The econometric discrete dependent variable multinomial Logit model Eleftherios Giovanis This paper examines the consumers’ preferences to the local furniture market in the Province of Serres. We apply a multinomial logit model to investigate the probability of buying a furniture in the following four-monthly period. We analyze also the demographic characteristics and we conclude that they are playing a major role among other factors. The questionnaire that will be analyzed in the particular project is a subset of the prototype, while the questions that were included in the initial questionnaire were too many, as a result the analysis to be quite long. So we tried to concentrate and to be restricted at the most important factors that they practice a great influence to the consumers’ choice decisions. Introduction According to the findings of the sector-based study that was realised by ICAP (COMCENTER , 2007) in Greece the majority of the productive furniture units are characterized by the small size, while usually are of familial nature and they do not have automated production. The productive units of medium and big size it is appreciated that they approach the 30% of the market share. The conclusions of this study are that the Greek enterprises present a decreasing export activity, while a shift of the market share to the super-markets has been marked, as well as to the importing enterprises via franchising. An other conclusion of the study is that the purchase and the furniture consuming are directly connected with the disposable income . So the problem that emerges is that an important part of the disposable income of Greek households is absorbed because of the obligations of the loans settlement. This fact results to the time change of the existing furniture replacement. The domestic furniture consumption marked an increasing course during the period 1998- 2006 with an average annual of 4.6%. At the year of 2006 the living room furniture it is appreciated that they assembled the 47.0% of the total domestic market, the bedroom furniture covered the 27.0% of the total domestic market, while the dining room furniture assembled a percentage of 26.0% (COMCENTER , 2007) Jonkers (2006) in a report that was conducted in collaboration with the CBI Market Survey finds that one of the major threats in the Greek domestic furniture market is that the
26
Embed
The econometric discrete dependent variable multinomial Logit model
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The econometric discrete dependent variable multinomial Logit model
Eleftherios Giovanis
This paper examines the consumers’ preferences to the local furniture market in the Province of Serres. We apply a multinomial logit model to investigate the probability of buying a furniture in the following four-monthly period. We analyze also the demographic characteristics and we conclude that they are playing a major role among other factors. The questionnaire that will be analyzed in the particular project is a subset of the prototype, while the questions that were included in the initial questionnaire were too many, as a result the analysis to be quite long. So we tried to concentrate and to be restricted at the most important factors that they practice a great influence to the consumers’ choice decisions.
Introduction
According to the findings of the sector-based study that was realised by ICAP
(COMCENTER , 2007) in Greece the majority of the productive furniture units are characterized
by the small size, while usually are of familial nature and they do not have automated
production. The productive units of medium and big size it is appreciated that they approach
the 30% of the market share. The conclusions of this study are that the Greek enterprises
present a decreasing export activity, while a shift of the market share to the super-markets
has been marked, as well as to the importing enterprises via franchising. An other conclusion
of the study is that the purchase and the furniture consuming are directly connected with the
disposable income . So the problem that emerges is that an important part of the disposable
income of Greek households is absorbed because of the obligations of the loans settlement.
This fact results to the time change of the existing furniture replacement.
The domestic furniture consumption marked an increasing course during the period 1998-
2006 with an average annual of 4.6%. At the year of 2006 the living room furniture it is
appreciated that they assembled the 47.0% of the total domestic market, the bedroom
furniture covered the 27.0% of the total domestic market, while the dining room furniture
assembled a percentage of 26.0% (COMCENTER , 2007)
Jonkers (2006) in a report that was conducted in collaboration with the CBI Market
Survey finds that one of the major threats in the Greek domestic furniture market is that the
1
Greek economy is quite dependent on furniture imports , based mainly on low prices, such
they arise opportunities for the developing country exporters, because the imports from these
countries are increasing at a faster rate than the imports from the developed countries.
According to Jonkers the best opportunities are in living and dining room furniture, where
domestic production is declining. In the same survey, imports increased by 80% in value
between 2001 and 2005, while exports were increased only by 14%. The major developing
countries exporters are China with Є 56 million, Turkey with Є 28.8 million , Indonesia with
Є 16.4 million, Vietnam with Є 10.2 million, India with Є 4.9 million, Malaysia with Є 4.2
million and then smaller suppliers are followed, as Albania, Egypt and South Africa. As for
the furniture exports the largest destination country is Cyprus, while Bulgaria, Germany and
Romania are followed.
The most important firm in Serres , and one of the most important in Greece, is the firm
“DROMEAS” ABEEA , which was established in 1979 and it is sited at the Industrial Area
outside Serres, about 80 km northeast of Thessalonica. Some of the firm’s achievements are
the equipment of 10,000 seats for waiting area of Manila’s airport and 4,000 seats for two
airports in Egypt. Smaller tasks bear its stamp in UK, Saudi Arabia and Australia. Also the
firm undertook the 40.0% of the furniture production that Olympic Committee was needed
(Interwood, 2007). Some other furniture firms and shops that are taking place in the
Prefecture of Serres are “BLACK RED WHITE”, “Fratzana”, “Kioutsoukis”, “ARREDO”,
and shops like “NEOSET”, “SATO” and others.
The main role of this project is to present some of the most important Logit models, that
can be used in the marketing survey researches and to choose the possible best model, while
this model choice it’s not unique, but is depended in the kind of product or service, the
questionnaire and sample design , the kind of the market , the city or the country , as also the
demographic characteristics, where a specific research is taking place.
2
Data
The data have been obtained by a marketing research that was realized by telephone
interview on 12-15 February of 2008 and was conducted by the firm “Analysis Center”. The
sample is 387 households and is being referred in the Prefecture of Serres in the region of
Macedonia of Greece. In the first stage the sample design was random, but in the second stage
data have been weighted based on age and sex. We must notice that the marketing survey is
refereed to households, but we are concern and for the sex too, because we would like to
obtain hypothesis test about the opinion and the preferences difference between the two sexes.
The weightings have been made based on the demographics data provided by National
Statistical Service of Greece. As concerning the urban weighting, is not necessary because the
research is reported for the city of Serres and the Capitals of regional Municipalities, so we
are concerning about only to urban population. We must notice that if the sample in the first
sample was not random, but stratified, as the industries in a specific sector, or particular age
category of particular sex, or a specific geographical region the weighted models would create
problems, as low standard errors and consequently erroneous interpretation of test
significance hypothesis. We must mention that it’s not possible to refer the name of the firm
which gave the order of the specific marketing research for private rights, but we are just
trying to give a guide of different approaches in the estimation of Logit models, as well the
interpretation of the results.
Methodology
The first thing that we must point out is to explain why we must take the Logit and not the
Probit model. In most application the two models are quite similar, while the main difference
is that the logistic distribution has slightly fatter tails, as we can see in figure 2.1. Also there is
no important reason to choose one model over the other. Actually many researchers prefer
Logit model, because of its mathematical simplicity (Gujarati, 2004).
3
Figure 1 Probit and logit cumulative distributions
In the model , that we will take, we would like to estimate the probability of buying
furniture in the next four-monthly period based on the kind of the furniture that consumers
generally would prefer to buy, on the criteria they choose the shop, on how much money they
intend to give , on demographics data as sex, age, income and profession. The multinomial
logit model in its general theoretical form is:
833632531430329228
1274263252241232221
2061941831721611514
13123112104938
27165544332211
Pr
PfbPfbPfbPfbPfbPfb
PfbIncbIncbIncbIncbidbageb
SexbInfbInfbInfbInfbInfbloyb
VarietybicebMonbMonbCritbCritb
CritbCritbCbCbCbCbCbaLi
+++++
+++++++
+++++++
++++++
++++++++=
,where α is constant, C1 is a dummy variable and is referred to question 1 , table 3.1,
presented in 3,where C1=1 for Living rooms and C1=0 otherwise where C2=1 , C3=1, C4=1,
C5=1 for dining rooms, Children furniture, Garden furniture, Bedrooms and Office furniture
respectively and zero otherwise. Crit is a dummy variable and is referred to question 2 , table
3.2 where Crit1=1 for Price and Crit1=0 otherwise where Crit2=1, Crit3=1, Crit4=1 for
quality, variety and trade name respectively and zero otherwise. Mon2 is a dummy variable
and is referred to question 3, table 3.3 where Mon2 =1 for 250-600 € and Mon2 =0 otherwise
and Mon3 =1 for ≥600 € and Mon2 =0 otherwise. Variables “Price”, “Variety” are
quantitative variables and are referred to question 4 , table 3.4. Loy is a dummy variable and is
4
referred to question 5, table 3.5 where loy=1 for Serres and loy =0 otherwise. Inf1 is a
dummy variable and is referred to question 3.6 , table 6 where Inf1=1 for TV and Inf1=0
otherwise and so for other variables Inf. Variable Sex is a dummy variable where Sex=1 for
male and Sex=0 for female. Variable age is quantitative variable and is presented in table
3.10. Variable “id” is a dummy variable where equals with 1 when the consumer lives in the
Municipality of Serres and equals with 0 the consumer lives in the regional Municipalities of
Serres Prefecture. The reason why we are taking this variable is to examine if the consumers
are characterized by homogenous preferences according to location or if there is heterogeneity
among them. Of course we could make the analysis more complicated and to cluster into
groups the main geographic regions but we make the hypothesis that the preferences are
homogenous, because in question 2 the “location” criterion assemble only 0.9%, so it doesn’t
play a crucial role in the consumer choices. Variables Inc are the dummy income variables
where Inc1=1 for income <500 € and Inc1=0 otherwise. The same procedure followed for the
other variables of income, Inc2, Inc3 and Inc4, and are presented in table 3.8. Finally variable
Pf (table 3.8) are the dummy profession variables, where Pf1 =1 for employees in Rural
Sector and Pf1 =0 otherwise. The same procedure is followed for the other variables of
employment , Pf2, Pf3, Pf4 ,Pf5, Pf6, and Pf8. The dependent polytomous variable is Li where is
referred to question 7, table 3.7 and it is Li =1 for those who answered YES, Li =2 for those
who answered NO and Li =3 for those who answered MAY BE.
So for a dummy variable with S categories, this requires the calculation of S-1 equations,
one for each category relative to the reference category. When using multinomial logistic
regression, one category of the dependent variable is chosen as the comparison category. This
category will be for Li =3. The probability is defined as
∑+
==J
j Ji
ji
i
X
Xjy
)exp(1
)exp()Pr(
β
β (1)
5
,and the log likelihood function can be written as
∑∑ −=J
j jiji
J
j i XXjy )exp(log()( ββ (2)
,where for the ith individual, yi is the observed outcome (dependent variable) and Xi is a
vector of explanatory variables , categorical or not, while j is the particular outcome and J
refers to all outcomes, except the base category. The unknown parameters βj are estimated by
maximum likelihood (Bartels, Boztug & Muller, 1999). The explanatory variables in relation
(1) doesn’t include the script t because the cases are the same for each choice j. With this
model we intend to explain if an unordered set of outcomes applies to the different individuals
in our sample, which means that probabilities of all these outcomes depend on the same
characteristics (Davidson & MacKinnon, 1999). In the section of the results we will show a
simple estimation example. Multinomial Logit relies in the assumption which called
independence from irrelevant alternatives (IIA) . This assumption claims that disturbances are
independent and homoscedastic (Greene, 2002). Because the dependent variable includes 3
outcomes we will consider outcome 1 (YES) as the base reference category and we will
estimate for the other two outcomes . So the probability for outcome 1 (YES) will be
∑+==
J
j Ji
ii
X
Xy
)exp(1
)exp()1Pr( 1
β
β (3)
, for outcome 2 (NO)
∑+==
J
j Ji
ii
X
Xy
)exp(1
)exp()2Pr( 2
β
β (4)
, and finally the probability for outcome 3 (MAY BE) is
6
∑+==
J
j Ji
i
Xy
)exp(1
1)3Pr(
β (5)
(Davidson & MacKinnon, 1999) A final matter that we must analyze is that from question 4
we took only variables price and variety. The reason why we have done this is that consumers
seem to respond in the same way, which means that price and quality might be considered as
a single variable, grouped to one. So we are trying to reduce the number of variables to avoid
the multicollinearity problem. Because those variables of question 4 are actually hierarchical,
the procedure of the cluster analysis is an agglomerative hierarchical method that begins with
all variables separate, six in our case, each forming its own cluster. . In the first step, the two
variables closest together are joined. In the next step, either a third variable joins the first two,
or two other variables join together into a different cluster. This process continues until all
clusters joined into one, but we decide to take two groups as it is more logical for our data.
First we must find the similarity measures between the variables and this can be done with the
commonly correlation coefficient distance measure
∑∑∑∑
∑∑∑−−
−=
])([])([ 2222 yynxxn
yxxynr
(6)
Ward’s cluster method objective is to minimize the sum of squares of the deviations from the
mean value (Žiberna et al, 2004)
∑∑∑ −=k
ikijk
ji
xXESS (7)
Ward’s clustering method results are presented in the figure 2.2, where we conclude that
the first groups constitutes by price, quality, service and service after shopping and the second
group constitutes by variety and delivery. The next step is taking the averages of each group
and to obtain the new variables.
7
Figure 2
Ward’s clustering method
V a r ia b l e s
Similarity
d e l i v e ryv a ri e tyse rv i c e a f te r sh o p p i n gse rv i c eq u a l l i t yp ri c e
-3 ,7 0
3 0 ,8 6
6 5 ,4 3
1 0 0 ,0 0
D e n d ro g ram w ith W a rd L in k ag e a n d Ab so lu te C o r r e la t io n C o e f f ic ie n t D is ta n ce
Second method is principal components. First we find the covariance matrix of the six
above variables. Then we find the eigenvalues of the covariance matrix in table 2.1. There are
two components with eigenvalues greater than unit. Table 2.2 presents the first principal
component eigenvector and we conclude again that we can obtain variables price, service,
quality and service after shopping as one, and from the other side variety and delivery as
another variable.
The first method is the frequency weighted multinomial logistic regression based on age.
The survey was conducted based on households but age plays an significant cluster variable
because there isn’t great age difference between couples and from the age we can generate
important significant. This is explained because the category of 30-50 years old presents the
greatest majority and frequency, especially in the city. So this category has the greatest weight
than the corresponding categories 18-24 old or 65 years and more, because couples that
belong in the category of 30-50 years old are more likely to buy furniture, for various reasons
as marriage, for replacement, because of deterioration or renovation, or to buy for their
children, that they will live in other house or in other city for educational purposes, working
or marriage. The probability is:
8
∑+
==J
j Jii
iji
i
XW
WXjy
)exp(1
)exp()Pr(
β
β (8.a.)
, while 8.a. can be written as
∑ −
−
−+
−
==J
j Ji
ji
i
XmJ
Jn
mj
jnX
jy
))exp((1
))(exp(
)Pr(1
1
β
β
(8.b.)
Where n is the number of observations, j is the specific outcome, J express all the
outcomes, except the base category, and m is the number of cases (Langholz & Goldstein,
2001). So for example if there are three persons of 30 years old, where the cases m equals
with three, who choose outcome 2 (NO), what is the probability based on the questions and
the demographics data?
The second model is the weighted robust multinomial Logit , where we obtain the same
weight as in the case of the weighted multinomial Logit model. The problem that arise in the
previous model is that MLE method and Rao’s score test can be misleading in the model
misspecification because of misclassification errors or extreme data points, the well known
outliers, in the sample (Pia & Feser, 2000). Pregibon (1982) suggests some tools that remove
data from the sample. But the problem that arises is that, while this procedure is iterative,
leaves the analyst with a considerably reduced sample. Robust is the well known Huber-
White sandwich variance estimator. Probabilities are defined as in the 8.a. The Huber-White
variance estimator is
11 )]()[()]([
1 −
Ε
∧∧
Ε
∧∧−
Ε
∧∧
Φ= βββ HHn
VE (9)
, where ∑=
Ε
∧
Ε
∧
Ε
∧
Ε
∧∧
∂∂
∂=
n
i
ii
΄
xyg
nH
1
2
]),|(log
[1
)(ββ
ββ
(9.a.)
9
∧
H is the Hessian matrix and
]),|(log
][),|(log
[1
1 ΄
xygxyg
n
iin
i
ii
∧
∧
=∧
∧∧
∂
∂
∂
∂=Φ ∑
β
β
β
β (9.b.)
,while if Ε
∧
β is the true MLE estimator then VE simplifies to 1)]}([{ −Ε
∧
Η− β .(Greene, 2002).
We notice that these standard errors, in the case we study, are robust for certain
misspecifications of the distribution of dependent variable and not for heteroscedasticity. The
reason why we claim that is that the assumption where disturbances are independent and
homoscedastic is confirmed with Hausman’s test and we will analyze it in next part of the
project.
The third method is the replication method with Jackknife standard errors. Jackknife is a
non nonparametric technique for estimating standard error of a statistic. The procedure is a
systematically recomputation of the statistic estimation leaving out one observation at a time
from the sample set. Thus, each subsample consists of n − 1 observations formed by deleting
a different observation from the sample. The jackknife estimator and its standard error are
then calculated from these truncated subsamples (Greene, 2002). For example, suppose θ is
the parameter of interest and let )()2()1( ...., n
∧∧∧
θθθ be estimations of θ based on n subsamples
each of size n − 1. The jackknife estimator of θ is given by (Wolter, 2007)
n
n
i
i
J
∑=
∧
∧
= 1
)(θ
θ (10)
and the jackknife estimate of the standard error of J
∧
θ is
2/12
1
)( ])(1
[ J
n
i
i
n
nJ
∧
=
∧∧
∑ −−
=∧ θθσ θ (11)
10
The t-statistic can be defined as
2/12
1
)(
)(
])(1
1[
)(
J
n
i
i
Ji
n
nt
∧
=
∧
∧∧∧
∑ −−
−=
θθ
θθ
(12)
Results
We must notice that there isn’t something equivalent and available, in the literature , to be
able to compare our results with other findings. Marketing research firms are dealing with
these matters, but these results are not available in public. From the results that are presented
in tables 1-3 in appendix we conclude that we reject the simple weighted Logit model because
of the great number of the statistical insignificance of the variables, even if from table 4 and
the Hausman test we conclude that the independence from irrelevant alternatives (IIA)
hypothesis is true. Also we reject the weighted Logit model with robust White-Huber standard
errors because of the heteroscedasticity presence and so the IIA assumption violation. So we
accept as the best estimation the weighted multinomial Logit with Jackknife standard errors,
which satisfies also the IIA assumption. So if we would like to make a probability prediction
for a consumer of buying or not or not sure of buying in the next four-monthly period we will
take the following probabilities.
)exp(1
)exp()1Pr( 1
T
iL
Ly
Σ+== ,
)exp(1
)exp()2Pr( 2
T
iL
Ly
Σ+== ,
and
)exp(1
1)3Pr(
T
iL
yΣ+
==
So for example if a consumer chose from question 1 the answer Living rooms, the main
criterion of buying from a furniture shop is the price, is female ,she intends to spend 250-600
€, she marks all the characteristics of her previous shopping – price, quality and the others-
11
with 5, she is 30 years old, she prefers Serres , as the region of shopping, she prefers to be
informed by leaflets, her income is 1001-1500 €, the profession is businessman and she lives
in the Municipal of Serres, then by Table 3 in appendix the probabilities for the multinomial
The next step is to apply a Monte-Carlo simulation to test the performance evaluation and
capability of the model we are presented. The expected coefficient value can be defined as
(Janke, 2002)
∑=
=N
i
iXfN
X1
)(1
(13)
12
, where X is the expectation value and the estimator X is a random number fluctuating
around the theoretical expected value. The variance is
222 )()( Χ−Χ=
Χσ (14)
, where we can take the standard errorN
σ. We must mention that the formula of standard
error is important, because the standard error of a Monte-Carlo simulation analysis decreases
with the square root of the sample size. Also if we would like for example a 50% error
reduction, or a 50% increase in accuracy, we must quadruple the number of random
drawings. As we already know, from relation (1)
∑+==≡
J
j Ji
ji
ij
X
Xjy
)exp(1
)exp()Pr(
β
βπ
(15)
So we can draw a predicted value y, from a multinomial distribution with parameters equal to
πj and n=1. We simulated the model with 500 set of parameters and then we took relations
(13) and (14) to find the mean estimated parameters and their standard errors. We decided to
simulate our estimations because our sample is finite so the parameter estimations are never
certain (Tomz et al, 2000) and probably not reliable and efficient. More specifically the
program draws simulations of the parameters from their asymptotic sampling distribution
equal to the vector of the estimated parameters and variance equal to the variance-covariance
matrix of estimates (Tomz et al, 2000). From the results of table 7 we conclude that our
model is fairly good, because the estimated coefficients by Monte-Carlo simulation are very
close to the estimated coefficients of the multinomial weighted Logit model with Jackknife
standard errors.
13
Conclusions
We applied three different multinomial Logit models for the marketing research survey
that was conducted in the Prefecture of Serres , for the case of the furniture market. The scope
of the research was the probability estimation of buying furniture, in the next four-monthly
period, based on the questionnaire and the demographic characteristics of the potential
consumers. We found that the simple weighted multinomial Logit is suffering by many
statistical insignificant variables, as there is a great possibility of the multicollinearity
problem. From the other side the weighted multinomial Logit, with Huber-White robust
standard errors presents heteroscedasticity and violates the IIA hypothesis. So we preferred to
choose the weighted multinomial Logit, with jackknife standard errors. We applied a simple
Monte-Carlo simulation and we concluded that the proposed model is quite a good option in
our case. We must mention that there are also other good estimations, as the Principal
Components (PC) logit or bootstrap, but the estimation are quite similar, with that of the
model we propose here, so it’s not necessary to present the results. It’s just worthy of
mentioning these methods, as PCA-logit or bootstrap, because in some other cases the
estimations might be quite better.
References
COMCENTER (2007),,“ The highly-fragmented furniture market in Greece” , I.C.A.P.
Bartels K., Boztug Y. & Muller M., (1999) “Testing the multinomial logit model”, working paper, University Potsdam, Humboldt-University at Berlin, Germany
Davidson R. & MacKinnon G.J., (1999), “Econometric theory and methods,” Oxford University Press, New York ,pp. 460-462 Greene H.W., (2003), “Econometric Analysis,” Fifth edition, Prentice Hall, New Jersey, U.S.A. , pp. 518-521, 724, 924 Gujarati D., (2004), “Basic Econometrics,” Fourth edition, McGraw-Hill, U.S.A., pp. 614-615
Interwood magazine , (2007) , “Dromeas presentation,” pp. 12-21
14
Janke W., (2002), “Statistical Analysis of Simulations: Data Correlations and Error Estimation,” John von Neumann Institute for Computing, Julich, NIC Series, Vol. 10, pp. 423-445. Jonkers J. (2006), “The domestic furniture market in Greece,” CBI MARKET SURVEY, Centre for the promotion of imports from developing countries, The Netherlands Langholz B. & Goldstein L., (2001), “Conditional logistic analysis of case-control studies with complex sampling,” Biostatistics, 2(1), 63-84. Pia M. & Feser V., (2000), “Robust Logistic Regression for Binomial Responses”, working paper, University of Geneva. Pregibon, D. (1982). “Resistant fits for some commonly used logistic models with medical applications,” Biometrics 38, 485-498.
Tomz M., Wittenberg J., King G., (2000), “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science, Vol. 44, No. pp. 341–355 Wolter M. K. ,(2007), “Introduction to Variance Estimation,” Statistics for Social and behavioural sciences , Second Edition, Springer, 151-153 Žiberna A., Kejžar N. & Golob P., (2004), “A Comparison of Different Approaches to Hierarchical Clustering of Ordinal Data” , Metodološki zvezki, 1(1), 57-73
Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes significant in 5% level, z denotes z-statistics
23
TABLE 2
Weighted multinomial Logit model with Huber-White robust standard errors Market = 1 Coef. z Market = 1 Coef. z Market = 2 Coef. z Market = 2 Coef. z C1 -24.1911*