Multinomial Logit Models with Individual Heterogeneity in R : The …msarrias.weebly.com/uploads/3/7/7/8/37783629/gmnl.pdf · 2018. 9. 10. · S-MNL extends the MNL by letting the

Multinomial Logit Models with Individual

Heterogeneity in R : The gmnl Package

Mauricio SarriasCornell University

Ricardo DazianoCornell University

Abstract

gmnl is a package for R, which allows to estimate multinomial logit models with un-observed heterogeneity across individuals for cross-sectional and panel data. The modelssupported by gmnl are MNL, MIXL, S-MNL, G-MNL, LC and MM-MNL. This documentis a general description of gmnl and all functionalities are illustrated using real databases.

Keywords: discrete choice models, logit model, simulated maximum likelihood, R, economet-rics, random parameters, latent class model.

1. Introduction

Modeling individual choices has been a very important research agenda in diverse fields suchas marketing, transportation, political science, and environmental, health, and urban eco-nomics. In all these areas the most widely used method to model choice among mutuallyexclusive alternatives has been the Conditional or Multinomial Logit model (MNL) (McFad-den 1974), which belongs to the family of Random Utility Maximization (RUM) models.The main advantage of the MNL model has been its simplicity in terms of both estimationand interpretation. On the one hand, the MNL has a closed-form choice probability and alikelihood function that is globally concave. MNL estimation is thus straightforward usingthe Maximum Likelihood Estimator (MLE). On the other hand, it has been recognized thatMNL not only imposes constant competition across alternatives – as a consequence of theindependence of irrelevant alternatives property (IIA) – but also lacks the flexibility to allowfor individual-specific preferences.

With the advent of more powerful computers and the improvement of simulation-based proce-dures in the last decades, researchers are no longer constrained to use models with closed-formsolutions that may lead to unrealistic behavioral specifications. In fact, much of recent workfocuses on extending MNL to allow for unobserved preference heterogeneity.

One of the most popular MNL extensions is the Mixed Logit Model (MIXL). MIXL allowscoefficients to vary randomly over individuals by assuming some continuous heterogeneitydistribution a priori while keeping the assumption that the error term is independent andidentically distributed (i.i.d) extreme value type 1 (McFadden and Train 2000; Train 2009;Hensher and Greene 2003). MIXL is a very flexible model that can approximate any RUMmodel, and it does not exhibit the IIA property encountered in MNL.

2 Multinomial Logit Models with Individual Heterogeneity in R : The gmnl Package

Latent Class (LC) models offer an alternative to MIXL by replacing the continuous distribu-tion assumption with a discrete distribution in which preference heterogeneity is captured bymembership in distinct classes or segments (Boxall and Adamowicz 2002; Greene and Hen-sher 2003; Shen 2009). The standard LC specification is useful if the assumption of preferencehomogeneity holds within segments. In effect, all individuals in a given class have the sameparameters (fixed parameters within a class), but the parameters vary across classes (hetero-geneity across classes).

Bujosa, Riera, and Hicks (2010), and more recently Greene and Hensher (2013), have ex-tended the LC model to allow for unobserved heterogeneity both within and across groups(LC-MIXL). The cross-group variation is modeled with the LC model, whereas the within-group variation is modeled as continuous variation of parameters within classes. An importantcharacteristic of this model is that it nests both the MIXL and LC model in a double mixturespecification. This model is also known as MM-MNL.

Other researchers have focused on MIXL extensions to allow for more flexible individual-specific parameters. For example, Fiebig, Keane, Louviere, and Wasi (2010) proposed twonew models, namely the Scale Heterogeneity (S-MNL) model and the Generalized Multino-mial Logit (G-MNL) model. S-MNL extends the MNL by letting the scale of errors varyacross individuals (via a parametric specification of heteroskedasticity), whereas the G-MNLnests the S-MNL, MIXL and MNL models. For a discussion of confounding effects betweenscale and taste heterogeneity, see Hess and Rose (2012) and Hess and Stathopoulos (2013).

Although there is a variety of new discrete choice models that accommodate individual pref-erence heterogeneity, the availability of statistical software for practical estimation of thesemodels is reduced. Thus, gmnl is intended to consolidate in a single package the whole rangeof discrete choice models with random parameters for the use of researchers and practitioners.

The aim of this paper is to describe the gmnl package in R (R Development Core Team 2012),a package that can be used to fit the models briefly discussed above. The gmnl packagecomplements other packages available in R that estimate MIXL models, such as the mlogit(Croissant 2012) and RSGHB packages (Dumont, Keller, and Carpenter 2014).

The paper is organized as follows: Section 2 presents a brief overview of the models, andsection 3 discusses the functionalities of the package for each model.

2. Models

2.1. Mixed and Latent Class Logit Model

The MIXL generalizes the MNL model by allowing the preference parameters to be differentfor each person (McFadden and Train 2000; Train 2009). The utility of person i for alternative

Mauricio Sarrias, Ricardo Daziano 3

j and for choice occasion t is:

Uijt = x′ijtβi + εijt i = 1, ..., N ; j = 1, ..., J, t = 1, ..., Ti, (1)

where xijt is a K × 1 vector of observed alternative attributes; εijt (the idiosyncratic errorterm or taste shock) is i.i.d. extreme value type 1; the parameter vector βi is unobservedfor each i and is assumed to vary in the population following the continuous density f(βi|θ),where θ are the parameters of this distribution. This mixing distribution can in principletake any shape. For example, by assuming that the parameters are distributed multivariatenormal, βi ∼ MVN(β,Σ), the vector βi can be re-written as:

βi = β + Lηi,

where ηi ∼ N(0, I), and L is the lower-triangular Cholesky factor of Σ such that LL′ = Σ. Ifthe off-diagonal elements of L are zero, then the parameters are independently normally dis-tributed. Observed heterogeneity (deterministic taste variations) can also be accommodatedin the random parameters by including individual-specific covariates (see for example Greene2012). Specifically, the vector of random coefficients is:

βi = β + Πzi + Lηi, (2)

where zi is a set of M characteristics of individual i that influence the mean of the preferenceparameters; and Π is a K ×M is a matrix of additional parameters.

Unlike the MIXL model, the LC model uses a discrete mixing distribution, where the indi-vidual i belongs to class q with probability wiq, i.e.:

βi = βq with probability wiq for q = 1, ..., Q,

where∑

q wiq = 1 and wiq > 0. The discrete mixing distribution (or class assignment prob-ability) is unknown to the analyst. The most widely used formulation for wiq is the semi-parametric multinomial logit format (Greene and Hensher 2003; Shen 2009):

wiq =exp (h′iγq)∑Qq=1 exp (h′iγq)

; q = 1, ..., Q, γ1 = 0,

where hi denotes a set of socio-economic characteristics that determine assignment to classes.The parameters of the first class are normalized to zero for identification of the model. Notethat one could omit any socio-economic covariate as a determinant of the class assignmentprobability. Under this scenario, the class probabilities are:

wiq =exp (γq)∑Qq=1 exp (γq)

; q = 1, ..., Q, γ1 = 0,

where γq is a constant (Scarpa and Thiene 2005).


Let yijt = 1 if individual i chooses j on occasion t, and 0 otherwise. Then, the unconditionalprobabilities of the sequence of choices for MIXL and LC are respectively given by:

Pi(θ) =

∫ T∏t

J∏j

exp(x′ijtβi

)∑J

j=1 exp(x′ijtβi

)yit f(βi)dβi

Pi(θ) =

Q∑q

wiq

T∏t

J∏j

exp(x′ijtβq

)∑J

j=1 exp(x′ijtβq

)yit .

Both models are widely used to accommodate preference heterogeneity across respondents.In the MIXL approach, heterogeneity is introduced by assuming that the parameters (or thetastes of the individuals) vary across the population according to some pre-specified statisticaldistribution (that defines continuous segmentation of preferences). In the LC model, hetero-geneity is accommodated by making use of a discrete number of separate classes (or segments)with different values for the vector of taste coefficients. In addition to differentiation in termsof continuous versus discrete consumer segments, there exist further differences between theMIXL and LC model. For example, compared with the MIXL approach, the LC model hasthe advantage of being relatively simple, reasonably plausible and statistically testable (Shen2009). LC also has the advantage of being a semiparametric specification, which frees theanalyst from potential problems of misspecification in the distribution of individual hetero-geneity. In fact, the main disadvantage of MIXL is that the researcher has to choose thedistribution of the random parameters a priori, whereas in the LC model no assumptions aremade about the shape of the heterogeneity distribution other than the number of supportpoints that is equal to the number of classes. Nevertheless, LC is less flexible than MIXL pre-cisely because the parameters in each class are fixed. Another important difference betweenthese two models is the estimation procedure. The MIXL requires the use of the maximumsimulated likelihood estimator – which can be very costly in terms of computational time –but for LC, no simulation is required.1

To take advantage of the benefits of both models, recent empirical papers have mixed LC andMIXL in one model. This double-mixture model is known as the ‘Mixed-Mixed’ Logit model(MM-MNL) (Keane and Wasi 2013).2 Bujosa et al. (2010), and Greene and Hensher (2013)developed this MM-MNL model by extending the LC model to allow for random parameterswithin each class.

Consider the case where the heterogeneity distribution is generalized to a discrete mixture ofmultivariate normal distributions. In this case we have:

βi ∼ N(βq,Σq) with probability wiq for q = 1, ..., Q. (3)

1For an empirical comparison between these two models, see for example Greene and Hensher (2003), Shen(2009) and Hess, Ben-Akiva, Gopinath, and Walker (2011).

2Train (2008) refers to this model as ‘discrete mixture of continuous distributions’, whereas Greene andHensher (2013) labeled it ‘LC-MIXL’.


The appeal of using a Gaussian mixture for the heterogeneity distribution is that any contin-uous distribution can be approximated by a discrete mixture of normal distributions (Train2008). Note that the MM-MNL with only one class is equivalent to the MIXL model. Fur-thermore, if Σq → 0 for all q, (3) the model becomes an LC (Bujosa et al. 2010; Keane andWasi 2013). Thus, MM-MNL nests both MIXL and LC.

The choice probabilities for the MM-MNL are given by:

Pi(θ) =

Q∑q

wiq

∫ T∏t

J∏j

exp(x′ijtβi

)∑J

j=1 exp(x′ijtβi

)yit f(βi)dβi,

where f(βi) = N(βq,Σq).

2.2. Generalized Multinomial Logit Model

Fiebig et al. (2010) proposed a general version of the MIXL model, called the generalizedmultinomial logit (G-MNL) model, where the parameters vary across individuals accordingto:

βi = σiβ + [γ + σi(1− γ)] Lηi, (4)

where σi is the individual-specific scale of the idiosyncratic error term; γ is a scalar parameterthat controls how the variance of residual taste heterogeneity, Lηi, varies with scale. Tobetter understand this specification, it is useful to note that differing sub-models arise if somestructural parameters in the G-MNL model are constrained:

• G-MNL-I: If γ = 1, then βi = σiβ + Lηi. In this model the residual taste heterogeneityis independent of the scaling of β.

• G-MNL-II: If γ = 0, then βi = σi(β + Lηi). In this model, the residual taste hetero-geneity is proportional to σi.

• S-MNL: If Var(ηi) = 0, then βi = σiβ. As pointed out by Fiebig et al. (2010), thismodel is observationally equivalent to the particular type of heterogeneity, in whichthe parameters are scaled up or down proportionately across individuals by the scalingfactor σi. This model provides a more parsimonious description of the data than MIXL,because βσi is a simpler object than β + Lηi (Fiebig et al. 2010).

• MIXL: βi = β + Lηi, if σi = 1

• MNL: βi = β, if σi = 1 and Var(ηi) = 0

Fiebig et al. (2010) note that some restrictions need to be considered to estimate the G-MNLmodel. First, the domain of σi should be the positive real line. To constraint the scaleparameter to be positive, Fiebig et al. (2010) assume that σi is distributed log-normal withstandard deviation τ and mean σ:3

3In fact, any distribution with a domain equal to all positive real numbers can be used.


σi = exp(σ + τυi)

where υ ∼ TN(−2, 2).4 Note that the parameters σ, τ , and β are not separately identified.Fiebig et al. (2010) suggest that one can normalize the mean σ by setting:

σ = − log

[1

N

N∑i=1

exp (τυi)

].

Another important issue in G-MNL is the domain of γ. Initially, Fiebig et al. (2010) imposedγ ∈ [0, 1]. To constrain γ in this interval, the authors used the logistic transformation:

γ =exp(γ∗)

1 + exp(γ∗),

and estimated γ∗. However, Keane and Wasi (2013) pointed out that γ < 0 or γ > 1 stillhave meaningful behavioral interpretations. Thus, these authors estimate γ directly. gmnlallows to estimate γ using both procedures.

Finally, one can allow the mean of the scale to differ across individuals by including individual-specific characteristics. Thus, the scale parameter can be written as:

σi = exp(σ + δsi + τυi),

where si is a vector of attributes of individual i.

In terms of computation, all models, except for the LC and the MNL model, are estimatedusing the maximum simulated likelihood estimator (MSLE). For a complete derivation of theasymptotic properties of the MSLE and a more comprehensive review of how to implementthis estimator, see example Train (2009), Lee (1992), Gourieroux and Monfort (1997) orHajivassiliou and Ruud (1986).

3. The gmnl package

To specify multinomial logit models with gmnl, the use of the formula argument is similarto that of mlogit. Therefore, users that have previously worked with mlogit will be familiarwith the requirements for model specification. In any case, users are invited to review theformula specification in the mlogit manual.

3.1. Estimating S-MNL Models

gmnl is loaded by typing:

4Fiebig et al. (2010) also note that when τ is too large, numerical problems arise for extreme draws of υi.To avoid this numerical issue, they suggest to use a truncated normal distribution for υi with truncation at ±2.Greene and Hensher (2010) found that constraining υi at −1.96 and +1.96 maintains the smoothness of theestimator. Specifically, they used υir = Φ−1(0.025 + 0.95uir), where uir is a draw from the standard uniformdistribution. gmnl allows the user to choose between these two ways of drawing from υi, using the argumenttypeR.


library(gmnl)

The function mlogit.data from mlogit is very useful to handle multinomial data format.Given this, gmnl uses the same class of data for the estimations.

We first show how to estimate an S-MNL model using the Travel Mode data from the AERpackage:

data("TravelMode", package = "AER")

library(mlogit)

TM <- mlogit.data(TravelMode, choice = "choice", shape = "long",

alt.levels = c("air", "train", "bus", "car"),

chid.var = "individual")

Note that the gmnl function requires the data in the mlogit.data format. If the user forgetsto put the data in this format, gmnl will give an error message and the estimation processwill stop. Once the data is of class mlogit.data, then one can use the gmnl function.

We start by estimating an S-MNL where the alternative specific constants (ASCs) are fixedand not scaled. Fiebig et al. (2010) found that in a model where all attributes are scaled –including the ASCs – the estimates often ‘blow up’ and the model actually produces a worsefit. The basic syntax is the following:

smnl <- gmnl(choice ~ wait + vcost + travel + gcost| 1,

data = TM,

model = "smnl",

R = 30,

notscale = c(1, 1, 1, rep(0, 4)))

The following variables are not scaled:

[1] "train:(intercept)" "bus:(intercept)" "car:(intercept)"

Estimating SMNL model

The formula component of the function is the same as in the mlogit package. The component| 1 in the formula implies that the model is fitted using ASCs for the J − 1 alternatives.The main argument in the model is model = "smnl". This indicates to the function that theuser wants to estimate the S-MNL model (without random parameters). R = 30 indicatesthat only 30 draws are used to simulate the probabilities. Another important argument isnotscale. This is a vector that indicates which variables (not coefficients) will not be scaled(1 = not scaled and 0 = scaled). Since the ASCs are always the first variables entering inthe model (if they are specified using | 1) and only J − 1 = 3 ASCs are created, notscale= c(1, 1, 1, rep(0, 4)) implies that the constants will not be scaled.

summary(smnl)


Model estimated on: Tue Jan 27 11:30:57 2015

Call:

gmnl(formula = choice ~ wait + vcost + travel + gcost | 1, data = TM,

model = "smnl", R = 30, notscale = c(1, 1, 1, rep(0, 4)),

method = "bfgs")

Frequencies of categories:

air train bus car

0.27619 0.30000 0.14286 0.28095

The estimation took: 0h:0m:4s

Coefficients:

Estimate Std. Error t-value Pr(>|t|)

train:(intercept) -1.1801236 0.5809429 -2.0314 0.0422151 *

bus:(intercept) -1.9272499 0.7022906 -2.7442 0.0060652 **

car:(intercept) -7.0765718 1.3056815 -5.4198 5.966e-08 ***

wait -0.1336641 0.0206988 -6.4576 1.064e-10 ***

vcost -0.1174104 0.0316600 -3.7085 0.0002085 ***

travel -0.0172084 0.0039124 -4.3985 1.090e-05 ***

gcost 0.0922862 0.0259145 3.5612 0.0003692 ***

tau 0.4301553 0.1325649 3.2449 0.0011751 **

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Optimization of log-likelihood by BFGS maximisation

Log Likelihood: -180.06

Number of observations: 210

Number of iterations: 63

Exit of MLE: successful convergence

Simulation based on 30 draws

The results report the point estimates for each variable and τ , which represents the standarddeviation of σi. The output also gives useful information about the estimation procedure.First, the model is estimated using the BFGS procedure. Other optimization procedures suchas the BHHH and Newton Raphson (NR) can be called using the argument method.5 BHHHis generally faster than the other procedures, but it can blow up if the variables have verydifferent scale. The larger the ratio between the largest standard deviation and the small-est standard deviation, the more problems the user will have with the estimation procedure(Scott Long 1997). Given this fact, we encourage the users to check, and potentially re-scaleor recode the variables if necessary. gmnl uses the numerical Hessian if method = ’nr’, whichcan slow down the estimation process compared to the other optimization methods.

Another important point is that the number of observations reported by gmnl correspondsto N/J if cross-sectional data is used, or N × T/J if panel data (repeated choice situations)is used. Finally, it is always important to check all details in the estimation output. In our

5All the estimation procedures are carried out using the maxLik package (Henningsen and Toomet 2011).For more information about arguments of this function type help(maxLik).


example, the output informs us that the convergence was achieved successfully.

In the next example, we allow the scale to differ across individuals according to their income.Basically, we assume that:

σi = exp (σ + δincomeincomei + τυi) .

The syntax is very similar to our previous example, with minor changes in the formula:

smnl.het <- gmnl(choice ~ wait + vcost + travel + gcost| 1 | 0 | 0 | income - 1,

data = TM,

model = "smnl",

R = 30,

notscale = c(1, 1, 1, 0, 0, 0, 0))


[1] "train:(intercept)" "bus:(intercept)" "car:(intercept)"


The fifth part of the formula is reserved for individual-specific variables that affect scale. Inthis example, we specify that the variable income and no constant are included in σi.

summary(smnl.het)


Call:

gmnl(formula = choice ~ wait + vcost + travel + gcost | 1 | 0 |

0 | income - 1, data = TM, model = "smnl", R = 30, notscale = c(1,

1, 1, 0, 0, 0, 0), method = "bfgs")


air train bus car

0.27619 0.30000 0.14286 0.28095


Coefficients:


train:(intercept) -0.8483193 0.6125736 -1.3848 0.1661000

bus:(intercept) -1.5034428 0.7440100 -2.0207 0.0433078 *

car:(intercept) -6.6921151 1.3277108 -5.0403 4.647e-07 ***

wait -0.1115623 0.0204826 -5.4467 5.132e-08 ***

vcost -0.0905990 0.0282352 -3.2087 0.0013332 **

travel -0.0142813 0.0034442 -4.1464 3.377e-05 ***

gcost 0.0746146 0.0224106 3.3294 0.0008702 ***

tau 0.4666873 0.1459901 3.1967 0.0013901 **


het.income 0.0058326 0.0033887 1.7212 0.0852111 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







The results are very similar to those of the previous example. All the parameters for the vari-ables that enter in the scale are preceded by the string het. Thus, the coefficient het.incomecorresponds to δincome.

Suppose now that we want to test the null hypothesis H0 : δincome = 0. This test can beperformed using the function waldtest or lrtest from the package lmtest:

library(lmtest)

waldtest(smnl, smnl.het)

Wald test

Model 1: choice ~ wait + vcost + travel + gcost | 1

Model 2: choice ~ wait + vcost + travel + gcost | 1 | 0 | 0 | income -

1

Res.Df Df Chisq Pr(>Chisq)

1 202

2 201 1 2.9626 0.08521 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(smnl, smnl.het)

Likelihood ratio test

Model 1: choice ~ wait + vcost + travel + gcost | 1

Model 2: choice ~ wait + vcost + travel + gcost | 1 | 0 | 0 | income -

1

#Df LogLik Df Chisq Pr(>Chisq)

1 8 -180.06

2 9 -178.16 1 3.8053 0.05109 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2. Estimating MIXL Models

In the following examples we show how to estimate MIXL models using gmnl. The packagemlogit is very efficient in estimating MIXL models. However, one advantage of using gmnl is


the inclusion of individual-specific variables to explain the mean of the random parameters(see equation 2). Other expansions include the possibility of producing point and intervalestimates at the individual level, and the consideration of triangular and Johnson Sb hetero-geneity distributions.

If we assume that the coefficients of travel and wait vary across individuals according to:

βtravel,i = β1 + π11income + π12size + σ1η1i

βwait,i = β2 + π21income + σ2η2i

where η1i is triangular and η2i ∼ N(0, 1), the corresponding MIXL model is estimated byusing model = "mixl":

mixl.hier <- gmnl(choice ~ vcost + gcost + travel + wait | 1 | 0 | income + size - 1,

data = TM,

model = "mixl",

ranp = c(travel = "t", wait = "n"),

mvar = list(travel = c("income","size"), wait = c("income")),

R = 50,

haltons = list("primes"= c(2, 17), "drop" = rep(19, 2)))

Estimating MIXL model

Note that the fourth part of the formula is reserved for all the variables that enter the meanof the random parameters. The argument mvar indicates which variables enter each specificrandom parameter. For example travel = c("income","size") indicates that the mean ofthe travel coefficient varies according to income and size.

Another important issue is the order of the variables in the first part of the formula. The usershould specify first the variables that are fixed, followed by the variables that are random.(In our example vcost and gcost have fixed parameters, so both are specified first in theformula.) Finally, haltons indicates the prime numbers to use and how many elements todrop for generating draws of each random parameter.

summary(mixl.hier)


Call:

gmnl(formula = choice ~ vcost + gcost + travel + wait | 1 | 0 |

income + size - 1, data = TM, model = "mixl", ranp = c(travel = "t",

wait = "n"), R = 50, haltons = list(primes = c(2, 17), drop = rep(19,

2)), mvar = list(travel = c("income", "size"), wait = c("income")),

method = "bfgs")



air train bus car

0.27619 0.30000 0.14286 0.28095


Coefficients:


train:(intercept) -3.1465e-01 1.0400e+00 -0.3025 0.7622421

bus:(intercept) -1.1038e+00 1.0978e+00 -1.0054 0.3147130

car:(intercept) -7.9684e+00 2.0632e+00 -3.8621 0.0001124 ***

vcost -5.0956e-02 4.5778e-02 -1.1131 0.2656635

gcost 2.8150e-02 4.5146e-02 0.6235 0.5329310

travel -9.6680e-03 5.2113e-03 -1.8552 0.0635698 .

wait -1.3329e-01 3.9310e-02 -3.3908 0.0006969 ***

travel.income -1.2805e-04 5.6331e-05 -2.2731 0.0230178 *

travel.size 2.2209e-03 1.2745e-03 1.7426 0.0813999 .

wait.income -1.1151e-03 5.5324e-04 -2.0155 0.0438495 *

sd.travel 2.4119e-03 6.0674e-03 0.3975 0.6909828

sd.wait 6.9475e-02 2.7707e-02 2.5075 0.0121583 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







The output shows the parameter estimates in the following order: fixed parameters, meanof the random parameters, effect of the variables that affect the mean of the random pa-rameters, and finally the standard deviation/spread of the random parameters. Note thattravel.income corresponds to π11, travel.size corresponds to π12, and wait.income cor-responds to π21.

We estimate now a correlated random parameter model using the Electricity data frommlogit package, which is a panel dataset. Given time compilation restrictions, we will usejust a subsample of this database (subset = 1:3000). The user may want to use the wholesample to reproduce this example.

data("Electricity", package = "mlogit")

Electr <- mlogit.data(Electricity, id.var = "id", choice = "choice",

varying = 3:26, shape = "wide", sep = "")

In this example, two arguments are especially relevant. First, panel = TRUE indicates thatthe data is a panel. When using panel data the user needs to specify a variable in the id.var

of mlogit.data. Second, to estimate correlated random parameters correlation = TRUE

needs to be indicated in the gmnl function.


Elec.cor <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0, data = Electr,

subset = 1:3000,

model = 'mixl',R = 50,

panel = TRUE,

ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n"),

correlation = TRUE)

Estimating MIXL model

summary(Elec.cor)


Call:

gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,

data = Electr, subset = 1:3000, model = "mixl", ranp = c(cl = "n",

loc = "n", wk = "n", tod = "n", seas = "n"), R = 50,

correlation = TRUE, panel = TRUE, method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


pf -0.870190 0.078631 -11.0668 < 2.2e-16 ***

cl -0.176484 0.042958 -4.1083 3.986e-05 ***

loc 2.382190 0.305305 7.8027 5.995e-15 ***

wk 1.944683 0.249278 7.8013 6.217e-15 ***

tod -8.502642 0.742339 -11.4538 < 2.2e-16 ***

seas -8.645581 0.780296 -11.0799 < 2.2e-16 ***

sd.cl.cl 0.391907 0.041985 9.3344 < 2.2e-16 ***

sd.cl.loc 0.492062 0.198342 2.4809 0.0131059 *

sd.cl.wk 0.551362 0.213074 2.5877 0.0096633 **

sd.cl.tod -0.983407 0.280226 -3.5093 0.0004492 ***

sd.cl.seas -0.147022 0.229663 -0.6402 0.5220643

sd.loc.loc 2.592474 0.422565 6.1351 8.511e-10 ***

sd.loc.wk 1.931095 0.360997 5.3493 8.827e-08 ***

sd.loc.tod 1.019809 0.565131 1.8046 0.0711447 .

sd.loc.seas 0.094062 0.457852 0.2054 0.8372266

sd.wk.wk -0.332956 0.221197 -1.5052 0.1322609

sd.wk.tod 1.934071 0.320829 6.0283 1.656e-09 ***

sd.wk.seas 0.734865 0.302988 2.4254 0.0152919 *

sd.tod.tod 2.063511 0.330087 6.2514 4.067e-10 ***

sd.tod.seas 1.168916 0.253925 4.6034 4.157e-06 ***

sd.seas.seas 1.703389 0.253345 6.7236 1.773e-11 ***

---


Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







The estimates from sd.cl.cl to sd.seas.seas are the elements of the lower triangular matrixL. We can recover the full variance-covariance matrix LL′ = Σ using the command cov.gmnl

cov.gmnl(Elec.cor)

cl loc wk tod seas

cl 0.15359085 0.1928424 0.2160824 -0.3854037 -0.05761902

loc 0.19284236 6.9630446 5.2776169 2.1599309 0.17150959

wk 0.21608239 5.2776169 4.1439870 0.7831748 -0.14409709

tod -0.38540369 2.1599309 0.7831748 10.0058043 4.07385871

seas -0.05761902 0.1715096 -0.1440971 4.0738587 4.83838660

If the user is also interested in the standard errors of the variance-covariance matrix of therandom parameters, the function se.cov.gmnl is very useful. This function is a wrapperof the deltamethod function from the msm package, and allows the user to estimate thestandard errors for each element of Σ if the argument sd = FALSE (by default). If sd =

TRUE the standard errors of the standard deviations are computed.

# Standard errors for Sigma

se.cov.gmnl(Elec.cor)

Elements of the variance-covariance matrix


v.cl.cl 0.153591 0.032909 4.6672 3.054e-06 ***

v.cl.loc 0.192842 0.081577 2.3639 0.018083 *

v.cl.wk 0.216082 0.091729 2.3557 0.018489 *

v.cl.tod -0.385404 0.129018 -2.9872 0.002815 **

v.cl.seas -0.057619 0.090617 -0.6359 0.524871

v.loc.loc 6.963045 2.206457 3.1558 0.001601 **

v.loc.wk 5.277617 1.663738 3.1721 0.001513 **

v.loc.tod 2.159931 1.332301 1.6212 0.104974

v.loc.seas 0.171510 1.122159 0.1528 0.878525

v.wk.wk 4.143987 1.329288 3.1174 0.001824 **

v.wk.tod 0.783175 0.852976 0.9182 0.358531

v.wk.seas -0.144097 0.752504 -0.1915 0.848142

v.tod.tod 10.005804 3.476283 2.8783 0.003998 **

v.tod.seas 4.073859 1.521734 2.6771 0.007426 **

v.seas.seas 4.838387 1.185096 4.0827 4.452e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# Standard errors for standard deviations

se.cov.gmnl(Elec.cor, sd = TRUE)

Standard deviations of the random parameters


cl 0.391907 0.041985 9.3344 < 2.2e-16 ***

loc 2.638758 0.418086 6.3115 2.763e-10 ***

wk 2.035679 0.326498 6.2349 4.521e-10 ***

tod 3.163195 0.549489 5.7566 8.582e-09 ***

seas 2.199633 0.269385 8.1654 2.220e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The correlation matrix of the random parameters can be recovered using the function cor.gmnl:

cor.gmnl(Elec.cor)

cl loc wk tod seas

cl 1.00000000 0.18647481 0.27084917 -0.3108903 -0.06683946

loc 0.18647481 1.00000000 0.98249212 0.2587702 0.02954871

wk 0.27084917 0.98249212 1.00000000 0.1216252 -0.03218072

tod -0.31089031 0.25877020 0.12162518 1.0000000 0.58550375

seas -0.06683946 0.02954871 -0.03218072 0.5855037 1.00000000

3.3. Estimating G-MNL Models

In the following examples, we show how to estimate G-MNL models. We will assume thatthe ASCs are random. Using the formula to create the ASCs produces problems in the ranp

argument due to the way the constants are labeled. So, we first create the ASCs by hand.

Electr$asc2 <- as.numeric(Electr$alt == 2)



The G-MNL model is estimated using model = "gmnl":

Elec.gmnl <- gmnl(choice ~ pf + cl + loc + wk + tod + seas + asc2 + asc3 + asc4 | 0,

data = Electr,

subset = 1:3000,

model = 'gmnl',R = 50,

panel = TRUE,

notscale = c(rep(0, 6), 1, 1, 1),

ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n",

asc2 = "n", asc3 = "n", asc4 = "n"))



[1] "asc2" "asc3" "asc4"

Estimating GMNL model

summary(Elec.gmnl)


Call:

gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas + asc2 +

asc3 + asc4 | 0, data = Electr, subset = 1:3000, model = "gmnl",

ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n",

asc2 = "n", asc3 = "n", asc4 = "n"), R = 50, panel = TRUE,

notscale = c(rep(0, 6), 1, 1, 1), method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


pf -0.873321 0.106578 -8.1942 2.220e-16 ***

cl -0.171766 0.042199 -4.0704 4.694e-05 ***

loc 1.808101 0.228900 7.8991 2.887e-15 ***

wk 1.754280 0.222239 7.8937 2.887e-15 ***

tod -8.595975 0.986489 -8.7137 < 2.2e-16 ***

seas -8.865332 1.012750 -8.7537 < 2.2e-16 ***

asc2 0.304378 0.153868 1.9782 0.047910 *

asc3 0.156338 0.159796 0.9784 0.327897

asc4 0.113329 0.156808 0.7227 0.469847

sd.cl 0.364344 0.044176 8.2475 2.220e-16 ***

sd.loc 1.101449 0.273783 4.0231 5.744e-05 ***

sd.wk 1.205311 0.249255 4.8357 1.327e-06 ***

sd.tod 1.465550 0.233550 6.2751 3.494e-10 ***

sd.seas 1.810961 0.295819 6.1219 9.249e-10 ***

sd.asc2 0.526420 0.179072 2.9397 0.003285 **

sd.asc3 0.084905 0.224624 0.3780 0.705442

sd.asc4 0.240725 0.188140 1.2795 0.200720

tau 0.677748 0.149017 4.5481 5.412e-06 ***

gamma 0.362509 0.174670 2.0754 0.037951 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1








Since we are including the ASCs as additional variables, the second part of the formula doesnot include the ASCs (| 0). Note also that even though the ASCs are random, they are notscaled: notscale = c(rep(0, 6), 1, 1, 1) indicates that the last three variables in thefirst part of the formula (asc2, asc3, and asc4) are not scaled. Recall also that the ASCsare random, therefore they should be the last variables in the first part of the formula.

Another important issue is that gmnl estimates γ directly by default as suggested by Keaneand Wasi (2013). However, one can estimate γ∗ where γ = exp(γ∗)/(1+exp(γ∗)) as suggestedby Fiebig et al. (2010), by specifying hgamma = "indirect". (Thus, hgamma = "direct" isthe default setting.)

For all models estimated with gmnl, the AIC and BIC goodness-of-fit criteria can be extracted:

AIC(Elec.gmnl)

[1] 1496.491

BIC(Elec.gmnl)

[1] 1584.272

The G-MNL estimation code is also very convenient when one wants to estimate S-MNLmodels with random effects (Keane and Wasi 2013). In this case, the user can fix γ and usemodel = "gmnl".

Elec.smnl.re <- gmnl(choice ~ pf + cl + loc + wk + tod + seas + asc2 + asc3 + asc4 | 0,

data = Electr,

subset = 1:3000,

model = 'gmnl',R = 50,

panel = TRUE,

print.init = TRUE,

notscale = c(rep(0, 6), 1, 1, 1),

ranp = c(asc2 = "n", asc3 = "n", asc4 = "n"),

init.gamma = 0,

fixed = c(rep(FALSE, 16), TRUE),

correlation = TRUE)


[1] "asc2" "asc3" "asc4"

Starting Values:

pf cl loc wk tod

-0.6018425 -0.1350292 1.2222840 1.0386685 -5.3686080

seas asc2 asc3 asc4 sd.asc2.asc2

-5.5623402 0.2097372 0.0811432 0.1064875 0.1000000

sd.asc2.asc3 sd.asc2.asc4 sd.asc3.asc3 sd.asc3.asc4 sd.asc4.asc4

0.1000000 0.1000000 0.1000000 0.1000000 0.1000000

tau gamma


0.1000000 0.0000000


summary(Elec.smnl.re)


Call:

gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas + asc2 +

asc3 + asc4 | 0, data = Electr, subset = 1:3000, model = "gmnl",

ranp = c(asc2 = "n", asc3 = "n", asc4 = "n"), R = 50, correlation = TRUE,

panel = TRUE, init.gamma = 0, notscale = c(rep(0, 6), 1,

1, 1), print.init = TRUE, fixed = c(rep(FALSE, 16), TRUE),

method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


pf -0.629883 0.114787 -5.4874 4.079e-08 ***

cl -0.137680 0.032820 -4.1950 2.729e-05 ***

loc 1.274858 0.219984 5.7952 6.823e-09 ***

wk 1.111911 0.192879 5.7648 8.174e-09 ***

tod -6.200782 1.045037 -5.9336 2.965e-09 ***

seas -6.368142 1.080162 -5.8955 3.735e-09 ***

asc2 0.212377 0.144220 1.4726 0.140862

asc3 0.229546 0.135119 1.6988 0.089350 .

asc4 0.153643 0.131028 1.1726 0.240955

sd.asc2.asc2 0.569372 0.218351 2.6076 0.009118 **

sd.asc2.asc3 0.306593 0.181275 1.6913 0.090777 .

sd.asc2.asc4 0.150838 0.199530 0.7560 0.449669

sd.asc3.asc3 0.044531 0.219183 0.2032 0.839005

sd.asc3.asc4 0.089272 0.218777 0.4081 0.683237

sd.asc4.asc4 -0.040568 0.203401 -0.1995 0.841910

tau 1.100902 0.185246 5.9429 2.800e-09 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







The argument init.gamma indicates the initial value for γ. In this case we set it at zero. The


next step is to set the parameters that are fixed by using the argument fixed. Note that theuser needs to be careful with the order of the parameters. We encourage the user to estimatefirst a model where all the parameters are freely estimated with the argument print.init =

TRUE. This argument will display the initial values used by gmnl. Generally, γ is the last pa-rameter that enters the likelihood specification. So, by typing fixed = c(rep(FALSE, 16),

TRUE) we are holding fixed just γ at zero, and the rest of the coefficients are freely estimated.

By default, the initial values for the mean of the random parameters come from an MNL, andthe standard deviations or spread are set at 0.1. The starting values from an MNL model maynot be the best guesses, since the G-MNL model is not globally concave. The best startingvalues for a G-MNL model with correlated parameters might be: 1) G-MNL with uncorrelatedparameters, 2) MIXL with correlated parameters, or 3) GMNL with correlated parameterswith γ fixed at 0. One can get these initial parameters first and then use the start argumentof gmnl to indicate the vector of appropriate starting values (see section 3.5 for an exampleof how to use the start argument).

3.4. Estimating LC and MM-MIXL Models

The next example shows how an LC model with two classes and the constant for the sharecan be estimated:

Elec.lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,

data = Electr,

subset = 1:3000,

model = 'lc',panel = TRUE,

Q = 2)

Estimating LC model

Note that for the LC model, one needs to specifiy at least a constant in the fifth part of theformula. If the class assignment, wiq, is also determined by socio-economic characteristics,those variables can also be included in the fifth part. The LC model is estimated by typingmodel = "lc", and the number of classes is indicated with the argument Q.

summary(Elec.lc)


Call:

gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas | 0 |

0 | 0 | 1, data = Electr, subset = 1:3000, model = "lc",

Q = 2, panel = TRUE, method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533



Coefficients:


class.1.pf -0.445822 0.087567 -5.0912 3.557e-07 ***

class.1.cl -0.184654 0.030077 -6.1393 8.288e-10 ***

class.1.loc 1.214376 0.161829 7.5041 6.195e-14 ***

class.1.wk 0.964074 0.142875 6.7477 1.502e-11 ***

class.1.tod -3.218417 0.687990 -4.6780 2.897e-06 ***

class.1.seas -3.486496 0.692898 -5.0318 4.860e-07 ***

class.2.pf -0.843079 0.096787 -8.7107 < 2.2e-16 ***

class.2.cl -0.124183 0.045291 -2.7419 0.006109 **

class.2.loc 1.644477 0.268852 6.1167 9.555e-10 ***

class.2.wk 1.413870 0.211971 6.6701 2.556e-11 ***

class.2.tod -9.373176 0.867618 -10.8033 < 2.2e-16 ***

class.2.seas -9.264694 0.884694 -10.4722 < 2.2e-16 ***

(class)2 -0.220005 0.078803 -2.7918 0.005241 **

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1






The next example estimates an MM-MNL with two mixtures of normals:

Elec.mm <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,

data = Electr,

subset = 1:3000,

model = 'mm',R = 50,

panel = TRUE,

ranp = c(pf = "n", cl = "n", loc = "n", wk = "n", tod = "n",

seas = "n"),

Q = 2,

iterlim = 500)

Estimating MM-MNL model

summary(Elec.mm)


Call:


0 | 0 | 1, data = Electr, subset = 1:3000, model = "mm",

ranp = c(pf = "n", cl = "n", loc = "n", wk = "n", tod = "n",

seas = "n"), R = 50, Q = 2, panel = TRUE, iterlim = 500,


method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


class.1.pf -1.2804726 0.1228091 -10.4265 < 2.2e-16 ***

class.1.cl -0.4971537 0.0806961 -6.1608 7.237e-10 ***

class.1.loc 0.6445668 0.2409667 2.6749 0.0074747 **

class.1.wk 0.7124326 0.2182372 3.2645 0.0010966 **

class.1.tod -11.6253933 1.0275088 -11.3142 < 2.2e-16 ***

class.1.seas -12.6572772 1.1396428 -11.1064 < 2.2e-16 ***

class.2.pf -0.4357360 0.1158819 -3.7602 0.0001698 ***

class.2.cl 0.0846431 0.0828833 1.0212 0.3071445

class.2.loc 3.3862203 0.3818676 8.8675 < 2.2e-16 ***

class.2.wk 2.7209205 0.3268854 8.3238 < 2.2e-16 ***

class.2.tod -4.7164132 1.0541542 -4.4741 7.673e-06 ***

class.2.seas -4.6475366 0.9600010 -4.8412 1.291e-06 ***

class.1.sd.pf 0.1053542 0.0394055 2.6736 0.0075045 **

class.1.sd.cl 0.2689940 0.0583282 4.6117 3.993e-06 ***

class.1.sd.loc 0.0075946 0.2655152 0.0286 0.9771810

class.1.sd.wk 0.1141660 0.5806554 0.1966 0.8441283

class.1.sd.tod 2.2052059 0.4503540 4.8966 9.751e-07 ***

class.1.sd.seas 2.3219899 0.4519776 5.1374 2.786e-07 ***

class.2.sd.pf 0.1969590 0.0337153 5.8418 5.163e-09 ***

class.2.sd.cl 0.3552958 0.0749217 4.7422 2.114e-06 ***

class.2.sd.loc 0.5870921 0.2869922 2.0457 0.0407886 *

class.2.sd.wk 1.1003680 0.3032253 3.6289 0.0002847 ***

class.2.sd.tod 1.4025201 0.5186779 2.7040 0.0068504 **

class.2.sd.seas 0.0731118 0.2565453 0.2850 0.7756548

(class)2 -0.1371921 0.0784367 -1.7491 0.0802771 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







The specification is similar to that of the LC model, but we now allow the parameters in eachclass to be normally distributed using the argument ranp. It is worth mentioning that thenumber of iterations required for this model is greater than that for previous models. Forthat reason we have set the maximum of iterations at 500 using the argument iterlim.


3.5. Willingness-to-Pay Space

Willingness-to-pay space models reparameterize the model such that the parameters are themarginal WTP for each attribute rather than the marginal utility. Train and Weeks (2005)and Sonnier, Ainslie, and Otter (2007) extend the WTP-space approach by allowing randomparameters. This WTP-space approach is very appealing because it allows the analyst toestimate the WTP heterogeneity distribution directly (Scarpa, Thiene, and Train 2008).

To illustrate the concept of WTP space, and how it can be estimated using gmnl, we will firstshow the case without random parameters. The standard procedure to derive willingness-to-pay measures is to start first with a model in preference space. For example, consider thesimple conditional logit model,

clogit <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0,

data = Electr,

subset = 1:3000)

summary(clogit)


Call:

gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas | 0,

data = Electr, subset = 1:3000, method = "nr")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


pf -0.611257 0.054828 -11.1486 < 2.2e-16 ***

cl -0.139809 0.020399 -6.8539 7.188e-12 ***

loc 1.198647 0.119740 10.0104 < 2.2e-16 ***

wk 1.030431 0.106319 9.6918 < 2.2e-16 ***

tod -5.453994 0.434132 -12.5630 < 2.2e-16 ***

seas -5.664840 0.441899 -12.8193 < 2.2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Optimization of log-likelihood by Newton-Raphson maximisation




Exit of MLE: gradient close to zero

To estimate the willignes-to-pay for each attribute, one needs to divide each attribute coeffi-cient by that of price pf. This ratio can be easily retrieved using the function wtp.gmnl:


wtp.gmnl(clogit, wrt = "pf")

Willigness-to-pay respect to: pf


cl 0.22872 0.03585 6.3801 1.77e-10 ***

loc -1.96095 0.23035 -8.5129 < 2.2e-16 ***

wk -1.68576 0.19490 -8.6493 < 2.2e-16 ***

tod 8.92258 0.20248 44.0657 < 2.2e-16 ***

seas 9.26752 0.21640 42.8250 < 2.2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The argument wrt="pf" indicates that all the coefficients should be divided by the coefficientof the attribute pf.

Another way to estimate the same WTP coefficients is to use the S-MNL model. We needfirst to compute the negative of the price coefficient using the mlogit.data function:

ElectrO <- mlogit.data(Electricity, id = "id", choice = "choice",

varying = 3:26, shape = "wide", sep = "",

opposite = c("pf"))

Next, we need to set the values for the price coefficient and τ at 1 and 0, respectively. Thefixed argument is used to set these values.

# Starting values

start <- c(1, 0, 0, 0, 0, 0, 0, 0)

# Estimate the model

wtps <- gmnl(choice ~ pf + cl + loc + wk + tod + seas|0 | 0 | 0 | 1,

data = ElectrO,

model = "smnl",

subset = 1:3000,

R = 1,

fixed = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE),

panel = TRUE,

start = start,

method = "bhhh",

iterlim = 500)


Note also that we fitted the S-MNL model with a constant in the scale. This constant, after aproper transformation, will represent the price coefficient. Since we are working with a fixedparameter model, the number of draws is set equal to 1.


summary(wtps)


Call:


0 | 0 | 1, data = ElectrO, subset = 1:3000, model = "smnl",

start = start, R = 1, panel = TRUE, fixed = c(TRUE, FALSE,

FALSE, FALSE, FALSE, FALSE, TRUE, FALSE), method = "bhhh",

iterlim = 500)


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


cl -0.228723 0.036101 -6.3356 2.364e-10 ***

loc 1.960954 0.228380 8.5864 < 2.2e-16 ***

wk 1.685756 0.191544 8.8009 < 2.2e-16 ***

tod -8.922585 0.202505 -44.0611 < 2.2e-16 ***

seas -9.267524 0.216588 -42.7888 < 2.2e-16 ***

het.(Intercept) -0.492237 0.091686 -5.3687 7.929e-08 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Optimization of log-likelihood by BHHH maximisation




Exit of MLE: successive function values within tolerance limit


Each value in the output represents the WTP estimates for each respective attribute. Notethat these WTP estimates are the same as those obtained using the wtp.gmnl function. Theprice coefficient can be obtained using the following transformation:

- exp(coef(wtps)["het.(Intercept)"])

het.(Intercept)

-0.6112572

If one requires the standard error, the deltamethod function from the msm package can beused in the following way:


library(msm)

estmean <- coef(wtps)

estvar <- vcov(wtps)

se <- deltamethod(~ - exp(x6), estmean, estvar, ses = TRUE)

se

[1] 0.05604362

Using the same idea, one can let the WTP to vary across individuals. To do so, we canestimate a G-MNL where the parameter of price and γ are fixed:

# Starting Values

start2 <- c(1, coef(wtps), rep(0.1, 5), 0.1, 0)

# Estimate the model

wtps2 <- gmnl(choice ~ pf + cl + loc + wk + tod + seas|0 | 0 | 0 | 1,

data = ElectrO,

subset = 1:3000,

model = "gmnl",

R = 50,

fixed = c(TRUE, rep(FALSE, 12), TRUE),

panel = TRUE,

start = start2,

ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n"))


summary(wtps2)


Call:


0 | 0 | 1, data = ElectrO, subset = 1:3000, model = "gmnl",

start = start2, ranp = c(cl = "n", loc = "n", wk = "n", tod = "n",

seas = "n"), R = 50, panel = TRUE, fixed = c(TRUE, rep(FALSE,

12), TRUE), method = "bfgs")


1 2 3 4

0.21467 0.30267 0.21733 0.26533


Coefficients:


cl -0.272874 0.051859 -5.2619 1.426e-07 ***

loc 2.164435 0.245296 8.8238 < 2.2e-16 ***

wk 1.943581 0.193666 10.0358 < 2.2e-16 ***

tod -9.679178 0.293550 -32.9728 < 2.2e-16 ***


seas -9.887635 0.277413 -35.6423 < 2.2e-16 ***

het.(Intercept) 0.113699 0.140126 0.8114 0.4171

sd.cl 0.411530 0.046497 8.8506 < 2.2e-16 ***

sd.loc 1.784992 0.232410 7.6804 1.577e-14 ***

sd.wk 1.286656 0.201673 6.3799 1.772e-10 ***

sd.tod 1.717628 0.222258 7.7281 1.088e-14 ***

sd.seas 2.214847 0.383833 5.7703 7.911e-09 ***

tau 0.690206 0.139422 4.9505 7.403e-07 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1







Finally, the parameter for price can be recovered using:

coef(wtps2)["het.(Intercept)"] - (coef(wtps2)["tau"]^2)/2

3.6. Individual Parameters

The gmnl also allows the analyst to get the conditional estimate for each individual usingBayes’ theorem (see for example Train 2009; Greene 2012). To show how these conditionalestimates are derived, let the probability for individual i in choice occasion t be

f(yit|xit,βi,θ) =J∏j

exp(x′ijtβi

)∑J

j=1 exp(x′ijtβi

)yijt

,

where βi = σiβ + [γ + σi(1− γ)] Lηi and θ represents any other parameter in the model. Ifwe have several choice occasions for an individual, then the joint density of the Ti observationsis given by

f(yi1, yi2, ..., yiTi |Xi,βi,θ) = f(yi|Xi,βi,θ) =∏t

f(yit|xit,βi,θ)

The marginal density of βi is f(βi|Ω), where Ω collects all the parameters that enter βi.Then, the joint distribution of yi and βi can be written as

f(yi,βi|Xi,θ,Ω) = f(yi|Xi,βi,θ)f(βi|Ω)

Using Bayes’ theorem, we obtain

f(βi|yi,Xi,θ,Ω) =f(yi|Xi,βi,θ)f(βi|Ω)∫

βif(yi|Xi,βi,θ)f(βi|Ω)dβi

.


So, the conditional expectation of βi is

E [βi|yi,Xi,θ,Ω] =

∫βi

βif(yi|Xi,βi,θ)f(βi|Ω)∫βif(yi|Xi,βi,θ)f(βi|Ω)dβi

.

This expectation gives us the conditional mean of the distribution of the random parameter(or also known as the posterior distribution of the individual parameters). A simulator forthis conditional estimate is

E [βi|yi,Xi,θ,Ω] =1R

∑Rr=1 βir

∏t f(yit|xit, βir, θ)

1R

∑Rr=1

∏t f(yit|xit, βir, θ)

=R∑

r=1

Wirβir

where βir = σirβ + [γ + σir(1− γ)] Lηi. We can also estimate the standard deviation of thisdistribution by calculating first:

E[β2i |yi,Xi,θ,Ω

]=

R∑r=1

Wirβ2ir,

and then computing the square root of the estimated variance,√E[β2i |yi,Xi,θ,Ω

]− E [βi|yi,Xi,θ,Ω]2.

For the latent class model, the expected individual-level parameters can be derived in asimilar manner (Boxall and Adamowicz 2002; Greene and Hensher 2003). The prior classprobabilities are wnq. Using Bayes’ theorem, we can obtain a posterior estimate of the latentclass probabilities using:

wq|i =Pi|qwiq∑Qq=1 Pi|qwiq

Note that the notation wq|i is used to indicate the individual-specific estimate of the classprobability, conditional on their estimated choice probabilities (Greene and Hensher 2003).The posterior estimates of the individual-specific parameter vector is:

βi =

Q∑q=1

wq|iβq

The gmnl package uses these formulae to compute the individual parameters along with theirstandard deviations. For example, we can plot the kernel density of the loc parameter forthe model with correlated parameters by typing the following:


plot(Elec.cor, par = "loc", effect = "ce", type = "density", col = "grey")

Figure 1: Kernel Density of the Individual’s Conditional Mean

−2 0 2 4 6 8

0.00

0.05

0.10

0.15

Conditional Distribution for loc

E(βi^)

Den

sity

The coefficients for the first 30 individuals can be plotted using:

plot(Elec.cor, par = "loc", effect = "ce", ind = TRUE, id = 1:30)


Figure 2: Individual Confident Interval for the Conditional Means

0 5 10 15 20 25 30

−2

02

46

8

95% Probability Intervals for loc

Individuals

E(β

i^)

Another important function is effect.gmnl. This function allows the users to get the indi-vidual’s conditional mean of both the parameters and the willingness-to-pay. For example,one can get the individual’s conditional mean and standard errors plotted in Figure 2 typing:

bi.loc <- effect.gmnl(Elec.cor, par = "loc", effect = "ce")

summary(bi.loc$mean)

Min. 1st Qu. Median Mean 3rd Qu. Max.

-0.7889 0.4223 2.0430 2.1190 3.4600 7.1300

summary(bi.loc$sd.est)


0.1127 0.5636 0.7955 0.8665 1.1320 1.8570

The willigness-to-pay for ”loc” for all individuals can be obtained by:

wtp.loc <- effect.gmnl(Elec.cor, par = "loc", effect = "wtp", wrt = "pf")

This function will compute the conditional mean of wtp = βi,loc/βpf for all individuals.Note that the argument par is the variable whose coefficient goes in the numerator, and theargument wrt is a string indicating which coefficient goes in the denominator.


summary(wtp.loc$mean)


-8.1940 -3.9760 -2.3470 -2.4360 -0.4853 0.9066

summary(wtp.loc$sd.est)


0.1295 0.6477 0.9141 0.9957 1.3010 2.1340

References

Boxall PC, Adamowicz WL (2002). “Understanding Heterogeneous Preferences in RandomUtility Models: A Latent Class Approach.” Environmental and resource economics, 23(4),421–446.

Bujosa A, Riera A, Hicks RL (2010). “Combining Discrete and Continuous Representationsof Preference Heterogeneity: A Latent Class Approach.” Environmental and ResourceEconomics, 47(4), 477–493.

Croissant Y (2012). “Estimation of Multinomial Logit Models in R: The mlogit Packages.”R package version 0.2-2. URL http://cran.r-project.org/web/packages/mlogit/

vignettes/mlogit.pdf.

Dumont J, Keller J, Carpenter C (2014). RSGHB: Functions for Hierarchical BayesianEstimation: A Flexible Approach. R package version 1.0.2, URL http://CRAN.R-project.

org/package=RSGHB.

Fiebig DG, Keane MP, Louviere J, Wasi N (2010). “The Generalized Multinomial LogitModel: Accounting for Scale and Coefficient Heterogeneity.” Marketing Science, 29(3),393–421.

Gourieroux C, Monfort A (1997). Simulation-based Econometric Methods. Oxford UniversityPress.

Greene WH (2012). Econometric Analysis. 7 edition. Prentice Hall. ISBN 0131395386.

Greene WH, Hensher DA (2003). “A Latent Class Model for Discrete Choice Analysis: Con-trasts With Mixed Logit.”Transportation Research Part B: Methodological, 37(8), 681–698.

Greene WH, Hensher DA (2010). “Does Scale Heterogeneity Across Individuals Matter? AnEmpirical Assessment of Alternative Logit Models.” Transportation, 37(3), 413–428.

Greene WH, Hensher DA (2013). “Revealing Additional Dimensions of Preference Hetero-geneity in a Latent Class Mixed Multinomial Logit Model.” Applied Economics, 45(14),1897–1902.

http://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.pdf

http://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.pdf

http://CRAN.R-project.org/package=RSGHB

http://CRAN.R-project.org/package=RSGHB


Hajivassiliou VA, Ruud PA (1986). “Classical Estimation Methods for LDV Models UsingSimulation.” In RF Engle, D McFadden (eds.), Handbook of Econometrics, volume 4 ofHandbook of Econometrics, chapter 40, pp. 2383–2441. Elsevier.

Henningsen A, Toomet O (2011). “maxLik: A Package for Maximum Likelihood Estimationin R.” Computational Statistics, 26(3), 443–458.

Hensher DA, Greene WH (2003). “The Mixed Logit Model: The State of Practice.” Trans-portation, 30(2), 133–176.

Hess S, Ben-Akiva M, Gopinath D, Walker J (2011). “Advantages of Latent Class OverContinuous Mixture of Logit Models.” unpublished: http://www. stephanehess. me.uk/papers/Hess Ben-Akiva Gopinath Walker May 2011. pdf [accessed August 2012].

Hess S, Rose JM (2012). “Can Scale and Coefficient Heterogeneity be Separated in RandomCoefficients Models?” Transportation, 39(6), 1225–1239.

Hess S, Stathopoulos A (2013). “Linking Response Quality to Survey Engagement: A Com-bined Random Scale and Latent Variable Approach.” Journal of choice modelling, 7, 1–12.

Keane M, Wasi N (2013). “Comparing Alternative Models of Heterogeneity in ConsumerChoice Behavior.” Journal of Applied Econometrics, 28(6), 1018–1045.

Lee LF (1992). “On Efficiency of Methods of Simulated Moments and Maximum SimulatedLikelihood Estimation of Discrete Response Models.” Econometric Theory, 8(04), 518–552.

McFadden D (1974). “Conditional Logit Analysis of Qualitative Choice Behavior.” In P Zarem-bka (ed.), Frontiers in Econometrics, pp. 105–142. Academic Press, New York.

McFadden D, Train K (2000). “Mixed MNL Models for Discrete Response.” Journal of appliedEconometrics, 15(5), 447–470.

R Development Core Team (2012). “R: A Language and Environment for Statistical Com-puting.” URL http://www.R-project.org/.

Scarpa R, Thiene M (2005). “Destination Choice Models for Rock Climbing in the Northeast-ern Alps: A Latent-class Approach Based on Intensity of Preferences.” Land economics,81(3), 426–444.

Scarpa R, Thiene M, Train K (2008). “Utility in Willingness to Pay Space: A Tool to AddressConfounding Random Scale Effects in Destination Choice to the Alps.” American Journalof Agricultural Economics, 90(4), 994–1010.

Scott Long J (1997). “Regression Models for Categorical and Limited Dependent Variables.”Advanced quantitative techniques in the social sciences, 7.

Shen J (2009). “Latent Class Model or Mixed Logit Model? A Comparison by TransportMode Choice Data.” Applied Economics, 41(22), 2915–2924.

Sonnier G, Ainslie A, Otter T (2007). “Heterogeneity Distributions of Willingness-to-Pay inChoice Models.” Quantitative Marketing and Economics, 5(3), 313–331.

http://www.R-project.org/


Train K (2009). Discrete Choice Methods with Simulation. 2 edition. Cambridge universitypress.

Train K, Weeks M (2005). “Discrete Choice Models in Preference Space and Willingness-to-Pay Space.” In R Scarpa, A Alberini (eds.), Applications of Simulation Methods inEnvironmental and Resource Economics, volume 6 of The Economics of Non-Market Goodsand Resources, pp. 1–16. Springer Netherlands. ISBN 978-1-4020-3683-5. doi:10.1007/

1-4020-3684-1_1. URL http://dx.doi.org/10.1007/1-4020-3684-1_1.

Train KE (2008). “EM Algorithms for Nonparametric Estimation of Mixing Distributions.”Journal of Choice Modelling, 1(1), 40–69.

Affiliation:

Mauricio Sarrias325 W. Sibley HallDepartment of City and Regional PanningCornell UniversityE-mail: [email protected]: https://msarrias.weebly.com

http://dx.doi.org/10.1007/1-4020-3684-1_1

http://dx.doi.org/10.1007/1-4020-3684-1_1

http://dx.doi.org/10.1007/1-4020-3684-1_1

mailto:[email protected]

https://msarrias.weebly.com

Multinomial Logit Models with Individual Heterogeneity in R : The …msarrias.weebly.com/uploads/3/7/7/8/37783629/gmnl.pdf · 2018. 9. 10. · S-MNL extends the MNL by letting the

Documents