7/30/2019 1-s2.0-S0304407602002233-main
1/24
Journal of Econometrics 114 (2003) 197220
www.elsevier.com/locate/econbase
Bayesian analysis of a self-selection model withmultiple outcomes using simulation-basedestimation: an application to the demand
for healthcare
Murat K. Munkina, Pravin K. Trivedib ;
a Department of Economics, 531 Stokely Management Center, University of Tennessee, Knoxville,
TN 37919, USAbDepartment of Economics, Wylie Hall, Indiana University, 100 South Woodlawn, Bloomington,
IN 47405, USA
Received 25 April 2001; received in revised form 15 August 2002; accepted 14 September 2002
Abstract
This paper studies a self-selection model with discrete and continuous outcomes and a treat-
ment variable. The treatment variable is endogenous to the two outcome variables. The approach
of the paper is fully parametric and Bayesian. The Bayes factor is calculated with the Savage
Dickey density ratio and used for model selection. The model is applied to two dierent micro
data sets, the 19871988 National Medical Expenditure Survey and the 1996 Medical Expen-
diture Panel Survey. The paper studies the eect of managed care and fee-for-service type of
private insurance on the demand for healthcare. It also compares the eects of private insurance
and Medicaid in covering health care expenses of elderly Americans.c 2002 Elsevier Science B.V. All rights reserved.
JEL classication: I11; C11; C31; C35
Keywords: Health insurance; Self-selection; Managed care; Medicare; Medicaid; Markov chain Monte Carlo
1. Introduction
This paper has two major foci, one methodological and the other empirical. The
methodological component deals with estimation of a three-equation self-selection model
with two correlated outcomes, one of which is a count and the second is a contin-
uous variable. We are interested in the impact of selection on the conditional mean
Corresponding author. Tel.: +1-812-855-3567; fax: +1-812-855-3736.
E-mail address: [email protected] (P.K. Trivedi).
0304-4076/03/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 0 2 ) 0 0 2 2 3 - 3
mailto:[email protected]:[email protected]7/30/2019 1-s2.0-S0304407602002233-main
2/24
198 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
of the outcomes, allowing for endogenous selection. The econometric methodology
used is Bayesian and parametric; it is strongly motivated by the dicult computational
problems that arose when the same model was estimated in a simulated maximum
likelihood framework. The empirical component of the paper investigates the impactof public and private health insurance, which is our treatment variable, on the demand
for healthcare and expenditure, that constitute our outcome measures. The motivation
behind the empirical component is a long-standing and inconclusive debate in the health
economics literature on the impact of the choice between the traditional fee-for-service
(FFS) types of private insurance and health maintenance organization (HMO) plans on
the demand for healthcare consumption for individuals of all ages.
In the rst part of the paper we consider simulation-based estimation of an econo-
metric model with y1 and y2, which are two jointly dependent discrete and continuous
random outcome variables, respectively. A third variable in our model is denoted d;
which will be referred to as the selection, or treatment, variable. For simplicity let d=1
refer to the treated state and d = 0 refer to the untreated state. Suppose that our main
interest is in the average value of the partial derivative such as y1=d or y2=d.
If (y1; y2; d) are jointly dependent, then it is known that ignoring endogeneity of d
results in self-selection bias, i.e. the causal eect of d on the outcome variable is not
identied. Bias arises because the treatment (self-selection) variable d reects choices
or decisions of an individual, and hence is endogenous.
Lee (2000) provides a recent survey of the large literature on self-selection. The
most widely used version of the selection model is a two-equation model in which
the outcome variable is continuous and the outcome equation is linear. Heckman(1976) proposes a two-stage estimation method for this type of models. In contrast,
the model considered here is nonlinear with both discrete (count) and continuous out-
comes. This model is an extension of Terza (1998), Crepon and Duguet (1997), Greene
(1997) and Winkelmann (1998) who have proposed both two-step moment-based and
simulation-based full-information estimation methods to estimate a selection model in
which the outcome variable is a count (also see van Ophem, 2000). Other approaches
to the same problem follow the traditional selection model format, e.g. Dowd et al.
(1991), in so far as they ignore discreteness, nonnormality, and heteroskedasticity that
are inherent in the data. However, such moment-based procedures are in general inef-
cient, even though computationally they are easier to implement. Further, the methodof moments does not allow one to estimate the full set of parameters for models
with correlated multiple outcomes. A second possibility is to use a weighted nonlin-
ear instrumental variable approach. Such an approach in other contexts has not been
very successful, in part because of the diculty of estimating consistently the weights.
Another approach is the simulated maximum likelihood method which requires a su-
cient number of simulations for consistency but we often do not know the operational
meaning of sucient. 1 In addition, the above-mentioned moment-based procedures
are dicult to generalize to the case of multiple outcomes.
1 The authors have found that the SML estimator for the present problem converged very slowly as the
number of outcomes increases. Besides, convergence can be impossible for some parameterizations and
parameter values due to unbounded gradient vectors.
7/30/2019 1-s2.0-S0304407602002233-main
3/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 199
We approach the estimation problem in a Bayesian setting, assuming specic marginal
distributions for the dependent variables. We consider the case of selection on unob-
served heterogeneity with factor structure (for greater generality). Prior distributions
are assigned to the parameters of the model. We apply the Markov chain Monte Carlo(MCMC) approach to estimation, building on the recent research of Albert and Chib
(1993), Chib et al. (1998), Chib and Hamilton (2000), Koop and Poirier (1997), Li
(1998), McCulloch et al. (2000), as well as several earlier studies.
The existence of possible selection bias has long been an issue of contention in
empirical health economics based on observational data in which insurance status is
a choice variable and not exogenously assigned (Maddala, 1985). The standard treat-
ment of selection eect in a linear model needs signicant extensions in dealing with
a typical healthcare application where outcomes are often count as well as continuous
variables and the selection decision may be multinomial. Computational diculty of
simultaneously dealing with nonlinearity, discreteness, and sample selection has dis-
couraged a full treatment of this topic. However, the issue is clearly important. Public
insurance, such as Medicaid and Medicare, are special programs that target only par-
ticular groups of individuals eligible for them, such as the low-income and the elderly.
A subset of these individuals also purchase private (Medigap) insurance that cov-
ers out-of-pocket expenses that constitute the gap between the provider charges and
the public insurance benets. Therefore, although the public insurance status can be
viewed as predetermined, the choice of private gap insurance plans may be endoge-
nously determined jointly with the level of the demand for healthcare. The issue of
endogeneity is relevant to comparisons of access to, utilization of, and evaluation ofquality of care between groups of healthcare users classied by their health insurance
status. If one can validly assume exogeneity of insurance status, such comparisons
are econometrically easier to implement because insurance choice and healthcare use
can be modeled separately. If not, then the modeling exercise is more computationally
complex especially if one wants to eciently estimate all parameters using a full in-
formation system estimator. We pursue this objective in order to facilitate more precise
comparisons between utilization patterns of dierent categories of insurees.
The literature on the selection models for healthcare utilization does not present a
full consensus on the importance of endogeneity of the insurance decision. In Section 6
we will compare our empirical ndings on endogeneity with those from several recentstudies. Here we briey mention ndings from a few studies to reect the mixed
empirical evidence on the endogeneity issue. For example, Miller and Luft (1994,
1997) survey several studies on HMOs and their impact on utilization. These studies
produce mixed results regarding the bias due to neglect of endogeneity. 2 Dowd et
al. (1991) nd negligible evidence of selection bias. Reschovsky (2000) argues that
self-selectivity of the HMO variable is due to observed characteristics of the individuals
and insurance markets and the threat of selection bias arises from not measuring and
2
Half of these studies show higher rates of physician doctor visits for HMO enrollees and the other halfreach the opposite conclusion. Most of the studies ignore the self-selecting behavior of the individuals. Some
of them argue that for particular types of healthcare, such as physician doctor visits, the insurance status is
exogenous. Others acknowledge the problem but avoid the issue of endogeneity for computational simplicity.
7/30/2019 1-s2.0-S0304407602002233-main
4/24
200 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
including these factors. Although unmeasured they are observable and an especially
rich set of explanatory variables could control for selectivity. Tu et al. (2000) and
Kemper et al. (2000) study healthcare utilization of HMO enrollees and generally
argue against the importance of the endogeneity issue and register reservations aboutsome instrumental variable type approaches for handling it. Goldman (1995), on the
other hand, emphasizes the role and importance of endogeneity. Because of dierences
in data and methods used in dierent investigations a full consensus is hard to achieve.
In this study we reconsider the modeling issues using a methodology that addresses
several neglected issues and apply our methodology using 1996 MEPS data as well as
the 1987 NMES data.
The rest of the paper is organized as follows. Section 2 species the model. MCMC
estimation of the model is presented in Section 3. Section 4 considers issues of Bayesian
model selection. Section 5 considers an example with articially generated data and
Section 6 deals with two empirical applications and concludes the paper.
2. Model specication
We observe N (i = 1; : : : ; N ) independent observations and it is assumed that: the
counted (outcome) variable y1i is Poisson distributed conditional on exogenous co-
variates x1i, endogenous variable di and unobserved heterogeneity 1i; the continuous
nonnegative (outcome) variable y2i is exponentially distributed conditional on exoge-
nous covariates x
2i
, endogenous variable di and unobserved heterogeneity 2i. The pres-
ence of unobserved heterogeneity in this structure will permit us to model counts and
continuous variables that display overdispersion. Specically,
y1i|x1i; di; 1i indP[i]; y1i = 0; 1; 2; : : : ; (2.1)y2i|x2i; di; 2i ind exp[i]; y2i 0; (2.2)
where P and exp stand for Poisson and exponential distributions with mean i and
1=i, respectively. Variables y1i and y2i are assumed to be independent conditional
on the unobserved heterogeneity. Their respective marginal distributions, obtained by
integrating out 1 and 2, will be more exible. The specication of the conditional
means is
log i = x1i
1 + 1di + 1i; (2.3)
log(1=i) = x2i
2 + 2di + 2i: (2.4)
The third (selection) equation in the model denes a latent variable zi such that
zi = x3i + ui; (2.5)
di =
1 if zi 0;
0 if zi 0;
where di is the treatment variable and x3i is a vector of exogenous explanatory vari-ables, and zi is a latent variable related to di. More specically, it could be propensity
to purchase private insurance or propensity of being in an HMO.
7/30/2019 1-s2.0-S0304407602002233-main
5/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 201
Endogeneity of di is modeled through correlation between unobserved variables 1i,
2i and ui, that are assumed jointly normally distributed
1i; 2i; ui N[(0; 0; 0); ]: (2.6)Thus, the outcome variables depend upon the treatment propensity through correlations
between 1i, 2i and ui. Since zi in the selection equation (2.5) is unobservable only
the ratio =
uu is identied. We restrict the variance of the latent variable uu to
unity so that the covariance matrix is
=
11 12 1u
12 22 2u
1u
2u
1
: (2.7)
The next section assigns prior distributions to the parameters of the model and
describes the estimation procedure, which uses the MCMC algorithms to construct
ergodic Markov chains converging to the posterior distributions of the parameters.
3. MCMC estimation
In our model the set of parameters to estimate is 1 ; 1; 2 ; 2; and ve elements
of matrix with uu = 1 as the identication restriction. If the sign of one coef-cient of were known a priori then one could x it to 1 or 1 as an alterna-tive identication restriction. That would allow uu to vary and simplify the MCMC
algorithm leading us to a convenient form of the Wishart distribution for the para-
meters of matrix . However, such information is not generally available. McCulloch
and Rossi (1994) in their analysis of the multinomial probit model propose to spec-
ify proper priors for the full set of parameters (; ) without any identication re-
strictions, derive the posterior distribution for both and and report the marginal
posterior of the identied parameters (=
uu; |uu = 1). Nobile (1998) proposes ahybrid sampler that improves convergence and autocorrelation properties of the al-
gorithm. However, with this approach it is impossible to assign improper priors onthe parameters. Nobile (2000) and Linardakis and Dellaportas (1999) propose algo-
rithms to draw directly from Wishart distribution conditional on one of the diagonal
elements.
We use data augmentation approach (Tanner and Wong, 1987) and include unob-
servable variables zi ; 1i and 2i in the algorithm drawing them at each iteration and
for all observations. Denote i = (1i; 2i), 21 = E(i ui) and 22 = E(
i i). We follow
the approach of Koop and Poirier (1997), Li (1998) and McCulloch et al. (2000) and
write the joint distribution of vi = (1i; 2i; ui) as the product of marginal distribution
of ui and the conditional distribution of i
|ui. This allows us to use Gibbs sampling
algorithm (Geman and Geman, 1984) to draw values from the posterior distributions ofthe elements of matrix . The distribution of ui is standard normal and the conditional
distribution i|ui N(12ui; ), where = 22 2112. Since there is a one-to-one
7/30/2019 1-s2.0-S0304407602002233-main
6/24
202 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
correspondence between and (12; )
= + 2112 2112 1
;the MCMC procedure can be organized by blocking as 12 and and including
them in the MCMC algorithm. Block the rest of the parameters as (1 ; 1), (2 ; 2)
and . For brevity, denote: 1 = (1 ; 1), 2 = (
2 ; 2), = (1; 2) and x1 = (x
1 ; d),
x2 = (x2 ; d). Assume the following prior distributions:
1 N(01; B101 ); 2 N(02; B102 ); N(0; A10 );1 Wish(n0; D0); 12 N($12; 112 ); (3.1)
where 01; B1
01 ; 02; B1
02 ; n0, D0; $12; 1
12 ; 0; A0 are known parameters, N(0; B1
0 ) de-notes multivariate normal distribution with mean vector 0 and covariance matrix B
10
and Wish(n0; D0) is the Wishart distribution with n0 degrees of freedom and scale
matrix D0. Denote = (1; 2; ; 1; 12). Then the joint posterior density of the
parameters and unobservables i and zi given the data is
(;;z |y) = C(1)(2)()(1)(12)
N
i=1[I{di = 1}I{zi 0} + I{di = 0}I{zi 0}]
exp(i)y1ii
y1i!i exp(iy2i)(i|12ui; )(ui|0; 1); (3.2)
where I{:} is the indicator function, (:|; 2) is the p.d.f. of the N[; 2], C is aproportionality constant, = (1 2); z
; y = (y1 y2 d) are matrices of N observations
and ui = zi x3i.
We construct our Markov chain blocking the parameters as i = (1i 2i), 1; 2; zi ; ;
12 and with the full conditional distributions
[1; 2
|y; 12; ; ; u]; [1
|y1; 1]; [2
|y2; 2]; [z
|1; 2; ; 12; ];
[|; z; 1; 2; 12; ]; [12|; 1; 2; u] and [1|12; 1; 2; u]:Notice that given z and ; u is known with certainty and when condition on u
formally it is conditioned on z and or (z x3). The following steps summarizeour algorithm.
3.1. Sampling 1 and 2
The full conditional density for 1 and 2 is
(1; 2|y; 12; ; ; u) =N
i=1
(1i; 2i|yi; 12; ; ; ui);
7/30/2019 1-s2.0-S0304407602002233-main
7/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 203
the product of N independent terms. We utilize the MetropolisHasting algorithm
(Metropolis et al., 1953; Hastings, 1970) to sample (1i; 2i) for each observation i
from the density
(1i; 2i|yi; 12; ; ; ui) = ci exp(exp(x1i1 + 1i))(exp(1i))y1i exp(2i)exp(exp(x2i2 2i)y2i)(i|12ui ; );
where ci is a proportionality constant and choose t-distribution centered at the modal
value of the full conditional density for the proposal density. Let
i = (1i; 2i) = arg max log (1i; 2i|yi; 12; ; ; ui)and Vi =(Hi )1 be the negative inverse of the Hessian of log (1i; 2i|yi; 12; ; ; ui)evaluated at the mode
i. The gradient vector and the Hessian are derived in the Com-
putational Appendix and used for a few steps of the NewtonRaphson algorithm to nd
the modal value and the Hessian formula is used to calculate the covariance matrix of
the proposal distribution q(i|yi; 12; ; ; ui) = fT(i|i; Ve i ; ), a bivariate t-distributionwith degrees of freedom, a tuning parameter selected to obtain reasonable acceptance
rates. When a proposal value i = (1i;
2i) is drawn the chain moves to the proposal
value with probability
(i; i ) = min
(i |yi; 12; ; ; ui)q(i|yi; 12; ; ; ui)(i|yi; 12; ; ; ui)q(i |yi; 12; ; ; ui)
; 1
:
If the proposal value is rejected then the next state of the chain is at the current valuei = (1i; 2i).
3.2. Sampling 1 and 2
The full conditional densities for 1 and 2 are
(1|y1; 1) = C1 (1|01; B101 )N
i=1
exp(exp(x1i1 + 1i))(exp(x1i1 + 1i))y1i ;
(2|y2; 2) = C2 (2|02; B102 )N
i=1
exp(x2i2 2i)
exp(y2i exp(x2i2 2i));where C1 and C2 are proportionality constants. The MetropolisHastings algorithm
is used again to draw samples from these densities. Denote j (j = 1; 2) the current
state of the Markov chain, j the mode of the full conditional density and j the
candidate for the new value of the chain. Following Chib et al. (1998) the proposal
density q(j ; j) is selected to be multivariate t-distribution with k degrees of freedom
fT(
j|
j(j
j); jVj) centered at
j(j
j). This proposal density is symmetricin j and j. The covariance matrix of the multivariate t-density is set to be Vj =H1
j,
negative inverse of the Hessian of log (j|yj ; j) evaluated at the modal value j and
7/30/2019 1-s2.0-S0304407602002233-main
8/24
204 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
j is an adjustable constant. The candidate j , drawn from the proposal distribution, is
accepted with probability
Pr(j ; j|yj ; j) = min
1; (j|yj ; j)(j|yj ; j)
:
If the candidate is not accepted then the chain does not change its value.
3.3. Sampling z
Variables zi are included in the MCMC algorithm (Albert and Chib, 1993). Sample
N independent random variables zi such that zi |1i; 2i; ; 12; is distributed normal
with mean x3i + 12122 (
1i2i
) and variance 1
12
122 21 (22 = + 2112) and it
is truncated at zero at the left if di = 1 and at the right if di = 0. To sample from thetruncated normal we follow Geweke (1991). See also Devroye (1986, p. 380).
3.4. Sampling ; 12 and
From Eqs. (2.3), (2.4) and given 1; 2 and 1; 2, variables log and log 1= are
known with certainty. Since variables log , log1= and z are multivariate normal,
1; 2 and are jointly normal and the conditional distribution of given 1; 2 is also
normal. Denote the parameters of this conditional distribution as N( |; 1|) (they
are derived in the Computational appendix). Thus given 1; 2; ; 1; 2; z
; 12; andthe prior N(0; A10 ) the posterior distribution of is normal with mean [A0 +|]
1[A00 + | |] and variance [A0 + |]1. It is straightforward to sample
from this distribution.
Conditional on 1; 1, 2 and u, and given the prior 12 N($12; 112 ), the poste-rior distribution of 12 is normal with the mean of (12 +
1uu)1(12$12 +1u)
and the variance of (12 + 1uu)1.
Given 12; 1; 2; u and the prior 1 Wish(n0; D0), the posterior distribution for
1 is
(1|v) Wishn0 + N;D10 +N
i=1
(i 12ui)(i 12ui)1 ;
see Zellner (1971, p. 389) and Johnson (1987, pp. 203204) for the details of the
Wishart density and the algorithm that draws values from it.
4. Model comparison
This section discusses the issues of Bayesian model comparisons. The model we
consider in this paper permits endogeneity of the treatment variable. We would like todevelop a decision rule that would allow one to select between two models: model M0,
with constraints 12 = (0 0), and M1 that leaves 12 unconstrained. The Bayes factor
7/30/2019 1-s2.0-S0304407602002233-main
9/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 205
for the null hypothesis H0 : 12 = (0 0) is dened as
B0; 1 =m(y|M0)m(y|M1)
;
where m(y|Mj) is the marginal likelihood of the model specication Mj : Since thesetwo models are nested we take the SavageDickey density ratio approach (Verdinelli
and Wasserman, 1995) to calculate the Bayes factor as
B0; 1 =(12|y)
(12);
where (12|y) is the posterior density and (12) is the prior density of parameter 12calculated at the point 12=(0 0). To estimate (12|y) we approximate it by averagingthe full conditional density of 12; (12
|; 1; 2; u) with respect to a posterior sample
s; s1;
s2 and u
s; s = 1; : : : ; S . Let Vs12 = (12 + 1
s usus)1 and s12 = V
s12(12$12 +
1s sus). Then
(12|y) =1
S
Ss=1
(12|s; s1; s2; us);
where (12|s; s1; s2; us) is the bivariate normal density with mean s12 and varianceVs12 evaluated at
12. One also has to estimate the prior density at
12. In general,
less informative priors would favor the null hypothesis so that improper priors are
not applicable for testing. We will choose informative priors for 12 without a large
spread. The specication of the prior distributions is discussed in more details in the
next sections.
5. Example based on articial data
As an example we generate articial data according to the selection model and
estimate it by the MCMC algorithm. The motivation of this exercise is to see how the
method performs when the model is correctly specied and to investigate the existence
of a self-selection bias when endogeneity of the treatment variable is ignored. To dothat one can estimate the model under the following restriction: 12 = (0 0). The
MCMC algorithm is easily implemented under this restriction by xing the draws of
21 to the zero vector. The data set, consisting of 1000 (i = 1; : : : ; 1000) observations,
is generated as follows.
1. x1i = x2i = (1; i) and i N(0; 1), x3i = (1; wi) and wi N(0; 1), = (1; 1),
1 = (1; 1; 0:5), and 2 = (1; 1;0:5).2. ui; 1i; 2i N[(0; 0; 0); ], where
=
1
0:5 0:5
0:5 1 0:50:5 0:5 1
:
7/30/2019 1-s2.0-S0304407602002233-main
10/24
206 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
3. Generate zi = x3i + ui and di such that
di = 1 i zi 0;
di = 0 i zi 0;
The values of and x3 are chosen in such a way that about 75% of all generated
observations have d = 1 and for the rest d = 0. Set x1i = (x1i; di) and x2i = (x
2i; di).
Denote k1; k2 and k3 are the number of variables in x1; x2 and x3, respectively.
4. Finally generate y1i and y2i as
y1i P[exp(x1i1 + 1i)];
y2i exp[1=exp(x2i2 + 2i)]:
We center priors for parameters ; 1 and 2 at zero and choose them to be
1 N(0k1 ; 10Ik1 ); 2 N(0k2 ; 10Ik2 ); N(0k3 ; 10Ik3 ) (5.1)to reect weak prior information on these parameters. Select a proper prior for
and center it at the identity matrix (since in general there is no information on the
covariance parameters 1u, 2u and 12 we center their priors at zero). McCulloch et
al. (2000) analyzing the multinomial probit model point out that in order to choose
such prior for one can select the following priors for 12 and 1:
12 N(02; I2);1 Wish(n0; (n0 2)(1 )I2) (5.2)
and specify only two scalars, and n0. In our example we choose =18
and n0 = 5.
Table 1 3 gives estimates of the posterior means, standard deviations and the auto-
correlation function of the coecients at lag 20. When endogeneity of the treatment
variable is ignored it results in a self-selection bias and coecients 1 and 2 are in-
consistently estimated. We use the SavageDickey density ratio to calculate the Bayes
factor to test H0 : 12 = (0 0). The calculated value of B0; 1 is of 106 order, which
provides decisive evidence against H0. This suggests potentially serious consequence
of ignoring the self-selection problem. The inuence of the self-selection bias on thepredicted utilization and expenditure is examined in more details in the next section.
In addition, we generate 3000 observations according to the selection model and es-
timate the model under M1 model specication. The results are given in Table 1.
Fig. 1 displays the prior and posterior distributions (1000 and 3000 observations)
for one of the parameters, 2u. The histograms are based on 20 000 iterations. The
solid lines, 2u = 0:5, are drawn at the true value of the data generating process. The
3 We run 40 000 replications following rst a burn-in phase of 1000 replications. During the burn-in
phase the Markov chains converge to the stationary distributions. The posterior means and posterior standarddeviations are calculated based on the 40 000 draws. Values for tuning parameters, 1 = 0:7, 2 = 0:9 and
k= = 15, are obtained in short preliminary runs by choosing reasonable acceptance rates close on average
to 0.3 and examining the serial correlations of the Markov chains.
7/30/2019 1-s2.0-S0304407602002233-main
11/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 207
Table 1
MCMC estimation for generated data
Coecient True value Unrestricted Restricted Unrestricted
N = 3000N = 1000 ACF(20) N = 1000 ACF(20)
Const1 1 1.111 0.560 1.523 0.148 0.955
0.125 0.072 0.069
x1 1 0.990 0.174 0.966 0.373 1.022
0.036 0.039 0.022
d 0.5 0.450 0.586 0.132 0.134 0.529
0.154 0.083 0.084
Const2 1 1.043 0.402 0.665 0.018 1.005
0.187 0.096 0.091
x2 1 1.029 0.024 0.923 0.036 1.003
0.048 0.048 0.028
d 0.5 0.522 0.488 0.017 0.012 0.495
0.251 0.109 0.114
Const3 1 1.080 0.071 0.986 0.003 1.062
0.059 0.059 0.034
x3 1 0.992 0.299 1.048 0.004 1.018
0.074 0.069 0.039
1u 0.5 0.512 0.453 0.474
0.099 0.060
2u
0.5 0.551 0.504 0.500
0.169 0.079
11 1 0.954 0.189 0.979 0.031 0.981
0.066 0.060 0.039
12 0.5 0.520 0.236 0.507 0.102 0.475
0.069 0.055 0.036
22 1 1.089 0.081 0.979 0.052 0.996
0.112 0.098 0.065
results indicate that the impact of the priors diminishes as the number of observations
increases, the posterior means move closer to the true values of the parameter andthe standard deviations of the posterior means become smaller. Overall the estimation
algorithm performs well and it produces Markov chains with reasonable properties and
the estimated results are consistent with the true parameters.
6. Empirical application
In this section we investigate the self-selectivity of private insurance considering
two dierent situations. The model is applied to two dierent data sets from two
household-based medical expenditure surveys sponsored by the Agency for HealthcareResearch and Quality (AHRQ). These are nationally representative surveys of health-
care use, expenditure, source of payment and insurance coverage for the US civilian
7/30/2019 1-s2.0-S0304407602002233-main
12/24
208 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
Prior distribution
-1.0 -0.5 0.0 0.5 1.0
0
500
1000
1500
2000
Posterior distribution (N=1000)
0.2 0.4 0.6 0.8
0
500
1000
1500
2000
Posterior distribution (N=3000)
0.2 0.4 0.6 0.8
0
500
1000
1500
2000
2u = 0.5
2u = 0.5
Fig. 1. Prior and posterior distributions for 2u in the articial data example.
noninstitutionalized population. The rst sample pertains to the U.S. elderly population
and the second to the U.S. nonelderly population. Both data sets are based on publicly
available data at AHRQ and contain only individuals with positive healthcare expen-
diture. Denitions and summary statistics for the variables from the data sets used in
this paper are given in Table 2.
6.1. Private insurance
First, we analyze the impact of private insurance on the number of physician doctor
visits and associated expenditures by elderly Americans. A sample of 3690 observationsis obtained from the National Medical Expenditure Survey conducted in 1987 and 1988
(NMES, 1987).
7/30/2019 1-s2.0-S0304407602002233-main
13/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 209
Table 2
Variable denition and summary statistics
Variable Data set MEPS NMES
Number of observations 2893 3690
DenitionMean St. Dev. Mean St. Dev.
DOCVIS Number of physician oce visits 4.74 6.30 6.88 6.85
DVEXP Expenditure on physician oce visits 481.8 972.0 424.2 788.5
EXCHLTH Equals 1 if self perceived health is excellent 0.32 0.47 0.07 0.25
POORHLTH Equals 1 if self perceived health is poor 0.02 0.13 0.13 0.34
NUMCHRON Number of chronic conditions 0.75 1.11 1.66 1.35
ADLDIFF Equals 1 if the person has a condition which 0.21 0.41
limits activities of daily living
INJURY Number of injuries which limit activities 0.42 0.83
of daily living during 1996NOREAST Equals 1 if the person lives in northeastern U.S. 0.20 0.40 0.19 0.39
MIDWEST Equals 1 if the person lives in midwestern U.S. 0.25 0.44 0.26 0.44
WEST Equals 1 if the person lives in western U.S. 0.21 0.41 0.19 0.39
AGE age in years (divided by 10) 4.03 1.29 7.41 0.62
BLACK Equals 1 if the person is African American 0.10 0.30 0.10 0.31
FEMALE Equals 1 if the person is female 0.58 0.49 0.61 0.49
MARRIED Equals 1 if the person is married 0.65 0.48 0.55 0.50
SCHOOL Number of years of education 13.28 2.58 10.5 3.7
FAMINC family income in $1,000 58.84 38.63 25.6 29.9
EMPLOYED Equals 1 if the person is employed 0.82 0.38 0.10 0.30
PRIVATE Equals 1 if the person is covered by 0.80 0.40
private health insurance
INSURANCE Equals 0 if the person in an HMO 0.51 0.50
1 if the person has a FFS plan
MEDICAID Equals 1 if the person is covered by Medicaid 0.09 0.29
SELFEMP Equals 1 if the person is self-employed 0.09 0.28
SIZE The size of the company where the person works 124.0 176.1
LOCATION Equals 1 if the company has multiple locations 0.53 0.50
GOVT Equals 1 if the company is governmental 0.18 0.39
Deb and Trivedi (1997) and Munkin and Trivedi (2000) treated private insuranceas an exogenous variable for two reasons. First, individuals have strong incentives to
purchase private insurance before they are 65 years old because its price rises sharply
after that age. Second, individuals older than 66 are covered by Medicare, a generous
public insurance program that oers a substantial protection against healthcare cost.
Medicare covers expenses of mostly acute healthcare needs including those associated
with the type of utilization that we consider, physician doctor visits, but not the costs
of long-term healthcare. Hence some will choose to purchase additional insurance such
as private insurance to cover out-of-pocket expenses, justifying the treatment of private
insurance as an endogenous variable.
We also compare the relative impact of private insurance and Medicaid on thelevel of healthcare utilization and expenditure. Medicaid provides health insurance to
low-income individuals at public expense by covering the cost dierence between the
7/30/2019 1-s2.0-S0304407602002233-main
14/24
210 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
cost of health service and the Medicare coverage. Those ineligible for Medicaid may
purchase private insurance with coverage similar to that of Medicaid.
The three equation joint model analyzes the number of doctor visits (DOCVIS),
doctor visit expenditure (DVEXP) and private insurance (d). We choose the compo-nents of vectors x1; x2 and x3 as follows. The determinants for the healthcare con-
sumption and expenditure, vectors x1 and x2, have the same set of variables, which
consists of self-perceived health status variables EXCLHLTH and POORHLTH,
a measure of chronic diseases and disability status NUMCHRON and ADLDIFF,
geographical variables NOREAST, MIDWEST and WEST, demographic variables
BLACK, MALE, MARRIED, SCHOOL, AGE, economic variable EMPLOYED
and insurance variables MEDICAID and PRIVINS (d). In general, health status of an
individual is unobservable and dicult to measure. However, self-perceived health vari-
ables, together with the evaluation of chronic conditions and disability status, have
proven to be a good measure of health status. The geographical variables are included
to capture dierences in the local insurance and healthcare markets. Vector x3 com-
prises of factors that inuence the decision to purchase a private insurance, including
EXCLHLTH, POORHLTH, NUMCHRON, ADLDIFF, NOREAST, MIDWEST,
WEST, BLACK, MALE, MARRIED, SCHOOL, EMPLOYED, AGE and FAM-
INC. MEDICAID, which targets low income individuals, is excluded from the insur-
ance equation.
In a nonlinear simultaneous model identication can in principle be secured by non-
linearity of the functional forms (McManus, 1992). However, genuine exclusion restric-
tions, if available, ensure more robust identication of causal parameters (Heckman,2000). Thus motivated, we assume that the variable FAMINC inuences the deci-
sion to purchase private (Medigap) insurance, but does not aect utilization. This
restriction is empirically supported in this study.
Our choice of priors is similar to that in the numerical example of the previous
section and is given by (5.1) and (5.2).
The posterior means and posterior standard deviations are presented in Table 3 4
which also presents estimation results for the restricted model with 12 = (0 0). The
posterior mean estimates for the coecient of PRIVINS are substantially dierent for
the restricted model (Table 3, columns 2 and 3) and unrestricted models (columns 5
and 6). The restricted coecients are positive and relatively precisely estimated, but theunrestricted coecients are positive but quite imprecisely estimated; i.e., they become
statistically insignicant under endogeneity assumptions. This result is consistent with
the presence of selection bias in the following sense. If after accounting for correlation
between the insurance and utilization decisions, PRIVINS no longer has a signicant
variable then the presence of selection bias is conrmed. However, for this argument
to be plausible requires that our estimates of covariance parameters 1u and 2u be
signicantly dierent from zero. Although both are estimated to have positive posterior
means, their standard deviation is too large to permit reliable inference. Consequently,
the result on selection bias is inconclusive.
4 The MCMC estimation is based on 40 000 replications. Values for tuning parameters 1 = 2 = 0:1 and
k = v = 15 are selected.
7/30/2019 1-s2.0-S0304407602002233-main
15/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 211
Table 3
MCMC estimates of the restricted and unrestricted models for private insurance
M1 (Unrestricted) M0 (Restricted)
Insurance Docvis Dvexp Insurance Docvis Dvexp
CONST 0.172 1.402 4.925 0.179 1.336 4.889
0.334 0.172 0.190 0.350 0.151 0.136
EXCLHLTH 0.098 0.278 0.245 0.106 0.281 0.247
0.110 0.055 0.082 0.111 0.055 0.082
POORHLTH 0.232 0.266 0.291 0.235 0.274 0.300
0.073 0.045 0.077 0.076 0.043 0.072
NUMCHRON 0.014 0.134 0.143 0.014 0.133 0.141
0.020 0.011 0.018 0.020 0.011 0.019
ADLDIFF 0.216 0.076 0.107 0.219 0.081 0.110
0.064 0.040 0.069 0.067 0.039 0.065MEDICAID 0.204 0.254 0.206 0.253
0.056 0.082 0.056 0.079
PRIVINS 0.057 0.227 0.198 0.337
0.192 0.285 0.041 0.064
NOREAST 0.136 0.078 0.188 0.135 0.076 0.186
0.071 0.038 0.065 0.074 0.039 0.064
MIDWEST 0.306 0.0003 0.049 0.307 0.005 0.051
0.067 0.039 0.064 0.070 0.036 0.059
WEST 0.144 0.101 0.325 0.141 0.102 0.327
0.071 0.039 0.064 0.075 0.038 0.062
BLACK 0.881 0.038 0.047 0.874 0.022 0.039
0.074 0.080 0.119 0.076 0.051 0.078
MALE 0.018 0.030 0.004 0.015 0.030 0.005
0.059 0.032 0.054 0.059 0.032 0.052
MARRIED 0.266 0.041 0.008 0.266 0.045 0.011
0.059 0.035 0.059 0.060 0.032 0.051
SCHOOL 0.100 0.017 0.026 0.100 0.016 0.024
0.007 0.007 0.010 0.008 0.004 0.007
EMPLOYED 0.062 0.001 0.020 0.061 0.001 0.020
0.096 0.052 0.079 0.099 0.049 0.077
AGE 0.007 0.038 0.022 0.006 0.039 0.023
0.042 0.018 0.020 0.044 0.019 0.020
FAMINC 0.061 0.0620.014 0.014
1u; 2u 0.076 0.059
0.106 0.154
11 ; 22 0.503 0.777 0.496 0.762
0.021 0.043 0.016 0.036
12 0.553 0.550
0.027 0.020
We calculate the Bayes factor using the SavageDickey density ratio. M1 is the
unrestricted specication that allows endogeneity of the treatment variable and M0 isthe one that ignores it. The calculated Bayes factor value is B0; 1 = 2:95468 (4:27171).
Kass and Raftery (1995) indicate that if B0; 1 does not exceed 10 there is not a strong
7/30/2019 1-s2.0-S0304407602002233-main
16/24
212 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
Table 4
Predicted values for doctor visits and expenditure. Private insurance/medicaid
Insurance status Private insurance; Medicaid; No medicaid;
Health status no medicaid no private insurance no private insurance
Excellent Poor Excellent Poor Excellent Poor
Pooled model
Number of visits 4.55 9.72 5.13 10.87 4.23 9.03
0.19 0.30 0.30 0.46 0.18 0.29
Expenditure 296.03 606.18 350.66 670.11 274.22 535.96
18.69 33.87 31.07 47.42 18.01 30.18
evidence against H1 and if it exceeds 100 then the evidence is decisive. In our casethere is no strong evidence in favor of either of the models and none of them can be
ignored in the Bayesian inferential framework. Given the value of the Bayes factor,
and assuming that the prior model probabilities are equal, P(M0) = P(M1) =12
, the
posterior model probabilities are P(M0|y) = 0:74713 and P(M1|y) = 0:25287. FollowingDraper (1995) we form our predictive distribution (pooled model) by averaging pos-
terior densities obtained under model specications M0 and M1 and using the posterior
model probabilities as weights. For example, the weighted coecients of PRIVINS in
utilization and expenditure equations are 0.162 (0.079) and 0.309 (0.120), respectively.
These eects are signicantly positive, but slightly weaker than those predicted by the
restricted model. We do not fully report moments of the pooled posterior density,
however, use it as predictive distribution in the following calculations.
In Table 4 we compare the levels of healthcare use and expenditure for three groups
dened according to their insurance status: those individuals who have private in-
surance and no Medicaid; those who have Medicaid and no private insurance; those
who have neither private insurance nor Medicaid. The individuals from the last group
are covered only by Medicare. We divide these three categories further according
to the self-perceived health status (excellent or poor health). The mean function for
yj (j = 1; 2) after integrating out unobserved heterogeneity has a closed form,
E[yj|j] = (x3 + ju)exp
xj
j + j + jj
2
+ (1 (x3 + ju))
exp
xj
j +jj
2
;
where j = (j ; ; ju; jj).
We calculate posterior moments of the dependent variables for each group eval-
uated at the groups mean value of the regressors, ( xj ; x3), and taken with respect
to the pooled posterior distribution of the parameters j. The posterior moments are
approximated as
E[yj| xj ; x3] = 1S
Si=1
E[yj| xj ; x3; ji]; j = 1; 2;
7/30/2019 1-s2.0-S0304407602002233-main
17/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 213
where S is the posterior sample size. The results are presented in Table 4. Based on
the estimation results one could conclude that for the poor health group the level of
utilization is higher for Medicaid patients over those with private insurance and for
private insurance over Medicare by about one visit a year. However, for the excellenthealth group the dierences are not as large. These results suggest that the additional
impact of Medigap insurance on average utilization levels for the Medicare elderly
is relatively small, albeit larger for those in poor health. The impact of Medicaid is
slightly larger on average, of the order of two visits for those in poor health and about
one visit for those in excellent health.
6.2. HMO versus FFS
In this application we study the choice of a specic type of private insurance byindividuals aged between 16 and 65 years. The individuals choose between two types
of private insurance: FFS options and HMO plans. The HMO plan serves as a proxy
for managed care type organization which often control costs and access by use of
features such as provider networks, gatekeeping, provider payment mechanisms and
so forth. Literature has emphasized that HMOs may increase the utilization of certain
types of care, e.g. preventive care, while reducing that of other more expensive types
of care, e.g. hospital nights.
Favorable selection into HMO plans means that those who expect to be low users of
services will tend to enrol into these plans, while those who expect to be heavy users
will enrol into the indemnity plans (FFS). If expected future usage can be adequatelyproxied by observed variables, then such selection can be controlled by introducing ap-
propriate proxy variables in the insurance and utilization equations (Reschovsky, 2000).
Under this scenario, estimation is considerably simplied. Ignoring the endogeneity
issue, several studies claim that HMO and FFS plans are similar in meeting individ-
uals needs of healthcare services and in covering the associated costs.
An issue that is relevant in discussing the endogeneity of insurance plans is that
individuals may have no choice or only very limited choice in the choice of insurance
plans. An overwhelming majority (80%) of our sample are employed. A high proportion
of these are thought to have very limited choice of plans. This factor softens the
impact of endogeneity issue even though it is not equivalent to exogenous assignmentof insurance plans. An example of a factor subsumed under unobserved heterogeneity
is attitude towards health risk. For example, a risk averse individual may choose a
health plan conservatively and may see a doctor more often than an individual who
behaves like a risk lover. Attitude towards health risk is not directly observed in our
sample.
We use data from the 1996 Medical Expenditure Panel Survey (MEPS). These are
collected from each household in a series of ve rounds of data collection over a 2.5
years of time. The rst round of the data consists of 10 639 households with more
than 23 000 individuals. Our MEPS sample size is 2893 and it consists of privately
insured individuals aged from 16 to 65 years whose healthcare expenditure is positive.About 50% have a FFS type insurance and the other half purchased their insurance
through an HMO. The categorical variable INSURANCE takes the value 1 for the
7/30/2019 1-s2.0-S0304407602002233-main
18/24
214 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
FFS category and 0 for the HMO category. Neither Medicare nor Medicaid are relevant
for this nonelderly sample.
As before the selection model is estimated for the number of doctor visits (DOCVIS),
doctor visit expenditure (DVEXP) and insurance status (d). Vectors x1 and x2 in-clude EXCLHLTH, POORHLTH, NUMCHRON, INJURY, BLACK, FEMALE,
MARRIED, SCHOOL, EMPLOYED, AGE, NOREAST, MIDWEST, WEST and
INSURANCE (d).
As mentioned before there is a problem of limited insurance choices aecting the
selection process. More than 80% of the individuals in the data set are employed and
some employers provide only limited insurance options. If data were available this
problem of constraints to the selection could be solved by restricting the sample to
only those individuals who had an actual choice between FFS and HMO when se-
lecting their insurance plans. However, this study takes a dierent approach. Including
variables controlling for the type of the company and for the employment status such
as size, existence of multiple locations, being self-employed and belonging to a gov-
ernmental organization could capture the eect of the employers constraint to the se-
lection. Vector x3 consists of EXCLHLTH, POORHLTH, NUMCHRON, INJURY,
NOREAST, MIDWEST, WEST, BLACK, FEMALE, MARRIED, SCHOOL,
EMPLOYED, AGE and SIZE, GOVT, LOCATION, SELFEMP, FAMINC. The
geographical variables are included to control for the inequalities in HMO penetration
and dierences in local prices.
The prior distributions of the parameters are the same as those in the previous model.
The posterior means and posterior deviations of the parameters are given in Table 5.5
For this sample neither the restricted nor the unrestricted estimates suggest that the
FFS plan has a signicant positive impact on doctor visits or expenditures relative to
the HMOs. Once again the covariances 1u and 2u are quite imprecisely estimated, so
the evidence in support of the endogeneity hypothesis remains weak.
The Bayes factor value is B0; 1 =1:77255 (2:93544) and the posterior model probabili-
ties are P(M0|y)=0:63932 and P(M1|y)=0:36068. According to these results again thereis no strong evidence in favor of either of the models. We use the posterior model
probabilities to calculate the predictive distribution (pooled model) as the weighted
average. The eect of INSURANCE on utilization and expenditure in the pooled
model is 0:084 (0:146) and 0:125 (0:167), respectively.Calculations similar to those in the previous section are made for four dierent groups
based on whether the individual belongs to an HMO or FFS and according to the health
status, excellent health or poor health. The expected utilization and expenditure for all
groups are presented in Table 6 and the results are based on the posterior distribution
of the pooled model. The results are not surprising given that the posterior mean
estimates for 12 and 13, as well as the coecients for INSURANCE variable, are
not signicantly dierent from zero. Average number of visits is at almost the same
level for the excellent health group for those from HMO and FFS. However, for the
poor health group HMO patients have a slightly higher utilization level.
5 The results are based on 40 000 replications. The following values of tuning parameters are selected:
1 = 2 = 0:1 and k = v = 15.
7/30/2019 1-s2.0-S0304407602002233-main
19/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 215
Table 5
MCMC estimates of the HMO/FFS model
M1 (Unrestricted) M0 (Restricted)
Insurance Docvis Dvexp Insurance Docvis Dvexp
CONST 0.094 0.584 4.650 0.093 0.425 4.508
0.159 0.208 0.246 0.159 0.110 0.156
EXCLHLTH 0.098 0.169 0.079 0.098 0.181 0.083
0.051 0.041 0.061 0.053 0.039 0.056
POORHLTH 0.115 0.402 0.589 0.103 0.390 0.579
0.179 0.126 0.176 0.187 0.116 0.169
NUMCHRON 0.004 0.214 0.235 0.003 0.214 0.235
0.023 0.016 0.026 0.024 0.016 0.026
INJURY 0.019 0.152 0.180 0.019 0.151 0.179
0.028 0.019 0.032 0.029 0.019 0.032
INSURANCE 0.285 0.220 0.030 0.071
0.347 0.371 0.033 0.052
NOREAST 0.072 0.117 0.076 0.075 0.121 0.079
0.065 0.049 0.076 0.067 0.048 0.074
MIDWEST 0.225 0.014 0.119 0.226 0.019 0.125
0.062 0.049 0.077 0.063 0.046 0.071
WEST 0.426 0.026 0.163 0.430 0.035 0.169
0.066 0.064 0.095 0.067 0.049 0.074
BLACK 0.162 0.137 0.201 0.163 0.132 0.195
0.078 0.061 0.095 0.083 0.060 0.092
FEMALE 0.137 0.282 0.325 0.138 0.289 0.329
0.047 0.039 0.057 0.049 0.036 0.053MARRIED 0.031 0.006 0.077 0.033 0.004 0.077
0.052 0.038 0.059 0.055 0.037 0.058
SCHOOL 0.001 0.026 0.034 0.0004 0.025 0.034
0.009 0.007 0.010 0.010 0.007 0.010
EMPLOYED 0.092 0.111 0.088 0.090 0.107 0.088
0.074 0.049 0.072 0.077 0.046 0.071
AGE 0.052 0.038 0.089 0.053 0.037 0.088
0.020 0.016 0.025 0.021 0.015 0.022
FAMINC 0.0008 0.0009
0.0006 0.0007
SELFEMP 0.202 0.206
0.093 0.100GOVT 0.107 0.112
0.063 0.066
SIZE 0.0006 0.0006
0.0001 0.0002
LOCATION 0.064 0.063
0.059 0.063
1u; 2u 0.195 0.181
0.215 0.228
11 ; 22 0.561 0.828 0.513 0.778
0.065 0.076 0.020 0.041
12 0.582 0.552
0.055 0.023
7/30/2019 1-s2.0-S0304407602002233-main
20/24
216 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
Table 6
Predicted values for doctor visits and expenditure, FFS/HMO
Insurance status FFS HMO
Health status Excellent Poor Excellent Poor
Pooled model
Number of visits 3.53 8.79 3.49 9.50
0.12 0.76 0.12 0.81
Expenditure 363.92 1089.42 368.80 1181.61
17.15 141.32 18.02 153.33
Thus, we conclude the type of insurance does not signicantly aect the level of
healthcare use.
6.3. Discussion and concluding remarks
How do our results compare with previous estimates? Dowd et al. (1991) modelled
physician visits and inpatient hospital days using 1984 survey data from 20 Twin Cities
rms that oered their employees a choice from at least one HMO plan and one FFS
plan. This study found no statistically signicant evidence for selection bias. However,
the study is subject to an important qualication. The authors estimated a linear selec-
tion model after restricting the sample to those with positive levels of utilization. Theyused log(physician visits) or log(hospital days) as their outcome variable and did not
account fully for the intrinsically discrete and heteroskedastic nature of the response
variable. By contrast, our formulation takes into account both these features. Yet Dowd
et al. (1991) did not nd signicant dierence between HMO and FFS insurees in the
average number of doctor visits. (They did nd that HMO insurees had a smaller aver-
age inpatient days.) In studies of the impact of HMOs on healthcare utilization, based
on the Community Tracking Study Household Survey 19961997 (Reschovsky, 2000;
Reschovsky and Kemper, 2000), the issue of selection bias was discussed, albeit not
dealt with in a comprehensive econometric framework. Reschovsky and Kemper (2000,
p. 385) argue that there is little evidence of selection on observables, and mention butdo not pursue the possibility of selection on unobservables via an econometric model.
After some tests they conclude that the risk of estimates of impact of HMO on health-
care use being aected by selection bias was small in their study. Tu et al. (2000)
analyze the same data as Reschovsky and Kemper (2000), using similar economet-
ric methodology and nd that no signicant dierences between HMO and non-HMO
enrollees in the use of hospital, surgery, and emergency room services. Mello et al.
(2002), in their study of the Medicare population based on data from 19931996, re-
port tests of endogeneity of insurance choice. Based on empirical models with discrete
factor structures, they do not nd evidence that supports endogeneity of the HMO vari-
able in their utilization equations, but they do nd evidence of favorable selection intoHMOs (healthier individuals self-select into cheaper health plans) and reduced utiliza-
tion of hospital services by HMO enrollees. An important qualication to the above
7/30/2019 1-s2.0-S0304407602002233-main
21/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 217
results is that incentives for controlling healthcare costs may now also be present in
FFS plans, and hence the marginal impact of HMOs on utilization may be smaller and
harder to detect. A second qualication is that studies that model disaggregated mea-
sures of utilization, such as specic preventive (blood pressure checks, mammograms,etc.) and curative services (surgery or hospital nights), may provide a sharper tests of
the endogeneity hypothesis and improved estimates of the dierential impact of HMO
and non-HMO plans on use of such services. This remains a topic for future research.
The nal qualication concerns the denition of an HMO plan. The denition used in
this study may be too broad and ner distinction based on the attributes of various
managed care plans may provide improved tests of the endogeneity hypothesis.
Embracing computational complications inherent in the problem, we have developed
a exible approach to modeling self-selectivity of the treatment variable in a model
with multiple outcomes. In our analysis of two separate data sets we nd mixed or
weak evidence of self-selectivity. However, the Bayes factor values suggest that the
results for both unrestricted and restricted (no endogeneity) models should be used in
a Bayesian inferential framework because neither model dominates the other.
Acknowledgements
We thank John Geweke, Co-Editor Arnold Zellner, an Associate Editor and three
anonymous referees for their helpful comments on earlier versions of this paper. We
have also beneted from presentation of an earlier version at the 2000 Mid-West Econo-metric Group Meeting in Chicago, Purdue University, University of Tennessee, Tulane
University. However, we retain responsibilities for any errors.
Appendix A. Computational
A.1. Sampling 1 and 2
The gradient vector has the following two components:
g1i = i + y1i (i 12ui)1
1
0
and
g2i = 1 + y2ii (i 12ui)1
0
1
;
where i = exp(x1i1 + 1i) and 1=i = exp(x2i2 + 2i) and the Hessian matrix is
Hei =
i 00 y2ii
1:
7/30/2019 1-s2.0-S0304407602002233-main
22/24
218 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
A.2. Sampling 1 and 2
The gradient vectors and the Hessian matrices are
g1 = B101 (1 01) +N
i=1
(y1i exp(x1i1 + 1i))x1i;
g2 = B102 (2 02) +N
i=1
(1 + y2i exp(x2i2 2i))x2i
and
H1 = B101 N
i=1 exp(x1i1 + 1i)x1ix1i;
H2 = B102 N
i=1
y2i exp(x2i2 2i)x2ix2i:
A.3. Sampling ; 12 and
Denote = (1; 2;
), X = diag(x1; x2; x3), Z = (log log1=z
). Then from
Eqs. (2.3), (2.4) and (2.5) has multivariate normal distribution N( ; ) where =
[X(1 IN)X]1 and = [X(1 IN)Z]. Partition and with respect to = (1; 2) and as = ( ) and
=
:
The conditional distribution of given 1 and 2 is normal with mean | = +
1 ( ) and variance 1| = 1 .
References
Albert, J.H., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data. Journal of
American Statistical Association 88, 669679.
Chib, S., Hamilton, B.H., 2000. Bayesian analysis of cross-section and clustered data treatment models.
Journal of Econometrics 97, 2550.
Chib, S., Greenberg, E., Winkelmann, R., 1998. Posterior simulation and Bayes factor in panel count data
models. Journal of Econometrics 86, 3354.
Crepon, B., Duguet, E., 1997. Research and development, competition and innovation: pseudo-maximum
likelihood and simulated maximum likelihood methods applied to count data models with heterogeneity.
Journal of Econometrics 79, 355378.
Deb, P., Trivedi, P.K., 1997. Demand for medical care by the elderly: a nite mixture approach. Journal of
Applied Econometrics 12, 313336.
Devroye, L., 1986. Non-Uniform Random Variate Generation. Springer, New York.
Dowd, B., Feldman, R., Cassou, S., Finch, M., 1991. Health plan choice and utilization of health careservices. Review of Economics and Statistics 73, 8593.
Draper, D., 1995. Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society,
Series B 57, 4597.
7/30/2019 1-s2.0-S0304407602002233-main
23/24
M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220 219
Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images.
IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 609628.
Geweke, J., 1991. Ecient simulation from the multivariate normal and Student-t distributions subject to
linear constraints. In: Keramidas, E.M. (Ed.), Computing Science and Statistics: Proceedings of the 23rdSymposium on the Interface, pp. 571578.
Goldman, D.P., 1995. Managed care as a public cost-containment mechanism. Rand Journal of Economics
26, 277295.
Greene, W.H., 1997. FIML estimation of sample selection models for count data. Discussion Paper EC-97-02,
Department of Economics, Stern School of Business, New York University.
Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika
57, 97109.
Heckman, J.J., 1976. The common structure of statistical models of truncation, sample selection and limited
dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement
5, 475492.
Heckman, J.J., 2000. Causal parameters and policy analysis in economics: a twentieth century retrospective.
Quarterly Journal of Economics 115 (1), 4597.Johnson, M., 1987. Multivariate Statistical Simulation. Wiley, New York.
Kass, R.E., Raftery, A.E., 1995. Bayes factors. Journal of American Statistical Association 90, 773795.
Kemper, P., Reschovsky, J.D., Tu, H.T., 2000. Do HMOs make a dierence? Summary and implications.
Inquiry 36, 419425.
Koop, G., Poirier, D.J., 1997. Learning about the across-regime correlation in switching regression models.
Journal of Econometrics 78, 217227.
Lee, L.-F., 2000. Self-selection. In: Baltagi, B.H. (Ed.), A Companion to Theoretical Econometrics.
Blackwell, Oxford (Chapter 18).
Li, K., 1998. Bayesian inference in a simultaneous equation model with limited dependent variables. Journal
of Econometrics 85, 387400.
Linardakis, M., Dellaportas, P., 1999. Bayesian analysis of latent utilities for transportation services via
extensions of the multinomial probit model. Working paper, Athens University of Economics and Business.
Maddala, G.S., 1985. A survey of the literature on selectivity bias as it pertains to health care markets.
Health Economics and Health Services Research 6, 318.
McCulloch, R.E., Rossi, P.E., 1994. An exact likelihood analysis of the multinomial probit model. Journal
of Econometrics 64, 207240.
McCulloch, R.E., Polson, N.G., Rossi, P.E., 2000. A Bayesian analysis of the multinomial probit model with
fully identied parameters. Journal of Econometrics 99, 173193.
McManus, D.A., 1992. How common is identication in parametric models? Journal of Econometrics 53
(13), 523.
Mello, M.M., Stearns, S.C., Norton, E.C., 2002. Do medicare HMOs still reduce health service use after
controlling for selection bias? Health Economics 11, 323340.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equations of state
calculations by fast computing machines. Journal of Chemical Physics 21, 10871092.
Miller, R.H., Luft, H.S., 1994. Managed care plan performance since 1980. Journal of American Medical
Association 271, 15121519.
Miller, R.H., Luft, H.S., 1997. Does managed care lead to better or worse quality of care? Health Aairs
16, 725.
Munkin, M.K., Trivedi, P.K., 2000. Analysis of patterns of healthcare utilization among the elderly using
mixed discrete-continuous models with unobserved heterogeneity. Working paper.
Nobile, A., 1998. A hybrid Markov chain for the Bayesian analysis of the multinomial probit model. Statistics
and Computing 8, 229242.
Nobile, A., 2000. Comment: Bayesian multinomial probit models with a normalization constraint. Journal of
Econometrics 99, 335345.
Reschovsky, J.D., 2000. Do HMOs make a dierence? Data and methods. Inquiry 36, 378389.Reschovsky, J.D., Kemper, P., 2000. Do HMOs make a dierence? Introduction. Inquiry 36, 374377.
Tanner, M.A., Wong, W.H., 1987. The calculation of posterior distribution by data augmentation. Journal of
American Statistical Association 82, 528540.
7/30/2019 1-s2.0-S0304407602002233-main
24/24
220 M.K. Munkin, P.K. Trivedi / Journal of Econometrics 114 (2003) 197 220
Terza, J.V., 1998. Estimating count data models with endogenous switching: sample selection and endogenous
treatment eects. Journal of Econometrics 84, 129154.
Tu, H.T., Kemper, P., Wong, H.J., 2000. Do HMOs make a dierence? Use of health services. Inquiry 36,
401410.van Ophem, H., 2000. Modeling selectivity in count data models. Journal of Business and Economic Statistics
18, 503510.
Verdinelli, I., Wasserman, L., 1995. Computing Bayes factors using a generalization of the SavageDickey
density ratio. Journal of American Statistical Association 90, 614618.
Winkelmann, R., 1998. Count data models with selectivity. Econometric Reviews 17, 339359.
Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York.