arXiv:1512.07273v1 [stat.ME] 22 Dec 2015 Computationally Efficient Distribution Theory for Bayesian Inference of High-Dimensional Dependent Count-Valued Data Jonathan R. Bradley 1 , Scott H. Holan 2 , Christopher K. Wikle 2 Abstract We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensional count-valued data. Our primary interest is when there are possibly millions of data points refer- enced over different variables, geographic regions, and times. This problem requires extensive methodological advancements, as jointly modeling correlated data of this size leads to the so- called “big n problem.” The computational complexity of prediction in this setting is further ex- acerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we develop a new computationally efficient distribution theory for this setting. In particular, we introduce a multivariate log-gamma distribution and provide substantial theoretical development including: results regarding conditional distributions, marginal distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To in- corporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. The results in this manuscript are extremely general, and can be used for data that exhibit fewer sources of dependency than what we consider (e.g., mul- tivariate, spatial-only, or spatio-temporal-only data). Hence, the implications of our modeling framework may have a large impact on the general problem of jointly modeling correlated count- valued data. We show the effectiveness of our approach through a simulation study. Additionally, we demonstrate our proposed methodology with an important application analyzing data obtained from the Longitudinal Employer-Household Dynamics (LEHD) program, which is administered by the U.S. Census Bureau. Keywords: American Community Survey; Big Data; Aggregation; Quarterly Workforce Indi- cators; Bayesian hierarchical model; Longitudinal Employer-Household Dynamics (LEHD) pro- gram; Markov chain Monte Carlo; Non-Gaussian. 1 (to whom correspondence should be addressed) Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65211, [email protected]2 Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65211-6100
54
Embed
Computationally Efficient Distribution Theory for Bayesian ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
arX
iv:1
512.
0727
3v1
[sta
t.ME
] 22
Dec
201
5
Computationally Efficient Distribution Theory forBayesian Inference of High-Dimensional Dependent
Count-Valued Data
Jonathan R. Bradley1, Scott H. Holan2, Christopher K. Wikle2
Abstract
We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensionalcount-valued data. Our primary interest is when there are possibly millions of data points refer-enced over different variables, geographic regions, and times. This problem requires extensivemethodological advancements, as jointly modeling correlated data of this size leads to the so-called “bign problem.” The computational complexity of prediction in this setting is further ex-acerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we developa new computationally efficient distribution theory for this setting. In particular, we introducea multivariate log-gamma distribution and provide substantial theoretical development including:results regarding conditional distributions, marginal distributions, an asymptotic relationship withthe multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To in-corporate dependence between variables, regions, and timepoints, a multivariate spatio-temporalmixed effects model (MSTM) is used. The results in this manuscript are extremely general, andcan be used for data that exhibit fewer sources of dependencythan what we consider (e.g., mul-tivariate, spatial-only, or spatio-temporal-only data).Hence, the implications of our modelingframework may have a large impact on the general problem of jointly modeling correlated count-valued data. We show the effectiveness of our approach through a simulation study. Additionally,we demonstrate our proposed methodology with an important application analyzing data obtainedfrom the Longitudinal Employer-Household Dynamics (LEHD)program, which is administeredby the U.S. Census Bureau.
Keywords: American Community Survey; Big Data; Aggregation; Quarterly Workforce Indi-cators; Bayesian hierarchical model; Longitudinal Employer-Household Dynamics (LEHD) pro-gram; Markov chain Monte Carlo; Non-Gaussian.
1(to whom correspondence should be addressed) Department ofStatistics, University of Missouri, 146 MiddlebushHall, Columbia, MO 65211, [email protected]
2Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65211-6100
Waller, L., Carlin, B., Xia, H., and Gelfand, A. (1997). “Hierarchical spatio-temporal mapping of
disease rates.”Journal of the American Statistical Association, 92, 607–617.
Wikle, C., Milliff, R., Nychka, D., and Berliner, L. (2001).“Spatiotemporal hierarchical Bayesian
modeling tropical ocean surface winds.”Journal of the American Statistical Association (Theory
and Methods), 96, 382–397.
Wikle, C. K. and Anderson, C. J. (2003). “limatological analysis of tornado report counts using a
hierarchical Bayesian spatio-temporal model.”Journal of Geophysical Research-Atmospheres,
108, 9005.
Wolpert, R. and Ickstadt, K. (1998). “Poisson/gamma randomfield models for spatial statistics.”
Biometrika, 85, 251–267.
Wu, G., Holan, S. H., and Wikle, C. K. (2013). “Hierarchical Bayesian Spatio-Temporal Conway-
Maxwell Poisson Models with Dynamic Dispersion.”Journal of Agricultural, Biological, and
Environmental Statistics, 18, 335–356.
34
Figures and Tables
Definition Additional Notes
Hβ ,k = (X′1, . . . ,X
′T ,α
−1/2k σ−1
β I p)′
Hη,t,k = (ΨΨΨ′t ,W
−1/2t,k ,−W−1/2
t+1,kM t+1)′
Hη,T,k = (ΨΨΨ′T ,W
−1/2T,k )′
Hξ ,t,k = (Int ,α−1/2k σ−1
ξ Int )′
κκκ(−1)β ,k =
{exp(ΨΨΨ1ηηη1+ξξξ 1)
′, . . . ,exp(ΨΨΨTηηηT +ξξξ T)′, 1
κk111(1)′p
}′
κκκ(−1)η,t,k =
{exp(Xtβββ +ξξξ t)
′, 1κk
exp(−W−1/2t,k M tηηηt−1)
′, 1κk
exp(W−1/2t+1,kηηη t+1)
′}′
1< t < T
κκκ(−1)η,1,k =
{exp(X1βββ +ξξξ 1)
′, 1κk
111′r ,1κk
exp(W−1/22,k ηηη2)
′}′
ProvidedT > 1
κκκ(−1)η,T,k =
{exp(XTβββ +ξξξ T)
′, 1κk
exp(−W−1/2T,k MTηηηT−1)
′}′
If T = 1 then replaceMT andηηηT−1 with 000r,r and
000r within the expression ofκκκ(−1)η,T,k above.
κκκ(−1)ξ ,t,k =
{exp(Xtβββ +ΨΨΨtηηηt)
′, 1κk
111′nt
}′
αααβ ,k ={
Z′1+dβ ,k111
′n1, . . . ,Z′
T +dβ ,k111′nT,αk111
′p−dβ ,kα1/2
k σβ ∑Tt=1111′nt
Xt
}′
αααη,t,k ={
Z′t +dη,t,k111
′nt,αk111
′r −
dη ,t,k2 111′nt
ΨΨΨtW1/2t,k ,αk111
′r +
dη ,t,k2 111′nt
ΨΨΨtM ′t+1W1/2
t+1,k
}′1< t < T
αααη,1,k ={
Z′1+dη,1,k111
′n1,αk111
′r −
dη ,1,k2 111′n1
ΨΨΨ1W1/21,k ,αk111
′r +
dη ,1,k2 111′n1
ΨΨΨ1M ′2W1/2
2,k
}′ProvidedT > 1
αααη,T,k ={
Z′T +dη,T,k111
′nT,αk111
′r −dη,1,k111
′n1
ΨΨΨ1W1/21,k
}′
αααξ ,t,k ={
Z′t +dξ ,t,k111
′nT,αk111
′nt−dξ ,t,kα1/2
k σξ 111′nt
}′1≤ t ≤ T
dβ ,k = αk
/[1+max
{abs(
α1/2k σβ ∑T
t=1111′ntXt
)}]
dη,t,k = αk
/(1+max
[abs{(111′nt
ΨΨΨtW1/2t,k ,−111′nt
ΨΨΨtM ′t+1W1/2
t,k )}])
1≤ t < T
dη,T,k = αk
/(1+max
[abs{
111′ntΨΨΨtW
1/2t,k
}])
dξ ,t,k = αk
/[1+max
{abs(
α1/2k σξ 111′nt
)}]1≤ t ≤ T
f (σK|·) =f (ηηη1|σK)∏T
i=2 f (ηηη t |ηηη t−1,σK)
∑σKf (ηηη1|σK)∏T
i=2 f (ηηη t |ηηη t−1,σK)The exact expression can be found in (A.5).
pξ (σξ ) =∏T
t=1exp
{α1/2
kσξ
αk111′nt ξξξ t−1
κk111′nt exp(
α1/2kσξ
ξξξ t)
}
∑σξ ∏Tt=1exp
{α1/2
kσξ
αk111′nt ξξξ t−1
κk111′nt exp(
α1/2kσξ
ξξξ t)
}
Table 1: A comprehensive list of functions, matrices, vectors, and constants used within the Propo-sition 3. If there are no zero counts within the dataset, thensetdβ ,k = dη,1,k = . . .=dη,T,k =dξ ,1,k =. . .= dξ ,T,k = 0.
35
Algorithm 1: The Gibbs Sampler for the P-MSTM
1. Choose a value ofk in (21), either,k= 0 to specify sMLG distributions, ork= 1 to specify
nMLG distributions.
2. Initializeβββ , σK , σξ , andξξξ t andηηη t for eacht. Denote these initializations withβββ [0], σ [0]K ,
σ [0]ξ , andξξξ [0]
t andηηη [0]t for eacht.
3. Setb= 1.
4. Setβββ [b] equal to a draw from mMLG(Hβ ,k,αααβ ,k,κκκβ ,k) using Theorem 2(ii ).
5. If t < T, then setηηη [b]t equal to a draw from mMLG(Hη,t,k,αααη,t,k,κκκη,t,k) using
Theorem 2(ii ).
6. Setηηη [b]T equal to a draw from mMLG(Hη,T,k,αααη,T,k,κκκη,T,k) using Theorem 2(ii ).
7. For eacht let ξξξ [b]t be a draw from mMLG(Hξ ,t,k,αααξ ,t,k,κκκξ ,t,k) using Theorem 2(ii ).
8. Setσ [b]K equal to randomly selected value from{a(K)
1 , . . . ,a(K)UK
}, with respective
probabilities determined bypK(·) in (22).
9. Setσ [b]ξ equal to randomly selected value from{a(ξ )1 , . . . ,a(ξ )Uξ
}, with respective
probabilities determined bypξ (·) in (22).
10. Setb= b+1.
11. Repeat steps 4 through 10 untilb is equal to the desired value (i.e., convergence is
achieved).
36
0 0.5 1 1.5 2
x 104
6.14
6.16
6.18
6.2
6.22
6.24
6.26
6.28(b) Example Trace Plot Using MALA
MCMC Iteration
A R
eg
ressio
n P
ara
me
ter
0 0.5 1 1.5 2
x 104
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12(c) Example Trace Plot Using Adaptive Robbins−Monroe
MCMC Iteration
A R
eg
ressio
n P
ara
me
ter
0 0.5 1 1.5 2
x 104
0.24
0.26
0.28
0.3
0.32
0.34
0.36(d) Example Trace Plot Using LAP
MCMC Iteration
A R
eg
ressio
n P
ara
me
ter
Figure 1: (a), Map of the LEHD estimated number of individuals employed in the beginning of thefourth quarter of 2013 within the information industry. White areas indicate “suppressed” QWIs.In panels (b), (c), and (d) we plot the trace plot from an MCMC using a Metroplis-within-Gibbsalgorithm associated with a Gaussian process model from Bradley et al. (2015a). These trace plotsare for an intercept parameter. Panels (b), (c), and (d) werecomputed using different methods fortuning a Metropolis-within-Gibbs algorithm. Specifically, (b), (c), and (d) were computed usingMALA, an adaptive Robbins-Monroe process, and LAP, respectively. In these three panels we seethat convergence is not achieved.
37
−5 0 5−5
−4
−3
−2
−1
0
1
2
3
4
5
Quantiles of Standard Normal
Qu
an
tile
s o
f n
ML
GNormal QQ Plot of nMLG
Figure 2: (a), In this plot we provide the normal QQ plot associated with a scaled log-gamma ran-dom variable. Specifically, we randomly select 10,000,000 values from aQ≡ αG LG(αG,1/αG),and computed the normal QQ plot above. There are enough simulated values in Figure 2 to obtainvalues from -5 to 5 (recall that the standard normal distribution has 99.7% of it’s mass between -3and 3). This demonstrates that nMLG provides an excellent approximation of a normal randomvariable.
38
(a) QWI on the Log Scale
3
4
5
6
7
8
9
10
(b) Pseudo Data on the Log Scale
3
4
5
6
7
8
9
10
(c) Predictions using sMLG
3
4
5
6
7
8
9
10
(d) Predictions using nMLG
3
4
5
6
7
8
9
10
Figure 3: (a), The LEHD estimated number of individuals employed in the beginning of the fourth
quarter of 2013 within the information industry (i.e.,{Z(1)96 (A)}) in Minnesota. For comparison, a
map of the pseudo-data is{R(1)96 (A)} computed using (23) is given in (b). The white areas indicate
“suppressed” QWIs. In (c) and (d), we provide the predictions of {Z(1)96 (A)} that are computed
using P-MSTM and the pseudo-data{R(ℓ)t (A)} from Equation (23). In (c), the predictions are done
using the sMLG specification and nMLG is computed using the nMLG specification.39
0 20 40 60 80 1001
2
3
4
5
6
7
8
9
10
11(a) Plot of sMLG based Predictions and the Truth
Arbitrary ordering of spatial regions
Lo
g C
ou
nt
TruthPredicted
0 20 40 60 80 1002
3
4
5
6
7
8
9
10
11(b) Plot of nMLG based Predictions and the Truth
Arbitrary ordering of spatial regionsL
og
Co
un
t
TruthPredicted
2 4 6 8 10 121
2
3
4
5
6
7
8
9
10
11(c) Scatterplot of sMLG based Predictions and the Truth
Log QWI
Pre
dic
ted
Va
lue
2 4 6 8 10 122
3
4
5
6
7
8
9
10
11(d) Scatterplot of nMLG based Predictions and the Truth
Log QWI
Pre
dic
ted
Va
lue
Figure 4: In (a) and (b), we plot the LEHD estimated number of individuals employed in the begin-ning of the fourth quarter of 2013 within the information industry in Minnesota, and the predictedvalues. In (a) the predictions are based on the sMLG specification, and in (b) the predictions arebased on the nMLG specification. In (b) and (c), we present thepredictions and standard devi-ations, respectively. In (c) and (d), we produce scatterplots of the LEHD estimated number ofindividuals employed in the beginning of the fourth quarterof 2013 within the information in-dustry in Minnesota, versus the predicted values. In (c) thepredictions are based on the sMLGspecification, and in (d) the predictions are based on the nMLG specification.
40
0.46
0.47
0.48
0.49
0.5
sMLG nMLG
Boxplot of Prediction Error by Model Assumptions
Figure 5: A boxplot of the average absolute error diagnosticin (24) over 100 replicates (i.e.,j = 1, . . . ,100). The distributional assumption used to produce the predictions are given in on thex-axis.
41
0 2 4 6 8 10 12
0
2000
4000
6000
8000
10000
12000(a) Histogram of CPU time (seconds) for each MCMC Iteration
Figure 6: In (a), we plot the CPU time (in seconds) to produce each MCMC replicate from theP-MSTM applied to the 4,089,755 QWIs analyzed in Section 4. All computations were computedusing Matlab (Version 8.0) on a dual 10 core 2.8 GHz Intel XeonE5-2680 v2 processor, with 256GB of RAM. In (b), we provide the trace plot associated with anintercept term (compare to Figure1(b,c,d)).
42
−0.2
−0.1
0
0.1
0.2
0.3(d) Boxplot of Residuals
Figure 7: (a), Map of the LEHD estimated number of individuals employed in the beginning ofthe fourth quarter of 2013 within the information industry (i.e., {Z(1)
96 (A)}). In (b) and (c), wepresent the predictions and standard deviations, respectively. Note that (a,b,c) are only a subset ofthe avialable QWIs, predictions, and posterior standard deviations. Specifically, there are QWIsavailable over the 20 NAICS sectors, US counties, and 96 quarters. Additionally, the predictionsand posterior standard deviations have complete coverage over all 20 sectors, 3,145 US counties,and 96 quarters. In (d), we plot the difference between the log predictions (plus 1) and the logQWI (plus 1), to demonstrate the in-sample error of the predictions.
43
Supplemental Material: Computationally EfficientDistribution Theory for Bayesian Inference of
High-Dimensional Correlated Count-Valued Data
Jonathan R. Bradley1, Scott H. Holan2, Christopher K. Wikle2
Appendix A: Details on Basis Functions and Propagator Matri-
ces
In this section, we review the MI basis functions, and the MI propagator matrix from Bradley et al.
(2015a). These quantities can be used to define{ψψψ(ℓ)t (A)} and{M t} in the expression of the sta-
tistical model given in (21).
A Review of the MI Basis Functions:In this manuscript we choose to set{ψ(ℓ)t (A)} equal to the
MI basis functions, which are a priori defined to be linearly independent of{x(ℓ)t (A)}. That is,
theNt × r matrixΨΨΨPt is specified to be contained within the orthogonal complement of the column
space ofXPt (X
P′t XP
t )−1XP′
t . Specifically, define the MI operator as
G(XPt ,At)≡
(INt −XP
t
(XP′
t XPt
)−1XP′
t
)At
(INt −XP
t
(XP′
t XPt
)−1XP′
t
); t = 1, . . . ,T, (A.1)
whereINt is anNt ×Nt identity matrix, andAt is a genericNt ×Nt weight matrix. In the setting
where|D(ℓ)t |> 1 for at least one(t, ℓ) combination, we letAt be the adjacency matrix correspond-
ing to the edges formed by{D(ℓ)t,P : ℓ = 1, . . . ,L}. Notice that the MI operator in (A.1) defines a
column space that is orthogonal toXPt . Then, let the spectral representationG(XP
t ,At) = ΦΦΦtΛΛΛtΦΦΦ′t ,
and denote theNt × r real matrix formed from the firstr columns ofΦΦΦt asΨΨΨPt . As done in Bradley
et al. (2015a), we set the row ofΨΨΨPt that corresponds to variableℓ and areal unitA equal toψψψ(ℓ)
t (A).
Hughes and Haran (2013) suggests settingr equal to roughly 10% of the positive eigenvalues given
on the diagonal ofΛΛΛt .
A Review of the MI Propagator Matrix:Recall, we specify{ψψψ(ℓ)t } such that it is not in the column
space spanned by{x(ℓ)t }; this allows one to perform inference onβββ . We can use this thinking for
the VAR(1) model. That is, substitute the VAR(1) expansion into (15) to obtain
Y(ℓ)t (A)= x(ℓ)t (A)′βββ +ψψψ(ℓ)
t (A)′M tηηη t−1+ψψψ(ℓ)t (A)′bt+ξ (ℓ)
t (A); ℓ=1, . . . ,L, t =T(ℓ)L , . . . ,T(ℓ)
U , A∈D(ℓ)t,P.
(A.2)
Stack the components of Equation (A.2) to obtain
YPt = XP
t βββ +ΨΨΨPt M tηηη t−1+ΨΨΨP
t bt +ξξξ Pt , (A.3)
whereYPt ≡ (Y(ℓ)
t (A) : ℓ = 1, . . . ,L, A ∈ D(ℓ)t,P)
′ andξξξ Pt ≡ (ξ (ℓ)
t (A) : ℓ = 1, . . . ,L, A ∈ D(ℓ)t,P)
′ are
Nt-dimensional latent random vectors. Then, organize (A.3) to get
ΨΨΨP′t (YP
t −ξξξ Pt ) = Btζζζ t +M tηηη t−1; t = 2, . . . ,T,
where ther × (p+ r) matrix Bt ≡ (ΨΨΨP′t XP
t , I) and the(p+ r)-dimensional random vectorζζζ t ≡
(βββ ′,b′t)′. To ensure thatM t is not confounded withBt , for t = 1, . . . ,T, we let M t equal ther
eigenvectors ofG(Bt ,Ut), where in generalUt can be anyr × r weight matrix. In Section 4, we
let Ut ≡ I r as is done in Bradley et al. (2015a). This specification of{M t} is called the Moran’s I
(MI) propagator matrix (see Bradley et al., 2015a, for a discussion).
2
(a) 2013 ACS 3−year period Estimates of Poverty
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 105 (b) 2013 ACS 3−year period Estimates of MOE
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
(c) P−MSTM Based Estimates of Poverty
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 105 (d) P−MSTM Based Estimates of MOE
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
Figure 8: (a), Map of the 2013 ACS period estimates of the number of US citizens below thepoverty threshold. In (b), we plot the associated ACS marginof error (MOE). In (c), we plot the P-MSTM based predictions of the mean number of US citizens thatfall below the poverty threshold.In (d), we give the width of the 95% credible interval. That is, the model based MOE in panel (d)
is defined to be the 97.5% quantile of the posterior distribution of Y(ℓ)t (·) minus the 2.5% quantile
of the posterior distribution ofY(ℓ)t (·).
3
Appendix B: A Spatial-Only Example: The American Commu-
nity Survey
The American Community Survey (ACS) produces 1-, 3-, and 5-year period estimates of important
US demographics. ACS 1-year period estimates are provided over areal units associated with
populations of 65,000 or larger, 3-year period estimates are provided over areal units associated
with populations of 20,000 or larger, and 5-year period estimates are provided over all population
sizes. This is done in an effort to produce estimates that strike a balance between precision (i.e.,
small margin of errors are associated with larger periods) and spatial coverage.
It has recently been announced that 3-year period estimateswill be discontinued starting in
the 2016 fiscal year (US Census Bureau, 2015; Bradley et al., 2015e). Thus, the introduction and
subsequent removal of 3-year period estimates provides a need for precise small area estimates for
ACS period estimates, since the option of the more precise (compared to 1-year period estimates)
3-year period estimates will no longer be available. Furthermore, small area estimation for ACS
has been a growing topic of interest (e.g., see Porter et al.,2013; Bradley et al., 2015c,e, among
others). In this section, we use this important problem to demonstrate the use of the P-MSTM in
the spatial-only setting (i.e.,T = L = 1).
In Figures 8(a) and 8(b), we plot ACS period estimates of the number of US citizens in poverty
within the last 12 months of 2013 and their associated margins of error (MOE) (see US Census
Bureau, 2008, for ACS MOEs). The estimates are defined over counties in Florida, and white areas
represent counties that do not have ACS poverty estimates associated with it. We fit this dataset
4
with the spatial-only version of the P-MSTM, given by,
Data Model :Z(1)1 (A)|βββ ,ηηη1,ξ
(1)1 (A)
ind∼ Pois{exp(x(1)1 (A)′β +ψψψ(1)
1 (A)′ηηη1+ξ (1)1 (A))};
Process Model 1 :ηηη1|σK ∼ MLG(
000,W1/21,k ,αk111r ,κk111r
);
Process Model 2 :ξξξ 1|σξ ∼ MLG(
000,αk/2n σξ In1,αk111n1,κk111n1
);
Parameter Model 1 :βββ ∼ MLG(
000p,1,αk/2n σβ I p,αk111p,κk111p
);
Parameter Model 2 :f (σK) =1
UK; σK = a(K)
1 , . . . ,a(K)UK
Parameter Model 3 :f (σξ ) =1
Uξ; σξ = a(ξ )1 , . . . ,a(ξ )Uξ
, A∈ D(1)O,1, (B.1)
where we leta(K)1 = a(ξ )1 = 0.01,a(K)
2 = a(ξ )2 = 0.02, . . . ,a(K)UK
= a(ξ )Uξ= 10. Let ψψψ(1)
1 (·) be the
MI basis function, where the weight matrix is set equal to theimmediate adjacency matrix (see
Appendix B). Notice thatψψψ(ℓ)t can be anyr-dimensional real-valued vector (not necessarily the MI
basis functions). For illustration, letx(1)1 (A)≡ 1 and letr = 7 (roughly 10% of the available basis
functions). From Proposition 3, we have the following full conditional distributions associated
with the model in (B.1),
f (βββ |·) = mMLG(Cβ ,k,(Z′,αk111
′p)
′,cβ ,k)
f (ηηη1|·) = mMLG(Cη,1,k,(Z′,αk111
′r)′,cη,1,k)
f (ξξξ 1|·) = mMLG(Cξ ,1,k,(Z′1,αk111
′n1)′,cξ ,1,k),
f (σK|·) = pK(σK); σK = a1, . . . ,aUK ,
f (σξ |·) = pξ (σξ ); σξ = a1, . . . ,aUξ ,
which can be used within Algorithm 1. Since both the simulation study in Section 4.1 and the real
data analysis in Section 4.2 suggested that sMLG resulted inimproved prediction performance, we
use the sMLG specification. The, we ran Algorithm 1 for 10,000iterations and visually inspected
trace plots to assess convergence. In 8(c) and (d), we display the posterior mean and the width
5
of the 95% credible interval computed using the Gibbs sampler in Algorithm 1. In general, the
predictions reflect the general pattern of the data. Furthermore, the estimates have complete spatial
coverage and upon comparison of Figure 8(b) to 8(d) we see that we have produced MOEs on a
similar order of magnitude as the ACS 3-year period estimates.
References
Abowd, J., Schneider, M., and Vilhuber, L. (2013). “Differential privacy applications to Bayesianand linear mixed model estimation.”Journal of Privacy and Confidentiality, 5, 73–105.
Abowd, J., Stephens, B., Vilhuber, L., Andersson, F., McKinney, K., Roemer, M., and Wood-cock, S. (2009). “The LEHD infrastructure files and the creation of the Quarterly WorkforceIndicators.” InProducer Dynamics: New Evidence from Micro Data, eds. T. Dunne, J. Jensen,and M. Roberts, 149–230. Chicago: University of Chicago Press for the National Bureau ofEconomic Research.
Allegretto, S., Dube, A., Reich, M., and Zipperer, B. (2013). “Credible research designs forminimum wage studies.” InWorking Paper Series, 1–63. Institute for Research on Labor andEmployment.
Anderson, T. (1958).Introduction to Multivariate Statistical Analysis. Canada: Wiley and Sons.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015).Hierarchical Modeling and Analysis forSpatial Data. London, UK: Chapman and Hall.
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). “Gaussian predictive processmodels for large spatial data sets.”Journal of the Royal Statistical Society, Series B, 70, 825–848.
Bradley, J., Holan, S., and Wikle, C. (2015a). “Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics.”TheAnnals of Applied Statistics, To Appear.
— (2015b). “Multivariate spatio-temporal models for high-dimensional areal data with applicationto Longitudinal Employer-Household Dynamics.”The Annals of Applied Statistics, To Appear,arXiv preprint: 1503.00982.
Bradley, J., Wikle, C., and Holan, S. (2015c). “Bayesian spatial change of support for count-valuedsurvey data.”Journal of the American Statistical Association, forthcoming.
— (2015d). “Regionalization of multiscale spatial processes using a criterion for spatial aggrega-tion error.” arXiv preprint: 1502.01974.
6
— (2015e). “Spatio-temporal change of support with application to American Community Surveymulti-year period estimates.”arXiv preprint: 1508.01451.
Bradley, J. R., Cressie, N., and Shi, T. (2014). “A comparison of spatial predictors when datasetscould be very large.”arXiv preprint: 1410.7748.
— (2015f). “Comparing and selecting spatial predictors using local criteria.”TEST, 24, 1–28.
Casella, G. and Berger, R. (2002).Statistical Inference. Pacific Grove, CA: Duxbury.
Cressie, N. and Johannesson, G. (2008). “Fixed rank krigingfor very large spatial data sets.”Journal of the Royal Statistical Society, Series B, 70, 209–226.
Cressie, N. and Wikle, C. K. (2011).Statistics for Spatio-Temporal Data. Hoboken, NJ: Wiley.
Crooks, G. (2015). “The Amoroso distribution.”arXiv preprint: 1005.3274.
Datta, A., Banerjee, S., Finley, A., and Gelfand, A. (2015).“Hierarchical nearest-neighbor Gaus-sian process models for large geostatistical datasets.”arXiv preprint: 1406.7343.
Davis, E., Freedman, M., Lane, J., McCall, B., Nestoriak, N., and Park, T. (2006). “Supermarkethuman resource practices and competition from mass merchandisers.” American Journal ofAgricultural Economics, 88, 1289–1295.
De Oliveira, V. (2013). “Hierarchical Poisson models for spatial count data.”Journal of Multi-variate Analysis, 122, 393–408.
Demirhan, H. and Hamurkaroglu, C. (2011). “On a multivariate log-gamma distribution and theuse of the distribution in the Bayesian analysis.”Journal of Statistical Planning and Inference,141, 1141–1152.
Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). “Model-based geostatistics.”Journal of theRoyal Statistical Society, Series C, 47, 299–350.
Dube, A., Lester, T., and Reich, M. (2013). “Minimum wage, labor market flows, job turnover,search frictions, monopsony, unemployment.” InWorking Paper Series, 1–63. Institute for Re-search on Labor and Employment.
Finley, A. O., Banerjee, S., Waldmann, P., and Ericsson, T. (2010). “Hierarchical spatial processmodels for multiple traits in large genetic trials.”Journal of the American Statistical Association,105, 506–521.
Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009). “Improving the performance ofpredictive process modeling for large datasets.”Computational Statistics and Data Analysis, 53,2873–2884.
7
Ford, K. and Fricker, J. (2009).Real-Time SocioEconomic Data for Travel Demand Modelingand Project Evaluation. Publication FHWA/IN/JTRP-2008/22. Indiana Department of Trans-portation and Purdue University, West Lafayette, Indiana,doi: 10.5703/1288284314314.: JointTransportation Research Program.
Garthwaite, P., Fan, Y., and Sisson, S. (2010). “Adaptive optimal scaling of Metropolis-Hastingsalgorithms using the Robbins-Monro process.”arXiv preprint: 1006.3690.
Gelman, A. and Rubin, D. (1992). “Inference from iterative simulation using multiple sequences.”Statistical Science, 7, 473–511.
Glaeser, E. (1992). “Is there a new urbanism? The growth of U.S. cities in the 1990s.”Journal ofEconomic Perspectives, 12, 139–60.
Glaeser, E. and Shapiro, J. (2009).Is there a new urbanism? The growth of U.S. cities in the 1990s.Cambridge, MA: NBER Working Paper no. 8357, National Bureauof Economic Research.
Griffith, D. (2000). “A linear regression solution to the spatial autocorrelation problem.”Journalof Geographical Systems, 2, 141–156.
— (2002). “A spatial filtering specification for the auto-Poisson model.”Statistics and ProbabilityLetters, 58, 245–251.
— (2004). “A spatial filtering specification for the auto-logistic model.”Environment and PlanningA, 36, 1791–1811.
Griffith, D. and Tiefelsdorf, M. (2007). “Semiparametric filtering of spatial autocorrelation: Theeigenvector approach.”Environment and Planning A, 39, 1193–1221.
Hobbs, N. and Hooten, M. (2015).Bayesian Models: A Statistical Primer for Ecologists. PrincetonUniversity Press.
Holan, S. and Wikle, C. (2015). “Hierarchical dynamic generalized linear mixed models fordiscrete-valued spatio-temporal data.” InHandbook of Discrete–Valued Time Series. To Ap-pear.
Hughes, J. and Haran, M. (2013). “Dimension reduction and alleviation of confounding for spatialgeneralized linear mixed model.”Journal of the Royal Statistical Society, Series B, 75, 139–159.
Johnson, R. and Wichern, D. (1999).Applied Multivariate Statistical Analysis, 3rd ed.. EnglewoodCliffs, New Jersey: Prentice Hall, Inc.
Jones, G., Haran, M., Caffo, B., and Neath, R. (2006). “Fixed-width output analysis for Markovchain Monte Carlo.”Journal of the American Statistical Association, 101, 1537–1547.
Kotz, S., Balakrishnan, N., and Johnson, N. (2000).Continuous Multivariate Distributions, Volume1: Models and Applications. New York, NY: Wiley.
8
Lee, Y. and Nelder, J. (1974). “Double hierarchical generalized linear models with discussion.”Applied Statistics, 55, 129–185.
Lehmann, E. (1999).Elements of Large-Sample Theory. New York, NY: Springer.
Lindgren, F., Rue, H., and Lindstrom, J. (2011). “An explicit link between Gaussian fields andGaussian Markov random fields: The stochastic partial differential equation approach.”Journalof the Royal Statistical Society, Series B, 73, 423–498.
McCullagh, P. and Nelder, J. (1989).Generalized Linear Models. London, UK: Chapman andHall.
Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2014). “A multi-resolution Gaussian process model for the analysis of largespatial data sets.”Journal of Com-putational and Graphical Statistics, DOI: 10.1080/10618600.2014.914946.
Porter, A., Holan, S. H., and Wikle, C. K. (2013). “Small areaestimation via multivariate Fay-Herriot models with latent spatial dependence.”Australian& New Zealand Journal of Statistics,75, 15–29.
Prentice, R. (1974). “A log gamma model and its maximum likelihood estimation.”Biometrika,61, 539–544.
Roberts, G. (1996). “Markov chain concepts related to sampling algorithms.” InMarkov ChainMonte Carlo in Practice, eds. W. Gilks, S. Richardson, and D. Spiegelhalter, 45–57.Chapmanand Hall, Boca Raton.
Roberts, G. and Tweedie, R. (2011). “Exponential convergence of Langevin distributions and theirdiscrete approximations.”Bernoulli, 2, 341–363.
Royle, J., Berliner, M., Wikle, C., and Milliff, R. (1999). “A hierarchical spatial model for con-structing wind fields from scatterometer data in the Labrador sea.” InCase Studies in BayesianStatistics, eds. C. Gatsonis, R. Kass, B. Carlin, A. Carriquiry, A. Gelman, I. Verdinelli, andM. West, 367–382. Springer New York.
Rue, H., Martino, S., and Chopin, N. (2009). “Approximate Bayesian inference for latent Gaus-sian models using integrated nested Laplace approximations.” Journal of the Royal StatisticalSociety, Series B, 71, 319–392.
Sengupta, A., Cressie, N., Frey, R., and Kahn, B. (2012). “Statistical modeling of MODIS clouddata using the Spatial Random Effects model.” InProceedings of the Joint Statistical Meetings,3111–3123. Alexandria, VA: American Statistical Association.
Shaby, B. and Wells, M. (2011). “Exploring an adaptive Metropolis algorithm.” InTechnicalReport. Department of Statistics: Duke University.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measuresof model complexity and fit.”Journal of the Royal Statistical Society, Series B, 64, 583–616.
9
Stein, M. (2014). “Limitations on low rank approximations for covariance matrices of spatial data.”Spatial Statistics, 8, 1–19.
Thompson, J. (2009). “Using local labor market data to re-examine the employment effects of theminimum wage.”Industrial and Labor Relations Review, 63, 343–366.
US Census Bureau (2008). “What General Data Users Need to Know.”http://www.census.gov/content/dam/Census/library/publications/2008/acs/ACSGeneralHandbook.pdf.
— (2015). “Census Bureau statement on American Community Survey 3-Year statistical product.”http://www.census.gov/programs-surveys/acs/news/data-releases/2014/release.html.
Waller, L., Carlin, B., Xia, H., and Gelfand, A. (1997). “Hierarchical spatio-temporal mapping ofdisease rates.”Journal of the American Statistical Association, 92, 607–617.
Wikle, C., Milliff, R., Nychka, D., and Berliner, L. (2001).“Spatiotemporal hierarchical Bayesianmodeling tropical ocean surface winds.”Journal of the American Statistical Association (Theoryand Methods), 96, 382–397.
Wikle, C. K. and Anderson, C. J. (2003). “limatological analysis of tornado report counts using ahierarchical Bayesian spatio-temporal model.”Journal of Geophysical Research-Atmospheres,108, 9005.
Wolpert, R. and Ickstadt, K. (1998). “Poisson/gamma randomfield models for spatial statistics.”Biometrika, 85, 251–267.
Wu, G., Holan, S. H., and Wikle, C. K. (2013). “Hierarchical Bayesian Spatio-Temporal Conway-Maxwell Poisson Models with Dynamic Dispersion.”Journal of Agricultural, Biological, andEnvironmental Statistics, 18, 335–356.