Studying the effects of air pollution on children’s health Presented by Elizabeth Stanwyck with Dr. Bimal Sinha University of Maryland, Baltimore County June 30, 2008
Studying the effects of air pollution on children’s healthPresented by Elizabeth Stanwyckwith Dr. Bimal SinhaUniversity of Maryland, Baltimore CountyJune 30, 2008
Objective
•To study the effects of air pollution on the health of children (and the elderly)
▫Focus on respiratory health
Elizabeth Stanwyck 30 June
2008 UMBC
2
Objective
Elizabeth Stanwyck 30 June
2008 UMBC
3
Adverse Health Effects•Types of responses
▫Binary▫Ordinal▫Continuous
•Possibility of measurement error•Measurements can be taken
▫Hourly▫Daily▫Monthly▫Annually
Elizabeth Stanwyck 30 June
2008 UMBC
4
Adverse Health Effects• Common health effects used in these studies:
▫Mortality [binary] Mortality due to respiratory causes Mortality due to cardiovascular causes
▫Disease rates [continuous] Cardiac arrest/cardiovascular events Respiratory disease
▫Specific outcomes Cough, wheeze, bronchitis, asthma [mostly binary] Lung function [continuous]
• Health effects on▫Children▫The elderly
Elizabeth Stanwyck 30 June
2008 UMBC
5
Pollutants
•Commonly measured pollutants are▫Sulfur Dioxide (SO2)
▫Oxides of Nitrogen (NOx, or NO and NO2)
▫Ozone (O3)
▫Particulate Matter (PM2.5, PM10)▫Carbon Monoxide (CO)
Elizabeth Stanwyck 30 June
2008 UMBC
6
Personal Level Covariates
•Smoking status•Mode of cooking and heating•Health history•Income level / Living conditions•Age•Ethnicity•Body Mass Index•Exercise•Gender
Elizabeth Stanwyck 30 June
2008 UMBC
7
Community Level Covariates
•Distance to nearest busy road/intersection
•Presence of factories/mills in the community
•Topography of study region
•Weather conditions
Elizabeth Stanwyck 30 June
2008 UMBC
8
NHANES-III Data• National Health And Nutrition Examination Study
• Conducted from 1988-91 and 1992-94
• Complex, multi-stage probability sampling design
• Designed to give a snapshot of the nation’s health
• Includes:▫ Questionnaire data
Personal covariates: age, race, gender, housing characteristics, family characteristics, smoking
Respiratory and allergy questions▫ Examination data
Height, weight, spirometry measurements▫ Laboratory data
Tests on blood and urine
Elizabeth Stanwyck 30 June
2008 UMBC
9
A Proposed Model: General Description
▫ Molitor et al. (2006) “Bayesian Modeling of Air Pollution Health Effects with Missing Exposure Data” American Journal of Epidemiology
▫ Molitor et al. (2007) “Assessing Uncertainty in Spatial Exposure Models for Air Pollution Health Effects Assessment” Environmental Health Perspectives
• Object: Model the relationship between health effect and exposure to pollution▫ Disease Model
Model the effect of household-level long term pollutant exposure and the effects of various personal-level covariates on lung function [response]
▫ Measurement Model Model the long-term level of pollutant(s) exposure for an individual
▫ Exposure Model Model long-term pollutant exposure using various household-level
covariates
Elizabeth Stanwyck 30 June
2008 UMBC
10
The Data – Pollution (Molitor et al. 2006)•Southern California Children’s Health Study
▫Continuous, long-term central site measurements of air pollution in multiple (11) communities
▫Two seasonal short-term household-level measurements at a subset of residences within-communities
▫Pollutants measured: Ozone – O3
Nitrogen Dioxide – NO2 Particulate matter with diameter of 10 μm or less
– PM10
Elizabeth Stanwyck 30 June
2008 UMBC
11
The Data – Health Outcomes• Questionnaire data on demographic
characteristics, health outcomes, activities, housing characteristics
• Height, weight• Lung function (using spirometry): response of
interest▫FVC: forced vital capacity
Measure of lung volume▫FEV1: forced expiry volume in 1 second
Measure of airway flow▫These measures have been shown to be sensitive
indicators of lung response▫All measurements were taken annually from
study entry until high school graduation
Elizabeth Stanwyck 30 June
2008 UMBC
12
The Data •Geocoded locations of all residences and
schools
•Information about the distance from residence to nearest freeway
•Predicted pollution exposures using CALINE4▫Package developed by the California
Department of Transportation (CalTrans) to predict air concentrations of PM, CO, and NO2 near roadways.
Elizabeth Stanwyck 30 June
2008 UMBC
13
The Model
•C=11 communities, c = 1, 2, . . . , 11
• i=1, 2, . . . Nc individuals per community
•Measurements over j=1, 2 seasons
Elizabeth Stanwyck 30 June
2008 UMBC
14
Disease Model
• is the health effect of the ith individual in the cth community
• is a community-level [response] random effect▫ Modeled as ;
• is the (unobserved) true household-level concentration of a pollutant in community c
• is the (unobserved) true average concentration of a pollutant in community c
• is the vector of personal-level covariates that directly affect the health outcome for subject i in community c
•
Elizabeth Stanwyck 30 June
2008 UMBC
15
Measurement Model
• are observed, household-level exposure measurements in community c, in season j, for subject i.
• are central-site ambient pollution measurements in community c, in season j
• are central-site ambient pollution measurements in community c, averaged over all seasons
•
Elizabeth Stanwyck 30 June
2008 UMBC
16
Exposure Model
• is a community level [pollutant] random effect▫Modeled as ,
• are household-level exposure variables
•
Elizabeth Stanwyck 30 June
2008 UMBC
17
Elizabeth Stanwyck 30 June
2008 UMBC
18
The Complete Model
Elizabeth Stanwyck 30 June
2008 UMBC
19
Elizabeth Stanwyck 30 June
2008 UMBC
20
Molitor et al. (2006) vs. Molitor et al. (2007)
▫ Molitor et al. (2006) “Bayesian Modeling of Air Pollution Health Effects with Missing Exposure Data” American Journal of Epidemiology
• Molitor et al. (2007) “Assessing Uncertainty in Spatial Exposure Models for Air Pollution Health Effects Assessment” Environmental Health Perspectives
Elizabeth Stanwyck 30 June
2008 UMBC
21
DOES NOT INCLUDE SPATIAL
AUTOCORRELATION IN THE ANALYSIS
DOES INCLUDE SPATIAL
AUTOCORRELATION IN THE ANALYSIS
Spatial Autocorrelation• Standard regression models for exposure
prediction assume independence
• Air pollution has been shown to be spatially correlated within communities
• Health effects are also often spatially correlated
• If these correlations are not accounted for within the model, it will lead to biased parameter estimates and inefficient significance tests
Elizabeth Stanwyck 30 June
2008 UMBC
22
Spatial Autocorrelation• Disease Model
within-town spatial error influencing lung function measurements
• Exposure Model
within-town spatial error influencing “true” long-term pollutant exposure
• Measurement Model
Elizabeth Stanwyck 30 June
2008 UMBC
23
Spatial Autocorrelation
•Community-specific random effects:▫
is a between-community spatial error influencing lung function
▫
is a between-community spatial error term influencing exposure
Elizabeth Stanwyck 30 June
2008 UMBC
24
Elizabeth Stanwyck 30 June
2008 UMBC
25
Frequentist Analysis:Molitor et al. (2006)(without Spatial Autocorrelation)
• Use and to estimate
• Exposure model:
• Disease model:
• Fit exposure model, then use fitted values in the disease model via
• Three approaches to frequentist regression
Elizabeth Stanwyck 30 June
2008 UMBC
26
Frequentist Analysis(without Spatial Correlation)• Naïve model:
• Weighted single-imputation model:
• Multiple imputation model:▫ 5 sets of multiple first stage NO2 measurements were
generated for each person from , then imputed into the disease model
▫ Multiple regression results based on imputed values were combined to get final parameter estimates.
Elizabeth Stanwyck 30 June
2008 UMBC
27
Bayesian Analysis: Molitor et al. (2006)(without Spatial Autocorrelation)• Markov-Chain Monte Carlo method (Gibbs Sampling) using
WinBUGS
• Parameters, latent variables , and missing values can be estimated simultaneously (treated as random variables)
• 20,000 iterations for burn-in
• 100,000 iterations to compute posterior distributions
• Diffuse priors on parameters
▫ Regression parameters with normal priors
▫ Variance components with flat (uniform) priors
Elizabeth Stanwyck 30 June
2008 UMBC
28
Elizabeth Stanwyck 30 June
2008 UMBC
29
Molitor et al. (2006) “Bayesian Modeling of Air Pollution Health Effects with Missing Exposure Data” American Journal of Epidemiology
Bayesian Analysis(with Spatial Autocorrelation)• and are assumed to follow a spatial
distribution defined by the CAR (Conditional Autoregressive) model
• denotes the vector of spatial residual errors excluding the subject i
▫ and
is a weight matrix, specified to reflect the amount of spatial similarity between all pairs of individuals
• and are assumed to follow a similar CAR model▫ Elements of the weight matrix are specified as the inverse of
driving distance between two communities
Elizabeth Stanwyck 30 June
2008 UMBC
30
Weight matrices based on pairwise spatial similarities: Example 1
Elizabeth Stanwyck 30 June
2008 UMBC
31
Weight matrices based on pairwise spatial similarities: Example 2
Elizabeth Stanwyck 30 June
2008 UMBC
32
Images courtesy of Boots, B.N. “Weighting Thiessen Polygons” (1980) Economic Geography
Results
Elizabeth Stanwyck 30 June
2008 UMBC
33
Molitor et al. (2007) “Assessing Uncertainty in Spatial Exposure Models for Air Pollution Health Effects Assessment” Environmental Health Perspectives
Unique to this study. . . .• : outdoor measurements of pollutant
concentrations at subjects’ homes for 4 weeks (2 weeks in the summer and 2 weeks in the winter)
•This allowed a model that could incorporate the relationship between measurements from fixed-site monitors and measurements made at the subjects’ homes
•What if this information is not available?
Elizabeth Stanwyck 30 June
2008 UMBC
34
Spatial Interpolation Methods• Wong et al. (2004) “Comparison of Spatial Interpolation Methods
for the Estimation of Air Quality Data” Journal of Exposure Analysis and Environmental Epidemiology
• Estimation methods to assess personal exposure to pollutants given only central-site monitoring measurements
• Four interpolation methods are presented and compared:▫ Spatial Averaging▫ Nearest Neighbor▫ Inverse Distance Weighting▫ Kriging
Elizabeth Stanwyck 30 June
2008 UMBC
35
Spatial Interpolation Methods• All methods use a weighted average; only
difference is in the choice of weights
where
• are weights
• is the air pollution concentration at an unsampled point
• are the concentrations at neighboring sampled points
Elizabeth Stanwyck 30 June
2008 UMBC
36
Spatial Interpolation Methods• Spatial Averaging
▫ All sampled values within a fixed distance from the point of interest are assigned the same weight (based on the number of monitors)
• Nearest Neighbor▫ The single sample value closest to the point of interest is assigned
a weight of 1
• Inverse Distance Weighting▫ Samples closer to the point of interest have correspondingly larger
weights
• Kriging▫ Weights are assigned based on spatial autocorrelation statistics
• Choice of method depends on density and nature of monitoring sites
Elizabeth Stanwyck 30 June
2008 UMBC
37
Incorporating Topography and Atmospheric Conditions• Sheppard et al. (2001) “Correcting for the Effects of Location and
Atmospheric Conditions on Air Pollution Exposures in a Case-Crossover Study” Journal of Exposure Analysis and Environmental Epidemiology
• Under stagnant conditions, distribution of a pollutant may not be uniform – especially if the topography of the study area is very hilly or mountainous
• Systematic variation in the distribution of a pollutant can▫ alter personal exposure levels and ▫ bias health effect analysis
Elizabeth Stanwyck 30 June
2008 UMBC
38
Incorporating Topography and Atmospheric Conditions• Additional covariates that may be useful:
▫ Whether or not the study area is subject to a lot of wood-burning
▫ Elevation
▫ Topographical Index (TI) can be used to classify airshed
▫ Outdoor temperature
▫ Measure of stagnant weather conditions (e.g. data on daily wind-speeds)
▫ Season (winter or summer)
▫ Geocoding for subject residences
▫ Interactions (e.g., between temperature and stagnation)
Elizabeth Stanwyck 30 June
2008 UMBC
39
Elizabeth Stanwyck 30 June
2008 UMBC
40
Another approach to studying the adverse health effects of air pollution in children, based on the
paper (title above) by Zhengmin Qian et al. (2004), Environment International.
Elizabeth Stanwyck 30 June
2008 UMBC
41
Study Area
•Two districts in each of four Chinese Cities▫The cities are Chongqing, Guangzhou,
Lanzhou, and Wuhan▫One urban (relatively high pollution levels)
and one suburban (relatively low pollution levels) district chosen in each city
▫Cities were chosen because they were expected to exhibit a substantial gradient in pollution levels
Elizabeth Stanwyck 30 June
2008 UMBC
42
Children’s age groups
•5-16 years of age•All students from one (or two) elementary
schools in each district were recruited•99% (7754 of 7817) of the recruited
students were represented in questionnaire responses
•91% (7058) of the recruited students were used in the analysis
Elizabeth Stanwyck 30 June
2008 UMBC
43
Pollutants of Interest
•TSP (total suspended particles)•SO2 (sulfur dixoide)•NOx (oxides of nitrogen)•Size fractionated particulate matter:
▫PM2.5▫PM10-2.5 (= PM10 – PM2.5)▫PM10
•NOTE: this study deals with multiple pollutants at the same time (contrast with Molitor et al.)
Elizabeth Stanwyck 30 June
2008 UMBC
44
Pollutant measurement concerns•8 districts may not be independent in
terms of ambient pollution levels▫High levels of correlation between
pollutants ▫This multicollinearity interferes with
estimates of the exposure-response relationship
•Exposure assessment in this study is indirect (as opposed to direct biological/personal monitoring)
Elizabeth Stanwyck 30 June
2008 UMBC
45
Children’s health outcomes
•6 respiratory health outcomes were explored, based on questionnaire responses [binary]▫Cough▫Phlegm▫Cough with phlegm▫Wheeze▫Asthma▫Bronchitis
Elizabeth Stanwyck 30 June
2008 UMBC
46
Data sources
•Pollutant level data were obtained from municipal sources and school-yard monitors
•Health outcome data were obtained from questionnaires
•Covariate data were obtained from questionnaires
Elizabeth Stanwyck 30 June
2008 UMBC
47
Time scale of the study
•1993-1996•Health and covariate data (questionnaires)
were collected during the years 1993-1996•TSP, NOx, and SO2 measurements were
collected during the years 1993-1996•PM10, PM10-2.5, and PM2.5
measurements were collected during the years 1995-1996
•This is not a time-series analysis
Elizabeth Stanwyck 30 June
2008 UMBC
48
Analysis – Stage 1
•Data were analyzed using a two-stage procedure▫Stage 1: group the 8 districts into 4
district clusters using hierarchical clustering. This will create homogeneous study areas The “cluster number” will serve as a single
aggregate measure of the pollutants under study (by ordering the clusters according to pollutant levels)
Elizabeth Stanwyck 30 June
2008 UMBC
49
Analysis – Stage 1▫District classification was driven by
particulate matter pollution (TSP, PM10, PM10-2.5, PM2.5)
▫Ordering of clusters: Total pollution, TSP-PM10 and PM10-2.5:
C4>C3>C2>C1 PM2.5: C4>C2>C3>C1 SO2: C3>C4>C2>C1 NOx: C2>C4>C3>C1
Elizabeth Stanwyck 30 June
2008 UMBC
50
Analysis – Stage 1
Elizabeth Stanwyck 30 June
2008 UMBC
51
Analysis – Stage 2▫Stage 2: Logistic Regression
Unconditional logistic regression models were used to calculate covariate-adjusted odds ratios of each health outcome (separately) with respect to district clusters
Gradients of odds ratios were then compared to pollutant gradients
Elizabeth Stanwyck 30 June
2008 UMBC
52
Analysis – Stage 2
•Logistic regression:
Elizabeth Stanwyck 30 June
2008 UMBC
or
53
Analysis – Stage 2
•In our application:
Elizabeth Stanwyck 30 June
2008 UMBC
With X1, X2, X3 = district cluster dummy variables (for clusters 2, 3, and 4)X4 = ageX5 = gender indicator variable (1 if male)X6 = indicator for child sleeps in own roomX7 = indicator for mother’s education level (1 if more than middle school)X8 = indicator for father’s smoking status (1 if smokes)X9, X10 = indicators for house type (apartment or one-story house)X11 = indicator for cooking oil type (1 if rapeseed oil)X12, X13, X14 = indicators for coal heating exposureX15, X16, X17 = indicators for coal cooking exposure**Notice that α is not really meaningful for interpretation, since none in the study had age zero
54
Analysis – Stage 2•Estimates for the regression parameters:
•Say is the estimate for the effect of cluster 2, then the odds ratio of the effect of cluster 2 with respect to cluster 1 is
•Likewise, if is the estimated effect for cluster 3, then the odds ratio of the effect of cluster 3 with respect to cluster 1 is
Elizabeth Stanwyck 30 June
2008 UMBC
55
Analysis – Stage 2 / Results
Elizabeth Stanwyck 30 June
2008 UMBC
56
Results• Crude prevalence rates of phlegm, cough with
phlegm, bronchitis and wheeze had the same ranking order as the combined pollution levels (C4>C3>C2>C1).
• Cluster 1 was treated as a reference group, since that cluster had the lowest of all crude prevalence rates
• Odds ratios of cough with phlegm and wheeze had the same ranking order as combined pollution levels (C4>C3>C2>C1)
• Odds ratios were significantly higher in other clusters than in the reference for all outcomes
Elizabeth Stanwyck 30 June
2008 UMBC
57
Results• Integrated pollution levels were associated with
prevalence of cough with phlegm and wheeze
• PM10-2.5 and TSP were associated with prevalence of cough with phlegm and wheeze
• PM2.5 was associated with prevalence of cough, phlegm, bronchitis, and asthma
• SO2 and NOx were not associated with any of the health outcomes
Elizabeth Stanwyck 30 June
2008 UMBC
58
Results/Conclusions
•Relationships between ambient air pollutant mixture exposure and prevalence of cough with phlegm and wheeze are:▫Monotonic▫Positive▫Statistically significant
•These relationships are driven by particulate matter pollution levels across the district clusters.
Elizabeth Stanwyck 30 June
2008 UMBC
59
Concerns• Toxicological importance of different pollutants
cannot be reasonably weighted
• Health effects are self-reported, and thus there is potential for misclassification and/or recall bias
• Assumption: community mean concentrations of pollutant levels are a good surrogate for personal exposure/dosage
• Cluster analysis is highly empirical, and it may be difficult to extend the conclusions of this study to other cities/districts
Elizabeth Stanwyck 30 June
2008 UMBC
60
Future Direction• Develop a model that will simultaneously
incorporate two (or more) health outcomes▫Two binary outcomes▫Two continuous outcomes▫One binary and one continuous outcome
• Develop a model that will simultaneously incorporate two (or more) pollutants▫Handle multicollinearity among pollutants
• Combine the models to create a model involving multiple pollutants and multiple health outcomes
Elizabeth Stanwyck 30 June
2008 UMBC
61
Elizabeth Stanwyck: [email protected]
Dr. Bimal Sinha: [email protected]
Elizabeth Stanwyck 30 June
2008 UMBC
62