An Overview of Two Recent Advances in Trajectory Modeling Daniel S Nagin
An Overview of Two Recent Advances in Trajectory Modeling
Daniel S Nagin
Combining Propensity Score Matching and Group-Based Trajectory Analysis in an Observational Study (Psychological Methods, 2007) (Also, Developmental Psychology, 2008)Amelia Haviland, RAND Corporation
Daniel S. Nagin, Carnegie Mellon University
Paul R. Rosenbaum, University of Pennsylvania
3
Problem Setting Inferring the “treatment (aka causal) effect” of an
important life event or a therapeutic intervention with non-experimental longitudinal data
Overcoming severe selection problem whereby treatment probability depends heavily upon prior trajectory of the outcome-- Boys with high prior violence levels are more likely to join gangs
Dealing with feedback effects--violence and gang membership may be mutually reinforcing
Treatment effect may also depend upon prior trajectory of the outcome
Measuring effect of gang membership is prototypical example of a large set of important inference problems in psychopathology Divorce and depression Drug treatment and drug abuse
4
Montreal Data
1037 Caucasian, francophone, nonimmigrant males
First assessment at age 6 in 1984
Most recent assessment at age 17 in 1995
Data collected on a wide variety of individual, familial, and parental characteristics including self-reported violent delinquency and gang membership from age 11 to 17
Prototypical modern longitudinal dataset—rich measurements about the characteristics and behaviors of participants
5
Annual Assessments of Violent Delinquency and Gang Membership Violent Delinquency—frequency in last year of:
Gang fighting Fist fighting Carrying/Using a Deadly Weapon Threatening or Attacking Someone Throwing an object at someone
Gang Membership: In the past year have you been part of a group or gang that committed reprehensible acts?
6
00.5
11.5
22.5
33.5
4
violence age 14
violence age 13
violence age 12
violence age 11
The Selection Problem: Violent Delinquency from Age 11 to 14 of Gang Members at Age 14
Gang member age 14
Non-gang member age 14
7
Cochran’s Advice on how to proceed: “How should the study be conducted if it were possible to do it by controlled experimentation?” Well defined treatment—what is the effect of first-time
gang membership at age 14 on violence at age 14 and beyond?
Good baseline measurements on the treated (gang members at 14) and controls (non-gang members at 14)—provided by trajectory groups
Randomize treatment to create comparability (i.e. balance) on all covariates between treated and controls—provided by propensity score matching
8
Treatment, Covariates, & Outcomes
TreatmentAssignment-1st-time gang status at 14
Baseline covariates—Fixed and time varyingIncluding violence prior to age 14
Responses to gang status at 14—Outcomes
Outcomes-violence at 14 and beyond
“Treatment compliance”-gang status at 15 and beyond
Time=0
Time= -
Time=+
9
Baseline Measurements: Trajectories of Violent Delinquency from Age 11 to 13 for Sub-sample with NO Gang Involvement over
this Period
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
11 12 13
Age
Delin
qu
en
t V
iole
nce
31% of ChronicsJoin Gangsat Age 14
15% of Decliners Join Gangs at Age 14
7% of LowsJoin Gangsat Age 14
10
Trajectory Groups as Baseline Measurements Allows test of whether facilitation effect of
gang membership depends on developmental history
Aids in controlling for selection effects by comparing gang and nongang members with comparable histories of violence that are uncontaminated by the effects of prior gang membership
11
Creating balance with propensity score matching Propensity score relates probability of treatment
to specified covariates By matching on propensity score, treated and
controls are balanced on the covariates in the propensity score
Imbalance may remain on other covariates
12
Creating balance—Match first-time gang joiners at 14 with one or more “comparable” non-gang joiners Match within trajectory group
Group-specific treatment effect estimates Helps to balance prior history of violence
Within Group Matching based on: Propensity score for gang membership at age 14 Covariates in the propensity score include:
Self reported violence at ages 10-13 plus teacher and peer ratings of aggression
Posterior probability of trajectory group membership Many risk factors for violence-gang membership such as
low iq and having a teen mother, hyperactivity and opposition
13
Twelve Covariates Comparing Gang Joiners at 14 with Potential Controls
14
Propensity for gang joining by trajectory group (before matching)
15
Matching Strategy
21 gang joiners in low trajectory matched with 105 (out of 276) non-gang joiners from that trajectory Number of matches range 2 to 7
38 gang joiners in declining trajectory matched with 114 (out of 216) non-gang joiners from that trajectory Number of matches range from 1 to 6
16
Balance before and after matching for selected variables
17
Standardized differences across the 15 variables used in matching
18
“Intent to Treat” Effects of First-time Gang Membership at 14 on Violence at age 14 to 17 Age Group Significance Level
14 LowDeclining
.008
.033
15 LowDeclining
.034
.086
16 LowDeclining
.044
.753
17 Low Declining
.070
.530
19
Effects of First-time Gang Membership at 14 on Violence at 14 to 17
Low Trajectory: Violence at Ages 14-17 by Gang Status at Age 14
0
0.5
1
1.5
2
violenceage 14
violenceage 15
violenceage 16
violenceage 17
Gang member age 14
Non-gang member age14
Declining Trajectory: Violence at ages 14 to 17 by Gang Status at Age 14
0
0.5
11.5
2
2.5
3
violenceage 14
violenceage 15
violenceage 16
violenceage 17
Gang member age 14
Non-gang member age14
20
Concluding Observations on Strengths of this Approach Trajectory Group Specific Effects Transparency Weaknesses Open to View Keeping Time in Order
Extending Group-Based Trajectory Modeling to Account for Subject Attrition
Daniel S. NaginCarnegie Mellon UniversityBobby Jones Carnegie Mellon UniversityAmelia HavilandRand Corporation
Trajectories Based on 1979 Dutch Conviction Cohort
Missing Data
• Two Types– Intermittent missing assessments (y1,
y2 , . ,y4, . ,y6)– Subject attrition where assessments cease
starting in period τ (y1 , y2 , y3 , . , . , .)• Both types assumed to be missing at random • Model extension designed to account for
potentially non-random subject attrition• No change in the model for intermittent
missing assessments
Some Notation
τi =period t in which subject i drops out
T=number of assessment periods
jt = Probability of Drop out in group j in period t
Probability of Dropout in Period t
Period Probability of Drop Out 1 0 2 3 4 . . . . . . T
No Drop Out
1 – all the above probabilities
The Dropout Extended Likelihood for Group j
).3()1)(;,,0|(),;,|(1
1
jT
t
jtjiitit
jjii i
i
jagewypjageYP
Specification of
• Binary Logit Model• Predictor Variables
– Fixed characteristics of i, – Prior values of outcome,
• If trajectory group was known within trajectory group j dropout would be “exogenous” or “ignorable conditional on observed covariates”
• Because trajectory group is latent, at population level, dropout is “non-ignorable”
jt
ix,...., 21 itit yy
Simulation Objectives
Examine effects of differential attrition rate across groups that are not initially well separated
Examine the effects of using model estimates to make population level projections
Simulation 1: Two Group Model With Different Drop Probabilities and Small Initial Separation
10 10
10 10
No dropoutSlope=.5
Time
E(y)
E(y)
E(y) E(y)
Time
Time Time
Group 1 Per Period
Dropout Probability
Expected Group 1
Assessment Periods
Probability of Group 1
Dropout on or before Period 6
Model Without Dropout
Model With Dropout
Group 1 Prob. Est.
(π1)
Percent Bias
Group 1 Prob. Est.
(π1)
Percent Bias
Dropout Prob.Est.
0 6.0 0 .200 0.0 .200 0.0 .000.05 5.3 .226 .171 -14.5 .199 -0.5 .051.10 4.7 .410 .146 -27.0 .199 -0.5 .099.15 4.2 .556 .122 -39.0 .200 0.0 .150.20 3.7 .672 .100 -50.0 .199 -0.5 .199.25 3.3 .762 .079 -60.5 .200 0.0 .250.30 2.9 .832 .061 -69.5 .199 -0.5 .301.35 2.6 .884 .046 -77.0 .199 -0.5 .350.40 2.4 .922 .034 -83.0 .199 -0.5 .398
Simulation Results: Group 1 and Group 2 Initially not Well Separated
Simulation 2: Projecting to the Population Level from Model Parameter Estimates
Chinese Longitudinal Healthy Longevity Survey (CLHLS) Random selected counties and cities in 22
provinces 4 waves 1998 to 2005 80 to 105 years old at baseline 8805 individual at baseline 68.9% had died by 2005 Analyzed 90-93 years old cohort in 1998
Activities of Daily Living
On your own and without assistance can you: Bath Dress Toilet Get up from bed or chair Eat
Disability measured by count of items where assistance is required
Table 3
Summary Statistic for the Age 90 to 93 CLHLS Cohort at Baseline
Variable N Average ADL 1998 Count 1078 .84 ADL 2000 Count 580 1.05 ADL 2002 Count 335 1.16 ADL 2005 Count 120 1.26
Female 1078 .52 Life Threatening
Disease 1078 .11
Table 4
Predict Population Average ADL counts from the Models With and Without Dropout
Model Without Drop Out
Model With Drop Out
Period Average ADL
Count
Predict ADL
Count
% Error
~1
t ~2
t ~3
t Predicted
ADL Count
% Error
1998 .84 .91 8.3 .201 .586 .213 .93 10.7 2000 1.05 1.19 13.3 .254 .600 .146 1.07 1.9 2002 1.16 1.42 22.4 .309 .593 .097 1.17 .9 2005 1.26 1.89 50.0 .366 .571 .063 1.58 25.4
Adding Covariates to Model to Test the Morbidity Compression v. Expansion Hypothesis • Will increases in longevity compress or expand
disability level in the population of the elderly?• “Had a life threatening disease” at baseline or
prior is positively correlated with both ADL counts at baseline and subsequent mortality rate.
• Question: Would a reduction in the incidence of life threatening diseases at baseline increase or decrease the population level ADL count?
Testing Strategy and Results
• Specify group membership probability (πj ) and dropout probability ( ) to be a function of life threatening disease variable
• Both also functions of sex and dropout probability alone of ADL count in prior period
• Life threatening disease significantly related to group membership in expected way but has no relationship with dropout due to death
• Thus, unambiguous support for compression
jt
Projecting the reduction in population average ADL count from a 25% reduction in the incidence of the life threatening disease at baseline
Year 1998 2000 2002 2005Reduction (%) 3.0 2.2 1.5 .7
Projected % Reduction in Population Average ADL Count
Table 6
Own and Cross Elasticity Estimates (%) for Life Threatening Disease Incidences
Cross Elasticity
Group Own Elasticity
Group 2
Group 3
Total Elasticity
1. Low ()201.1
NA -.033 -.059 -.092
2. Medium ()586.2
.069 NA -.173 -.104
3. High()213.3
.232 -.036 NA .196
Conclusions and Future Research Large differences in dropout rates across
trajectory groups matter Future research
Investigate effects of endogenous selection Compare results in data sets with more modest
dropout rates Further research morbidity expansion and
contraction