Event History Analysis 6 Sociology 8811 Lecture 20 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission
Dec 20, 2015
Event History Analysis 6
Sociology 8811 Lecture 20
Copyright © 2007 by Evan SchoferDo not copy or distribute without permission
Announcements
• Paper Assignment #2 handed out previously• Due April 26
• Class topic: • More details on Cox model diagnostics• Next class: parametric models, AFT models (if time)…• Then – new topics!
Review: Cox Model Basics
• Choosing how to deal with ties• Several algorithms available• Efron method is better than Breslow (default)• Exact marginal is even better – but time consuming
• The baseline hazard rate• Can be computed in stata…• Plus, you can estimate the hazard rates at particular
values of variables
• The proportional hazard assumption• We’ll continue discussing several methods…
Proportional Hazard Assumption
• Key assumption: Proportional hazards• Estimated Hazard ratios are proportional over time• i.e., Estimates of a hazard ratio do NOT vary over time
– Example: Effect of “abstinence” program on sexual behavior
• Issue: Do abstinence programs lower the rate in a consistent manner across time?
– Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group).
– Groups are assumed to have “parallel” hazards• Rather than rates that diverge, converge (or cross).
Proportional Hazard Assumption
• Strategies:
• 1. Visually examine raw hazard plots for sub-groups in your data
• Watch for non-parallel trends• A simple, crude method… but often identifies big
violations
Proportional Hazard Assumption• Visual examination of raw hazard rate
0.0
5.1
.15
1970 1980 1990 2000analysis time
west = 0 west = 1
Smoothed hazard estimates, by west
Parallel trends in hazard rate indicate proportionality
Proportional Hazard Assumption
• 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables
• What stata calls “stphplot”• Parallel lines indicate proportional hazards• Again, convergence and divergence (or crossing)
indicates violation
– A less-common approach: compare observed survivor plot to predicted values (for different values of X)
• What stata calls “stcoxkm”• If observed are similar to predicted, assumption is not
likely to be violated.
Proportional Hazard Assumption• -ln(-ln(survivor)) vs. ln(time) – “stphplot”
Convergence suggests violation of proportional hazard assumption
(But, I’ve seen worse!)
-10
12
34
-ln[-
ln(S
urv
ival
Pro
babi
lity)
]
7.585 7.59 7.595 7.6 7.605ln(analysis time)
west = 0 west = 1
Proportional Hazard Assumption• Cox estimate vs. observed KM – “stcoxkm”
0.0
00.
20
0.4
00.
60
0.8
01.
00
Sur
viva
l Pro
bab
ility
1970 1980 1990 2000analysis time
Observed: west = 0 Observed: west = 1Predicted: west = 0 Predicted: west = 1
Predicted differs from observed for countries in West
Proportional Hazard Assumption
• 3. Piecewise Models• Piecewise = break model up into pieces (by time)
– Ex: Split analysis in to “early” vs “late” time
• If coefficients vary in different time periods, hazards are not proportional
– Example:• stcox var1 var2 var3 if _t < 10 • stcox var1 var2 var3 if _t >= 10 • Look for large changes in coefficients!
Proportional Hazard Assumption
• In a piecewise model, coefficients would differ in non-proportional models
Proportional Non-Proportional
Here, the effect is the same in both time periods
Early Late Early Late
Here, the effect is negative in the early period and positive in the late period
Piecewise Models• Look at coefficients at 2 (or more) spans of timeEARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4465818 .4255587 1.05 0.294 -.3874979 1.280661 degradation | -.282548 .1572746 -1.80 0.072 -.5908005 .0257045 education | -.0195118 .0328195 -0.59 0.552 -.0838368 .0448131 democracy | .2295673 .2625205 0.87 0.382 -.2849634 .744098 ngo | .6792462 .3110294 2.18 0.029 .0696399 1.288853 ingo | .6664661 .4804229 1.39 0.165 -.2751456 1.608078------------------------------------------------------------------------------LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4963942 .357739 1.39 0.165 -.2047613 1.19755 degradation | -.5702894 .2395257 -2.38 0.017 -1.039751 -.1008277 education | .0142118 .0143762 0.99 0.323 -.0139649 .0423886 democracy | .2541799 .0981386 2.59 0.010 .0618317 .4465281 ngo | .1742862 .1448187 1.20 0.229 -.1095532 .4581256 ingo | -.1134661 .2104308 -0.54 0.590 -.5259028 .2989707------------------------------------------------------------------------------
Note: Effect of ngo is larger in early period
Proportional Hazard Assumption
• 4. Tests based on re-estimating model• Try including time interactions in your model• Recall: Interactions – effect of A on C varies with B• If effect of variable X on hazard rate (or ratio) varies
with time, then hazards aren’t proportional
– Recall example: Abstinence programs• Perhaps abstinence programs have a big effect initially,
but the effect diminishes (or reverses) later on
Proportional Hazard Assumption
• Red = Abstinence group; green = control
No time interaction Positive timeinteraction
In non-proportional case, the effect of abstinence programs varies across time
Proportional Hazard Assumption
• Strategy: Create variables that reflect the interaction of X variables with time
• Significant effects of time interactions indicate non-proportional hazard
• Fortunately, inclusion of the interaction term in the model corrects the problem.
• Issue: X variables can interact with time in multiple ways…
– Linearly– With “log time” or time squared– With time dummies– You may have to try a range of things…
Proportional Hazard Assumption
• Red = Abstinence group; green = control
Linear time interactionEffect grows consistently over timeTry “Abstinence*time”
Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”
Proportional Hazard Assumption
• 5. Grambsch & Therneau test – Ex: Stata “estat phtest”
• Test for non-zero slope of Schoenfeld residuals vs time– Implies log hazard ratio function = proportional
• Can be applied to general model, or for each variable
stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*)
. estat phtest
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Significant chi-square indicates violation of proportional hazard assumption
Proportional Hazard Assumption
• Variable-by-variable test “estat phtest”:
. estat phtest, detail
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Note: Certain variables are especially problematic…
Proportional Hazard Assumption• Notes on estat phtest :
– 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model
– And, if you want a test for each variable, you must also request scaled schoenfeld residuals
– 2. Test is based on identifying non-zero time trend… but how should we characterize time?
• Options: normal/linear time, log time, time dummies, etc– Results may differ depending on your choice– Ex: estat phtest, log – specifies “log time”
• Plot of smoothed Schoenfeld residuals can indicate best way to characterize time
– Linear trend (not a curve) indicates that time is characterized OK– Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)
Proportional Hazard Assumption
• What if the assumption is violated?
• 1. Improve model specification• Add time interactions to address nonproportionality• Ex: If high democracies are not proportional to low
democracies, try adding “highdemoc*time”• Variables can be interacted with linear time, log time,
time dummies, etc., to address the issue
• 2. Model groups separately• Split sample along variables that are non-proportional.
Proportional Hazard Assumption
• What if the assumption is violated?
• 3. Use a stratified Cox model• Allows a different baseline hazard for each group
– But, you can’t estimate effect of stratifying variable!
• Ex: stcox var1 var2 var3, strata(Dhighdemoc)
• 4. Use a piecewise model• Split time into chunks… in which PH assumption is met
– Requires sufficient sample size in all time periods!
Proportional Hazard Assumption
• What if the assumption is violated?
• 5. Live with it (but temper your conclusions)• Violation of proportional hazard assumption tends to:
– Overestimate the effect of variables whose hazard ratios are increasing over time
– And, underestimate those whose hazard ratios are decreasing
• However, Allison points out: Cox model is reasonably robust
– Other issues (e.g., model misspecification) are bigger issues
Cox Model: Residuals
• OLS regression: Residuals = difference between predicted value of Y and observed
• Y-hat – Yi
• EHA: Residuals are more complicated• You could compute predicted failure minus observed…• But, what about censored cases? What is observed?• There are a number of different ways to calculate
residuals… each with different properties.
Cox Model: Residuals – Summary• From Cleves et al. (2004) An Introduction to Survival
Analysis Using Stata, p. 184:• 1. Cox-Snell residuals
• … are useful for assessing overall model fit
• 2. Martingale residuals• Are useful in determining the functional form of the covariates to
be included in the model
• 3. Schoenfeld residuals (scaled & unscaled), score residuals, and efficient score residuals
• Are useful for checking & testing the proportional hazard assumption, examining leverage points, and identifying outliers
• 4. Deviance residuals• Are useful fin examining model accuracy and identifying outliers.
Martingale/deviance Residuals Outliers
• Martingale residuals: difference over time of observed failures minus expected failures
• Feature: range from +1 to –infinity
– Deviance residuals = martingale residuals that are rescaled to be symmetric around zero
• Easier to interpret
• Extreme martingale or deviance residuals may indicate outliers
• Plot residuals vs. time, case number, IVs, etc.• Or simply sort data by residuals & list the cases.
Martingale & Deviance Residuals: Outliers
• Stata code to identify outliers:
*run Cox Model, calculate martingale residualsstcox var1 var2 var3, robust nohr mgale(mg)* Creates variable “mg” which contains martingale residuals* Next, compute deviance residuals using “predict”predict dev, deviancegen caseid = _n* create plots of various typesscatter mg caseid* Deviance residual plots are generally easier to interpretscatter dev caseidscatter dev caseid, mlabel(newname2)scatter dev _t
Deviance Residuals Plot• Extreme values may be outliers
CROATIA
LATVIA
MACEDONIA
SLOVAKIA
SLOVENIA
ALGERIA
ANGOLA
BENIN
BUR-FASO
BURUNDI
CAMEROON
CHADCOMOROS
CONGO
EGYPT
ETHIOPIA w e
GAMBIA
GHANA
GUINEA
IVORY-CO
KENYA
MADAGASC
MALAWI
MALI
MAURITANMAURITIUS
MOROCCO
MOZAMBIQ
NIGER
NIGERIA
RWANDA
SENEGAL
SIERRA-L
SO-AFRICA
TANZANIA
TOGO
UGANDA
ZAMBIA
ZIMBABWE
CANADA
COSTA-RICUBA
DOM-REP
EL-SALVA
GUATEMA
HONDURAS
JAMAICA
MEXICO
NICARAGPANAMATRIN&TOB
USA
ARGENTIN
BOLIVIA
BRAZIL
CHILE
COLOMBIA
ECUADORGUYANA
PARAGUAY
PERU
URUGUAY
BANGLAD
CYPRUS
KAMPUCH
INDIA
INDONES
IRAN
ISRAEL
JAPAN
JORDAN
KOREA-R(S
LEBANON
MALAYSIA
NEPAL
PAKISTAN
PHILIPPI
SINGAPOR
SRI-LAN
SYRIA
THAILAND
TURKEY
BELGIUM
DENMARKFINLAND
ICELAND
IRELAND
LUXEMB
NETHERL
NORWAY
PORTUGAL
SWEDEN
SWITZERL
AUSTRAL
NEW-ZEAL
-2-1
01
2de
via
nce
re
sidu
al
0 1000 2000 3000caseid
Here, no obvious outliers are visible
Efficient Score Residuals: Influential Cases
• Procedure for identifying outliers using ESRs• It is possible to compute DFBETAs based on ESRs• DFBETA: Change in coefficient a variable’s coefficient
due to a particular case in the analysis– Cases with big DFBETAS may be overly influential
– Issue: Stata cannot automatically compute DFBETAS…
• You have to compute them manually• Also, computation = limited to 800 cases (for
“intercooled stata”)• Hopefully stata will improve this in the future.
ESRs: Influential Cases• Stata code to estimate DFBETAs:* Run Cox model, request efficient score residuals* Creates vars: esr1 to esr5 corresponding to vars listed in modelstcox gdp var1 var2 var3 var4, robust nohr esr(esr*)* Create room for a matrix of up to 800 rows (for your cases)set matsize 800* Create esr matrixmkmat esr1 esr2 esr3 esr4, matrix(esr)* Multiply ESRs and Var/Cov matrix to estimate DFBETAs, save resultsmat V=e(V)mat Inf = esr*Vsvmat Inf, names(s)* Label estimates for subsequent plotslabel var s1 "dfbeta – var 1"label var s2 "dfbeta – var 2"label var s3 "dfbeta – var 3"label var s4 "dfbeta – var 4"* Plot DFBETAs for each variable vs. time or case numberscatter s1 _t, yline(0) mlab(caseID) s(i)scatter s1 casenumber, yline(0) mlab(caseID) s(i)* Look for extreme values (for each IV – s1 to s4)
DFBETA Example• DFBETA for NGOs (plotted by casenumber)
LATVIA
MACEDONIA
MACEDONIAMACEDONIAMACEDONIA
MACEDONIA
SLOVAKIASLOVAKIASLOVAKIASLOVAKIASLOVAKIASLOVAKIASLOVAKIASLOVAKIASLOVENIASLOVENIASLOVENIASLOVENIASLOVENIASLOVENIASLOVENIASLOVENIASLOVENIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIAALGERIA
ALGERIA
ANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLAANGOLA
ANGOLA
BENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENINBENIN
BENIN
BUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASOBUR-FASO
BUR-FASO
BURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDIBURUNDI
BURUNDI
CAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROONCAMEROON
CAMEROON
CHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHADCHAD
CHAD
COMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCOMOROSCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGOCONGO
CONGO
EGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPTEGYPT
EGYPT
ETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIAETHIOPIA w eETHIOPIA w eETHIOPIA w eETHIOPIA w eETHIOPIA w eETHIOPIA w eETHIOPIA w eETHIOPIA w eGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIAGAMBIA
GAMBIA
GHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGHANAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEAGUINEA
GUINEAIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-COIVORY-CO
KENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYAKENYA
KENYA
MADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASCMADAGASC
MADAGASC
MALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWIMALAWI
MALAWI
MALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALIMALI
MALI
MAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITANMAURITAN
MAURITAN
MAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUSMAURITIUS
MAURITIUS
MOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOROCCOMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQMOZAMBIQNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGERNIGER
NIGER
NIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIANIGERIA
NIGERIA
RWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDARWANDASENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSENEGALSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-LSIERRA-L
SO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICASO-AFRICA
SO-AFRICA
TANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATANZANIATOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOTOGOUGANDAUGANDA
-.05
0.0
5.1
dfb
eta
- ng
o
0 200 400 600 800casenumber
DFBETA value indicates that presence of Latvia changes NGO coefficient by +.075 standard deviations
Cox-Snell residuals: Model Fit
• Cox-Snell residuals can be plotted to assess model fit
• If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line
– Strategy in stata:• Run Cox model, request martingale residuals• Use “predict” to compute Cox-Snell residuals• Stset your data again, with Cox-Snell as time variable• Compute integrated hazard• Graph integrated hazard versus residuals.
Cox-Snell residuals: Model Fit
• Cox-Snell residuals can be plotted to assess model fit
• If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line
– Strategy in stata:• Run Cox model, request martingale residuals• Use “predict” to compute Cox-Snell residuals• Stset your data again, with Cox-Snell as time variable• Compute integrated hazard• Graph integrated hazard versus residuals.
Cox-Snell Model Fit Example• Cox-Snell Plot for Environmental Law data
0.5
11.
52
0 .2 .4 .6partial Cox-Snell residual
Nelson-Aalen cumulative hazardpartial Cox-Snell residual
This looks quite bad. Cumulative hazard should fall on the line… Instead, there is a sizable gap.
Note: Don’t worry about small deviations from the line at the right edge of the plot. There are typically few cases there…
Stratified Cox Models
• Stratified models allow different baseline hazards for sub-groups in your data
• But constrain all model coefficients to be the same across all groups
• Useful if we think that some sub-groups have very different hazard curves over time
– AND we aren’t really interested in differences across those groups – it is just a nuisance
• Another option is to simply analyze groups separately– But, we lose sample size. Stratifying avoids that.
Cox Models for Grouped Data
• Sometimes cases are not independent• Ex: Students in same class; people in same family
• Two useful options:– 1. Stata’s “cluster” command: Adjusts standard
errors based on group membership•stcox var1 var2 var3, cluster(FamilyID)
– 2. Cox model with shared frailty • Another name for a random effects model
– We’ll discuss this later in the course– Don’t confuse with non-shared frailty models!
•stcox var1 var2 var3, shared(FamilyID)
Stata Note: TVC
• Note: Stata has a special option for “time varying covariates”
• You DO NOT need to use this!!!– It is designed for cases where you wish to specify the
character of time variation (e.g., a rate of decay)
• You can simply create time-varying variables in your dataset…
– STATA will analyze it properly using the stcox command.