Slide 1 Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University http://www.orie.cornell.edu/~davidr January 2004 Joint work with Babette Brumback, Ray Carroll, Brent Coull, Ciprian Crainiceanu, Matt Wand, Yan Yu, and others Slide 2 Example (data from Hastie and James, this analysis in RWC) age (years) spinal bone mineral density 10 15 20 25 0.6 0.8 1.0 1.2 1.4 Slide 3 Possible Model SBMD i,j is spinal bone mineral density on ith subject at age equal to age i,j . SBMD i,j = U i + m(age i,j )+ ² i,j , i =1,...,m = 230, j = i,...,n i . U i is the random intercept for subject i. {U i } are assumed i.i.d. N (0,σ 2 U ). Slide 4 Underlying philosophy 1. minimalist statistics • keep it as simple as possible 2. build on classical parametric statistics 3. modular methodology
14
Embed
Semi Possible Model Semiparametric Modeling, Penalized ... · Semi 11 Slide 21 Selecting the Number of Knots 0 0.2 0.4 0.6 0.8 1-1-0.5 0 0.5 1 1.5 (a) SpaHet, j = 3, typical data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Semi 1
Slide 1
Semiparametric Modeling, PenalizedSplines, and Mixed Models
David Ruppert
Cornell University
http://www.orie.cornell.edu/~davidr
January 2004
Joint work with Babette Brumback, Ray Carroll, Brent
Coull, Ciprian Crainiceanu, Matt Wand, Yan Yu, and
others
Slide 2
Example (data from Hastie and James, this
analysis in RWC)
age (years)
spin
al b
one
min
eral
den
sity
10 15 20 25
0.6
0.8
1.0
1.2
1.4
Semi 2
Slide 3
Possible Model
SBMDi,j is spinal bone mineral density on ith subject at
age equal to agei,j.
SBMDi,j = Ui + m(agei,j) + εi,j,
i = 1, . . . , m = 230, j = i, . . . , ni.
Ui is the random intercept for subject i.
{Ui} are assumed i.i.d. N(0, σ2U).
Slide 4
Underlying philosophy
1. minimalist statistics
• keep it as simple as possible
2. build on classical parametric statistics
3. modular methodology
Semi 3
Slide 5
Reference
Semiparametric Regression by Ruppert, Wand, and
Carroll (2003)
• Lots of examples from biostatistics.
Slide 6
Recent Example — April 17, 2003
Canfield et al. (2003) — Intellectual impairment
and blood lead.
• longitudinal (mixed model)
• nine covariates (modelled linearly)
• effect of lead modelled as a spline (semiparametric
model)
– disturbing conclusion
Semi 4
Slide 7
0 5 10 15 20 25 30 3560
70
80
90
100
110
120
130
lead (microgram/deciliter)
IQ
Quadratic
Spline
Thanks to Rich Canfield for data and estimates.
Slide 8
Semiparametric regression
Partial linear or partial spline model:
Yi = WTi βββW + m(Xi) + εi.
m(x) = XTi βββX + BT(x)b.
BT(x) = ( B1(x) · · · BK(x) ) .
E.g.,
XTi = ( Xi · · · Xp
i )
BT(x) = { (x− κ1)p+ · · · (x− κK)p
+ }
Semi 5
Slide 9
Example
m(x) = β0 + β1x + b1(x− κ1)+ + · · ·+ bK(x− κK)+
• slope jumps by bk at κk
Slide 10
Linear “plus” function
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 plus fn.derivative
Semi 6
Slide 11
Fitting LIDAR data with plus functions
range
log
ratio
400 500 600 700
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
Slide 12
Generalization
m(x) = β0+β1x+· · ·+βpxp+b1(x−κ1)
p++· · ·+bK(x−κK)p
+
• pth derivative jumps by p! bk at κk
• first p− 1 derivatives are continuous
Semi 7
Slide 13
Quadratic “plus” function
0 0.5 1 1.5 2 2.5 30
0.5
1
1.5
2
2.5
3
3.5
4 plus fn.derivative2nd derivative
Slide 14
Ordinary Least Squares
400 600−1
−0.8
−0.6
−0.4
−0.2
0
Raw Data
400 600−1
−0.8
−0.6
−0.4
−0.2
0
2 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
3 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
5 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
10 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
20 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
50 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
100 knots
Semi 8
Slide 15
Penalized least-squares
Minimizen∑
i=1
{Y − (WT
i βββW + XTi βββX + BT(Xi)b)
}2+ λbTDb.
E.g.,
D = I.
Slide 16
Penalized Least Squares
400 600−1
−0.8
−0.6
−0.4
−0.2
0
Raw Data
400 600−1
−0.8
−0.6
−0.4
−0.2
0
2 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
3 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
5 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
10 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
20 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
50 knots
400 600−1
−0.8
−0.6
−0.4
−0.2
0
100 knots
Semi 9
Slide 17
Ridge Regression
From previous slide:
n∑i=1
{Y − (WT
i βββW + XTi βββX + BT(Xi)b)
}2+ λbTDb.
Let X have row (WTi XT
i BT(Xi) ). Then
βββW
βββX
b
=
{X TX + λ blockdiag(0,0,D)}−1X TY.
• Also, a BLUP in a mixed model and an empirical
Bayes estimator.
Slide 18
Linear Mixed Models
Y = Xβββ + Zb + ε
where b is N(0, σ2bΣΣΣb).
Xβββ are the “fixed effects” and Zb are the “random
effects.”
Henderson’s equations.
(βββ
b
)=
(XTX XTZ
ZTX ZTZ + λΣΣΣ−1b
)−1 (XTY
ZTY
).
λ =σ2
ε
σ2b
.
Semi 10
Slide 19
From previous slides:
Let X have row (WTi XT
i BT(Xi) ). Then
βββW
βββX
b
=
{X TX + λ blockdiag(0,0,D)}−1X TY.
Linear mixed model:(
βββ
b
)=
(XTX XTZ
ZTX ZTZ + λΣΣΣ−1b
)−1 (XTY
ZTY
)
={
(X Z )T (X Z ) + λ blockdiag(0, ΣΣΣ−1b )
}−1
(X Z )T Y
Slide 20
Selecting λ
1. cross-validation (CV)
2. generalized cross-validation (GCV)
3. ML or REML in mixed model framework
Semi 11
Slide 21
Selecting the Number of Knots
0 0.2 0.4 0.6 0.8 1−1
−0.5
0
0.5
1
1.5(a) SpaHet, j = 3, typical data set
y
Truefull−search
5 20 40 80 12095
100
105
110
115
K
rela
tive
MA
SE
(b) MASE comparisons
fixed nknotsmyopicfull−search
1 2 3 4 5 60
50
100
150
number of knots (coded)
freq
uenc
y
0 0.0125 0.0250
0.0125
0.025
ASE − K=5
AS
E −
K=
40
n = 200
Slide 22
0 0.2 0.4 0.6 0.8 1−0.5
0
0.5(a) SpaHetLS, j = 3, n = 2,000
y
Truefull−search
5 20 40 80 12095
100
105
110
115
K
rela
tive
MA
SE
(b) MASE comparisons
fixed nknotsmyopicfull−search
1 2 3 4 5 60
50
100
150
200
250
number of knots (coded)
freq
uenc
y
0 0.5 1 1.5
x 10−3
0
0.5
1
1.5x 10
−3
ASE − K=5
AS
E −
K=
40
n = 2, 000
Semi 12
Slide 23
0 5 10 15 20 250
1
2
x 10−4
dffit
(λ)
MS
E
MSE
Bias
Variance
Optimal
n = 10, 000, 20 knots, quadratic spline
Slide 24
Return to spinal bone mineral density study
age (years)
spin
al b
one
min
eral
den
sity
10 15 20 25
0.6
0.8
1.0
1.2
1.4
SBMDi,j = Ui + m(agei,j) + εi,j,
i = 1, . . . , m = 230, j = i, . . . , ni.
Semi 13
Slide 25 X =
1 age11...
...
1 age1n1
......
1 agem1...
...
1 agemnm
Slide 26 Z =
1 · · · 0 (age11 − κ1)+ · · · (age11 − κK)+
.... . .
......
. . ....
1 · · · 0 (age1n1− κ1)+ · · · (age1n1
− κK)+
......
......
. . ....
0 · · · 1 (agem1 − κ1)+ · · · (agem1 − κK)+
.... . .
......
. . ....
0 · · · 1 (agemnm− κ1)+ · · · (agemnm
− κK)+
Semi 14
Slide 27 u =
U1
...
Um
b1
...
bK
Slide 28
age (years)
spin
al b
one
min
eral
den
sity
10 15 20 25
0.6
0.8
1.0
Variability bars on m and estimated density of Ui
Semi 15
Slide 29
Broken down by ethnicity
0.6
0.8
1.0
1.2
1.4Asian
10 15 20 25
Black
Hispanic
0.6
0.8
1.0
1.2
1.4White
10 15 20 25
age (years)
spin
al b
one
min
eral
den
sity
Slide 30
Model with ethnicity effects
SBMDij = Ui + m(ageij) + β1blacki + β2hispanici
+β3whitei + εij, 1 ≤ j ≤ ni, 1 ≤ i ≤ m.
Asian is the reference group.
Semi 16
Slide 31
Only requires an expansion of the fixed effects by adding
the columns
black1 hispanic1 white1
......
...
black1 hispanic1 white1
......
...
blackm hispanicm whitem
......
...
blackm hispanicm whitem
Slide 32co
ntra
st w
ith A
sian
sub
ject
s0.
00.
050.
100.
15
Black Hispanic White
Semi 17
Slide 33
• In this model, the age effects curve for the four ethnic
groups are parallel.
• Could we model them as non-parallel?
• Might be problematic in this example because of the
small values of the ni.
• But the methodology should be useful in other
contexts.
Slide 34
• Add interactions between age and black, hispanic,
and white.
– These are fixed effects.
• Then add interactions between black, hispanic,
white, and asian and the linear plus functions in
age.
– These are mean-zero random effects with their own
variance component
– This variance component control the amount of
shrinkage of the enthicity-specific curves to the
overall effect.
Semi 18
Slide 35
Penalized Splines and Additive Models
Additive model:
Yi = m1(X1,i) + . . . + mP (XP,i) + εi
Slide 36
Bivariate additive spline model
Yi = β0+βx,1Xi+ bx,1(Xi−κx,1)++ · · ·+ bx,K(Xi−κx,Kx)+