Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology Billy Heung Wing Chang, Yanxian Chen, Mingguang He Zhongshan Ophthalmic Center, Sun Yat-sen University Biostatistics Seminar Dalla Lana School of Public Health Feb 3, 2015 Longitudinal Prediction Feb 3, 2015 1 / 33
33
Embed
Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Implications of Longitudinal Data in MachineLearning for Medicine and Epidemiology
Billy Heung Wing Chang, Yanxian Chen, Mingguang HeZhongshan Ophthalmic Center, Sun Yat-sen University
Biostatistics SeminarDalla Lana School of Public Health
Feb 3, 2015
Longitudinal Prediction Feb 3, 2015 1 / 33
Outline
1 Myopia and Myopia Prediction
2 Supervised Machine Learning and Prediction
3 Myopia Progression
4 Principal Component Analysis
5 Conclusion
Longitudinal Prediction Feb 3, 2015 2 / 33
Myopia
Commonly known as short-sightedness.Measured by Spherical Equivalence (SE), units = Dioptres (D).0 D: emmetropia (no myopia).0 D to -3 D: low myopia.Correctable by wearing glasses.
Morgan et. al. (2012) Lancet 379:1739-1748
Longitudinal Prediction Feb 3, 2015 3 / 33
Myopia Progression
COMET (2013) IOVS 54:7871-7883
Emmetropic at early ages.Myopia onset during elementary school.Myopia stabilization during secondary school.Age of onset, age of stabilization, and progression rates varies.
Longitudinal Prediction Feb 3, 2015 4 / 33
Risk Factors for Myopia
Largely believed to be genetic in the past.Prevalence in certain countries on rapid rise recently.
Lin, et al (2004) Ann Acad Med Singapore, 33, 27-33.
Education, near work, outdoor time.
Longitudinal Prediction Feb 3, 2015 5 / 33
High Myopia
An extreme level of myopia. SE < -6.0 D.Increased risk of blindness.Irreversible.Prevention.
Longitudinal Prediction Feb 3, 2015 6 / 33
Preventive Treatment of High Myopia
To arrest myopia progression towards high myopia.Popular treatment: Atropine eye drops, specialized contact lens.
Long-term treatment with risk of severe side-effects.Idea: target only children at risk of high myopia.
Longitudinal Prediction Feb 3, 2015 7 / 33
Prediction of Children At-Risk
Given SE at early ages (10-13 years old). Predict the SE at age15.
Longitudinal Prediction Feb 3, 2015 8 / 33
Outline
1 Myopia and Myopia Prediction
2 Supervised Machine Learning and Prediction
3 Myopia Progression
4 Principal Component Analysis
5 Conclusion
Longitudinal Prediction Feb 3, 2015 9 / 33
Supervised Machine Learning
Construct a prediction model based on a “training" sample ofpredictors and responses {xi , yi}Ni=1, (xi , yi) ∼ (X,Y ).At prediction time: input the test case xtest = (x test
1 , x test2 , ...) into
the fitted model to obtain the prediction y test .E.g. linear regression.
I E(Y |X ) = β0 + β1X1 + β2X2 + ....I Training data {xi , yi}N
i=1 to estimate β0, β1, β2, ...I y test = β0 + β1x test
1 + β2x test2 + ....
Longitudinal Prediction Feb 3, 2015 10 / 33
Criterion for a Good Prediction Model
Generalization Ability:I Can the model make accurate prediction for data unused for
training?I Can the model be applied for prediction in the future?I Can the model be applied for other population?
Longitudinal Prediction Feb 3, 2015 11 / 33
Existing Works
Follow the above scheme:
Training Data→ Prediction Model→ Prediction
Issues of Generalization:I Must use data from the past.I Rely on population parameters.
Also need Y = endpoint SE: unrealistic.
Longitudinal Prediction Feb 3, 2015 12 / 33
Prediction using Longitudinal Data
With longitudinal data, we can extrapolate using SE at early ages.Endpoint SE not needed for model building.But this naive approach ignores myopia stabilization.
Age
SE
Age
SE
Longitudinal Prediction Feb 3, 2015 13 / 33
Change-point Model
Myopia progression will stabilize during adolescence.Use a Change-point Regression Model to imitate stabilization.
Age
SE
Change-Point
Age
SE
Change-Point
Longitudinal Prediction Feb 3, 2015 14 / 33
Mean-Matching for Change-point Selection
Fit regression using the available SE measures.Mean SE at midpoint age (14 years) was estimated.Fit change-point models using a range of change-points.Choose the change-point with the averaged prediction values thatbest matched the regression-predicted mean at the midpoint age.
Age
SE
Age
SE
mid-point
change-point
mid-point
Longitudinal Prediction Feb 3, 2015 15 / 33
Data Set
Training Data: Guangzhou Twin Eye Study.I 1281 pairs of twins. First-born twins are considered for analysis.I inclusion: 2nd follow-up SE before age 13. Endpoint age > 15.I 72 subjects remains. Right-eye SE is used.
Validation Data: Zhongshan Ophthalmic Center Optometry ClinicData.
I 1573 subjects.I same inclusion criterion as above.I 56 subjects remains. Left-eye SE is used.
Proposed methods compared with linear mixed effects model(LME).
Longitudinal Prediction Feb 3, 2015 16 / 33
Results: Prediction MSE for Twin Data
10 12 14 16
−6
−4
−2
02
One Follow−up
Age
SE ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0
−12
−8
−6
−4
−2
0
1 follow−up, Naive
Endpoint SE
Pre
dict
ed
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0
−6
−4
−2
0
1 follow−up, Change−Point
Endpoint SE
Pre
dict
ed
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
−6 −4 −2 0
−8
−6
−4
−2
0
1 follow−up, LME
Endpoint SE
Pre
dict
ed
10 12 14 16
−6
−4
−2
02
Two Follow−up
Age
SE
●
●
●
●●●
●
●
● ●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0
−10
−8
−6
−4
−2
0
2 follow−up, Naive
Endpoint SE
Pre
dict
ed
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
−6 −4 −2 0
−6
−4
−2
0
2 follow−up, Change−Point
Endpoint SE
Pre
dict
ed
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
−6 −4 −2 0
−10
−8
−6
−4
−2
0
2 follow−up, LME
Endpoint SE
Pre
dict
ed
Longitudinal Prediction Feb 3, 2015 17 / 33
Results: Validation on Optometry Clinic Data, Prediction MSE
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
−8 −6 −4 −2 0
−8
−6
−4
−2
0
2 follow−up, Change−point
Endpoint SE
Pre
dict
ed●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
−8 −6 −4 −2 0
−10
−8
−6
−4
−2
2 follow−up, LME
Endpoint SE
Pre
dict
ed
Longitudinal Prediction Feb 3, 2015 18 / 33
Brief Summary
A simple change-point model for future SE prediction.Higher accuracy than linear mixed effects model.Potential reason:
I Change-point model accounts for myopia stabilization.I Linear mixed effects model contains many population parameters.
Lack generalization ability.
Longitudinal Prediction Feb 3, 2015 19 / 33
Outline
1 Myopia and Myopia Prediction
2 Supervised Machine Learning and Prediction
3 Myopia Progression
4 Principal Component Analysis
5 Conclusion
Longitudinal Prediction Feb 3, 2015 20 / 33
Analysis of Myopia Progression
To study the various aspects of myopia progression.Progression rate, myopia onset, myopia stabilization.To identify factors associated to progression rate, onset andstabilization.
Longitudinal Prediction Feb 3, 2015 21 / 33
Existing Appraoch for Progression ModellingGompertz model.A pre-defined model for modelling the entire progression.Require long term follow-up data. Lack-of-fit issues.Idea: perhaps with shorter-term follow-up data, we can still dosome analysis?
COMET (2013) IOVS 54:7871-7883
Longitudinal Prediction Feb 3, 2015 22 / 33
Outline
1 Myopia and Myopia Prediction
2 Supervised Machine Learning and Prediction
3 Myopia Progression
4 Principal Component Analysis
5 Conclusion
Longitudinal Prediction Feb 3, 2015 23 / 33
Principal Component Analysis (PCA)
Let x = {x1, x2, . . . , xP} ∼ F , E(x) = ~0{xi}Ni=1 i.i.d. samples from F
1 Find a unit vector v1 such that ˆvar(xT v1) is maximized.Step 1
−2 −1 0 1 2 3
−2−1
0 1
2 3
−2−1
0 1
2 3
x1x2
x3
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
Step 2
−2 −1 0 1 2 3
−2−1
0 1
2 3
−2−1
0 1
2 3
x1
x2
x3
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
2 Find a unit vector v2, v2 ⊥ v1, such that ˆvar(xT v2) is maximized.3 Repeat if necessary for v3,v4, etc...
zij = xTi vj is the jth principal component scores for xi .
Longitudinal Prediction Feb 3, 2015 24 / 33
Principal Component Analysis
−2 −1 0 1 2 3
−2−1
0 1
2 3
−2−1
0 1
2 3
x1
x2
x3
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
● ●
●
●
●●
●●
●●
●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
−2 −1 0 1 2
−2−1
01
2z1
z2
Longitudinal Prediction Feb 3, 2015 25 / 33
PCA on Longitudinal Data
What if PCA is applied to longitudinal data?It finds major trends hidden within the data.Revisit the Twin data set.
I 637 first born twins with without cataract surgery history or loss of 3consecutive visits.
I Right-eye SE are used.I Missing data are imputed using linear regression.
Purpose: to identify major trends of myopia progression, andidentify potential factors associated with those trends.
Longitudinal Prediction Feb 3, 2015 26 / 33
PCA on the Twin Data Set
1 2 3 4 5 6 7
−0.
4−
0.2
0.0
0.2
0.4
0.6
PC 1
follow−up
PC
Loa
ding
1 2 3 4 5 6 7
−0.
4−
0.2
0.0
0.2
0.4
0.6
PC 2
follow−up
PC
Loa
ding
1 2 3 4 5 6 7
−0.
4−
0.2
0.0
0.2
0.4
0.6
PC 3
follow−up
PC
Loa
ding
Longitudinal Prediction Feb 3, 2015 27 / 33
PC Scores > 0.5 vs. < -0.5
PC2 represents rate of myopia progression.PC3 with positive scores represents myopia stabilization.PC3 with negative scores represents myopia onset.
Longitudinal Prediction Feb 3, 2015 28 / 33
Using the PC Scores as Responses in Regression
The PC scores zij = xTi vj are measures of the strength of the
trends.To identify risk factors for each trend. Regress zij = xT
i vj onto thepredictors.
Longitudinal Prediction Feb 3, 2015 29 / 33
PCA scores and Risk Factors
Longitudinal Prediction Feb 3, 2015 30 / 33
PCA scores and Risk Factors
Longitudinal Prediction Feb 3, 2015 31 / 33
Outline
1 Myopia and Myopia Prediction
2 Supervised Machine Learning and Prediction
3 Myopia Progression
4 Principal Component Analysis
5 Conclusion
Longitudinal Prediction Feb 3, 2015 32 / 33
Conclusion
Change-point model with longitudinal data: a prediction methodwith good generalization ability.PCA: hypothesis-free approach to analyze longitudinal trends inmyopia progression.Hopefully, this presentation can suggest some ideas on howlongitudinal data can be used for prediction, and how dimensionreduction techniques can be used in longitudinal data analysis.