Top Banner
Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology Billy Heung Wing Chang, Yanxian Chen, Mingguang He Zhongshan Ophthalmic Center, Sun Yat-sen University Biostatistics Seminar Dalla Lana School of Public Health Feb 3, 2015 Longitudinal Prediction Feb 3, 2015 1 / 33
33

Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Jul 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Implications of Longitudinal Data in MachineLearning for Medicine and Epidemiology

Billy Heung Wing Chang, Yanxian Chen, Mingguang HeZhongshan Ophthalmic Center, Sun Yat-sen University

Biostatistics SeminarDalla Lana School of Public Health

Feb 3, 2015

Longitudinal Prediction Feb 3, 2015 1 / 33

Page 2: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Outline

1 Myopia and Myopia Prediction

2 Supervised Machine Learning and Prediction

3 Myopia Progression

4 Principal Component Analysis

5 Conclusion

Longitudinal Prediction Feb 3, 2015 2 / 33

Page 3: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Myopia

Commonly known as short-sightedness.Measured by Spherical Equivalence (SE), units = Dioptres (D).0 D: emmetropia (no myopia).0 D to -3 D: low myopia.Correctable by wearing glasses.

Morgan et. al. (2012) Lancet 379:1739-1748

Longitudinal Prediction Feb 3, 2015 3 / 33

Page 4: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Myopia Progression

COMET (2013) IOVS 54:7871-7883

Emmetropic at early ages.Myopia onset during elementary school.Myopia stabilization during secondary school.Age of onset, age of stabilization, and progression rates varies.

Longitudinal Prediction Feb 3, 2015 4 / 33

Page 5: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Risk Factors for Myopia

Largely believed to be genetic in the past.Prevalence in certain countries on rapid rise recently.

Lin, et al (2004) Ann Acad Med Singapore, 33, 27-33.

Education, near work, outdoor time.

Longitudinal Prediction Feb 3, 2015 5 / 33

Page 6: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

High Myopia

An extreme level of myopia. SE < -6.0 D.Increased risk of blindness.Irreversible.Prevention.

Longitudinal Prediction Feb 3, 2015 6 / 33

Page 7: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Preventive Treatment of High Myopia

To arrest myopia progression towards high myopia.Popular treatment: Atropine eye drops, specialized contact lens.

Shih et. al. (2002) Acta Ophthalmologica Scandinavica 79:3, 233-236

Long-term treatment with risk of severe side-effects.Idea: target only children at risk of high myopia.

Longitudinal Prediction Feb 3, 2015 7 / 33

Page 8: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Prediction of Children At-Risk

Given SE at early ages (10-13 years old). Predict the SE at age15.

Longitudinal Prediction Feb 3, 2015 8 / 33

Page 9: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Outline

1 Myopia and Myopia Prediction

2 Supervised Machine Learning and Prediction

3 Myopia Progression

4 Principal Component Analysis

5 Conclusion

Longitudinal Prediction Feb 3, 2015 9 / 33

Page 10: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Supervised Machine Learning

Construct a prediction model based on a “training" sample ofpredictors and responses {xi , yi}Ni=1, (xi , yi) ∼ (X,Y ).At prediction time: input the test case xtest = (x test

1 , x test2 , ...) into

the fitted model to obtain the prediction y test .E.g. linear regression.

I E(Y |X ) = β0 + β1X1 + β2X2 + ....I Training data {xi , yi}N

i=1 to estimate β0, β1, β2, ...I y test = β0 + β1x test

1 + β2x test2 + ....

Longitudinal Prediction Feb 3, 2015 10 / 33

Page 11: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Criterion for a Good Prediction Model

Generalization Ability:I Can the model make accurate prediction for data unused for

training?I Can the model be applied for prediction in the future?I Can the model be applied for other population?

Longitudinal Prediction Feb 3, 2015 11 / 33

Page 12: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Existing Works

Follow the above scheme:

Training Data→ Prediction Model→ Prediction

Issues of Generalization:I Must use data from the past.I Rely on population parameters.

Also need Y = endpoint SE: unrealistic.

Longitudinal Prediction Feb 3, 2015 12 / 33

Page 13: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Prediction using Longitudinal Data

With longitudinal data, we can extrapolate using SE at early ages.Endpoint SE not needed for model building.But this naive approach ignores myopia stabilization.

Age

SE

Age

SE

Longitudinal Prediction Feb 3, 2015 13 / 33

Page 14: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Change-point Model

Myopia progression will stabilize during adolescence.Use a Change-point Regression Model to imitate stabilization.

Age

SE

Change-Point

Age

SE

Change-Point

Longitudinal Prediction Feb 3, 2015 14 / 33

Page 15: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Mean-Matching for Change-point Selection

Fit regression using the available SE measures.Mean SE at midpoint age (14 years) was estimated.Fit change-point models using a range of change-points.Choose the change-point with the averaged prediction values thatbest matched the regression-predicted mean at the midpoint age.

Age

SE

Age

SE

mid-point

change-point

mid-point

Longitudinal Prediction Feb 3, 2015 15 / 33

Page 16: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Data Set

Training Data: Guangzhou Twin Eye Study.I 1281 pairs of twins. First-born twins are considered for analysis.I inclusion: 2nd follow-up SE before age 13. Endpoint age > 15.I 72 subjects remains. Right-eye SE is used.

Validation Data: Zhongshan Ophthalmic Center Optometry ClinicData.

I 1573 subjects.I same inclusion criterion as above.I 56 subjects remains. Left-eye SE is used.

Proposed methods compared with linear mixed effects model(LME).

Longitudinal Prediction Feb 3, 2015 16 / 33

Page 17: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Results: Prediction MSE for Twin Data

10 12 14 16

−6

−4

−2

02

One Follow−up

Age

SE ●

●●

●●

●●

●●

● ●

●●

−6 −4 −2 0

−12

−8

−6

−4

−2

0

1 follow−up, Naive

Endpoint SE

Pre

dict

ed

●●

●●

● ●

●●

● ●●

●●

● ●

●●

● ●

−6 −4 −2 0

−6

−4

−2

0

1 follow−up, Change−Point

Endpoint SE

Pre

dict

ed

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

●●

● ●●

−6 −4 −2 0

−8

−6

−4

−2

0

1 follow−up, LME

Endpoint SE

Pre

dict

ed

10 12 14 16

−6

−4

−2

02

Two Follow−up

Age

SE

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

−6 −4 −2 0

−10

−8

−6

−4

−2

0

2 follow−up, Naive

Endpoint SE

Pre

dict

ed

● ●

● ●

●●

●●

●●

●●

●●

−6 −4 −2 0

−6

−4

−2

0

2 follow−up, Change−Point

Endpoint SE

Pre

dict

ed

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

−6 −4 −2 0

−10

−8

−6

−4

−2

0

2 follow−up, LME

Endpoint SE

Pre

dict

ed

Longitudinal Prediction Feb 3, 2015 17 / 33

Page 18: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Results: Validation on Optometry Clinic Data, Prediction MSE

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

−8 −6 −4 −2 0

−8

−6

−4

−2

0

2 follow−up, Change−point

Endpoint SE

Pre

dict

ed●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

−8 −6 −4 −2 0

−10

−8

−6

−4

−2

2 follow−up, LME

Endpoint SE

Pre

dict

ed

Longitudinal Prediction Feb 3, 2015 18 / 33

Page 19: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Brief Summary

A simple change-point model for future SE prediction.Higher accuracy than linear mixed effects model.Potential reason:

I Change-point model accounts for myopia stabilization.I Linear mixed effects model contains many population parameters.

Lack generalization ability.

Longitudinal Prediction Feb 3, 2015 19 / 33

Page 20: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Outline

1 Myopia and Myopia Prediction

2 Supervised Machine Learning and Prediction

3 Myopia Progression

4 Principal Component Analysis

5 Conclusion

Longitudinal Prediction Feb 3, 2015 20 / 33

Page 21: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Analysis of Myopia Progression

To study the various aspects of myopia progression.Progression rate, myopia onset, myopia stabilization.To identify factors associated to progression rate, onset andstabilization.

Longitudinal Prediction Feb 3, 2015 21 / 33

Page 22: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Existing Appraoch for Progression ModellingGompertz model.A pre-defined model for modelling the entire progression.Require long term follow-up data. Lack-of-fit issues.Idea: perhaps with shorter-term follow-up data, we can still dosome analysis?

COMET (2013) IOVS 54:7871-7883

Longitudinal Prediction Feb 3, 2015 22 / 33

Page 23: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Outline

1 Myopia and Myopia Prediction

2 Supervised Machine Learning and Prediction

3 Myopia Progression

4 Principal Component Analysis

5 Conclusion

Longitudinal Prediction Feb 3, 2015 23 / 33

Page 24: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Principal Component Analysis (PCA)

Let x = {x1, x2, . . . , xP} ∼ F , E(x) = ~0{xi}Ni=1 i.i.d. samples from F

1 Find a unit vector v1 such that ˆvar(xT v1) is maximized.Step 1

−2 −1 0 1 2 3

−2−1

0 1

2 3

−2−1

0 1

2 3

x1x2

x3

●●

●●

●●

● ●

Step 2

−2 −1 0 1 2 3

−2−1

0 1

2 3

−2−1

0 1

2 3

x1

x2

x3

●●

●●

●●

● ●

2 Find a unit vector v2, v2 ⊥ v1, such that ˆvar(xT v2) is maximized.3 Repeat if necessary for v3,v4, etc...

zij = xTi vj is the jth principal component scores for xi .

Longitudinal Prediction Feb 3, 2015 24 / 33

Page 25: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Principal Component Analysis

−2 −1 0 1 2 3

−2−1

0 1

2 3

−2−1

0 1

2 3

x1

x2

x3

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2−1

01

2z1

z2

Longitudinal Prediction Feb 3, 2015 25 / 33

Page 26: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

PCA on Longitudinal Data

What if PCA is applied to longitudinal data?It finds major trends hidden within the data.Revisit the Twin data set.

I 637 first born twins with without cataract surgery history or loss of 3consecutive visits.

I Right-eye SE are used.I Missing data are imputed using linear regression.

Purpose: to identify major trends of myopia progression, andidentify potential factors associated with those trends.

Longitudinal Prediction Feb 3, 2015 26 / 33

Page 27: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

PCA on the Twin Data Set

1 2 3 4 5 6 7

−0.

4−

0.2

0.0

0.2

0.4

0.6

PC 1

follow−up

PC

Loa

ding

1 2 3 4 5 6 7

−0.

4−

0.2

0.0

0.2

0.4

0.6

PC 2

follow−up

PC

Loa

ding

1 2 3 4 5 6 7

−0.

4−

0.2

0.0

0.2

0.4

0.6

PC 3

follow−up

PC

Loa

ding

Longitudinal Prediction Feb 3, 2015 27 / 33

Page 28: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

PC Scores > 0.5 vs. < -0.5

PC2 represents rate of myopia progression.PC3 with positive scores represents myopia stabilization.PC3 with negative scores represents myopia onset.

Longitudinal Prediction Feb 3, 2015 28 / 33

Page 29: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Using the PC Scores as Responses in Regression

The PC scores zij = xTi vj are measures of the strength of the

trends.To identify risk factors for each trend. Regress zij = xT

i vj onto thepredictors.

Longitudinal Prediction Feb 3, 2015 29 / 33

Page 30: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

PCA scores and Risk Factors

Longitudinal Prediction Feb 3, 2015 30 / 33

Page 31: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

PCA scores and Risk Factors

Longitudinal Prediction Feb 3, 2015 31 / 33

Page 32: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Outline

1 Myopia and Myopia Prediction

2 Supervised Machine Learning and Prediction

3 Myopia Progression

4 Principal Component Analysis

5 Conclusion

Longitudinal Prediction Feb 3, 2015 32 / 33

Page 33: Implications of Longitudinal Data in Machine Learning for ... · PCA: hypothesis-free approach to analyze longitudinal trends in myopia progression. Hopefully, this presentation can

Conclusion

Change-point model with longitudinal data: a prediction methodwith good generalization ability.PCA: hypothesis-free approach to analyze longitudinal trends inmyopia progression.Hopefully, this presentation can suggest some ideas on howlongitudinal data can be used for prediction, and how dimensionreduction techniques can be used in longitudinal data analysis.

Longitudinal Prediction Feb 3, 2015 33 / 33