Top Banner
Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics Branch, Division of Cancer Epidemiology and Genetics
28

Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Dec 21, 2015

Download

Documents

Lewis Bradford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models

Nilanjan Chatterjee, PhDChief and Senior Investigator

Biostatistics Branch, Division of Cancer Epidemiology and Genetics

Page 2: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Thanks to team science!Biostatistics BranchJuHyun Park, Fellow Paige Maas, FellowJianxin Shi, TT InvestigatorJoshua Sampson, TT InvestigatorBin Zhu, TT InvestigatorMitchell Gail, InvestigatorMinsun Song, FellowDCEGStephen Chanock, DirectorNat Rothman, InvestigatorDebra Silverman, Investigator

Other Institutions/CollaborationsPeter Kraft, HSPHMontserrat Garcia-Closas, ICR, UKCambridge University, UKGerman Cancer Research CenterBPC3 ConsortiumBCAC Consortium

Page 3: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.
Page 4: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Utility of Risk Models• Individual counseling

– weighing risks and benefits for various preventive interventions

• Screening, medication, risk-factor modification

• Understanding distribution of risk at population-level and inform public heath strategies for prevention

• Comparative effectiveness studies

• Design of intervention trial

Page 5: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Methodological Issues

• Sample size and study design

• Model building – Polygenic risk score (PRS)– Incorporating environmental risk-factors– Using external information– Model calibration

• Model validation and evaluation

Page 6: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Limited Discriminatory Ability of Early GWAS Discoveries

“A tiny step to personalized risk prediction of breast cancer” - Devilee and Rookus, NEJM, Editorial

Page 7: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Many more to be found

Page 8: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

CancerSite

Family History

Only

KnownSNPs

Foreseeable SNPs

Family History

andKnown SNPs

Family History

and Foreseeabl

e SNPs

Epidemiologic Risk-Factors

and Foreseeabl

e SNPs

BREAST 0.536 0.599 0.635 0.613 0.646 0.670

PROSTATE 0.549 0.647 0.676 0.668 0.694

COLORECTUM

0.528 0.582 0.616 0.598 0.629 0.658

OVARY 0.509 0.557 0.568 0.564 0.575

BLADDER 0.514 0.596 0.615 0.602 0.620 0.726

GLIOMA 0.503 0.597 0.621 0.598 0.622

PANCREAS 0.517 0.576 0.600 0.588 0.610

Utility of Foreseeable Cancer SNPs

Park et al., JCO, 2012

Page 9: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Hidden Heritability for Complex Traits

•Heritability: fraction of total variance attributable to susceptibility (Quantitative traits) and sibling-recurrence-risks (Qualitative traits)

Trait HT BMI TC HDL LDL CD T1D T2D PrCA CAD

Narrow sense heritability ( )

0.45 0.14 - 0.12 - 0.22 0.30 0.51 0.22 -

Effective sample-size for the largest

GWAS133K 162K 100K 100K 95K 25K 22K 36K 28K 73K

No. of detected SNPs 108 31 45 35 36 64 30 22 20 21

Heritability explained by

detected SNPs0.066 0.014 0.063 0.046 0.059 0.066 0.053 0.034 0.061 0.024

2gh

NCI
change color scheme, update this table from the Nature Genetics paper
Page 10: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Challenges

• Many loci with very small effects are undetectable at genome-wide significance level

• Can we still exploit them to improve risk prediction? – Using a more liberal threshold or a fancier penalized

regression method?

• Needs an understanding of “power” in the context of prediction

Page 11: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Predictive Correlation Coefficient (PCC)

– covariances and variances are taken with respect to randomness of a “new” observation for which prediction is desired

– Remaining randomness is due to that of the “training” dataset

Page 12: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

The Expected PCC value for GWAS Polygenic Models

• Parameters of genetic architecture

• Properties of the statistical method

• For fixed N, optimal threshold (®opt(N)) can be chosen by maximizing ¹(N,®)

Chatterjee et al, Nature Genetics, 2013

NCI
Work on formula
Page 13: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Further Results

• Many measures of discriminatory performance of risk-model have a one-to-one relationship with PCC

• Can project performance of models that include polygenic-risk-score (PRS) and family history– Family hx effect is attenuated by a quantity related to PCC

Chatterjee et al., Nature Genetics, 2013

Page 14: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

AUC (Cont’d)

Trait

(AUC with FH alone)

Model

Current Sample size (N)

3xN 5xN

α=10-7 αOPT α=10-7 αOPT α=10-7 αOPT

T2D

(0.595)

SNPs 0.570 0.598 0.617 0.704 0.660 0.750

SNPs+FH 0.632 0.654 0.667 0.736 0.700 0.776

PrCA

(0.552)

SNPs 0.621 0.625 0.637 0.648 0.646 0.673

SNPs+FH 0.648 0.651 0.661 0.670 0.669 0.692

CAD

(0.601)

SNPs0.582-0.584

0.587-0.589

0.595-0.604

0.612-0.650

0.603-0.629

0.635-0.676

SNPs+FH0.647-0.648

0.651-0.652

0.656-0.663

0.669-0.697

0.663-0.681

0.686-0.717

Page 15: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Architecture of Joint Effects: Implications for Disease Prevention

Page 16: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Breast Cancer Risk Modeling: BPC3 Study

• 17,176 cases and 19,860 controls from 8 prospective studies

• Risk factors– Family history, height, reproductive risk-factors,

smoking, BMI, alcohol and HRT use

• SNPs– 24 genotyped SNPs, imputed PRS for 86 SNPs

Page 17: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Steps for Building Absolute Risk Model and Projecting Risk Distribution

• Develop models for relative-risk – Construction of efficient PRS, Model selection for gene-

gene/gene-environment interaction

• Utilize rates from SEER cancer registry to calibrate absolute risk to the US population

• Use national survey data to project risk distribution

Page 18: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Gene-gene/Gene-Environment Interactions in Disease-risk

• Interaction in what scale?– Logistic, probit (liability threshold), additive…

• Little evidence of SNP-SNP/SNP-E interactions under the logistic scale– Lack of power or are risks truly multiplicative?– Does the scale matter?

• Important to have good model-fit at extremes of disease risks– Clinically important

Page 19: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Linear Logistic vs Linear Additive Null Models

• Linear logistic

• Linear additive

• Can be fitted in the logistic scale under rare disease assumption

Page 20: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

10 15 20 25

-1.0

-0.5

0.0

0.5

Number of risk alleles at the 19 loci

log

OR

02

00

40

06

00

80

01

00

01

20

01

40

0F

req

ue

ncy

Page 21: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

10 15 20 25

-1.0

-0.5

0.0

0.5

Number of risk alleles at the 19 loci

log

OR

02

00

40

06

00

80

01

00

01

20

01

40

0F

req

ue

ncy

Page 22: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

A Tail-based Goodness-of-fit Test (also a global test for interaction)

Song et al. (Biostatistics, In Press)

Page 23: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Multiplicative Model Additive Model

Complete case analysis

Analysis including subjects with missing

genotypes

Complete case analysis

Hom OR Het OR Hom OR Het OR Hom OR Het OR

Hosmer and Lemeshow test

0.11 0.87 . . 0.0003 0.01

Tail-based Test

C=25 0.11 0.85 0.16 0.11 0 0

C=100 0.20 0.77 0.23 0.17 0 0

Page 24: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Statistically Speaking…

• Multiplicative model could not be rejected even with a large dataset and a powerful method– Fit seems adequate even at extremes

• Modest departure cannot be ruled out

• Additive model is soundly rejected– Plethora of gene-gene interactions in the additive

scale

Page 25: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Does the Scale Matter Clinically?

• Stronger risk variation (or risk stratification) under the multiplicative than the additive model

• Proportion of the population identified at 2 fold or higher than average risk:– 1.16% under multiplicative model– 0.02% under additive model

• Correlation in PRS under two model= 0.93 (AUC is hardly different)

Page 26: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.

Concluding Remarks

• Translating heritability to predictability is hard– Due to highly polygenic (non-sparse) architecture

• Multiplicative model for gene-gene and gene-environment interaction works amazingly well

• Time to seriously think about public health implications for joint effects– Evaluate risk stratification – Stop using AUC

Page 27: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.
Page 28: Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics.