Top Banner
1 Association Analysis Association Analysis of Rare Genetic of Rare Genetic Variants Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics
49

Association Analysis of Rare Genetic Variants

Feb 06, 2016

Download

Documents

mirit

Association Analysis of Rare Genetic Variants. Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics. Rare Variants. Low allele frequency : usually less than 1% Low power : for most analyses, due to less variation of observations - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Association Analysis of Rare Genetic Variants

1

Association Analysis of Association Analysis of Rare Genetic VariantsRare Genetic Variants

Qunyuan ZhangDivision of Statistical Genomics

Course M21-621 Computational Statistical Genetics

Page 2: Association Analysis of Rare Genetic Variants

2

Rare VariantsRare Variants

Low allele frequency: usually less than 1%

Low power: for most analyses, due to less variation of observations

High false positive rate: for some model-based analyses, due to sparse distribution of data, unstable/biased parameter estimation and inflated p-value.

Page 3: Association Analysis of Rare Genetic Variants

3

An Example of Low Power

Jonathan C. Cohen, et al. Science 305, 869 (2004)

Page 4: Association Analysis of Rare Genetic Variants

An Example of High False Positive Rate(Q-Q plots from GWAS data, unpublished)

N=~2500

MAF>0.03

N=~2500

MAF<0.03

N=~2500

MAF<0.03

Permuted

N=50000

MAF<0.03

Bootstrapped

Page 5: Association Analysis of Rare Genetic Variants

5

Three Levels of Three Levels of Rare Variant DataRare Variant Data

Level 1: Individual-level

Level 2: Summarized over subjects

Level 3: Summarized over both subjects and variants

Page 6: Association Analysis of Rare Genetic Variants

6

Level 1: Individual-level

Subject V1 V2 V3 V4 Trait-1 Trait-2

1 1 0 0 0 90.1 1

2 0 1 0 . 99.2 1

3 0 0 0 0 105.9 0

4 0 0 0 0 89.5 0

5 0 . 0 0 97.6 0

6 0 0 0 0 110.5 0

7 0 0 1 0 88.8 0

8 0 0 0 1 95.4 1

Page 7: Association Analysis of Rare Genetic Variants

7

Level 2: Summarized over subjects (by group)

Jonathan C. Cohen, et al. Science 305, 869 (2004)Jonathan C. Cohen, et al. Science 305, 869 (2004)

Page 8: Association Analysis of Rare Genetic Variants

Level 3: Summarized over subjects (by group) and variants (usually by gene)

Variant allele

number

Reference allele

numberTotal

Low-HDL group

20 236 256

High-HDL group

2 254 256

Total 22 490 512

Page 9: Association Analysis of Rare Genetic Variants

9

Methods For Level 3 Data

Page 10: Association Analysis of Rare Genetic Variants

10

Single-variant Test vs Total Freq.Test (TFT)

Jonathan C. Cohen, et al. Science 305, 869 (2004)

Page 11: Association Analysis of Rare Genetic Variants

11

What we have learned …

Single-variant test of rare variants has very low power for detecting association, due to extremely low frequency (usually < 0.01)

Testing collective effect of a set of rare variants may increase the power (sum test, collective test, group test, collapsing test, burden test…)

Page 12: Association Analysis of Rare Genetic Variants

12

Methods For Level 2 Data

Allowing different samples sizes for different variants

Different variants can be weighted differently

Page 13: Association Analysis of Rare Genetic Variants

13

CAST: A cohort allelic sums test Morgenthaler and Thilly, Mutation Research 615 (2007) 28–56

Under H0:S(cases)/2N(cases)−S(controls)/2N(controls) =0S: variant number; N: sample size

T= S(cases) − S(controls)N(cases)/N(controls)= S(cases) − S∗(controls)(S can be calculated variant by variant and can be weighted differently, the final T=sum(WiSi) )

Z=T/SQRT(Var(T)) ~ N (0,1)

Var(T)= Var (S(cases) − S* (controls) )=Var(S(cases)) + Var(S* (controls))=Var(S(cases)) + Var(S(controls)) X [N(cases)/N(controls)]^2

Page 14: Association Analysis of Rare Genetic Variants

14

C-alpha

PLOS Genetics, 2011 | Volume 7 | Issue 3 | e1001322

Effect direction problem

Page 15: Association Analysis of Rare Genetic Variants

15

C-alpha

Page 16: Association Analysis of Rare Genetic Variants

QQ Plots of Existing Methods (under the null)

•EFT and C-alphainflated with false positives

•TFT and CAST no inflation, but assuming single effect-direction

•ObjectiveMore general, powerful methods …

CAST C-alpha

EFT TFT

Page 17: Association Analysis of Rare Genetic Variants

17

More Generalized Methods For Level 2 Data

Page 18: Association Analysis of Rare Genetic Variants

Structure of Level 2 datavariant 1

variant i variant k

variant 2

Strategy

Instead of testing total freq./number, we test the randomness of all tables.

variant 3 …

Page 19: Association Analysis of Rare Genetic Variants

4. Calculating p-value P= Prob.( )

Exact Probability Test (EPT)

k

iiPL

1

)log(

iA

iiiiii nNCanCanCP ,,, 2211

1.Calculating the probability of each table based on hypergeometric distribution

2. Calculating the logarized joint probability (L) for all k tables

3. Enumerating all possible tables and L scores

ASHG Meeting 1212, Zhang

Page 20: Association Analysis of Rare Genetic Variants

Likelihood Ratio Test (LRT)

2~):,,,Pr(

):,,,Pr(log2

1212211

12102211

kdfHbaba

HbabaLR k

i

iiA

iiii

k

i

iiiiii

Binomial distribution

ASHG Meeting 1212, Zhang

Page 21: Association Analysis of Rare Genetic Variants

Q-Q Plots of EPT and LRT(under the null)

EPTN=500

EPTN=3000

LRTN=500

LRTN=3000

Page 22: Association Analysis of Rare Genetic Variants

Power Comparison significance level=0.00001

Variant proportion

Positive causal 80%

Neutral 20%

Negative Causal0%

Pow

er

Sample size

Pow

er

Sample size

Pow

er

Sample size

Page 23: Association Analysis of Rare Genetic Variants

Power Comparison significance level=0.00001

Variant proportion

Positive causal 60%

Neutral 20%

Negative Causal20%

Pow

er

Sample size

Page 24: Association Analysis of Rare Genetic Variants

Power Comparison significance level=0.00001

Variant proportion

Positive causal 40%

Neutral 20%

Negative Causal40%

Pow

er

Sample size

Page 25: Association Analysis of Rare Genetic Variants

25

Methods For Level 1 Data

•Including covariates

•Extended to quantitative trait

•Better control for population structure

•More sophisticate model

Page 26: Association Analysis of Rare Genetic Variants

26

Collapsing (C) test

Step 1

Step 2

logit(y)=a + b* X + e (logistic regression)

Li and Leal,The American Journal of Human Genetics 2008(83): 311–321

Page 27: Association Analysis of Rare Genetic Variants

27

Variant Collapsing

(+) (+) (.) (.)

Subject V1 V2 V3 V4 Collapsed Trait

1 1 0 0 0 1 1

2 0 1 0 0 1 1

3 0 0 0 0 0 0

4 0 0 0 0 0 0

5 0 0 0 0 0 0

6 0 0 0 0 0 0

7 0 0 1 0 1 0

8 0 0 0 1 1 1

Page 28: Association Analysis of Rare Genetic Variants

28

WSS

Page 29: Association Analysis of Rare Genetic Variants

29

WSS

Page 30: Association Analysis of Rare Genetic Variants

30

WSS

Page 31: Association Analysis of Rare Genetic Variants

31

Weighted Sum Testi

m

ii gws

1

Collapsing test (Li & Leal, 2008), wi =1 and s=1 if s>1

Weighted-sum test (Madsen & Browning ,2009), wi calculated based-on allele freq. in control group

aSum: Adaptive sum test (Han & Pan ,2010), wi = -1 if b<0 and p<0.1, otherwise wj=1

KBAC (Liu and Leal, 2010), wi = left tail p value

RBT (Ionita-Laza et al, 2011), wi = log scaled probability

PWST p-value weighted sum test (Zhang et al., 2011) :, wi = rescaled left tail p value, incorporating both significance and directions

EREC( Lin et al, 2011), wi = estimated effect size

Page 32: Association Analysis of Rare Genetic Variants

32

When there are only causal(+) variants …

(+) (+)Subjec

t V1 V2Collapse

d Trait

1 1 0 1 3.00

2 0 1 1 3.10

3 0 0 0 1.95

4 0 0 0 2.00

5 0 0 0 2.05

6 0 0 0 2.10

Collapsing (Li & Leal,2008) works well, power increased

Page 33: Association Analysis of Rare Genetic Variants

33

(+) (+) (.) (.)

Subject V1 V2 V3 V4Collapse

d Trait1 1 0 0 0 1 3.002 0 1 0 0 1 3.103 0 0 0 0 0 1.954 0 0 0 0 0 2.005 0 0 0 0 0 2.056 0 0 0 0 0 2.107 0 0 1 0 1 2.008 0 0 0 1 1 2.10

When there are causal(+) and non-causal(.) variants …

Collapsing still works, power reduced

Page 34: Association Analysis of Rare Genetic Variants

34

(+) (+) (.) (.) (-) (-)

Subject V1 V2 V3 V4 V5 V6Collaps

ed Trait1 1 0 0 0 0 0 1 3.002 0 1 0 0 0 0 1 3.103 0 0 0 0 0 0 0 1.954 0 0 0 0 0 0 0 2.005 0 0 0 0 0 0 0 2.056 0 0 0 0 0 0 0 2.107 0 0 1 0 0 0 1 2.008 0 0 0 1 0 0 1 2.109 0 0 0 0 1 0 1 0.95

10 0 0 0 0 0 1 1 1.00

When there are causal(+) non-causal(.) and causal (-) variants …

Power of collapsing test significantly down

Page 35: Association Analysis of Rare Genetic Variants

35

P-value Weighted Sum Test (PWST)(+) (+) (.) (.) (-) (-)

Subject V1 V2 V3 V4 V5 V6 Collapsed pSum Trait1 1 0 0 0 0 0 1 0.86 3.002 0 1 0 0 0 0 1 0.90 3.103 0 0 0 0 0 0 0 0.00 1.954 0 0 0 0 0 0 0 0.00 2.005 0 0 0 0 0 0 0 0.00 2.056 0 0 0 0 0 0 0 0.00 2.107 0 0 1 0 0 0 1 -0.02 2.008 0 0 0 1 0 0 1 0.08 2.109 0 0 0 0 1 0 1 -0.90 0.95

10 0 0 0 0 0 1 1 -0.88 1.00t 1.61 1.84 -0.04 0.11 -1.84 -1.72

p(x≤t) 0.93 0.95 0.49 0.54 0.05 0.062*(p-0.5) 0.86 0.90 -0.02 0.08 -0.90 -0.88

Rescaled left-tail p-value [-1,1] is used as weight

Page 36: Association Analysis of Rare Genetic Variants

36

P-value Weighted Sum Test (PWST)

Power of collapsing test is retained

even there are bidirectional effects

Page 37: Association Analysis of Rare Genetic Variants

37

PWST:Q-Q Plots Under the Null

Direct testInflation of type I error

Corrected by permutation test(permutation of phenotype)

Page 38: Association Analysis of Rare Genetic Variants

Generalized Linear Mixed Model (GLMM)

& Weighted Sum Test (WST)

38

Page 39: Association Analysis of Rare Genetic Variants

GLMM & WST

Y : quantitative trait or logit(binary trait)α : interceptβ : regression coefficient of weighted sum m : number of RVs to be collapsed wi : weight of variant igi : genotype (recoded) of variant iΣwigi : weighted sum (WS)X: covariate(s), such as population structure variable(s)τ : fixed effect(s) of XZ: design matrix corresponding to γγ : random polygene effects for individual subjects, ~N(0, G), G=2σ2K, K is the kinship matrix and σ2 the additive ploygene genetic variance ε : residual

ZXgwY i

m

ii

1

39

Page 40: Association Analysis of Rare Genetic Variants

Base on allele frequency, binary(0,1) or continuous, fixed or variable threshold;

Based on function annotation/prediction; SIFT, PolyPhen etc.

Based on sequencing quality (coverage, mapping quality, genotyping quality etc.);

Data-driven, using both genotype and phenotype data, learning weight from data or adaptive selection, permutation test;

Any combination …

Weight

40

i

m

ii gw

1

Page 41: Association Analysis of Rare Genetic Variants

Adjusting relatedness in family data for non-data-driven test of rare variants.

Application 1: Family Data

41

i

m

ii gwY

1

ZgwY i

m

ii

1

γ ~N(0,2σ2K)

Unadjusted:

Adjusted:

Page 42: Association Analysis of Rare Genetic Variants

Q-Q Plots of –log10(P) under the Null

Li & Leal’s collapsing test, ignoring family structure, inflation of type-1 error

Li & Leal’s collapsing test, modeling family structure via GLMM,inflation is corrected

42

(From Zhang et al, 2011, BMC Proc.)

Page 43: Association Analysis of Rare Genetic Variants

Application 2: Permuting Family Data

ZgwY i

m

ii

1

Permuted

Non-permuted, subject IDs fixed

43

MMPT: Mixed Model-based Permutation Test

Adjusting relatedness in family data for data-driven permutation test of rare variants.

γ ~N(0,2σ2K)

Page 44: Association Analysis of Rare Genetic Variants

Q-Q Plots under the Null WSS

SPWSTPWSTaSum

Permutation test, ignoring family structure, inflation of type-1 error

44

(From Zhang et al, 2011, IGES Meeting)

Page 45: Association Analysis of Rare Genetic Variants

Q-Q Plots under the Null WSS

SPWSTPWSTaSum

Mixed model-based permutation test (MMPT), modeling family structure, inflation corrected

(From Zhang et al, 2011, IGES Meeting)

Page 46: Association Analysis of Rare Genetic Variants

Burden Test vs. Non-burden Test

46

Burden test

)0...(0:

...

210

1

ki

k

iii

H

xY

Non-burden test

T-test, Likelihood Ratio Test, F-test, score test, …

SKAT: sequence kernel association test

0:

)(

0

1

H

xwY i

k

ii

Page 47: Association Analysis of Rare Genetic Variants

SKAT: sequence kernel association test

)0...(0: 210

1

ki

k

iii

H

xY

Page 48: Association Analysis of Rare Genetic Variants

Extension of SKAT to Family Data

kinship matrix

Polygenic heritability of the trait Residual

Han Chen et al., 2012, Genetic Epidemiology

Page 49: Association Analysis of Rare Genetic Variants

Other problems

49

Missing genotypes & imputation

Genotyping errors & QC (family consistency,

sequence review)

Population Stratification

Inherited variants and de novo mutation

Family data & linkage infomation

Variant validation and association validation

Public databases

And more …