Top Banner
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston
50

Power of linkage analysis

Jan 23, 2016

Download

Documents

Celina Nitu

Power of linkage analysis. Manuel AR Ferreira. Massachusetts General Hospital. Harvard Medical School. Boston. Egmond, 2006. Outline. 1. Aim. 2. Statistical power. 3. Estimate the power of linkage analysis. Analytically. Empirically. 4. Improve the power of linkage analysis. 1. Aim. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Power of linkage analysis

Power of linkage analysis

Egmond, 2006

Manuel AR Ferreira

Massachusetts General HospitalHarvard Medical School

Boston

Page 2: Power of linkage analysis

Outline

1. Aim

2. Statistical power

3. Estimate the power of linkage analysis

4. Improve the power of linkage analysis

Analytically

Empirically

Page 3: Power of linkage analysis

1. Aim

Page 4: Power of linkage analysis

1. Know what type-I error and power are

2. Know that you can/should estimate the power of your linkage analysis (analytically or empirically)

3. Be aware that there are MANY factors that increase type-I error and decrease the power of linkage

4. Show how to clean your data to detect presence and minimize the impact of these factors

Page 5: Power of linkage analysis

2. Statistical power

Page 6: Power of linkage analysis

Type-1 error

H0 is true

α

In reality…

Type-2 error

β1 - α

Power

1 - β

H0: Person A is not guilty

H1: Person A is guilty – send him to jail

H1 is true

H0 is true

H1 is true

We d

ecid

e…

Page 7: Power of linkage analysis

xx xxxxx

x xxx

xxxxxx

xxxxx x

xxx

xx xxxxx

x xxx

xxxxxx

xxxxx x

xxx

H0: There is NO linkage between a marker and a trait

H1: There is linkage between a marker and a trait

Linkage test statistic has different distributions under H0 and H1

xx

Page 8: Power of linkage analysis

Where should I set the threshold to determine significance?

x

Threshold Power (1 – β)

Type-1 error (α)

To low High High

I decide H1 is true (Linkage)I decide H0 is true

Page 9: Power of linkage analysis

Where should I set the threshold to determine significance?

x

Threshold Power (1 – β)

Type-1 error (α)

To low High High

To high

Low Low

I decide H1 is true

I decide H0 is true

Page 10: Power of linkage analysis

How do I maximise Power while minimising Type-1 error rate?

x

I decide H1 is true

I decide H0 is true

Power (1 – β)

Type-1 error (α)

1. Set a high threshold for significance (i.e. results in low α [e.g. 0.05-0.00002])

2. Try to shift the distribution of the linkage test statistic when H1 is true as far as possible from the distribution when H0 is

true.

Page 11: Power of linkage analysis

Non-centrality parameter

H0 H1

NCP

Mean (μ)

Variance (σ2)

Central Χ2

df

2*(df)

Non-central Χ2

df + NCP

2*(df) + 4*NCP

These distributions ARE NOT chi-sq with 1df!! Just for illustration.. Switch to R to see what they really look like..

Page 12: Power of linkage analysis

…R

Page 13: Power of linkage analysis

H0 H1

NCP

Small NCP Big overlap between H0 and H1 distributions

Lower power

Large NCP Small overlap between H0 and H1 distributions

Greater power

Page 14: Power of linkage analysis

Short practical on GPCGenetic Power Calculator is an online resource for carrying out basic power calculations.

For our 1st example we will use the probability function calculator to play with power

http://pngu.mgh.harvard.edu/~purcell/gpc/

Page 15: Power of linkage analysis

1. Go to: ‘http://pngu.mgh.harvard.edu/~purcell/gpc/’Click the ‘Probability Function Calculator’ tab.

2. We’ll focus on the first 3 input lines. These refer to the chi-sq distribution that we’re interested in right now.

Using the Probability Function Calculator of the GPC

NCP

Degrees of freedom

of your test. E.g. 1df for univariate linkage (ignoring for now that it’s a mixture distribution)

Page 16: Power of linkage analysis

1. Let’s start with a simple exercise.

Determine the critical value (X) of a chi-square distribution with 1 df and NCP = 0, such that P(X>x) = 0.05.

Exercises

df = 1

NCP = 0

P(X>x) = 0.05

X = ?

Determine the P(X>x) for a chi-square distribution with 1 df and NCP = 0 and X = 3.84.

df = 1

NCP = 0

P(X>x) = ?

X = 3.84

Page 17: Power of linkage analysis

2. Find the power when the NCP of the test is 5, degrees of freedom=1, and the critical X is 3.84.

Exercises

df = 1

NCP = 5

P(X>x) = ?

X = 3.84

What if the NCP = 10?

df = 1

NCP = 10

P(X>x) = ?

X = 3.84

NCP

3.84

NCP = 5

NCP = 10

3.84

Page 18: Power of linkage analysis

3. Find the required NCP to obtain a power of 0.8, for degrees of freedom=1 and critical X = 3.84.

Exercises

df = 1

NCP = ?

P(X>x) = 0.8

X = 3.84

What if the X = 13.8?

df = 1

NCP = ?

P(X>x) = 0.8

X = 13.8

NCP

3.84

NCP = ? = 0.8

NCP

13.8

NCP = ? = 0.8

Page 19: Power of linkage analysis

2. Estimate power linkage

analysis

Page 20: Power of linkage analysis

Why is it important to estimate power?

To determine whether the study you’re designing/analysing can in fact localise the QTL you’re looking for.

You’ll need to do it for most grant applications.

When and how should I estimate power?

Study design stage

Analysis stage

How?

Theoretically, empirically

Empirically

When?

Page 21: Power of linkage analysis

Theoretical power estimation

NCP determines the power to detect linkage

NCP = μ(H1 is true) - df

H0 H1

NCP

If we can predict what the NCP of the test will be, we can estimate the power of the test

Page 22: Power of linkage analysis

4. Marker informativeness (i.e. Var(π) and Var(z))

Theoretical power estimation

Variance Components linkage analysis (and some HE extensions)

zCovVVzVarVVarVr

rssNCP DADA ,ˆˆ

1

1

2

1 2222

2

1. The number of sibs in the sibship (s)

2. Residual sib correlation (r)

3. Squared variance due to the additive QTL component

(VA)

5. Squared variance due to the dominance QTL

component (VD).

^

Sham et al. 2000 AJHG 66: 1616

Page 23: Power of linkage analysis

Another short practical on GPC

The idea is to see how genetic parameters and the study design influence the NCP – and so the power – of linkage analysis

Page 24: Power of linkage analysis

1. Go to: ‘http://pngu.mgh.harvard.edu/~purcell/gpc/’Click the ‘VC QTL linkage for sibships’ tab.

Using the ‘VC QTL linkage for sibships’ of the GPC

Page 25: Power of linkage analysis

1. Let’s estimate the power of linkage for the following parameters:

Exercises

QTL additive variance: 0.2

QTL dominance variance: 0

Residual shared variance: 0.4

Residual nonshared variance: 0.4

Recombination fraction: 0

Sample Size: 200

Sibship Size: 2

User-defined type I error rate: 0.05

User-defined power: determine N : 0.8

Power = 0.36 (alpha = 0.05)Sample size for 80% power = 681 families

Page 26: Power of linkage analysis

2. We can now assess the impact of varying the QTL heritability

Exercises

QTL additive variance: 0.4

QTL dominance variance: 0

Residual shared variance: 0.4

Residual nonshared variance: 0.4

Recombination fraction: 0

Sample Size: 200

Sibship Size: 2

User-defined type I error rate: 0.05

User-defined power: determine N : 0.8

Power = 0.73 (alpha = 0.05)Sample size for 80% power = 237 families

Page 27: Power of linkage analysis

3. … the residual shared variance

Exercises

QTL additive variance: 0.2

QTL dominance variance: 0

Residual shared variance: 0.2

Residual nonshared variance: 0.6

Recombination fraction: 0

Sample Size: 200

Sibship Size: 2

User-defined type I error rate: 0.05

User-defined power: determine N : 0.8

Power = 0.26 (alpha = 0.05)Sample size for 80% power = 1161 families

Page 28: Power of linkage analysis

4. … the sample size

Exercises

QTL additive variance: 0.2

QTL dominance variance: 0

Residual shared variance: 0.4

Residual nonshared variance: 0.2

Recombination fraction: 0

Sample Size: 400

Sibship Size: 2

User-defined type I error rate: 0.05

User-defined power: determine N : 0.8

Power = 0.94 (alpha = 0.05)Sample size for 80% power = 294 families

Page 29: Power of linkage analysis

4. … the sibship size

Exercises

QTL additive variance: 0.2

QTL dominance variance: 0

Residual shared variance: 0.4

Residual nonshared variance: 0.2

Recombination fraction: 0

Sample Size: 200

Sibship Size: 3

User-defined type I error rate: 0.05

User-defined power: determine N : 0.8

Power = 0.99 (alpha = 0.05)Sample size for 80% power = 78 families

Page 30: Power of linkage analysis

Theoretical power estimation

Advantages: Fast, GPC

Disadvantages: Approximation, may not fit well

individual study designs, particularly if one needs to

consider more complex pedigrees, missing data,

ascertainment strategies, etc…

Page 31: Power of linkage analysis

Empirical power estimation

Mx: simulate covariance matrices for 3 groups (IBD 0, 1 and 2 pairs) according to an FQE model (i.e. with VQ > 0) and then fit the wrong model (FE). The resulting test statistic (minus 1df) corresponds to the NCP of the test.

See powerFEQ.mx script.

Still has many of the disadvantages of the theoretical approach, but is a useful framework for general power estimations.

Simulate data: generate a dataset with a simulated phenotype and a marker that explains a proportion of the phenotypic variance. Test the marker for linkage with the phenotype. Repeat this N times. For a given α, Power = proportion of replicates with a P-value < α (e.g. < 0.05).

Page 32: Power of linkage analysis

3. Improve power of linkage

analysis

Page 33: Power of linkage analysis

Factors that influence type-1 error/power

linkage1. Selective sampling

2. Sample size

QTL heritability

Disease prevalence

Residual correlation

Sibship size

3. Deviations in trait distribution

4. Outliers

5. Pedigree errors

6. Genotyping errors

7. Marker informativeness

8. Marker density

9. Genetic map

Page 34: Power of linkage analysis

Pedigree errors

Definition. When the self-reported familial relationship for a given pair of individuals differs from the real relationship (determined from genotyping data). Similar for gender mix-ups.

Impact on linkage analysis. Increase type-1 error rate (can also decrease power)

Detection. Can be detected using genome-wide patterns of allele sharing. Some errors are easy to detect. Software: GRR.

Correction. If problem cannot be resolved, delete problematic individuals (family)

Boehnke and Cox (1997), AJHG 61:423-429; Broman and Weber (1998), AJHG 63:1563-4; McPeek and Sun (2000), AJHG 66:1076-94; Epstein et al. (2000), AJHG 67:1219-31.

Page 35: Power of linkage analysis

Pedigree errors *Impact on

linkage*

• CSGA (1997) A genome-wide search for asthma susceptibility loci in ethnically diverse populations. Nat Genet 15:389-92

• ~15 families with wrong relationships

• No significant evidence for linkage

• Error checking is essential!

Page 36: Power of linkage analysis
Page 37: Power of linkage analysis

http://www.sph.umich.edu/csg/abecasisGRR

Pedigree errors

*Detection/Correction*

Page 38: Power of linkage analysis

Practical

Aim:

Identify pedigree errors with GRR

1. Go to: ‘Egmondserver\share\Programs’Copy entire ‘GRR’ folder into your desktop.

2. Go into the ‘GRR’ folder in your desktop, and run the GRR.exe file.

3. Press the ‘Load’ button, and navigate into the same ‘GRR’ folder on the desktop. Select the file ‘sample.ped’ and press ‘Open’. Note that all sibpairs in ‘sample.ped’ were reported to be fullsibs or half-sibs.

I’ll identify one error. Can you identify the other two?

Page 39: Power of linkage analysis

Genotyping errors

Definition. When the observed genotype at a given locus does not match the true genotype at that locus.

Unavoidable (assay quality, genotyping platform); becoming much lower with most recent genotyping technologies (chip arrays).

Impact on linkage analysis. Can substantially decrease power: e.g. 1% genotyping error can result in ~10-50% loss of power for linkage. Can also increase type-1 error rate.

Detection. Look at: assay failure rate (e.g. 20%), number of Mendelian errors, number of genotypes that imply unlikely recombination events. Can be hard to detect (SNPs)!

Correction. (1) Re-type problematic markers/individuals; (2) Remove the problematic genotypes; (3) leave errors in, but model them appropriately.

Page 40: Power of linkage analysis

Genotyping errors *Impact on

linkage*

-4

-3

-2

-1

0

1

2

3

4

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85

Ave

rag

e L

OD

Successive lines for 0, 0.5, 1, 2 and 5% error.

Page 41: Power of linkage analysis

Genotyping errors

*Detection/Correction*Detection

Correction

(1) Assay failure rate

(2) Mendelian errors (e.g. SIBPAIR)

(3) Genotypes that imply unlikely recombination events

(1) Re-type problematic markers/individuals

(2) Remove the problematic genotypes

(3) Leave errors in, but model them appropriately.

Page 42: Power of linkage analysis

Genotyping errors

*Detection/Correction*

http://www.sph.umich.edu/csg/abecasisMERLIN

Page 43: Power of linkage analysis

Genotyping errors

*Detection/Correction*

• Genotype errors can change inferences about gene flow– May introduce additional

recombinants

• Likelihood sensitivity analysis– How much impact does each

genotype have on likelihood of overall data

2 2 2 22 1 2 12 2 2 22 1 2 11 2 1 22 2 2 21 1 2 22 1 2 11 1 1 11 2 1 22 1 2 11 2 1 21 1 1 1

Page 44: Power of linkage analysis

MERLIN demoDetect and correct genotyping errors

1. Use input files:

2. I’ll first run the program pedstats to have a look at these files first.

pedstats –d error.dat –p error.ped

You can download these from the MERLIN website error.dat error.ped error.map

There are 20 markers and 1 affection trait for 200 families with 4 individuals each.

Page 45: Power of linkage analysis

3. I’ll then test the trait for linkage with each of the 20 markers using MERLIN, using the ‘--npl’ option for the linkage test.

Note this is done before detecting/correcting genotyping errors!

merlin –d error.dat –p error.ped –m error.map --npl

So before correcting any errors, we get a maximum LOD score

of 1.69 at position 52.680 cM

Page 46: Power of linkage analysis

4. But first we should have looked for genotyping errors. Let’s do that using the ‘--error’ option.

merlin –d error.dat –p error.ped –m error.map --error

MERLIN flagged 7 pairs of

unlikely genotypes.

There’s a very good

chance that these

resulted from

genotyping errors!

Page 47: Power of linkage analysis

5. Let’s delete these unlikely genotypes using pedwipe

pedwipe –d error.dat –p error.ped

pedwipe will read the error.dat and error.ped files, and delete the

genotypes that were stored in file merlin.err produced in the

previous step.

We get 2 new files, wiped.dat and wiped.ped, that do not have those

genotyping errors.

Page 48: Power of linkage analysis

6. We can now check whether we get any improvement in the LOD score after removing those genotyping errors.

merlin –d wiped.dat –p wiped.ped –m error.map --npl

After deleting those 7 pairs of genotyping errors, the LOD score

at position 52.680 cM increases from 1.69 to 2.10, ~24%!

Page 49: Power of linkage analysis

7. But instead of identifying and removing errors, MERLIN will soon allow the user to leave the genotyping errors in and model them appropriately. This is the ‘--fit’ function. Have a look at the MERLIN documentation for more info.

Page 50: Power of linkage analysis

Summmary

1. Statistical power

2. Estimate the power of linkage analysis

3. Improve the power of linkage analysis