Top Banner
Linkage analysis: basic principles Manuel Ferreira & Pak Sham Boulder Advanced Course 2005
38

Linkage analysis: basic principles

Feb 22, 2016

Download

Documents

Manju

Linkage analysis: basic principles. Manuel Ferreira & Pak Sham. Boulder Advanced Course 2005. Outline. 1. Aim. 2. The Human Genome. 3. Principles of Linkage Analysis. 4. Parametric Linkage Analysis. 5. Nonparametric Linkage Analysis. 1. Aim. For a heritable trait. Linkage:. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linkage analysis: basic principles

Linkage analysis: basic principlesManuel Ferreira & Pak Sham

Boulder Advanced Course 2005

Page 2: Linkage analysis: basic principles

Outline

1. Aim2. The Human Genome3. Principles of Linkage Analysis4. Parametric Linkage Analysis5. Nonparametric Linkage Analysis

Page 3: Linkage analysis: basic principles

1. Aim

Page 4: Linkage analysis: basic principles

For a heritable trait...

localizes region of the genome where a locus (loci) that regulates the trait is likely to be harboured

identifies a locus that regulates the trait

Linkage:

Association:

Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given trait locus

Population-specific phenomenon: Affected individuals in a population share the same ancestral predisposing DNA segment at a given trait locus

Page 5: Linkage analysis: basic principles

2. Human Genome

Page 6: Linkage analysis: basic principles

A DNA molecule is a linear backbone of alternating sugar residues and phosphate groupsAttached to carbon atom 1’ of each sugar is a nitrogenous base: A, C, G or TTwo DNA molecules are held together in anti-parallel fashion by hydrogen bonds between bases [Watson-Crick rules]Antiparallel double helix

Only one strand is read during gene transcription

Nucleotide: 1 phosphate group + 1 sugar + 1 base

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - GA - TC - GC - GT - AA - TG - CG - CC - GG - CA - TT - AA - TC - GT - AA - TA - TA - T

DNA structure

A gene is a segment of DNA which is transcribed to give a protein or RNA product

Page 7: Linkage analysis: basic principles

C - GA - TA - TT - AG - CC - GT - AT - AT - AG - CT - AA - TC - GG - CA - TC - GA - TC - GA - T (CA)nG - CG - CC - GG - CA - TT - AA - T C - G G - C T - GC - GT - AA - TA - TA - T

DNA polymorphismsRFLPs

A

B

MinisatellitesMicrosatellites>100,000Many alleles, (CA)n, veryinformative, even, easily automatedSNPs 10,054,521 (25 Jan ‘05)Most with 2 alleles (up to 4), not veryinformative, even, easily automated

Page 8: Linkage analysis: basic principles

Haploid gametes

♂ ♁

G1 phase

chr1

chr1

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

S phase

Diploid zygote 1 cell

M phase

Diploid zygote >1

cell

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

♁♂ ♁

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

A -

B -

A -

B -

A -

B -

A -

B -

A -

B -

A -

B -

♂ ♁C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

A -

B -

A -

B -

♂ ♁C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

- A

- B

- A

- B- A

- B

- A

- B

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

DNA organization

Mitosis

22 + 1 2 (22 + 1)

2 (22 + 1)

2 (22 + 1)

Page 9: Linkage analysis: basic principles

Diploid gamete precursor cell

(♂) (♁)

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

(♂)

(♁)

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - GHaploid

gamete precursors Hap. gametes

NR

NR

R

R

A -

B -

- A

- B

A -

B -

- A

- B

A -

B -

- A

- B

A -

B -

- A

- B

♂ ♁C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

A -

B -

A -

B -

- A

- B

- A

- B

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

DNA recombination

Meiosis

2 (22 + 1)

2 (22 + 1)

22 + 122 + 1

chr1 chr1 chr1 chr1

chr1

chr1

chr1

chr1

chr1

chr1

Page 10: Linkage analysis: basic principles

Diploid gamete precursor

(♂) (♁)

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

(♂)

(♁)

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - GHaploid

gamete precursors Hap. gametes

NR

NR

NR

NR

A -B -

- A- B

A -B -

- A- B

A -B -

- A- B

A -B -

- A- B

♂ ♁C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

A -B -

A -B -

- A- B

- A- B

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

C - GA - TA - TT - AG - CC - GT - AT - AT - AA - TA - TG - CC - GG - CA - TT - AG - CT - AA - TC - G

DNA recombination between linked loci

Meiosis2 (22 + 1)

22 + 1

Page 11: Linkage analysis: basic principles

Human Genome - summary

Recombination fraction between loci A and B (θ)Proportion of gametes produced that are recombinant for A and BIf A and B are very far apart: 50%R:50%NR - θ = 0.5If A and B are very close together: <50%R - 0 ≤ θ < 0.5Recombination fraction (θ) can be converted to genetic distance (cM)Haldane: eg. θ=0.17, cM=20.8Kosambi: eg. θ=0.17, cM=17.7

21ln5.0100cM

2121ln25.0100cM

DNA is a linear sequence of nucleotides partitioned into 23 chromosomesTwo copies of each chromosome (2x22 autosomes + XY), frompaternal and maternal origins. During meiosis in gamete precursors,recombination can occur between maternal and paternal homologs

Page 12: Linkage analysis: basic principles

3. Principles of Linkage Analysis

Page 13: Linkage analysis: basic principles

Linkage Analysis requires genetic markers

M1

M2

Mn

M1

M2

Mn

M1M2

Mn

θ 0.5 0.5 .4 .3.15

.3 .4 0.5

Q

θ 0.50.5 .4 .3 .1

.26 .35 0.5.35 .22.3 .4

Page 14: Linkage analysis: basic principles

Linkage Analysis: Parametric vs. Nonparametric

QM

Phe

A

D

C

E

Genetic factors

Environmental factors

Mode of inheritanc

e

Recombination

Correlation

ChromosomeGene

Adapted from Weiss & Terwilliger 2000

Page 15: Linkage analysis: basic principles

4. Parametric Linkage Analysis

Page 16: Linkage analysis: basic principles

Linkage with informative phase known meiosis

M2M5Q2Q2 M1M6Q1Q?

M1Q1/M2Q2 M3M4Q2Q2

M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2

Chromosome

M1..6 Q1,2Autosomal dominant, Q1 predisposing

allele

Gene

♁♂

NR: M1Q1 NR: M2Q2

R: M1Q2R: M2Q1

θMQ = 1/6 = 0.17

InformativePhase known

(~20.8 cM)

M1M2Q1Q2

M1 Q1

M2 Q2

Page 17: Linkage analysis: basic principles

M1M2Q1Q2 M3M4Q2Q2

NR: M1Q1 NR: M2Q2

R: M1Q2R: M2Q1

Q2Q2 Q1Q?

P 1-θ 1-θ

θ

θ

M1Q1/M2Q2

R: M1Q1 R: M2Q2

NR: M1Q2NR: M2Q1

P θ θ

1-θ

1-θ

M1Q2/M2Q1N 3 2

0 1

N 3 2

0 1

|XL 51 121 15 1

21 +

5.0|XL 51 5.015.021

15 5.015.021

+ 65.0

InformativePhase unknown

Linkage with informative phase unknown meiosis

M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2

Page 18: Linkage analysis: basic principles

0.1 0.2 0.3 0.4 0.5

LOD

sco

re-5

-4

-3

-2

-1

0

1

2

3

θ

Parametric LOD score calculation

)5.0|()|(log10

XL

XLLOD)5.0|(

)|(

XL

XLOD

n

i i

i

XLXLLOD

110 )5.0|(

)|(log

n

i i

i

XLXLOD

1 )5.0|()|(

n

ii

n

i i

i LODXL

XLLOD11

10 )5.0|()|(log

Overall LOD score for a given θ is the sum of all family LOD scores at θeg. LOD=3 for θ=0.28

6

1551

10 5.0

1211

21

log

LOD

Page 19: Linkage analysis: basic principles

M1

M2

Mn

θ 0.5 0.5 .4 .3.1

.3 .4 0.5

Q

For each marker, estimate the θ that yields highest LOD score across all families

Markers with a significant parametric LOD score (>3) are said to be linkedto the trait locus with recombination fraction θ

This θ (and the LOD) will depend upon the mode of inheritance assumedMOI determines the genotype at the trait locus Q and thus determines thenumber of meiosis which are recombinant or nonrecombinant. Limited to Mendelian diseases.

Parametric Linkage Analysis - summary

Page 20: Linkage analysis: basic principles

M1M2Q1Q1 M3M4Q1Q2

M2M3Q1Q1 M1M4Q1Q2 M1M4Q1Q1 M2M4Q1Q2

NR: M3Q1 NR: M4Q2

R: M3Q2R: M4Q1

Q1Q1 Q2Q?

P 1-θ 1-θ

θ

θ

M3Q1/M4Q2

R: M3Q1 R: M4Q2

NR: M3Q2NR: M4Q1

P θ θ

1-θ

1-θ

M3Q2/M4Q1N 1 2

0 1

N 1 2

0 1

|XL 31 121 13 1

21 +

5.0|XL 31 5.015.021

13 5.015.021

+ 45.0

Practical

1. Identify informative individual(s)2. Reconstruct possible phase(s)3. Classify gametes as R or NR4. Count R and NR gametes5. Express |XL 5.0| XL

6. Express LOD score )(f

6

1551

10 5.0

1211

21

log

LOD

Page 21: Linkage analysis: basic principles

Practical II

6

1551

10 5.0

1211

21

log

LOD

Talk example

4

1331

10 5.0

1211

21

log

LOD

Practical example

Graph each…

Page 22: Linkage analysis: basic principles

Outline

1. Aim2. The Human Genome3. Principles of Linkage Analysis4. Parametric Linkage Analysis5. Nonparametric Linkage Analysis

Page 23: Linkage analysis: basic principles

5. Nonparametric Linkage Analysis

Page 24: Linkage analysis: basic principles

Approach

Parametric: genotype marker locus & genotype trait locus(latter inferred from phenotype according to a specific disease model) Parameter of interest: θ between marker and trait lociNonparametric: genotype marker locus & phenotypeIf a trait locus truly regulates the expression of a phenotype, then tworelatives with similar phenotypes should have similar genotypes at amarker in the vicinity of the trait locus, and vice-versa.Interest: correlation between phenotypic similarity and marker genotypicsimilarity

No need to specify mode of inheritance, allele frequencies, etc...

Page 25: Linkage analysis: basic principles

Phenotypic similarity between relatives

Squared trait differencesSquared trait sumsTrait cross-product

221 XX

221 XX

21 XX

Trait variance-covariance matrix

221

211

XVarXXCovXXCovXVar

Affection concordance

T2

T1

Page 26: Linkage analysis: basic principles

Genotypic similarity between relativesIBS Alleles shared Identical By State “look the same”, may have the

same DNA sequence but they are not necessarily derived from a known common ancestorIBD Alleles shared

Identical By Descent are a copy of the same

ancestor allele

M1Q1

M2Q2

M3Q3

M3Q4

M1Q1

M3Q3

M1Q1

M3Q4

M1Q1

M2Q2

M3Q3

M3Q4

IBS IBD2 1

Inheritance vector (M)

0 0 0 1 1

Page 27: Linkage analysis: basic principles

Genotypic similarity between relatives -

M1Q1

M3Q3

M2Q2

M3Q4

Number of alleles IBD

0

M1Q1

M3Q3

M1Q1

M3Q4

1

M1Q1

M3Q3

M1Q1

M3Q3

2

Proportion of alleles IBD -

0

0.5

1

Inheritance vector (M)

0 0 1 1

0 0 0 1

0 0 0 0

Page 28: Linkage analysis: basic principles

Genotypic similarity between relatives -

21

210 222

21

20ˆ

x0/x1 x0/x1

x0/x0x0/x0x0/x0x0/x0x0/x1

x0/x1x0/x1x0/x1x1/x0x1/x0x1/x0x1/x0

x1/x1x1/x1x1/x1

x1/x1

x0/x0x0/x1

x1/x0x1/x1

x0/x0

x0/x1x1/x0x1/x1x0/x0x0/x1x1/x0

x1/x1

x0/x0x0/x1x1/x0

x1/x1

Inheritance vector

0000000100100011010001010110011110001001101010111100110111101111

Prior probability

1/161/161/161/161/161/161/161/161/161/161/161/161/161/161/161/16

IBD

2110120110210112

A1/A3 A1/A2

Posterior probability

01/400

1/4000000

1/400

1/40

A1A3 A1A2

Posterior probability

01/60

1/121/60

1/1200

1/120

1/61/12

01/60

1 2

3 4

A1A2 A3A2

A1/A2

Posterior probability

A1/A3

A1/A2 A3/A2

0100000000000000

P (IBD=0)P (IBD=1)P (IBD=2)

1/41/21/4

1/32/30

010

010

22n

Page 29: Linkage analysis: basic principles

Statistics that incorporate both phenotypic and genotypic similarities

Genotypic similarity ( )

Phen

otyp

ic s

imila

rity

0 0.5 1

Page 30: Linkage analysis: basic principles

Haseman-Elston regression – Quantitative traits

221 XX

|221 XXE

|2 2121 XXCovXVarXVar

|2 2122

21 XXXXE

ECAQ VVVVXVarXVar 21

CAQ VlVVXXCov 2ˆˆ|, 21

EAQQ VVVVXXE 22ˆ2ˆ|221

Phenotypic dissimilarity

Genotypic similarity

b ×= + c

0 0.5 1

X1 X2 (X1-X2)2

1 2.2 2.1 0.01 0.92 1.9 2.3 0.16 0.63 2.3 2.6 0.09 0.74 3.4 1.6 3.24 0.15 2.5 2.3 0.04 0.8

…1000 2.4 2.4 0 0.9

Page 31: Linkage analysis: basic principles

VC ML – Quantitative & Categorical traits method

0 0.5 1

21, XXCov

H1: CAQ VlVVXXCov 2ˆˆ|, 21

H0: |, 21 XXCov )()(log

0

110 HL

HLLOD CA VlV 2

e.g. LOD=3

Page 32: Linkage analysis: basic principles

Individual LOD scores can be expressed as P values (Pointwise)LOD Chi-sq (n-df) P value2.1 9.67 0.0009

Genome-wide linkage analysis (e.g. VC)

(x4.6)

Page 33: Linkage analysis: basic principles

Statistics for selected samples

T2

T1

H0 (No linkage): Mean 5.0ˆ H1 (Linkage): Mean 5.0ˆ

H0 (No linkage): Mean H1 (Linkage): Mean

5.0ˆ 5.0ˆ

Mean IBD sharing statistics(Risch & Zhang 1995, 1996)

Page 34: Linkage analysis: basic principles

Other Linkage statistics

Dependent variable: Phenotypes Independent variable:

Dependent variable: Independent variable: Phenotypes

Extensions to Haseman Elston

VC ML with mixture distribution

Pedwide-regression Analysis (“reverse HE”)Reverse VC ML

(Wright 1997, Drigalenko 1998, Elston et al. 2000, Forrest 2001, Visscher & Hopper 2001, Xu et al. 2000,

Sham & Purcell 2001)(Eaves et al. 1996)

(Sham et al. 2002)

(Sham et al. 2000)

Statistics for affection traitsBased on IBD scoring functions eg. Sall(Whittemore & Halpern 1994, Kong & Cox 1997)

Forrest & Feingold 2000 Mixed statistic

Page 35: Linkage analysis: basic principles

No need to specify mode of inheritance

Nonparametric Linkage Analysis - summary

Models phenotypic and genotypic similarity of relativesExpression of phenotypic similarity, calculation of IBD

HE and VC are the most popular statistics used for linkage of quantitative traitsOther statistics available, specially for affection traits

Type I error?Power?

Page 36: Linkage analysis: basic principles

Type I error

Type I error

True positive

LOD k Theoretical (Lander & Kruglyak

1995)Empirical

Page 37: Linkage analysis: basic principles

Theoretical genome-wide thresholds

Genome-wide threshold for suggestive linkageLOD score that occurs by chance alone on average once per scanLOD = 2.2, Chi-sq = 10.1, Pointwise P = 0.00074

Genome-wide threshold for significant linkageLOD score that occurs by chance alone on average once per 20 scansLOD = 3.6, Chi-sq = 16.7, Pointwise P = 0.000022

Page 38: Linkage analysis: basic principles

Empirical genome-wide thresholds

Genome-wide threshold for suggestive linkageLOD score that occurs by chance alone on average once per scan

Genome-wide threshold for significant linkageLOD score that occurs by chance alone on average once per 20 scans