Top Banner
Linkage Linkage Disequilibrium Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03
20

Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Linkage DisequilibriumLinkage Disequilibrium

Granovsky Ilana and Berliner Yaniv

Computational Genetics

19.06.03

Page 2: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

What is Linkage Disequilibrium?What is Linkage Disequilibrium?• When the occurrence of pairs of specific

alleles at different loci on the same haplotype is not independent, the deviation form independence is termed linkage disequilibrium

• In general, linkage disequilibrium is usually seen as an association between one specific allele at one locus and another specific allele at a second locus

Page 3: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

LinkageLinkage Disequilibrium Coefficient Disequilibrium Coefficient DefinitionsDefinitions

Marker 2

Marker1

Allele1

(probability = p2)

Allele2

(probability = 1-p2)

Allele1

(probability = p1)

X1

p1*p2+D11

X2

p1*(1-p2)-D11

Allele2

(probability = 1-p1)

X3

(1-p1)*p2-D11

X4

(1-p1)*(1-p2)+D11

•Xi-number of observations in cell i (X1+X2+X3+X4)=n

•D11-coefficient of gametic linkage disequilibrium

between allele 1 at locus 1 and allele 1 at locus 2

D11=E[X1X4-X2X3|n=1]

Page 4: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Population-based sampling and the Population-based sampling and the EH programEH program

• We wish to test the absence of disequilibrium between allele A at locus 1 and allele B at locus 2 (DAB=0)

• The sample of individuals we have consist of genotyping data with no possibility to fully distinguish all of the haplotypes in each individual

Page 5: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Table of all possible two-locus Table of all possible two-locus genotypesgenotypes

Locus2

Locus 2

AA Aa aa

BB k1 k2 k3

Bb k4 k5 k6

bb k7 k8 k9

In cell 5 there can be either of two phases, AB/ab or Ab/aB

Page 6: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Analysis of likelihoodAnalysis of likelihood

• We maximize the log likelihood of the data observed:

• For cell 1: p1=[P(A B)] • For cell 4: p4=2P(A B)P(A b)• For cell 5: p5=P(A B/a b)+P(A b/a B) =

=2P(A B)P(a b)+2P(A b)P(a B)

1 2

1

ln[ ( )] ln( )a a

i ii

L data pk

2

2

Page 7: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Table of probabilities in each cellTable of probabilities in each cell

Locus 1

Locus 2

AA Aa aa

BB p(A B) 2p(A B)p(a B) P(a B)

Bb 2p(A B)p(A b)

2P(A B)P(a b)+

+2P(A b)P(a B)

2p(a B)p(a b)

bb P(A b) 2p(A b)p(a b) P(a b)

2

2

2

2

Page 8: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Analysis of likelihoodAnalysis of likelihood

• We maximize the likelihood above over the possible haplotype frequencies (p(A), p(B) and DAB.

• This likelihood is then compared with the maximum likelihood when DAB is set equal to 0 (absence of linkage disequilibrium)

Page 9: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

ExampleExample Locus 1

Locus 2

AA Aa aa

BB K1=10 K2 = 10 K3=3

Bb K4=15 K5=50 K6=13

bb K7=5 K8=13 K9=10

A a

B 45 29

b 38 46

A a

B 0.28 0.18

b 0.24 0.29

*When censoring k5 all the haplotypes can be uniquely determined

Page 10: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Example cont.Example cont.

• P(A) = 0.28+0.24 = 0.525

• P(B) = 0.28+0.18 = 0.468• DAB = p(A B) –p(A)p(B) = 0.28 – 0.525*0.468

= 0.0387

* Biased example due to the elimination of the 50 observations in k5.

Page 11: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

EH program input file formatEH program input file format

• EH = estimated haplotype.– Input file EH.dat

Line 1: Number of alleles at each of the two loci

Line 2: k1 k4 k7

Line 3: k2 k5 k8

Line 4: k3 k6 k9

Page 12: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

EH program output fileEH program output file• Output – Estimates of Gene Frequencies

(including k5)

AlleleLocus

1 2

1 0.515 0.484

2 0.480 0.519

# of typed Individuals: 129

Page 13: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

EH program output fileEH program output file

Allele at locus 1

Allele at locus 2

Haplotype frequencyIndependent w/association

1 1 0.248 0.328

1 2 0.268 0.188

2 1 0.232 0.153

2 2 0.252 0.332

Page 14: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Chi square testChi square test

df Ln(L) Chi-square

H0: No association 2 -252.68 0.00

H1: Allelic association allowed

3 -248.23 8.89

•The difference between the 2 chi-square is 8.89

• The P-value associated with chi-square (with 1 df) is 0.002873

• It is clear the k5 contributes siginificant information

Page 15: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Haplotype frequencies

Without k5 With k5Haplotype Indepe

ndentassociate Indepe

ndentassociate

A B 0.246 0.284 0.247 0.327

A b 0.279 0.24 0.267 0.187

a B 0.222 0.183 0.232 0.152

a B 0.252 0.291 0.251 0.331

p(A) 0.525 0.515

p(B) 0.468 0.48

Dab 0.038 0.079

SummarySummary

Page 16: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Multiallelic genotype information in EH Multiallelic genotype information in EH programprogram

Locus 2Locus 1 1/1 1/2 2/2 1/3 2/3 3/3

1/1 a1 b1 c1 d1 e1 f1

1/2 a2 b2 c2 d2 e2 f2

2/2 a3 b3 c3 d3 e3 f3

1/3 a4 b4 c4 d4 e4 f4

2/3 a5 b5 c5 d5 e5 f5

3/3 a6 b6 c6 d6 e6 f6

Line 1: Number of alleles at each locus

Subsequent lines:

Page 17: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Multilocus genotype dataMultilocus genotype data

Locus 3

Locus 1 Locus 2 1/1 1/2 2/2

1/1 1/1 a1 b1 c1

1/2 a2 b2 c2

2/2 a3 b3 c3

1/2 1/1 a4 b4 c4

1/2 a5 b5 c5

2/2 a6 b6 c6

2/2 1/1 a7 b7 c7

1/2 a8 b8 c8

2/2 a9 b9 c9

Page 18: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Ex. 23Ex. 23• Full data Solution file: • Censored data solution file.

Censored data

1/1 haplotype data

Locus 2Locus 1

1/1 1/2 1/3 1/4 2/2 2/3 2/4 3/3 3/4 4/4

1/1 10 5 6 4 1 2 3 1 2 0

1/2 6 3 3 3 1 2 1 1 2 1

2/2 12 9 8 11 3 2 5 1 0 3

1/3 1 2 2 1 1 1 1 0 4 2

2/3 0 2 2 8 2 2 9 3 6 8

3/3 8 6 4 10 3 3 8 5 9 13

Page 19: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

Haplotypes from censored genotype dataHaplotypes from censored genotype data

Allele at locus 2

Allele at locus 1 1 2 3 4

1 42 14 13 12

2 58 25 16 31

3 37 26 29 63

Allele at locus 2

Allele at locus 1

1 2 3 4

1 0.11 0.038 0.035 0.032

2 0.158 0.068 0.044 0.085

3 0.10 0.07 0.079 0.172

Page 20: Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.

!!!!!!תודה רבהתודה רבה