Top Banner
STAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011
109

STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Jun 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

STAT100, Module 3:Statistics in genetics

Dr. Alexandre Bouchard-CôtéFeb 8, 2011

Page 2: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Next topics

1. Review: mendelian inheritance & parentage testing

2. Hardy-Weinberg principle

Page 3: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Review

Page 4: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker questionSuppose A and a are two alleles on chromosome 1; and B and b are two alleles on chromosome 2.

Given the following parental profiles, what it the probability that their child has the AaBB profile?

Father: AaBbMother: AaBB a) 1/3

b) 1/4

c) 1/2

d) 0

Page 5: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Review: locus and genotype

Alleles (version) at locus 1(for example, a SNP)

A a

Locus: address in the genome where there is a variation hot spot between individuals(Plural: loci)

Example of a locus: 6p21.3

Page 6: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Review: locus and genotype

Alleles (version) at locus 1(for example, a SNP)

A a

Locus: address in the genome where there is a variation hot spot between individuals(Plural: loci)

Example of a locus: 6p21.3

Genotype: the unordered combination of the allele from mother and father

Notation:Aa

Page 7: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Genotype that we can distinguish (one locus)

or oraa AaAA

Page 8: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Review: profileProfile: the genotype at each of the locus

Alleles at locus 1

Alleles at locus 2

A a

B b

Notation:AaBB

or: AaBB

Page 9: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AaBB

AaBb

AaBB

F M

C

Given the following parental profiles, what it the probability that their child has the AaBB profile?

Father: AaBbMother: AaBB P(C | F, M)

Page 10: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AaBB

AaBb

AaBB

F M

C

P(C | F, M)

3 steps: 1) do the computation for locus 1

Page 11: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AaBB

AaBb

AaBB

F M

C

P(C | F, M)

3 steps: 1) do the computation for locus 12) do the computation for locus 2

Page 12: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AaBB

AaBb

AaBB

F M

C

P(C | F, M)

3 steps: 1) do the computation for locus 12) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

Page 13: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AaBB

AaBb

AaBB

F M

C

P(C | F, M)

3 steps: 1) do the computation for locus 12) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

Page 14: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

3 steps: 1) do the computation for locus 1

Aa

F

C ProbabilityAA 1/4aa 1/4Aa 1/2

Aa

M

Possibilities:

Both parents have both alleles

aa

F

C ProbabilityAA 0aa 1Aa 0

aa

M

Both parents have only one alleles

Aa

F

C ProbabilityAA 0aa 1/2Aa 1/2

aa

M

On parent has both, one has only one

Page 15: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

3 steps: 1) do the computation for locus 1

C ProbabilityAA 1/4aa 1/4

Aa 1/2

C ProbabilityAA 0aa 1Aa 0

C ProbabilityAA 0aa 1/2Aa 1/2

Aa F AaM

Both parents have both alleles

aa F aaM

Both parents have only one alleles

Aa F aaM

On parent has both, one has only one

Father: AaBbMother: AaBBChild: AaBB

=> 1/2

Page 16: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

3 steps: 1) do the computation for locus 1

C ProbabilityAA 1/4aa 1/4Aa 1/2

C ProbabilityAA 0aa 1Aa 0

C ProbabilityAA 0aa 1/2Aa 1/2

Aa F AaM

Both parents have both alleles

aa F aaM

Both parents have only one alleles

Aa F aaM

On parent has both, one has only one

Father: AaBbMother: AaBBChild: AaBB

=> 1/2 x 1/2

2) do the computation for locus 2

Page 17: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

3 steps: 1) do the computation for locus 1

C ProbabilityAA 1/4aa 1/4Aa 1/2

C ProbabilityAA 0aa 1Aa 0

C ProbabilityAA 0aa 1/2Aa 1/2

Aa F AaM

Both parents have both alleles

aa F aaM

Both parents have only one alleles

Aa F aaM

On parent has both, one has only one

Father: AaBbMother: AaBBChild: AaBB

=> 1/2 x 1/2

2) do the computation for locus 2

Questions?

Page 18: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for creditsSuppose A and a; B and b; and C and c are pairs of alleles located on different chromosomes.

Given the following parental profiles, what it the probability that their child has the AABBCc profile?

Father: AaBbCcMother: AaBBCc a) 0

b) 1/4

c) 1/16

d) 1/3

Page 19: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for creditsSuppose A and a; B and b; and C and c are pairs of alleles located on different chromosomes.

Given the following parental profiles, what it the probability that their child has the AABBCc profile?

Father: AaBbCcMother: AaBBCc a) 0

b) 1/4

c) 1/16

d) 1/3

Page 20: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aa

BB

aa

Bb

aa

BB

Aa

Bb

or

F1 F2M

C

H1 H2

Likelihood ratio

Find the likelihood

ratio

a) 1/2

b) 1

c) 2

d) 0NB: the SNPs are on different chromosomes

Pr( Data | H1)

Pr( Data | H2)

Page 21: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Difference between H and FAa

BB

aa

Bb

aa

BB

Aa

Bb

or

F1 F2M

C

H1 H2

F1 Event that the first man has genotype aaBbF2 Event that second man has genotype AaBbH1 Event that the first man is the biological fatherH2 Event that the second man is the biological father

Page 22: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Comparing probabilities

Pr( Data | H1)

Pr( Data | H2)Can use a ratio:

Interpretation

- Greater than 1: evidence for H1: (the first man being the biological father)- Less than 1: evidence for H2: (the second man being the biological father)

Page 23: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

AaBB

aaBb

aaBB

AaBb

F1 F2M

C

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

H1

Probability of all the observed profiles given that F1 is the father’s genotype (hypothesis 1, i.e. H1)

= P(adults, C | H1)

Page 24: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

= P(adults, C | H1)What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)(How?)

What we want:

Page 25: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

= P(adults, C | H1)What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)(How?)

What we want:Allele contributed by mother

Allele contributed by mother

A (1/2) a (1/2)

Allele contributed

by father

A (1/2) AA (1/4) Aa (1/4)Allele contributed

by father a (1/2) aA (1/4) aa (1/4)

Page 26: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

= P(adults, C | H1)What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)(How?)

What we want:

P(adults) = P(F1)P(F2)P(M)(How?)

Page 27: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

= P(adults, C | H1)What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)(How?)

What we want:

P(adults) = P(F1)P(F2)P(M)(How?)

AAbb

aabb

Estimated value for Pr( AA ) = # of times AA is observed

# genotypes for SNP1

AaBB

AABb

AABb

= 3

5

Page 28: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

Chain rule:

= P(adults, C | H1)

= P(adults| H1) x

P(C|adults,H1)

What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)

P(adults) = P(F1)P(F2)P(M)

Page 29: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

Chain rule:

= P(adults, C | H1)

= P(adults| H1) x

P(C|F1,M)

What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)

P(adults) = P(F1)P(F2)P(M)Known

Page 30: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

Chain rule:

= P(adults, C | H1)

= P(adults| H1) x

P(C|F1,M)

What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)

P(adults) = P(F1)P(F2)P(M)Not quite the same

Page 31: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Extra assumption

Under what condition P(adults|H1) = P(adults)?

How can we interpret this assumption?

> Independence

> Consider the extreme case: genotype AA implies infertility

‘neutral alleles’

Page 32: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H1) ?

Pr( Data | H1)

Pr( Data | H2)

= P(M, F1, F2, C | H1)

Pr( Data | H1)

Chain rule:

= P(adults, C | H1)

= P(adults| H1) x

P(C|F1,M)

What we know:

P(C|adults,H1) = P(C|F1,M)P(C|adults,H2) = P(C|F2,M)

P(adults) = P(F1)P(F2)P(M) = P(adults) x

P(C|F1,M)

Neutrality

Page 33: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is the ratio ?

Pr( Data | H1)

Pr( Data | H2)=

P(adults) x P(C|F1,M)

P(adults) x P(C|F2,M)

=P(C|F1,M)

P(C|F2,M)

Page 34: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is the ratio ?

=P(C|F1,M)

P(C|F2,M)

Aa

BB

aa

Bb

aa

BB

Aa

Bb

F1 F2M

C

1) do the computation for locus 1

2) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

=(1/2) x (1/2)

Page 35: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is the ratio ?

=P(C|F1,M)

P(C|F2,M)

Aa

BB

aa

Bb

aa

BB

Aa

Bb

F1 F2M

C

1) do the computation for locus 1

2) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

(1/2) x (1/2)

(1/4) x (1/2)=

2=

Page 36: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is the ratio ?

=P(C|F1,M)

P(C|F2,M)

Aa

BB

aa

Bb

aa

BB

Aa

Bb

F1 F2M

C

1) do the computation for locus 1

2) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

(1/2) x (1/2)

(1/4) x (1/2)=

2=

Questions?

Page 37: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

AaBB

AABb

aaBB

AaBB

F1 F2M

C H1

Likelihood ratio

Find the likelihood

ratio

a) 1/2

b) 1

c) 2

d) 0NB: the SNPs are on different chromosomes

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

P(C|F2,M)H2

Page 38: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

AaBB

AABb

aaBB

AaBB

F1 F2M

C H1

Likelihood ratio

Find the likelihood

ratio

a) 1/2

b) 1

c) 2

d) 0NB: the SNPs are on different chromosomes

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

P(C|F2,M)H2

Page 39: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

Find the likelihood

ratio

a) 1/13

b) 10/3

c) 5/2

d) 1

Page 40: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Find the likelihood

ratio

a) 1/13

b) 10/3

c) 5/2

d) 1

Page 41: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

E( P(C | U, M) )

Find the likelihood

ratio

a) 1/13

b) 10/3

c) 5/2

d) 1

Page 42: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

E( P(C | U, M) )

1) do the computation for locus 1

2) do the computation for locus 2

3) multiply the two numbers (using indep. assumption)

Aaaa

aa

? ?

or

F1 UnkM

C

H1 H2

Page 43: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Preview of the clicker question

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

E( P(C | U, M) )

Different computation needed here...

Aaaa

aa

? ?

or

F1 UnkM

C

H1 H2

Page 44: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H1)

Pr( Data | H2)

Pr( Data | H2) = P( M, F1, FAA, C | H2 )

Probability of all the observed profiles given H2, summing over the possible values of the unknown genome

+ P( M, F1, FAa, C | H2 )

+ P( M, F1, Faa, C | H2 )

Page 45: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H1)

Pr( Data | H2)

Pr( Data | H2) = P( M, F1, FAA, C | H2 )

Probability of all the observed profiles given H2, summing over the possible values of the unknown genome

+ P( M, F1, FAa, C | H2 )

+ P( M, F1, Faa, C | H2 )

Event of the observed mother

genome (Aa)

Event that the unknown genome is aa (pretending

we know it’s aa)

Page 46: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H1)

Pr( Data | H2)

Pr( Data | H2) = P( M, F1, FAA, C | H2 )

Probability of all the observed profiles given H2, summing over the possible values of the unknown genome

+ P( M, F1, FAa, C | H2 )

+ P( M, F1, Faa, C | H2 )

Event of the observed mother

genome (Aa)

Event that the unknown genome is aa (pretending

we know it’s aa)

Very useful principle: sum over the uncertainty

(marginalize)

Page 47: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P( M, F1, FAA, C | H2 )

+ P( M, F1, FAa, C | H2 )

+ P( M, F1, Faa, C | H2 )

P( M, F1, F, C | H2 ) = P(M) P(F1) P(F) P(C | F, M)

Earlier result: for any genotype F...

Page 48: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P(M) P(F1) P(FAA) P(C | FAA, M)

+ P( M, F1, FAa, C | H2 )

+ P( M, F1, Faa, C | H2 )

Earlier result: for any genotype F...

P( M, F1, F, C | H2 ) = P(M) P(F1) P(F) P(C | F, M)

Page 49: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P(M) P(F1) P(FAA) P(C | FAA, M)

+ P(M) P(F1) P(FAa) P(C | FAa, M)

+ P( M, F1, Faa, C | H2 )

Earlier result: for any genotype F...

P( M, F1, F, C | H2 ) = P(M) P(F1) P(F) P(C | F, M)

Page 50: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P(M) P(F1) P(FAA) P(C | FAA, M)

+ P(M) P(F1) P(FAa) P(C | FAa, M)

+ P(M) P(F1) P(Faa) P(C | Faa, M)

Earlier result: for any genotype F...

P( M, F1, F, C | H2 ) = P(M) P(F1) P(F) P(C | F, M)

Page 51: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P(M) P(F1) P(FAA) P(C | FAA, M)

+ P(M) P(F1) P(FAa) P(C | FAa, M)

+ P(M) P(F1) P(Faa) P(C | Faa, M)

= P(M) x P(F1) x

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Page 52: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?Pr( Data | H2) = P(M) P(F1) P(FAA) P(C | FAA, M)

+ P(M) P(F1) P(FAa) P(C | FAa, M)

+ P(M) P(F1) P(Faa) P(C | Faa, M)

= P(M) x P(F1) x

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Gets cancelled by factors in the

numerator, P(Data | H1)

Important part!

Page 53: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

!"#$"#%!!

#

!"#$%&' ()$**+,-&+,&*./"0&"1$2%*"!"

! 3.45"6&$&7$,6.85&.,&9:;<=&.85*+,+,-&57+4&"1$2%*"

! :"5>4&*..?&$5&+5&,.@&5.&#*"$A&8%&$,B&2+48,6"A45$,6+,-4

! 3*"$4"&$4?&C8"45+.,4&,.@&' +)&B.8&$A"&#.,)84"6D&+5&+4&/"AB&*+?"*B&57$5&.57"A&2"2E"A4&.)&57"&#*$44&$A"&#.,)84"6&$4&@"**F

! G$#?&5.&.8A&6+4#844+.,&.)&"1%"#5$5+.,H&!"#$**&57"&)+A45&

"1$2%*"&).A&*$45&#*$44H&

=1%"#5$5+.,&=1$2%*"

#$%&'()*(+,'(-)%%

I J K L M N

3A.E$E+*+5B ION ION ION ION ION ION

! <7"&:.,-&!8,&"1%"#5$5+.,&+4&P845&$&@"+-75"6&

$/"A$-"&.)&57"&/$*8"4&.)&57"&A.**D&@7"A"&57"&@"+-754&$A"&57"&%A.E$E+*+5+"4Q

!.

! " ! " ! " ! " ! " ! "! ! ! ! ! !! " # $ % && & & & & &

!"#$%&'()*)+,- # $ % $ % $ % $ % $ % $

"! #'%&

# #

($ .(,/'#,-#0+('12345R4+,-&,.5$5+.,D&@"&@A+5"Q

S.@&@"&7$/"&5@.&6+#"!/

! T.,4+6"A&57"&#$4"&@7",&@"&7$/"&5@.&6+#"&$,6&@"&

A.**&57"2&E.57&$,6&#.2%85"&57"&482&.)&57"&4#.A"4&47.@,H&

!U"&?,.@&57$5&).A&.,"&6+#"&57"&$/"A$-"&4#.A"&@+**&E"&KHMH&' V.@&#$,&@"&84"&57+4&+,).A2$5+.,&5.&)+,6&57"&$/"A$-"&4#.A"&.)&57"&482W

! 982&!8*"Q

! 9.D&84+,-&57"&-","A$*&A8*"D&@"&@A+5"Q

!"#$%&#'%(#$)*$%$+,-$.+$/"#$

+,-$)*$/"#$%&#'%(#+

($ .(,/'#,6#.78#,6#)9,#0+('12$:.(,/'#,-#;.)#0+'1#<$:.(,/'#,-#=-0#0+'1

############################################2444

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Slide from Corinne Riddell

Possible valuesPossible values

Page 54: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

!"#$"#%!!

#

!"#$%&' ()$**+,-&+,&*./"0&"1$2%*"!"

! 3.45"6&$&7$,6.85&.,&9:;<=&.85*+,+,-&57+4&"1$2%*"

! :"5>4&*..?&$5&+5&,.@&5.&#*"$A&8%&$,B&2+48,6"A45$,6+,-4

! 3*"$4"&$4?&C8"45+.,4&,.@&' +)&B.8&$A"&#.,)84"6D&+5&+4&/"AB&*+?"*B&57$5&.57"A&2"2E"A4&.)&57"&#*$44&$A"&#.,)84"6&$4&@"**F

! G$#?&5.&.8A&6+4#844+.,&.)&"1%"#5$5+.,H&!"#$**&57"&)+A45&

"1$2%*"&).A&*$45&#*$44H&

=1%"#5$5+.,&=1$2%*"

#$%&'()*(+,'(-)%%

I J K L M N

3A.E$E+*+5B ION ION ION ION ION ION

! <7"&:.,-&!8,&"1%"#5$5+.,&+4&P845&$&@"+-75"6&

$/"A$-"&.)&57"&/$*8"4&.)&57"&A.**D&@7"A"&57"&@"+-754&$A"&57"&%A.E$E+*+5+"4Q

!.

! " ! " ! " ! " ! " ! "! ! ! ! ! !! " # $ % && & & & & &

!"#$%&'()*)+,- # $ % $ % $ % $ % $ % $

"! #'%&

# #

($ .(,/'#,-#0+('12345R4+,-&,.5$5+.,D&@"&@A+5"Q

S.@&@"&7$/"&5@.&6+#"!/

! T.,4+6"A&57"&#$4"&@7",&@"&7$/"&5@.&6+#"&$,6&@"&

A.**&57"2&E.57&$,6&#.2%85"&57"&482&.)&57"&4#.A"4&47.@,H&

!U"&?,.@&57$5&).A&.,"&6+#"&57"&$/"A$-"&4#.A"&@+**&E"&KHMH&' V.@&#$,&@"&84"&57+4&+,).A2$5+.,&5.&)+,6&57"&$/"A$-"&4#.A"&.)&57"&482W

! 982&!8*"Q

! 9.D&84+,-&57"&-","A$*&A8*"D&@"&@A+5"Q

!"#$%&#'%(#$)*$%$+,-$.+$/"#$

+,-$)*$/"#$%&#'%(#+

($ .(,/'#,6#.78#,6#)9,#0+('12$:.(,/'#,-#;.)#0+'1#<$:.(,/'#,-#=-0#0+'1

############################################2444

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Slide from Corinne Riddell

Possible valuesProbabilities

Page 55: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

!"#$"#%!!

#

!"#$%&' ()$**+,-&+,&*./"0&"1$2%*"!"

! 3.45"6&$&7$,6.85&.,&9:;<=&.85*+,+,-&57+4&"1$2%*"

! :"5>4&*..?&$5&+5&,.@&5.&#*"$A&8%&$,B&2+48,6"A45$,6+,-4

! 3*"$4"&$4?&C8"45+.,4&,.@&' +)&B.8&$A"&#.,)84"6D&+5&+4&/"AB&*+?"*B&57$5&.57"A&2"2E"A4&.)&57"&#*$44&$A"&#.,)84"6&$4&@"**F

! G$#?&5.&.8A&6+4#844+.,&.)&"1%"#5$5+.,H&!"#$**&57"&)+A45&

"1$2%*"&).A&*$45&#*$44H&

=1%"#5$5+.,&=1$2%*"

#$%&'()*(+,'(-)%%

I J K L M N

3A.E$E+*+5B ION ION ION ION ION ION

! <7"&:.,-&!8,&"1%"#5$5+.,&+4&P845&$&@"+-75"6&

$/"A$-"&.)&57"&/$*8"4&.)&57"&A.**D&@7"A"&57"&@"+-754&$A"&57"&%A.E$E+*+5+"4Q

!.

! " ! " ! " ! " ! " ! "! ! ! ! ! !! " # $ % && & & & & &

!"#$%&'()*)+,- # $ % $ % $ % $ % $ % $

"! #'%&

# #

($ .(,/'#,-#0+('12345R4+,-&,.5$5+.,D&@"&@A+5"Q

S.@&@"&7$/"&5@.&6+#"!/

! T.,4+6"A&57"&#$4"&@7",&@"&7$/"&5@.&6+#"&$,6&@"&

A.**&57"2&E.57&$,6&#.2%85"&57"&482&.)&57"&4#.A"4&47.@,H&

!U"&?,.@&57$5&).A&.,"&6+#"&57"&$/"A$-"&4#.A"&@+**&E"&KHMH&' V.@&#$,&@"&84"&57+4&+,).A2$5+.,&5.&)+,6&57"&$/"A$-"&4#.A"&.)&57"&482W

! 982&!8*"Q

! 9.D&84+,-&57"&-","A$*&A8*"D&@"&@A+5"Q

!"#$%&#'%(#$)*$%$+,-$.+$/"#$

+,-$)*$/"#$%&#'%(#+

($ .(,/'#,6#.78#,6#)9,#0+('12$:.(,/'#,-#;.)#0+'1#<$:.(,/'#,-#=-0#0+'1

############################################2444

E( P(C|Unknown genome, M) ) = ( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Slide from Corinne Riddell

Possible valuesExpectation notation

Page 56: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is Pr(Data | H2) ?

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

AA

aa

Aa

AA

AA

P(FAA) = 3/5, ...

E( P(C|Unknown genome, M) ) =

Page 57: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

E( P(C|Unknown genome, M) ) =

What is Pr(Data | H2) ?

( P(FAA) P(C|FAA, M) + P(FAa) P(C|FAa, M)

+ P(Faa) P(C|Faa, M) )

Aaaa

aa

? ?

or

F1 UnkM

C

H1 H2

1) pretend ?? = AA

2) use the usual mendelian inheritance probabilities

Page 58: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Example

Aaaa

aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C | F1, M)

E( P(C | U, M) )

=1/2

Page 59: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Example

Aaaa

aa

AA

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C | F1, M)

E( P(C | U, M) )

=1/2

P(AA) x P(C | M, pretending Unk=AA) + ...

Page 60: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Example

Aaaa

aa

AA

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C | F1, M)

E( P(C | U, M) )

=1/2

P(AA) x 0 + ...

Page 61: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Example

Aaaa

aa

??

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C | F1, M)

E( P(C | U, M) )

=1/2

P(AA) x 0 + P(Aa) x 1/4 + P(aa) x 1/2=

1/2

1/5 x 1/4 + 1/5 x 1/2

= 10/3

Page 62: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Example

Aaaa

aa

??

or

F1 UnkM

CH1 H2

AA

aa

Aa

AA

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C | F1, M)

E( P(C | U, M) )

=1/2

P(AA) x 0 + P(Aa) x 1/4 + P(aa) x 1/2=

1/2

1/5 x 1/4 + 1/5 x 1/2

= 10/3Questions?

Page 63: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

Aaaa

Aa

? ?

or

F1 UnkM

CH1 H2

Find the likelihood

ratio

a) 1/3

b) 7/2

c) 5/3

d) 1

Page 64: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

Aaaa

Aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

Aa

AA

SNP survey data

Find the likelihood

ratio

a) 1/3

b) 7/2

c) 5/3

d) 1

Page 65: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

Aaaa

Aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

Aa

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

E( P(C | U, M) )

Find the likelihood

ratio

a) 1/3

b) 7/2

c) 5/3

d) 1

Page 66: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker question for credits

Aaaa

Aa

? ?

or

F1 UnkM

CH1 H2

AA

aa

Aa

Aa

AA

SNP survey data

Pr( Data | H1)

Pr( Data | H2)=

P(C|F1,M)

E( P(C | U, M) )

Find the likelihood

ratio

a) 1/3

b) 7/2

c) 5/3

d) 1

Page 67: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-Weinberg principle: the genotype structure of simplified

populations

Page 68: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Motivation

Suppose A vs. a allele corresponds to red hair

phenotype

AA or Aa : No red hair

aa : Red hair

a: ‘Recessive allele’A: ‘Dominant allele’

Assignment question: A couple want to estimate the probability that their first child has red hair. The mother has red hair, but not the father. You only know that red hair is controlled by a recessive allele, and that 1/5 people in the population have red hair.

Page 69: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Motivation

What we needed: P(AA)P(Aa)P(aa)

1- Direct estimation

2- Indirect estimation

How we got these numbers

aa? ?

aa

F1 M

CAA

aa

Aa

AA

SNP survey

P(C) = E P(C|M, father genotype)

Page 70: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Recall: direct estimation

AA

aa

Estimated value for Pr( AA ) = # of times AA is observed

# genotypes for SNP1

Aa

AAAA

Page 71: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Motivation

What we needed: P(AA)P(Aa)P(aa)

1- Direct estimation

2- Indirect estimation

How we got these numbers

aa? ?

aa

F1 M

C AA

aa

Aa

AA

SNP survey

P(C) = E P(C|M, father genotype)

Page 72: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Indirect estimation

No red hair

No red hair

No red hair

Red hair No red hair

Known: red hair gene is recessive

Hair color statistics

AA or Aa : No red hair

aa : Red hair

This gives us Pr(aa). How can we get the other genotype probabilities, Pr(AA), Pr(Aa) from this information?

Page 73: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

In the first generation, not enough information

Generation 1

Known:P(aa)

Fraction of Aa versus AA???

Page 74: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

In the first generation, not enough information

Generation 1

Known:P(aa)

Could be like this...

P(Aa)

P(AA)

Page 75: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

In the first generation, not enough information

Generation 1

Known:P(aa)

or like this:

P(Aa)

P(AA)

Page 76: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

In the second generation, can be inferred exactly

Generation 1 Generation 2

Known:P(aa)

Fraction of Aa versus AA???

Hardy-WeinbergIn the next

generation, the fraction of Aa, aa and AA can be

determined under idealized conditions

Page 77: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

What is missing: a large population has been mixing

Generation 1 Generation 2

Random mating: Assume each individual in the next generation

has a father taken uniformly at random from the previous generation, and a

mother taken independently at

random

Page 78: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

AA

aaAa

Individuals with at least one copy of allele A

Individuals with at least one copy of allele a

AA

Individuals with red hairPreliminaries

Page 79: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Allele frequencies 5/8

3/8

aaAa

AA

P( a )

P( A )

Genotype frequencies

2/4

1/4P( Aa )

P( AA )

AA

1/4P( aa )

Page 80: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker Question

Individuals with at least one copy of allele A

Individuals with at least one copy of allele a

What is the allele frequency of A,

Pr(A) ?

a) 1/6

b) 1/2

c) 7/12

b) 2/3

Page 81: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker Question

Individuals with at least one copy of allele A

Individuals with at least one copy of allele a

What is the allele frequency of A,

Pr(A) ?

a) 1/6

b) 1/2

c) 7/12

b) 2/3

Page 82: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker Question

Individuals with at least one copy of allele A

Individuals with at least one copy of allele a

What is the genotype frequency of AA,

Pr(AA) ?

a) 1/6

b) 1/2

c) 7/12

b) 2/3

Page 83: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Clicker Question

Individuals with at least one copy of allele A

Individuals with at least one copy of allele a

What is the genotype frequency of AA,

Pr(AA) ?

a) 1/6

b) 1/2

c) 7/12

b) 2/3

Page 84: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Allele frequencies 5/8

3/8P( a )

P( A )

Genotype frequencies

2/4

1/4P( Aa )

P( AA )

1/4P( aa )

Useful formula: P( A ) = P( AA ) + P( Aa ) / 2

5/8 1/2 1/8

Page 85: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

OverviewWhat we need:

P(red hair) = P(aa)P(Aa)

P(AA)

How we will do it:

p = P(A) [ and hence q = P(a) ]

why? P(Aa)

P(aa)

P(AA)

Page 86: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergWhat we need:

P(red hair) = P(aa)P(Aa)

P(AA)

How we will do it:

P(A) = pP(a) = q = 1-p

P(Aa) = 2 x p x q

P(aa) = q x q

P(AA) = p x p

When a large population has been mixing

under random mating for at

least one generation

Page 87: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-Weinberg: application

How we will do it:

No red hair

No red hair

No red hair

Red hair No red hair

Hair color statistics

P(Aa) = 2 x p x q

P(aa) = q x q

P(AA) = p x p

P(A) = pP(a) = q = 1-p

Page 88: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-Weinberg: application

How we will do it:

No red hair

No red hair

No red hair

Red hair No red hair

Hair color statistics

= 1/5

P(Aa) = 2 x p x q

P(aa) = q x q

P(AA) = p x p

P(A) = pP(a) = q = 1-p

Page 89: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-Weinberg: application

How we will do it:

No red hair

No red hair

No red hair

Red hair No red hair

Hair color statistics

= 1/5=> p = 1 - 1/ √5

q = 1 / √5 P(Aa) = 2 x p x q

P(aa) = q x q

P(AA) = p x p

P(A) = pP(a) = q = 1-p

Page 90: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-Weinberg: application

How we will do it:

No red hair

No red hair

No red hair

Red hair No red hair

Hair color statistics

P(aa) = 1/5

P(AA) = 2(3-√5)/5

P(Aa) = 1-1/5-P(AA)=> p = 1 - 1/ √5

q = 1 / √5

P(A) = pP(a) = q = 1-p

Page 91: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Where do the formulas come from?Part 1: stability of allele frequencies

Generation 1 Generation 2

50 individuals:Suppose there are:

70 copies of the A allele, 30 copies of the a allele

What is the probability that the allele inherited from

the father is A?

A: 3/7B: 3/5C: 3/10D: 7/10

F

M

Page 92: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 1: stability of allele frequencies

Generation 1 Generation 2

What is the probability that the allele inherited from

the father is A?

D: 7/10 =

70 / ( 30 + 70 )

F

M

Suppose there are: 70 copies of the A allele, 30 copies of the a allele

Page 93: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 1: stability of allele frequencies

Generation 1 Generation 2

Suppose there are still 50 peoples (100 allele copies) at generation 2. What is your best guess for the number

of copies of the A allele in generation 2?A: 70B: About 70C: Not enough informationD: Less than 70

F

M

Suppose there are: 70 copies of the A allele, 30 copies of the a allele

Page 94: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 1: stability of allele frequencies

Generation 1 Generation 2

Suppose there are still 50 peoples (100 allele copies) at generation 2. What is your best guess for the number

of copies of the A allele in generation 2?A: 70B: About 70C: Not enough informationD: Less than 70

F

M

Suppose there are: 70 copies of the A allele, 30 copies of the a allele

Page 95: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

!"#$"#%!!

&

!"#$%&'&()*!"

!"#$%&'&()*+!"',#-$#$

%&'' ()'*+,&-,

./+,%&''

01+2)3+

4&,-)2

. / /011

2 2 3011

4 2 4044

3 2 4011

5 2 2061

/ 3 4011

000 000 000

276 3 405.

277 . 4051

411 2 4037! "! #!! #"! $!! $"! %!!

#$

%&

"'

!"#$%&'#%()*+,-.-/"#%"0%1/,+%&"22

()**

+,*-./)0/12./()**

8$&9:+;)--+'+<(%$+411+&(,$:000

!"#$%&'&()*+!"',#-$#5

! =$+>*<$;:&'*<+&?'&+(*+&?$+-)*@+;>*A+&?$+'B$;'@$+

C(--+%)*B$;@$+&)+4050+D)C+&)+C$+E);,'--F+%'-%>-'&$+

&?(:G

! H$%'--+)*$+)E+&?$+<$E(*(&()*:+C$+>:$<+)E+#;)I'I(-(&FJ

!"#$%$#%&'"($)*+)#,()&-$-.)$+#"/%/,0,-1$"2 $-.)$)3)&-$,4$-.)$

#)0%-,3)$2#)56)&71$"2 $"776##)&7)$"2 $-.)$)3)&-$,2 $-.)$)*+)#,()&-$

8%4$#)+)%-)'$(%&1$-,()49$

In a large population the expectation is close to the observed fractionConsequence: in a large population, the allele frequency stays relatively constant across generations

Slide from Corinne Riddell

Page 96: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Generation 1 Generation 2

F

M

Suppose there are: 70 copies of the A allele, 30 copies of the a allele

Let’s find P’(AA)

Finding P’(aa) uses the same argument, and P’(Aa) = 1-P’(AA)-P’(aa)

The prime means:‘in generation 2’

Page 97: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Generation 1 Generation 2

F

M

Suppose there are: 70 copies of the A allele, 30 copies of the a allele

Let’s find P’(AA)

? ? ? ?FM

CAA

Page 98: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

Large population approximation (as in

part 1)

? ?

Page 99: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)

Same argument as in parentage testing (marginalize over

unknown variables)

? ?

Page 100: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(FAA) P(MAA) P(CAA| FAA,MAA)

+ P(FAA) P(MAa) P(CAA| FAA,MAa)

+ P(FAa) P(MAA) P(CAA| FAa,MAA)

+ P(FAa) P(MAa) P(CAA| FAa,MAa)

+ 0 + 0 + ...Some parental genotype

cannot lead to AA (e.g. aa & Aa)

Page 101: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(FAA) P(MAA) x 1

+ P(FAA) P(MAa) x (1/2)

+ P(FAa) P(MAA) x (1/2)

+ P(FAa) P(MAa) x (1/4)

Page 102: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(AA) P(AA) x 1

+ P(AA) P(Aa) x (1/2)

+ P(Aa) P(AA) x (1/2)

+ P(Aa) P(Aa) x (1/4)

Page 103: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(AA) P(AA) x 1

+ P(AA) P(Aa)

+ P(Aa) P(Aa) x (1/4)

Page 104: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(AA) P(AA)

+ P(AA) P(Aa)

+ P(Aa) P(Aa) x (1/4)

= (P(AA) + P(Aa) / 2)2

x2 + xy + y2/4 = (x + y/2)2

Page 105: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA)

? ?FM

CAA

P’(AA) ≈ P(CAA)

= E P(CAA| parents)? ?= P(AA) P(AA)

+ P(AA) P(Aa)

+ P(Aa) P(Aa) x (1/4)

= (P(AA) + P(Aa) / 2)2Useful formula:

P( A ) = P( AA ) + P( Aa ) / 2

= (P(A))2

Page 106: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA): Got: (P(A))2 = p 2

Page 107: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA): Got: (P(A))2 = p 2

For P’(aa): Do exactly the same argument, writing ‘a’ instead of ‘A’, and ‘A’ instead of ‘a’

Get: (P(a))2 = q 2

Page 108: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergPart 2: genotype frequencies

Let’s find P’(AA): Got: (P(A))2 = p 2

For P’(aa): Do exactly the same argument, writing ‘a’ instead of ‘A’, and ‘A’ instead of ‘a’

Get: (P(a))2 = q 2

For P’(Aa): P’(Aa) = 1 - P’(AA) - P’(aa)

= 1 - p2 - q2

= 2pq Using p + q = 1

Page 109: STAT100, Module 3: Statistics in geneticsbouchard/pub/module3-lecture3.pdfSTAT100, Module 3: Statistics in genetics Dr. Alexandre Bouchard-Côté Feb 8, 2011 Next topics 1.Review:

Hardy-WeinbergWhat we need:

P(red hair) = P(aa)P(Aa)

P(AA)

How we will do it:

P(A) = pP(a) = q = 1-p

P(Aa) = 2 x p x q

P(aa) = q x q

P(AA) = p x p

When a large population has been mixing

under random mating for at

least one generation