FST & Some Selection Index진화, 인구집단 유전학과 건강 2014
김진섭
GSPH, SNU
October 29, 2014
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 1 / 65
Fst
Contents
1 Fst
Wright’s F -statisticsCockerham’s θ-statistics
2 Selection IndexEHHiHSxp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 2 / 65
Fst Wright’s F -statistics
3 types of Heterozygosity[4]
Individual, Subpopulation, Total Population
1 HI = 1n
∑ni=1 Hi
2 HS = 1n
∑ni=1 2piqi
3 HT = 2pq
(Hi : observed heterozygosity in ith subpopulation, 2piqi : averageheterozygosity in ith subpopulation, 2pq: average heterozygosity of totalpopulation)Locus 별로 값 구한다.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 3 / 65
Fst Wright’s F -statistics
Wright’s F -statistics[4]
1 FIS = HS−HIHS
2 FST = HT−HSHT
3 FIT = HT−HIHT
Example
FST = 0 → Subpopulation의 effect없다!! 차이 없다.
FST = 1 → Subpopulation별로 차이가 크다.
Simple relation
1− FIT = (1− FIS)(1− FST )
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 4 / 65
Fst Wright’s F -statistics
http://academic.reed.edu/biology/professors/srenn/pages/
research/2011_students/sean/SM_thesis.html
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 5 / 65
Fst Wright’s F -statistics
http://www.johnderbyshire.com/Miscellaneous/Other/Fst.jpg
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 6 / 65
Fst Wright’s F -statistics
FST inference[5]
Convenient measure of genetic differentiation.
Most widely used descriptive statistics in population andevolutionary genetics.
Natural selection in particular subpopulation.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 7 / 65
Fst Wright’s F -statistics
Problem in estimation
HT = 2pq
1 Subpopulation마다 sample수가 다르면??
2 Ex: SASIA 1000명, Oceania 100명..
3 제대로 된 p 추정이 아님.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 8 / 65
Fst Cockerham’s θ-statistics
ANOVA approach[1, 5]
θ =σPσT
(σP : variance due to population, σT : total variance)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 9 / 65
Fst Cockerham’s θ-statistics
Wright’s FST = Cockerham’s θ
실제 계산은 대부분 θ
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 10 / 65
Fst Cockerham’s θ-statistics
θ inference
Population > 2
대세와 다른 population이 있다!!
어떤 population인지는 말 안해준다.
Pairwise FST
2 population만 가지고 계산.
상대적인 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 11 / 65
Fst Cockerham’s θ-statistics
Figure: FST calculated for each SNP between Tibetan and Han populations[6]
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 13 / 65
Fst Cockerham’s θ-statistics
Figure: Inter-population pairwise comparisons of FST statistics
http://academic.reed.edu/biology/professors/srenn/pages/
research/2011_students/sean/SM_thesis.html김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 14 / 65
Selection Index
Contents
1 Fst
Wright’s F -statisticsCockerham’s θ-statistics
2 Selection IndexEHHiHSxp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 15 / 65
Selection Index
특정 인구집단에 특정 haplotype이 많냐??Example: Erik Corona’s slide - Next slide
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 16 / 65
Selection Index
Population Genetics
Glucose
HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%
Lactase + H2O
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 17 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 18 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%
AATTGCAGATTACA <1%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 19 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%
AATTGCAGATTACA <1%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 20 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 21% -1%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 8% -1%
AATTGCAGATTACA 2% +2%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 21 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 21% -1%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 23% -1%AACTACAGATTACC 15% -1%GATTACAGACTACA 7%AATTACAGATTACA 7% -2%
AATTGCAGATTACA 5% +5%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 22 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPESGATTACAGATTACA 20% -2%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 23% -1%AACTACAGATTACC 15% -1%GATTACAGACTACA 6% -1%AATTACAGATTACA 5% -4%
AATTGCAGATTACA 9% +9%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 23 / 65
Selection Index EHH
EHH: Sabeti, Reich et al. (2002)[7]
Extended Haplotype Homozygosity
Random으로 2개 haplotype 뽑았을 때 그것이 같을 확률은??
0 → haplotype이 다 다르다.
1 → haplotype이 모두 같다.
관심있는 haplotype을 Core라 한다.
EHHt =
∑si=1
(eti2
)(ct2
)(t: core haplotype, c : the number of samples of a particular corehaplotype, e: the number of samples of a particular extended haplotype, s:the number of unique extended haplotype)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 24 / 65
Selection Index EHH
How can we detect Pos. Sel.?
AATTACAGATTACA 50 people have thisGATTACAGATTACA 50 people have this---- 50 KB ----
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 25 / 65
Selection Index EHH
50 KB + 20 KB = 70 KB__AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
How can we detect Pos. Sel.?
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 26 / 65
Selection Index EHH
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 27 / 65
Selection Index EHH
( (32
52
72
82)+
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
102)+( )+( )+ )+( )+6
2( )+42( )72
)502(
(
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 28 / 65
Selection Index EHH
)+
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
102( )+ 8
2( )+72( )+5
2( )+32( )+6
2( )+42( )72(
)502(
0.121
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 29 / 65
Selection Index EHH
EHH Drops Over Genetic Distance
EHH drops off quickly over genetic distanceStarts with 1Ends at 0
Every hap block will eventually be unique
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 30 / 65
Selection Index EHH
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
EHH What It Is & What It Isn’tDetects over‐representation of a haplotype
This will raise the p(two haps are homozygous)Does NOT detect if a haplotype spread quickly
Low recombination != spread quickly
AATTACAGATTACA AACACGC 22AATTACAGATTACA ATGATAG 28
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 31 / 65
Selection Index EHH
Compare EHH ScoresAATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
)+242( )262(
)502(
0.121
0.490
Low RecombinationOver Represented
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 32 / 65
Selection Index EHH
Can EHH Detect Pos. Sel.?
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 33 / 65
Selection Index EHH
Relative EHH
Detects over‐representation of a haplotypeLow recombinationThis will raise the p(two haps are homozygous)
Does detect if a haplotype spread quicklyOther haplotype blocks are controls!
Recombination cold‐spot / hot‐spot agnosticLow score if both alleles are assoc. w/ high or low recombination
AATTACAGATTACA AACACGC 22AATTACAGATTACA ATGATAG 28
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 34 / 65
Selection Index EHH
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26
0.121
0.490
0.4900.121
= 4.05REHH =
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 35 / 65
Selection Index EHH
REHH: Problem #1
We get a different REHH value at different genetic distance cutoffs
AATTACAGATTACA 50GATTACAGATTACA 50---- 50 KB ----
REHH = 1.0
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 36 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
AGTTACAGATTACAAACACGCAAATACAGATTACAATGATAG AATTACAGATTACAAACCCAGAATTTCAGATTACACTGACAGAATTAAAGATTACACAGACAG AATTACCGATTACAAACACAG AATTACAAATTACACACACAGAATTACAGGTTACACACCCAG
GATTACAGATTACACACATAG GATTACAGATTACACACACAG
---------- 70 KB ---------REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 37 / 65
Selection Index EHH
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
Which REHH value to use?
Extend to the right
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 38 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 39 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 40 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 41 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 42 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 43 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 44 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 45 / 65
Selection Index EHH
REHH: Problem #2
REHH score is heavily biased by allele frequenciesMust normalize
P(REHH | Allele Freq.)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 46 / 65
Selection Index EHH
REHH: Problem #3
Not possible to detect selection in high frequency allelesSolution requires a X‐population approach (discussed later)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 47 / 65
Selection Index EHH
Leaves a lot to be desiredPicking the maximum is arbitrary
Why not the mean REHH score?Biased by allele frequency
ln(REHH | allele freq) ~ norm dist.Still widely used and published with
REHH Overview
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 48 / 65
Selection Index EHH
Site-specific EHH[9]
두 allele의 EHH값의 대략적인 평균(weight: squared allele frequencies)
Focal SNP의 대략적인 EHH크기
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 49 / 65
Selection Index iHS
iHS: sabeti(2007)[8]
모든 위치에 대해 적분!!!!해서 비교
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 50 / 65
Selection Index iHS
Integrated Haplotype Score (iHS)
Unstandardized iHS =
EHH
y x
y = bwd distancex = fwd distanceEHHD = derived alleleEHHA = ancestral allele
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 51 / 65
Selection Index iHS
…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACAACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG
+ 0.5 = 1.20.7
4.0 + 4.4 = 8.4
Unstandardized iHSln(8.4/3.2) = 0.419
Integrated Haplotype Score (iHS)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 52 / 65
Selection Index iHS
iHS Characteristics
As both alleles have the same AUC, iHS zeroLarge negative values indicate selection of allele in the denominatorLarge positive values indicate selection of allele in the numeratorStill heavily biased by allele frequency!
Z‐score normalization
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 53 / 65
Selection Index iHS
Unstandardized iHS ‐ E(iHS | Allele Frequency) SD(iHS | Allele Frequency)
E(iHS | Allele Freq.): Estimated from empirical distributionSD(iHS | Allele Freq.): Estimated from empirical distribution
Integrated Haplotype Score (iHS)
= iHS
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 54 / 65
Selection Index iHS
iHS Overview
iHS and REHH are EHH based methods to detect positive selectioniHS outperforms REHH in specific allele frequencies
They don’t completely outperform each other
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 55 / 65
Selection Index iHS
iHS: Problem #1Still can’t detect selection in high frequency (old) alleles
Relatively High EHH values are not present high frequency (old) alleles!Use a reference population
If pos. sel. didn’t take place in ref. population, EHH is high
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 56 / 65
Selection Index xp-EHH
xp-EHH: sabeti(2007)[8]
Population 별, 같은 allele별 integreted EHH를 비교!!
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 57 / 65
Selection Index xp-EHH
Cross Population EHH (XP‐EHH)
AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7
Same allele but diff populationAATTACAGATTACA CACATAG 20 AATTACAGATTACA CACACAG 30
0.5
XP‐EHH = ln(3.3/0.5) = 1.89 Z‐score Norn
Integrate EHH over distance from alleleCalculated for fwd/rev sides independentlyIntegrate until EHH = 0.04 in e.a. population
3.3
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 58 / 65
Selection Index xp-EHH
REHH and iHS are more or less complementarye.a. is better at detecting pos. sel. at diff freqs.
XP‐EHHCan detect pos. sel. in high freq. allelesSusceptible to population variation in recombination rate
Overview
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 59 / 65
Selection Index xp-EHH
Final Verdict: REHH vs iHS vs XP‐EHH
REHHiHS testXP‐EHH
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 60 / 65
Selection Index xp-EHH
Rsb[9]
Population끼리 비교하는 또다른 지표.
Population별로만 비교.
Locus별로 두 allele의 integrated EHH의 average: iES
Locus의 대략적인 selection정도를 population끼리 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 61 / 65
Practice
Contents
1 Fst
Wright’s F -statisticsCockerham’s θ-statistics
2 Selection IndexEHHiHSxp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 62 / 65
Practice
FST
hierfstat[3]
PER3 gene in HGDP(Human Genome Diversity Panel): 289 SNPs &7 population
EHH, iHS
rehh[2]
패키지 자체 제공 예제
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 63 / 65
Practice
Reference I
[1] Cockerham, C. C. (1969). Variance of gene frequencies. Evolution, pages 72–84.
[2] Gautier, M. and Vitalis, R. (2012). rehh: an r package to detect footprints of selection in genome-wide snp data fromhaplotype structure. Bioinformatics, 28(8):1176–1177.
[3] Goudet, J. (2005). Hierfstat, a package for r to compute and test hierarchical f-statistics. Molecular Ecology Notes,5(1):184–186.
[4] Hamilton, M. (2011). Population genetics. John Wiley & Sons.
[5] Holsinger, K. E. and Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating andinterpreting fst. Nature Reviews Genetics, 10(9):639–650.
[6] Huerta-Sanchez, E., Jin, X., Bianba, Z., Peter, B. M., Vinckenbosch, N., Liang, Y., Yi, X., He, M., Somel, M., Ni, P., et al.(2014). Altitude adaptation in tibetans caused by introgression of denisovan-like dna. Nature, 512(7513):194–197.
[7] Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., Gabriel, S. B., Platko, J. V.,Patterson, N. J., McDonald, G. J., et al. (2002). Detecting recent positive selection in the human genome from haplotypestructure. Nature, 419(6909):832–837.
[8] Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A.,Gaudet, R., et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature,449(7164):913–918.
[9] Tang, K., Thornton, K. R., and Stoneking, M. (2007). A new approach for using genome scans to detect recent positiveselection in the human genome. PLoS biology, 5(7):e171.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 64 / 65
Practice
END
Email : [email protected]: (02)880-2743H.P: 010-9192-5385
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 65 / 65