Top Banner
Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint work with J. Kennedy and B. Pasaniuc
27

Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Imputation-based local

ancestry inference in admixed

populations

Ion Mandoiu

Computer Science and Engineering Department

University of Connecticut

Joint work with J. Kennedy and B. Pasaniuc

Page 2: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Outline

Motivation and problem definition

Factorial HMM model of genotype data

Algorithms for genotype imputation and ancestry inference

Preliminary experimental results

Summary and ongoing work

Page 3: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Population admixture

http://www.garlandscience.co.uk/textbooks/0815341857.asp?type=resources

Page 4: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Admixture mapping

Patterson et al, AJHG 74:979-1000, 2004

Page 5: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Local ancestry inference problem

rs11095710 T T rs11117179 C T rs11800791 G G rs11578310 G Grs1187611 G Grs11804808 C C rs17471518 A G...

Given: Reference haplotypes for ancestral populations P1,…,Pn Whole-genome SNP genotype data for extant individual

Find: Allele ancestries at each locus

Reference haplotypes

SNP genotypes

rs11095710 P1 P1rs11117179 P1 P1rs11800791 P1 P1rs11578310 P1 P2rs1187611 P1 P2rs11804808 P1 P2rs17471518 P1 P2...

1110001?0100110010011001111101110111?1111110111000 11100011010011001001100?100101?10111110111?0111000111100100110011010011100101101010111110111101110001110001001000100111110001111011100111?111110111000011101100110011011111100101101110111111111?011000011100010010001001111100010110111001111111110110000011?001?011001101111110010?1011101111111111011000011100110010001001111100011110111001111111110111000

Inferred local ancestry

1110001?0100110010011001111101110111?1111110111000 11100011010011001001100?100101?10111110111?0111000111100100110011010011100101101010111110111101110001110001001000100111110001111011100111?111110111000011101100110011011111100101101110111111111?011000011100010010001001111100010110111001111111110110000011?001?011001101111110010?1011101111111111011000011100110010001001111100011110111001111111110111000

1110001?0100110010011001111101110111?1111110111000 11100011010011001001100?100101?10111110111?0111000111100100110011010011100101101010111110111101110001110001001000100111110001111011100111?111110111000011101100110011011111100101101110111111111?011000011100010010001001111100010110111001111111110110000011?001?011001101111110010?1011101111111111011000011100110010001001111100011110111001111111110111000

Page 6: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Previous work

MANY methods Ancestry inference at different granularities, assuming

different amounts of info about genetic makeup of ancestral populations

Two main classes HMM-based: SABER [Tang et al 06], SWITCH

[Sankararaman et al 08a], HAPAA [Sundquist et al. 08], … Window-based: LAMP [Sankararaman et al 08b], WINPOP

[Pasaniuc et al. 09] Poor accuracy when ancestral populations are

closely related (e.g. Japanese and Chinese) Methods based on unlinked SNPs outperform methods

that model LD!

Page 7: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Haplotype structure in panmictic populations

Page 8: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Similar models proposed in [Schwartz 04, Rastas et al. 05, Kennedy et al. 07, Kimmel&Shamir 05, Scheet&Stephens 06,…]

HMM model of haplotype frequencies

Page 9: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Random variables Fi = founder haplotype at locus i, between 1 and K Hi = observed allele at locus I

Model training Based on haplotypes using Baum-Welch algo, or Based on genotypes using EM [Rastas et al. 05]

Given haplotype h, P(H=h|M) can be computed in O(nK2) using a forward algorithm, where n=#SNPs, K=#founders

Graphical model representation

F1 F2 Fn…

H1 H2 Hn

Page 10: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

F1 F2 Fn…

H1 H2 Hn

F'1 F'2 F'n…

H'1 H'2 H'n

G1 G2 Gn

Factorial HMM for genotype data in a window with known local ancestry

Page 11: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

HMM Based Genotype Imputation

Probability of missing genotype given the typed genotype data:

gi is imputed as )|][(argmax }2,1,0{ MxggP ix

)|][(),|( MxggPMgxgP iii

Page 12: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

fi …

hi

gi

f’i …

h’i

Forward-backward computation

)()|( '' ''1 ,1 ,, i

i

ff

K

f

i

ff

i

ff

K

fgMgP

iii iiiii

Page 13: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

fi …

hi

gi

f’i …

h’i

Forward-backward computation

)()|( '' ''1 ,1 ,, i

i

ff

K

f

i

ff

i

ff

K

fgMgP

iii iiiii

Page 14: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

fi …

hi

gi

f’i …

h’i

Forward-backward computation

)()|( '' ''1 ,1 ,, i

i

ff

K

f

i

ff

i

ff

K

fgMgP

iii iiiii

Page 15: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

fi …

hi

gi

f’i …

h’i

Forward-backward computation

)()|( '' ''1 ,1 ,, i

i

ff

K

f

i

ff

i

ff

K

fgMgP

iii iiiii

Page 16: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

)()( '11

1

, ' fPfPii ff

K

fi

i

ffii

K

fii

i

ff

i

ff

i

ii

i

iiiigffPffP

11

1

,

'1

'

11

1

,,

1

'11'

1

'11

' )()|()|(

Runtime Direct recurrences for computing forward

probabilities:

Runtime reduced to O(nK3) by reusing common terms:

where

)()|( 11

1

,

'1

'1

,,'1

'11

'11

'1

i

K

f

i

ffiii

ff

i

ffgffP

i

iiiiii

K

f

i

ffiii

ffi

iiiiffP

1,1,

'1

'1

' )|(

Page 17: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Imputation-based ancestry inference

View local ancestry inference as a model selection problem Each possible local ancestry defines a factorial

HMM Pick model that re-imputes SNPs most

accurately around the locus of interest Fixed-window version: pick ancestry that

maximizes the average posterior probability of true SNP genotypes within a fixed-size window centered at the locus

Multi-window version: weighted voting over window sizes between 200-3000, with window weights proportional to average posterior probabilities

Page 18: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

HMM imputation accuracy

Missing data rate and accuracy for imputed genotypes at different thresholds (WTCCC 58BC/Hapmap CEU)

Page 19: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

N=2,000g=7

=0.2n=38,864

r=10-8

Window size effect

Page 20: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Number of founders effect

CEU-JPTN=2,000

g=7=0.2

n=38,864 r=10-8

Page 21: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

N=2,000g=7

=0.2n=38,864

r=10-8

Comparison with other methods

Page 22: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Summary and ongoing work

Imputation-based local ancestry inference achieves significant improvement over previous methods for admixtures between close ancestral populations

Code at http://dna.engr.uconn.edu/software/ Ongoing work

Evaluating accuracy under more realistic admixture scenarios (multiple ancestral populations/gene flow/drift in ancestral populations)

Extension to pedigree data Exploiting inferred local ancestry for more accurate

untyped SNP imputation and phasing of admixed individuals

Extensions to sequencing data Inference of ancestral haplotypes from extant admixed

populations

Page 23: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

N=2,000g=7

=0.5n=38,864

r=10-8

Untyped SNP imputation accuracy in admixed individuals

Page 24: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

HMM-based phasing

Maximum likelihood genotype phasing: given g, find (h1,h2) = argmax h1+h2=g P(h1|M)P(h2|M)

F1 F2 Fn…

H1 H2 Hn

F'1 F'2 F'n…

H'1 H'2 H'n

G1 G2 Gn

Page 25: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

• Bad news: Cannot approximate maxh1+h2=g P(h1|M)P(h2|M) within a factor of O(n1/2 -), unless ZPP=NP [KMP08]

• Good news: Viterbi-like heuristics yields phasing accuracy comparable to PHASE in practice [Rastas et al. 05]

HMM-based phasing

Page 26: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

F1 F2 Fn…

H1 H2 Hn

G1 G2 Gn

…R1,1 R2,1

F'1 F'2 F'n…

H'1 H'2 H'n

R1,c … R2,c …Rn,1 Rn,c1 2 n

Factorial HMM model for sequencing data

Page 27: Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.

Acknowledgments

J. Kennedy and B. Pasaniuc Work supported in part by NSF awards IIS-0546457

and DBI-0543365.