Computational Problems Computational Problems in Perfect Phylogeny in Perfect Phylogeny Haplotyping: Haplotyping: Xor-Genotypes and Tag Xor-Genotypes and Tag SNPs SNPs Tamar Barzuza Tamar Barzuza 1 Jacques S. Jacques S. Beckmann Beckmann 2,3 2,3 Ron Shamir Ron Shamir 4 Itsik Pe’er Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Computer Science and Applied Mathematics, Weizmann Institute of Science Institute of Science 2 Molecular Genetics, Weizmann Institute of Science Molecular Genetics, Weizmann Institute of Science 3 Génétique Médicale, Universitätsspital Lausanne Génétique Médicale, Universitätsspital Lausanne
34
Embed
Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs
Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs. Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Institute of Science 2 Molecular Genetics, Weizmann Institute of Science - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Computational Problems in Computational Problems in Perfect Phylogeny Perfect Phylogeny Haplotyping: Haplotyping:
Xor-Genotypes and Tag SNPsXor-Genotypes and Tag SNPs Tamar BarzuzaTamar Barzuza11 Jacques S. Jacques S.
BeckmannBeckmann2,32,3
Ron ShamirRon Shamir44 Itsik Pe’erItsik Pe’er55
11Computer Science and Applied Mathematics, Weizmann Institute of Computer Science and Applied Mathematics, Weizmann Institute of ScienceScience
22Molecular Genetics, Weizmann Institute of ScienceMolecular Genetics, Weizmann Institute of Science33Génétique Médicale, Universitätsspital LausanneGénétique Médicale, Universitätsspital Lausanne 44School of Computer Science, Tel- Aviv UniversitySchool of Computer Science, Tel- Aviv University
55Medical and Population Genetics Group, Broad InstituteMedical and Population Genetics Group, Broad Institute
Previous workPrevious workHaplotyping:Haplotyping: haplotypes from haplotypes from genotypesgenotypes::Input:Input: Genotypes Genotypes GG={={GG11,…,,…,GGnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}Output:Output: Find the haplotypes Find the haplotypes HH={={HH11,…,,…,HH22nn}} that gave rise to that gave rise to GG
General heuristics: General heuristics: Clark ’90 Clark ’90 Excoffier+Slatkin ‘95Excoffier+Slatkin ‘95
PPH:PPH: Perfect phylogeny haplotyping ( Perfect phylogeny haplotyping (nn genotypes, genotypes, mm SNPs):SNPs):Gusfield 2002Gusfield 2002 O(O(nmnm((nn,,mm)) )) Bafna et. al 2002Bafna et. al 2002O(O(nmnm22))Eskin et. al 2003Eskin et. al 2003O(O(nmnm22))
Graph Realization
Graph Realization
Previous workPrevious work
Tutte 1959 Tutte 1959 O(O(nn22mm), ), Gavril and Tamari 1983 Gavril and Tamari 1983 O(O(nmnm22), ),
Bixby and Wagner 1988 Bixby and Wagner 1988 O(O(nmnm((nn,,mm))))
The graph realization problem:The graph realization problem: Input: Input: A hypergraphA hypergraph HH=({1,…,=({1,…,mm}, }, PP))
Summary and Future researchSummary and Future research
Xor-haplotypingXor-haplotyping: haplotypes from : haplotypes from xor-genotypesxor-genotypes::Input:Input: 1. Xor-genotype data 1. Xor-genotype data (can be obtained by DHPLC)(can be obtained by DHPLC)
2. Three genotypes2. Three genotypesGoal:Goal: Resolve the haplotypes and their perfect phylogeny Resolve the haplotypes and their perfect phylogeny
Xor-haplotypingXor-haplotyping: haplotypes from : haplotypes from xor-genotypesxor-genotypes::Input:Input: 1. Xor-genotype data 1. Xor-genotype data (can be obtained by DHPLC)(can be obtained by DHPLC)
2. Three genotypes2. Three genotypesGoal:Goal: Resolve the haplotypes and their perfect phylogeny Resolve the haplotypes and their perfect phylogeny
GREALGREAL Find graph realization or determine that none Find graph realization or determine that none
existsexists Count num of graph realization solutions for dataCount num of graph realization solutions for data Stable and fastStable and fast Available at Available at http://http://www.cs.tau.ac.il/~rshamir/grealwww.cs.tau.ac.il/~rshamir/greal//
SimulationsSimulations Simulate data of Simulate data of nn individuals using Hudson 2002 individuals using Hudson 2002 Remove all SNPs with <5% minor allele frequencyRemove all SNPs with <5% minor allele frequency Apply GREAL: Is there a single solution?Apply GREAL: Is there a single solution? Repeat 5000 times for each Repeat 5000 times for each nn
We implemented Gavril & Tamari’s algorithm (83) We implemented Gavril & Tamari’s algorithm (83) for graph realization: for graph realization: O(O(mm22nn))
Resolution up to Resolution up to bit flippingbit flipping : gives the haplotypes : gives the haplotypes structurestructure
1
23
{1, 2}{1, 3}{2, 3}
Xor-genotypes
1 2 2Genotype
1 x x1 x x
0 x x
SNP #1 homozygous SNP #1 homozygous Can infer SNP #1 for all Can infer SNP #1 for all haplotypeshaplotypes Need individuals with Need individuals with xor-genotypes (=xor-genotypes (={het {het SNPs}) = SNPs}) =
Theorem:Theorem: xor-genotypes=xor-genotypes= there are there are three three xor-genotypes with empty intersectionxor-genotypes with empty intersection
Proof: Proof: ! xor-genotypes are tree paths ! xor-genotypes are tree paths (ow: NP-(ow: NP-hard)hard)
(1) The intersection of two tree paths is an (1) The intersection of two tree paths is an intervalinterval
(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn
XX11
(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn
XX11
(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn
(3) (3) XXLL ends firstends first,, XXRR begins last begins last
XXLL
XXRR
XX11
XX11
(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn
(3)(3) XXLL ends firstends first,, XXRR begins last begins last
XXLL
XXRR
XX11XXLL
XXRR
XX11
(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn
XX11XXLLXXRR==
XXLL
XXRR
XX11 XXLL
XXRR
XX11
XXLL
XXRR
XX11
Find 3 individuals to genotype in Find 3 individuals to genotype in O(O(nmnm))
Summary and Future researchSummary and Future research
Input:Input: 1. Haplotypes 1. Haplotypes HH={={HH11,…,,…,HHnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}2. A set of interesting SNPs2. A set of interesting SNPs SS""SS
Output:Output: Minimal setMinimal set SSSS\\SS"" that distinguishes the same that distinguishes the same haplotypes as haplotypes as SS""
Informative SNPs (Bafna et al. 2003):Informative SNPs (Bafna et al. 2003):
Informative SNPsInformative SNPs
1 0 0 0 00 0 1 0 00 0 0 1 10 1 0 1 0Ha
plo t
ypes
4 3
2
1
SNPs1 2 3 4 5
Not perfect phylogeny: NP-hard (Not perfect phylogeny: NP-hard (MINIMUM TEST SETMINIMUM TEST SET))Perfect phylogeny, 1 interesting SNP: O(Perfect phylogeny, 1 interesting SNP: O(nmnm), Bafna et al. 2003), Bafna et al. 2003
Informative SNPs:Informative SNPs:Input:Input: 1. Haplotypes 1. Haplotypes HH={={HH11,…,,…,HHnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}
2. A set of interesting SNPs2. A set of interesting SNPs SS""SS 3. A perfect phylogeny for 3. A perfect phylogeny for HH..4. A cost function4. A cost function CC::SSRR++..
Output:Output: SSSS\\SS"" with minimal costwith minimal cost that distinguishes that distinguishes the same haplotypes as the same haplotypes as SS""
Informative SNPsInformative SNPs
Generalization of prev defGeneralization of prev def
1 0 0 0 00 0 1 0 00 0 0 1 10 1 0 1 0Ha
plo t
ypes
4 3
2
1
SNPs1 2 3 4 5
We find informative SNPs setWe find informative SNPs set Of minimal costOf minimal cost For any number of interesting SNPsFor any number of interesting SNPs In O(In O(mm))
By a dynamic programming algorithm that By a dynamic programming algorithm that climbs up the perfect phylogeny treeclimbs up the perfect phylogeny tree
We prove that the definition of informative We prove that the definition of informative SNPs generalizes to a more practical SNPs generalizes to a more practical definitiondefinition Under the perfect phylogeny model, informative Under the perfect phylogeny model, informative
SNPs on genotypes and haplotypes are SNPs on genotypes and haplotypes are equivalentequivalent
SummarySummary Xor-haplotyping:Xor-haplotyping:
DefinitionDefinition Resolve haplotypes given xor-data and 3 Resolve haplotypes given xor-data and 3
genotypes in O(genotypes in O(nmnm((mm,,nn)))) ImplementationImplementation Experimental resultsExperimental results
Selection of tag SNPs:Selection of tag SNPs: Generalize to Generalize to
arbitrary costarbitrary cost many interesting SNPsmany interesting SNPs
Find optimal informative SNPs set in O(Find optimal informative SNPs set in O(mm) time) time Combinatorial observation allows practical usesCombinatorial observation allows practical uses
Future researchFuture research Relax the strong assumption of perfect Relax the strong assumption of perfect
phylogenyphylogeny Deal with data errors and missing dataDeal with data errors and missing data
Obtain empirical results for the theoretical Obtain empirical results for the theoretical work on informative SNPswork on informative SNPs Preliminary results show that blocks of up to 600 Preliminary results show that blocks of up to 600
SNPs are distinguishable by ~20 informative SNPsSNPs are distinguishable by ~20 informative SNPs
Theorem:Theorem: All genotypes are distinct within a block All genotypes are distinct within a blockProof: Proof: Assume to the contrary equivalency of two:Assume to the contrary equivalency of two: