Top Banner
RECOMB Satellite Workshop , 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis
18

RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

RECOMB Satellite Workshop, 2007

Algorithms for Association Mapping of Complex Diseases With

Ancestral Recombination Graphs

Yufeng Wu

UC Davis

Page 2: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

2

Association (or LD) Mapping

• Given a subset of SNPs from unrelated individuals, find unobserved genetic variations that strongly discriminate individuals with the trait (cases) and those without the trait (controls)

• Complex Diseases: difficult to map

Page 3: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

3

Illustration (Zollner and Pritchard, Genetics, 2005)

Cases

ControlsSNP markers

1: 0011012: 1100003: 0011104: 0010005: 0000106: 1111017: 1000118: 1100019: 11001010: 10001111: 01000012: 101101

Page 4: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

4

Some Challenges in Association Mapping

1 2

Page 5: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

5

The Genealogy Approach

• “..the best information that we could possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard

• Goal: infer genealogy from marker data with recombination– Approximation (e.g. in Zollner and Pritchard)

Page 6: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

6

Ancestral Recombination Graph (ARG)

10 01 00

S1 = 00S2 = 01S3 = 10S4 = 10

MutationsS1 = 00S2 = 01S3 = 10S4 = 11

10 01 0011

Recombination

Assumption:

at most one mutation per site

1 0 0 1

1 1

Page 7: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

7

Full-ARG Approaches

• First full ARG mapping method (Minichiello and Durbin)– Use full plausible ARG, but heuristic– Less complex disease model

• Our results (Wu, 2007)– Sampling full ARGs with provable property, and work

on more complex disease model– Focus on parsimonious history

• minARGs: ARGs that use the minimum number of recombinations

• Near minimum ARGs

– Uniform sampling of minARGs

Page 8: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

8

Special Case: ARG with Only Input Sequences

• Self-derivability (SD) Problem: construct an ARG with only the input sequences

• In fact, such ARG, if exits, must be a minARG

• Runs in O(2n) time

• Heuristics to extend to non-self-derivable data

Page 9: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

9

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

N1=164

00000

01000

01100

11000

00010

11011

00011

01101

N2=76N = 164*1 + 76*2

= 316

Counting Self-derived ARGs

Page 10: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

164

00000

01000

01100

11000

00010

11011

00011

01101

76

1. Random value Rnd = 0.3 < 0.52

316

Select 11011 with prob = 164/316 = 0.52, and 01101 with prob = 76*2/316 = 0.48

2. Pick seq = 11011 as last row to derive

3. Move to reduced matrix

Page 11: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

11

ARGs Represents a Set of Marginal Trees

• Clear separation of cases/controls: NOT expected for complex diseases!

Page 12: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

12

Disease Model (Zollner & Pritchard)

Disease mutations: Poisson Process

Two alleles: wild-type and mutant

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

Page 13: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

13

Disease Penetrance (Zollner & Pritchard)

PA,1: probability of a mutant sequence becomes a casePC,1 = 1.0 - PA,1

PA,0: probability of a wild-type sequence becomes a casePC,0 = 1.0 - PA,0

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

Case

Control

Page 14: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

14

Phenotype Likelihood (Zollner and Pritchard)

• Given a tree Tx at position x and case/control phenotype of its leaves, what is the probability Pr( | Tx) of observing on Tx? (Zollner & Pritchard)

– Sum over all subset of mutated edges

• Adopted in this work

Page 15: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

15

Expected Phenotype Likelihood

• Need for assessing statistical significance.• Null model: randomly permute case/control

labels.• Our result: O(n3) algorithm for computing

expected value of phenotype likelihood.– Exact, fully deterministic method.

Page 16: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

16

Diploid Penetrance

Diploid: two sequences per individual

Diploid enetrance:

PA,00: prob. Individual with two wild-type sequences becomes a case

PA,01 : …, PA,11: …

Case

Control

Efficient computation of phenotype likelihood: stated but unresolved in Zollner and Pritchard

Our result (Wu, 2007): computing phenotype likelihood with diploid penetrance is NP-hard

Page 17: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

17

Simulation Results

Comparison: TMARG (uniform), TMARG (pathway), LATAG, MARGARITA

50 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

Uniform Pathway LATAG MARGARITA

50/5000 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

n50 n5000 LATAG MARGRITA

Page 18: RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.

18

Acknowledgement

• Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu

• I want to thank– Dan Gusfield– Dan Brown– Chuck Langley– Yun S. Song