Detecting Copy Number Variation With Short Paired Reads

Post on 31-Jan-2016

49 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Detecting Copy Number Variation With Short Paired Reads. Department of Computer Science University of Toronto Genome Informatics 2009. Paul Medvedev , Marc Fiume, Misko Dzamba, Tim Smith, Adrian Dalca, Mike Brudno. Copy Number Variants (CNVs). - PowerPoint PPT Presentation

Transcript

Detecting Copy Number Variation With

Short Paired Reads

Department of Computer Science University of Toronto

Genome Informatics 2009

Paul Medvedev, Marc Fiume, Misko Dzamba,

Tim Smith, Adrian Dalca, Mike Brudno

Copy Number Variants (CNVs)

• Large regions that appear a different number of times within different indiv.

• CNVs are associated with a number of diseases

• Input– reference human genome– sequenced donor genome

• Output– CNV annotations in ref

Previous Approach

DOC 1 2 1 0 1

Ref

Ref 1 1 1 1 1

CNV

0.8 2.3 0.5 0.5 1.7

CNV

Campbell et al 2008Chiang et al 2009Yoon et al 2009

Campbell et al 2008Chiang et al 2009Yoon et al 2009

Using depth of coverage:

Our Approach:

• Capture adjacency information about the donor genome in a graph.

• Use these adjacencies together with DOC

Donor Graph

Step 1: represent reference adjacencies

Donor Graph

Step 1: represent reference adjacencies

Donor Graph

Ref

Donor

Step 2: represent donor adjacencies

Ref

Donor

Donor Graph

Step 2: represent donor adjacencies

Which walk is the donor?

DOC

Ref

1111221Path

Ref 1 2 1 1 1 1 1

CNV

We find a path that is “most faithful” to the DOC – using probabilistic model to score “faithfulness”– use network flow to find traversal counts of walk with max score

0.8 2.3 2.6 0.5 1.4 1.7 1.1

Use depth-of-coverage:

Preliminary Results

• NA18507 individual sampled with Illumina, hg18 reference

• Total of 3730 CNV calls

• 2165 losses, 1565 gains

Size DistributionSize Distribution

58%

6%

1%

35%Just Loss

Both

Just Gain

None

Preliminary Results

After randomly shuffling our calls:

Sensitivity: Kidd et al.’s (2008) LOSS calls (141 calls)

88%

6% 0%

6%

Percentage of Kidd’s callsthat overlap one of ours:

11%

68%

9%

12%

DGV Loss

DGV Both

DGV Gain

None

Percent of our calls that overlap with DGV:

After randomly shufflingour calls:

Specificity: Database of Genomic Variants (DGV)

11%

7%

12%

70%

Conclusion

• Presented a method for detecting CNVs

• Combines – depth-of-coverage – paired-end mapping

• Improves– compared to paired-end mapping:

• Increased sensitivity in repeating regions – segmental duplications

– compared to depth-of-coverage methods:• better resolution (1Kb vs. 30Kb)

• Global optimization approach

Detecting Copy Number Variation

Paul Medvedev

Marc Fiume

Misko Dzamba

Tim Smith

Adrian Dalca

Mike Brudno

Genome Informatics 2009

top related