Sept2016 sv dnanexus_benchmarking

Post on 17-Jan-2017

339 Views

Category:

Health & Medicine

1 Downloads

Preview:

Click to see full reader

Transcript

The Global Network For Genomics™

®

Mix and Match:Assessing Structural Variation Calling with Varying Coverages and AlgorithmsAndrew CarrollHead of Science, DNAnexus

® 2

xSECUREand compliant

platformx

SHARESafely within

the global network

x

SCALEYour

analysis to any size

necessary

Global Cloud-based Platform Secure, Scalable, and Collaborative

solution for Genomics

® 5

Overview of Structural Variation

® 6

Calling with short reads is challenging

Alkan, Coe, and Eichler (2011)

• Difficult for reads to span events

• Mapping is hard in low complexity regions

• GC Bias• Rely on other

signals –• Insert size• Clipping• Read

orientation

® 7

Short-Read Structural Variant Tools

xDELLY

xCREST

xPindel

xBreakDancer

xLUMPY

xCNVnator

xManta

xBreakseq2

® 8

Calling SV with PacBio Data

® 9

Tools for PacBio DataPB Honey Sniffles Parliament

Adam English

Fritz Sedlazeck

Adam English

® 10

Apps

® 11

Benchmarks – Part IOnly PacBio Data

® 12

Creation of Multi-Technology Truth Set• SV Calls were contributed for a variety of

technologies (Illumina, PacBio, BioNano, 10X Genomics, Complete Genomics)

• Split confident call lists into deletions occurring in regions with tandem repeats and those not in regions with tandem repeats

® 13

Recall – PBHoney and Sniffles - Deletions

® 14

Complementarity at 10-Fold Coverage

® 15

Benchmarks – Part IIIllumina + PacBio Data

® 16

Parliament Pipeline

® 17

Recall – Parliament (Assembly vs PacBio)

® 18

Ensemble Strategies

® 19

Call Overlap at 10-Fold Coverage

® 20

Full Combination Ensemble Strategies

® 21

Conclusions1. With Illumina data at 30-fold coverage, SV calling can

be effective at PacBio data coverages as low as 3–5 fold

2. More PacBio data coverage seems to be always better over investigated ranges (mostly thanks to PBHoney)

3. With only PacBio data, 10–15 fold gives good SV calling results with reasonable sequencing investment

4. Running both Sniffles and PBHoney gives best results, especially at lower (5–15 fold) PacBio data coverages

® 22

Thank-you!Genome in a BottleJustin Zook

Baylor College of MedicineAdam EnglishWill Salerno

Schatz LabFritz Sedlazeck

DNAnexusAndrew CarrollSinger MaBrett HanniganYih-Chii HwangMarcus KinsellaAbhiram DasSamantha Zarate

® 23

QUESTIONS?CONTACT ME:

Andrew Carroll, PhD

acarroll@dnanexus.com

top related