Top Banner
GIAB Workshop Len Trigg & Sean Irvine
15

Sept2016 smallvar rtg

Jan 17, 2017

Download

Health & Medicine

GenomeInABottle
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sept2016 smallvar rtg

GIAB WorkshopLen Trigg & Sean Irvine

Page 2: Sept2016 smallvar rtg

Phasing NA12878 by segregation in children

Page 3: Sept2016 smallvar rtg

Phasing NA12878 by segregation in children

● Joint calling of 17 member CEPH pedigree.● Benefits:

○ High Mendelian consistency across all members.○ (Near) full phasing of NA12878 (and NA12877) according

to segregation in the 11 children.● Latest run incorporates 300x Illumina reads for NA12878

RM8398 sample (other members ~30x).

● Calls that segregate well are more likely to be correct.● Could look at phasing inconsistent calls in more detail.

○ Structural variants○ Somatic variants

Page 4: Sept2016 smallvar rtg

Concordance of NA12878 with GIAB 3.2.2

Page 5: Sept2016 smallvar rtg
Page 6: Sept2016 smallvar rtg

NA24385, RTG on 10X Genomics Chromium

Page 7: Sept2016 smallvar rtg
Page 8: Sept2016 smallvar rtg

Unifying call sets

Different callers, different representations.Different samples, different representations.

Given some number of call sets, represent the calls in as consistent manner as possible.

● Incrementally accumulate alleles from call sets.● Recode call sets using accumulated alleles.● Harmonization rather than Canonicalization (chosen

representation comes from within rather than externally specified).

Page 9: Sept2016 smallvar rtg

Example: chr20, NA12878

Page 10: Sept2016 smallvar rtg

Example: Harmonization of AJ trio

Page 11: Sept2016 smallvar rtg

Example from v3.3 AJ trio

3 non-Mendelian calls become consistent on recoding.12 original alleles recoded into 6 alleles.

Original child mother father1:73974514 GAACCC G . 0|1 .1:73974515 A T 0/1 . .1:73974516 ACCC A 0/1 . .1:73974520 TC T . 0|1 .1:73974521 CATA C 0/1 . .1:73974524 A C . 0|1 .

Recoded1:73974515 A T 0/1 0/1 .1:73974516 ACCC A 0/1 0/1 .1:73974521 CATA C 0/1 0/1 .

Page 12: Sept2016 smallvar rtg

Notes and Limitations

● Recoding loses existing annotations. Could recover in simple cases, but not clear what to do when calls are moved, split, or combined as a result of the recoding.

● If a new call set needs to be added, can incrementally accumulate new sample, but existing ones will need to be recoded.

● Final result is dependent on the order in which call sets are accumulated.

● Minimizes number of alleles (can in rare cases introduce Mendelian violations).

Page 13: Sept2016 smallvar rtg
Page 14: Sept2016 smallvar rtg

Phase Transfer

Another mode of operation for vcfeval. The phasing in one call set can be lifted over to another call set without losing annotations or changing the representation of calls.

v3.3 HG002/NA243859.7%

RTG AJ trio 300x88.1%

phase-transferred90.2%

chr20 NA12878 GATK0%

RTG CEPH SP 37.7.099.9%

89.0%

Illumina PG 8.0.199.9%

phase-transferred90.8%

Page 15: Sept2016 smallvar rtg

Phase Transfer

During normal operation vcfeval ignores phasing information and tries each allele on each haplotype.

During phase transfer vcfeval will obey the phasing of one (or both) of the samples. Effectively restricts the matches that can be made. Ideally want at least one sample to be fully phased.

A special output mode is used to report the phasing found during the matching. Apart from the phasing, the calls are not changed and all the original annotations are retained.