Vertebrate Resequencing Informatics 22 nd March, 2011 Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly Kim Wong/Thomas Keane Vertebrate Resequencing Informatics http://svmerge.sourceforge.net
21
Embed
Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Large DNA rearrangements (>100bp) Frequent causes of disease Referred to as genomic disorders Mendelian diseases or complex traits such as behaviors
E.g. increase in gene dosage due to increase in copy number Prevalent in cancer genomes
Many types of genomic structural variation (SV) Insertions, deletions, copy number changes, inversions,
translocations & complex events Comparative genomic hybridization (CGH) traditionally used to for copy number discovery CNVs of 1–50 kb in size have been under-ascertained
Next-gen sequencing revolutionised field of SV discovery Parallel sequencing of ends of large numbers of DNA fragments Examine alignment distance of reads to discover presence of
Initially developed for mouse genomes project Several software packages currently available to discover SVs
Various approaches using information from anomalously mapped read pairs OR read depth analysis No single SV caller is able to detect the full range of structural variants Paired-end mapping information, for example, cannot detect SVs where the
read pairs do not flank the SV breakpoints Insertion calls made using the split-mapping approach are also size-limited
because the whole insertion breakpoint must be contained within a read Read-depth approaches can identify copy number changes without the need
for read-pair support, but cannot find copy number neutral events SVMerge, a meta SV calling pipeline, which makes SV predictions with a collection of SV callers Input is a BAM file per sample Run callers individually + outputs sanitized into standard BED format SV calls merged, and computationally validated using local de novo assembly Primarily a SV discovery/calling + validation tool
Compared the overlap of the deletion, gain, and inversion calls against the curated Database of Genomic Variants Overlapped with calls in DGV at a rate significantly higher than
expected by random chance Deletions in DGV: 71% (NA18506), 81% (NA18507), and 71%
(NA18508) Copy number gains in DGV: 29% (NA18506), 32% (NA18507),
and 36% (NA18508) Inversions in DGV: 47% (NA18506), 69% (NA18507), and 51%
(NA18508) Child calls not in DGV also called in the parents Further 18% deletions, 32% inversions, 54% duplications Estimated max. false positive rate of 11%, 21%, and 17%
All child-only SV calls comprise 11% of the child's final SV call Considerable improvement from 'merged raw’ (50% unique)
SVMerge primarily a discovery and validation tool Extensible pipeline so that calls from any method to be easily
incorporated Developed primarily for mouse genomes project Successfully applied to human trio dataset Computationally validation approach reduces false positives
Complex SVs Cataloging repeating combinations of multiple SV events in small
loci 2011 development Low coverage cross-population SV discovery Genotyping existing SVs in new samples Better support for heterozygous calls Integration of SVMerge into Vert. Reseq. pipeline for UK10K