Clinical grade next- generation sequencing of UM Elisha Roberson, Ph.D. Depts. of Internal Medicine and Genetics Washington University in St. Louis
Feb 23, 2016
Clinical grade next-generation sequencing of UM
Elisha Roberson, Ph.D.Depts. of Internal Medicine and Genetics
Washington University in St. Louis
All talk content can be tweeted and blogged
@thatdnaguy
Sanger ≠ next-gen sequencingSanger
• Consensus of a population of molecules• ~600-800bp sequence• Very low error rate• Low-throughput• Targeted• Expensive* (256 bp/$)
NGS• Single-molecules• 35 bp – 3+ kb•High error rates (1-15+%)• High-throughput• Can shotgun• Cheap* (11.5 Mbp/$)
*Our lab’s current Sanger & HiSeq2500 costs
UM Clinical sequencing• CLIA/CAP– Detect actionable somatic variants• Insurance pre-approval• Submit DNA• Wait a long time• Pay $7000+ / sample
• Research (discovery, epidemiology, etc)– IRB approved DNA collection– You sequence and interpret
My sequencing wishlist• Tissue
– Fresh tissue– Large amount of high-quality DNA
• No PCR amplification
• Sequencing (whole genome)– Low-error, long reads– Paired-end– 30X or greater germline coverage– 60X or greater tumor coverage
• Bioinformatics– De novo assembly of both germline and tumor, compare to
reference & each other– Genotyping algorithm that is pair-aware
NGS technologies
• ABI – SOLiD– Emulsion PCR by di-base ligation
• Illumina – Solexa– Single-molecule fluor sequencing
• Life Tech - Ion Torrent– Single-molecule semiconductor sequencing
• PacBio – SMRT– Zero-mode wave guide with single polymerase in-well
Where should I get DNA???
• Germline–NO blood• Sequencing single molecules!!!
– Spit kits, washed skin biopsies
• Tumor tissue– Fresh primary tumor– Fresh metastatic tumor– FFPE?
Calculating sequencing depth
• NGS technology dependent• How deeply do you want to see mosaicism?
• General guidelines– 30X germline– 60X or higher tumor
• BUT sequencing follows Poisson distribution– i.e. 30X average coverage != all targets 30X
% Target covered ≥30X at different average coverage
Coverage also varies by tissue*
*Plot available on Figshare
FASTQ Preprocessing• Demultiplex samples
– Discard no index
• Convert to PHRED quality scale– -10 x log10( probability of base error )
• Remove adapter contamination– cutadapt
• Trim low-quality trailing bases and 3’ Ns– No 5’ trimming!!!
• Run FASTQC!
Alignment to reference
• Current human is GRCh37
• Repeat masking– Hard mask repeats are N– Soft mask repeats are lowercase– Prefer soft masking
• Ref aligners have generally low memory use– Mostly use Burrows-Wheeler transform– bowtie 1 & 2– bwa-aln & bwa-mem– Novoalign (high memory)
De novo assembly• Most De Bruijn graphs with kmers of
sequence
• Mostly very high memory usage– Depends on depth and number of kmers– Try running diginorm first (C. Titus Brown)
• Aligners– ABySS– MIRA– SOAPdenovo– Velvet
Post-processing• Picard tools– Convert to BAM format– Add read-group tags–Mark duplicates (Picard tools)
• Genome Analysis ToolKit (GATK)– Local realignment of indels– Base quality score recalibration
Genotyping• General– Genome analysis toolkit• Unified Genotyper
– Samtools• Mpileup
• Somatic specific–MuTect– Somatic Sniper– VirMiD
Variant filtering strategies –sequential evolution
Initiating event
Mosaic primary
Mutation with metastatic advantageleaves the eye
Metastasis has primary mutations &Metastatic mutation (maybe in primary?) &New mutations
Variant filtering strategies –parallel evolution
Initiating event
Mosaic primary
Metastasis has few primary mutations &Metastatic mutation (not in primary) &New mutations
Very little overlapin mutations between
primary and met!
Variant confirmation
• Sanger sequencing
• Fluidigm
• TaqMan– castPCR
• Sequenom
• Illumina Golden Gate
Discover in a focused set with sequencingType with these technologies in everything
Many thanks to