Top Banner
Nuts and Bolts of Next Genera1on Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor, PMI 11/09/2017
39

Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Apr 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

NutsandBoltsofNextGenera1onSequencingAnalysisatACCRE

ThomasStrickerMD/PhDAssistantProfessor,PMI

11/09/2017

Page 2: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,
Page 3: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

NextGenera1onSequencingIllumina Seqeuncing Technology

DNA–thegene+ccode• DNAisadoublestrandedpolymerof4bases(A,T,C,G)

• Theorder(sequence)ofA,T,C,Gisthegene1ccode

• AalwayspairswithTontheoppositestrand,andCalwayspairswithG

• EnzymescalledpolymerasesmakecopiesofDNAbytakingasinglestrandofDNA,andthenaddingA,T,C,Gaccordingtothebase-pairingrules

Sanger(modbyLeeHood)• Sequencingbysynthesis

• MixmanycopiesofthesameDNAmolecule,polymerase,ATCGs,andasmallamountofflourescentlylabeledATCGthatareterminated

• Terminatedbasesstopextension

• Separatebasedonsize

Page 4: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

1.  In vitro amplification, ‘cloning’

2.  Flow cell based sequencing by synthesis

3.  A draft of the human genome

Illumina Seqeuncing Technology

WhatHappened?

Page 5: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

TermsandDefini1ons

•  Singleorpairedend•  Lane•  Barcodeorindexes

•  Library–Thebaseunitofprepara1on.PoolofDNAmoleculesthatareseq.

•  Sample–Asingleindividual

Page 6: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

TermsandDefini1ons•  FASTA=sequencefile•  Fastq=sequence+qualityfile•  BAM=alignmentfileinbinary•  VCF=variantcallfile•  Index=allowsrapidlookupoffasta,bam,vcf•  MAF=muta1onannota1onfile•  GTF/GFF=informa1onaboutgenestructure•  Intervallist=likeBED,uniquePicard(usepicardBedteoIntercalto

generate)•  BED=

Page 7: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Contents of the human genome [CC-BY-SA-3.0] Steve Cook

SequencingandGenomeSize

S. cerevisiae = 12 million bp C. elegans = 100 million bp Drosophila m. = 130 million bp D. rerio = 1.4 billion bp M. musculus = 2.1 billion bp H. sapiens = 3 billion bp

Page 8: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Genomic DNA

Pros: Large capture space Copy Number and Translocations Variants outside baits might be sampled Cons: Larger amount of DNA necessary (>250ng) Sequencing inefficient Library builds slow, add to turnaround time

TargetEnrichment–HybridCaptureApproach

Page 9: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

ALK APC BRAF BRCA1 BRCA2

ERBB2 EGFR FLT3 HRAS IDH1 IDH2 JAK2 KIT KRAS MET MPL MTOR MYC

NF1 NRAS PDGFRA PIK3CA PTEN PTPN11 TP53

Pros: Quick, efficient, low sample amounts (10ng) Cons: Number of genes limited Uneven PCR = uneven coverage No copy number or translocations Variants outside amplicon not sampled

TargetEnrichment–AmpliconApproach

Page 10: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

RNAseq

•  Splice aware aligner build splice junctions on the fly

•  Isoform specific quantification •  Ab initio gene/isoform discovery •  Fusion genes •  Genotype (SNVs only)

Gene A GeneB Exon A Exon B Exon A Exon C

Exon B

Page 11: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

(ChIP-seq)• Methodofiden1fyinggenome-widebindingsitesforapar1cularTF,bypurifyingTF-boundDNAandsequencingittolocategenomicTFbindingregions• Canbeusedonbothendogenousandexogenousproteins• Endogenousdetec1ondependsonreliablean1bodyforIP,althoughtaggedproteinscanbeusedforthesamepurpose

Page 12: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Contents of the human genome [CC-BY-SA-3.0] Steve Cook

SequencingandGenomeSize

S. cerevisiae = 12 million bp C. elegans = 100 million bp Drosophila m. = 130 million bp D. rerio = 1.4 billion bp M. musculus = 2.1 billion bp H. sapiens = 3 billion bp

Page 13: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

GenomeandGeneVersions

•  Hg19/GRCh37orGRCh38•  Refseq,Genbank,Ensembl…

Page 14: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,
Page 15: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,
Page 16: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

FASTQfilesandQC

Page 17: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Pre-processingSteps

•  1)QCrawdatawithfastqc•  2)Trimtoremoveadaptors/lowqualitysequencing

•  3)QCclippeddatawithfastqc•  4)Summarizeoutputwithmul1qc

Page 18: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

FASTQandQC

•  FASTQC•  /home/strickt2/scripts/qc/fastqc.pl

•  GotoFASTQCandMul1QCsummaries

Page 19: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

RNAseqExpressionAnalysis

•  Splice-awarealignment–bamfile–  Tophat2,STAR,RSEM

•  PicardandRSeQCalignmentQC•  SummarizeQC•  Assignreadstotranscripts,producecountfiles

– HtSeq,featurecounts,cufflinks•  Collectintomatrix•  Differen1alExpressionwithDESeq2,limma/voom,EdgeR,others,mostlyinR

Page 20: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Splice-awarealignment

•  STAR--2step–  /home/strickt2/scripts/rnaseq/star.step1.pl

•  BamFile–  /home/strickt2/SEQanalysis/samtools/bin/samtoolsview-H/data/strickt2/578/clip/578-AG-49.clip/Aligned.sortedByCoord.out.bam

•  SummarizedmetricQC•  Featurecounts

–  /data/strickt2/578/clip/new_count/final.transcript.table.txt

Page 21: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Genotyping--Germline

Page 22: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

IGV – Genotyping

Page 23: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

AddReadGroups

•  ReadGroups=setofreadsgeneratedfromasinglesequencingrun

•  ID=flowcellID+Lane•  PU=plahormunit=flowcellID+Lane+SampleID

•  PL=Illumina,IonTorrent,etc.•  LB=LibraryID

Page 24: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

/home/strickt2/scripts/rnaseq_genotyping/final.stricker.merge.rg.mdup.pl

Page 25: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Genotyping--Germline

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

Page 26: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

HaplotypeCaller/JointCaller

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

Page 27: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

VCF–variantcallfile•  /data/strickt2/3102/3102-CLA-67.3102.split.vcf

Page 28: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

FilterGenotypes•  Recalibrate(VQSR)orhard-filter

–  Recalibra1onisrecommendedforgermlinehumanvariantsfor30ormoreexomes,ormodelorganismsforwhichtheappropriatetrainingsetsexist

•  Recalibra1onhastoberuntwice–onceforSNPsandonceforpolymorphisms

•  Trainamodelofvariantsta1s1cs(QD(qualityscoreoverdepth),MQ(RMSofmappingquality),SB(strandbias),etc.)forknownvariants

•  Usethatmodeltodeterminetheprobabilitythatothervariantsaretrue.

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

Page 29: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

VQSR–Recalibrate

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

SNPs

Indels

Page 30: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

VQSR--Recalibrate

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

SNP Indel

Page 31: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

HardFilteringhips://sokware.broadins1tute.org/gatk/best-prac1ces/

Page 32: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

HardFiltering•  QD=QualByDepth=Variantconfidence/Depth•  FS=FisherStrand=Phred-scaledp-valuefromFETstrandbias•  MQ=RMSMappingQuality•  MQRankSum=MappingQualityRankSumTest=CompareMQforreadswithrefvs.

readswithaltviaMann-Whitney•  ReadPosRankSum=ReadPosRankSumTest=Mann_Whitneyfordistanceofvariant

fromendofread.Variantsonlyseenatendofreadslikelyerrors.•  SOR=StrandOddsRa1o=strandbias.

•  READ:•  hips://sokware.broadins1tute.org/gatk/documenta1on/ar1cle.php?id=3225

•  hips://sokware.broadins1tute.org/gatk/documenta1on/ar1cle.php?id=6925and

hips://sokware.broadins1tute.org/gatk/best-prac1ces/

Page 33: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Annota1ons

•  Annovar,VEP,others•  Availabledataresources

– Humanvaria1on(dbSNP,ExAC,gnomAD,EVS,etc.)

– Clinicalsignificance– ClinSig,CancerHotspots,others

– Measuresofconserva1onordeleteriousness•  GERP,CADD,SIFT,Polyphen,Provean,etc

•  Annota1ontablesandRadarplots

Page 34: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Somatic Genotyping –Tumor-Normal Contamination

Page 35: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Tumor-Normal Contamination

Framptonet.al.hips://www.nature.com/ar1cles/nbt.2696.

Page 36: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

Soma1cMuta1onCalling•  Tumor-NormalPairs

Page 37: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

OtherAnalyses

•  ChIPseq/ATAC-seq/etc.•  Methyla1on/Bisulfitesequencing•  FusionIden1fica1on•  Noveltranscript/isoformiden1fica1on•  CopyNumberAltera1ons

– BothGermlineandSoma1c

•  Allele-Specificexpression

Page 38: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

FusionIden1fica1onCCL14:PCGF2USP22:PTPRDNR2C2:RFTN1ESR1:AKAP12ESR1:C6orf211PAK1:RP11-807H22.7ESR1:C6orf97THSD4:LRRC49FOXK2:ZBTB40CES7:FANCD2ESR1:C6orf97PBRM1:NKIRAS1PTPRN2:CYP3A5SPOP:SWAP70DNAJA3:PTPN2NR3C2:FSTL5OSBPL2:CDH4ROR1:LRP8SSH1:FOXN4NRIP1:AF127936.7NCOR1:WDR16CCL4L2:CCL4ERLIN1:PTENFAM188A:PIP4K2APBX1:RP11-705O24.1STK32B:CLEC16AMLL3:ANKRD36

A B

Gene A GeneB Exon A Exon B Exon A Exon C

Exon B

Page 39: Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE · 2017-11-09 · Nuts and Bolts of Next Generaon Sequencing Analysis at ACCRE Thomas Stricker MD/PhD Assistant Professor,

TheFuture

•  SingleCellRNAseq– ddSEQ,10XChromium(JeffRathmell),InDrop(LauLab),andSmartSeq2(Mallal)

•  Synthe1cLongReads– haplotypingandcopynumber/structuralvariantanalysis

•  Spark-enabledGATK4•  Costeffec1veexomeandwholegenomesequencing=datadeluge