Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre
Genome-widecopy-numbercalling(CNAsnotCNVs!)
Dr GeoffMacintyre
Structuralvariation(SVs)
Deletion
Duplication
Inversion
Translocation
ABC
C
ABABC
BAC
Copy-numbervariations
Balancedrearrangements
Causes• Replicationerrors• Retrotransposition• Repairerrors• Recombinationerrors
Whyiscopy-numberimportant?
Ciriello et al (2015). Nature Genetics
Thedata
Genome-wideSNPallelefrequencies
SomemeasureoftheamountofDNAforagivenlocus(e.g.sequencingdepth)
ArraybaseddetectionofSNPs(hybridisation)
SNPcallingusingaffymetrix arrays
NucleicAcidsRes.2006;34(14):e1002006doi: 10.1093/nar/gkl475
Furtherreadingongenotyping
• Birdseed:http://www.nature.com/ng/journal/v40/n10/full/ng.237.html• CRLMM:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3329223/• HaplotypeCaller andUnifiedGenotyper:http://www.nature.com/ng/journal/v43/n5/full/ng.806.html• Varscan:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734323/• GWASprimer:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4181332/
Basicworkflow
Quantifysignal• depth/intensity• B-allele
Segmentgenome• HiddenMarkovmodel• Smoothingmethods
Callcopy-numberchanges• Threshold• Cluster• Probabilisticmodels
Thedata:logR
Sequencing:log2(depth)
Array:R(θ)subject= normalised intensityof
probesfromsample
R(θ)expected= normalised intensityofprobesfromcontrol
logR =log2(R(θ)observed/R(θ)expected)
Quantifysignal
Segmentgenome
Callcopy-number
logR ofHCC1143cell-lineusingaffy SNP6
logR/depthnormalisation
• Different proportions of GC in each region can produce a bias in the read depth (wave artifact)• We can fit a loess model and remove the effect.
35 40 45 50 55 60
0.5
1.0
1.5
2.0
LS041 Sample (CNAnorm package)
GC content
Cop
y N
umbe
r rat
io
Quantifysignal
Segmentgenome
Callcopy-number
Thedata:B-allelefrequency
θA= intensityofprobeforalleleA
BAF=θA/(θA +θB)
AA
AB
BB
AA
BB
AAB
ABB
Quantifysignal
Segmentgenome
Callcopy-number
BAFbanding
• 1 band:– Backgroundnoise(0copies).
• 2 bands: – {A,B},{AA,BB},or{AAA,BBB},…Copynumbers(0,j).
• 3 bands:– {AA,AB,BB}or{AAAA,AABB,BBBB},...Copynumbers(i,j=i)
• 4 bands: – {AAA,ABB,AAB,BBB}or{AAAA,ABBB,AAAB,BBBB}or{AAAAA,ABBBB,AAAAB,BBBBB},…Copynumbers(i,j)/i <j
Quantifysignal
Segmentgenome
Callcopy-number
BAFofHCC1143cell-lineusingaffy SNP6
Segmentation:CircularbinarysegmentationOlshen etal.,2004.• Itcanbeusedwitharrayandsequencingdata• Findschangepointsusingat-testunderapermutationmodel.• BioconductorpackageDNAcopy.
https://academic.oup.com/biostatistics/article/5/4/557/275197/Circular-binary-segmentation-for-the-analysis-of?searchresult=1
Quantifysignal
Segmentgenome
Callcopy-number
Segmentation:Hiddenmarkov modelsQuantifysignal
Segmentgenome
Callcopy-number
Copy-numbercalling:thresholdbased
Individual thresholds based on the variability of each sample:
Quantifysignal
Segmentgenome
Callcopy-number
Copy-numbercalling:clusterbased
van de Wiel et al., 2007 (CGHCall Bioconductor package).
• The segmented means come from a mixture of six normal
populations.
• The model is fit by EM algorithm.
• Classification reduced to 3 or 4 states. (Usually loss, gain, normal)
http://bioconductor.org/packages/release/bioc/html/CGHcall.html
Quantifysignal
Segmentgenome
Callcopy-number
Relativecopy-numberprofile(ovariancancer)
Method:QDNAseq
Scheinin I et al., 2014 (QDNAseqBioconductor package).
• Divides genome into bins of equal size.
• Normalisation based on blacklisted
regions, GC content,....
• Segmentation with DNAcopy.
• Optional calling with CGHcall.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248318/
Exercise1
Problems:purityandheterogeneity
Whataretheeffectsonrelativecopy-number?
Absolutecopy-numberprofile(ovariancancer)
Method:Allele-SpecificCopynumberAnalysisofTumours (ASCAT)
constanttumour fraction
ploidy
B-allelecopy-number
A-allelecopy-number
Furtherreadingoncopy-number
• MethodsforCNdetection(arraydata):https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2697494/
• ToolsforCNdetection(sequencedata):http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S11-S1• PennCNV,apackageforCNVcalling:http://penncnv.openbioinformatics.org/en/latest/• LargescaleanalysisofCNAsincancer:http://www.nature.com/ng/journal/v45/n10/full/ng.2760.html
Exercise2