Single Tumor-Normal Pair Parent-Specific Copy Number Analysis Henrik Bengtsson Department of Epidemiology & Biostatistics, UCSF with: Pierre Neuvial, Berkeley/CNRS Adam Olshen, UCSF Richard Olshen, Stanford Venkatraman Seshan, MSKCC Terry Speed, Berkeley/WEHI Paul Spellman, LBNL/OHSU NORMAL REGION GAIN COPY-NEUTRAL LOH
43
Embed
Single Tumor-Normal Pair Parent-Specific Copy Number …helper.ipam.ucla.edu/publications/genws2/genws2_10184.pdfIllumina’s “luster Regression” ... TumorBoost: Normalization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Single Tumor-Normal Pair Parent-Specific Copy Number Analysis Henrik Bengtsson
Department of Epidemiology & Biostatistics, UCSF
with: Pierre Neuvial, Berkeley/CNRS
Adam Olshen, UCSF
Richard Olshen, Stanford
Venkatraman Seshan, MSKCC
Terry Speed, Berkeley/WEHI
Paul Spellman, LBNL/OHSU
NORMAL REGION
GAIN COPY-NEUTRAL LOH
“This presentation has been modified from its original version...“ The content of the slides was formatted to fit the upper 3/4 of the screen at IPAM, so that also the audience in the back would be able to see all of it.
Paired PSCBS
-- H Bengtsson, P Neuvial, TP Speed, TumorBoost: Normalization of allele-specific tumor copy numbers from one single tumor-normal pair of genotyping microarrays, BMC Bioinformatics 2010. -- AB Olshen, H Bengtsson, P Neuvial, PT Spellman, RA Olshen, VE Seshan, Parent-specific copy number in paired tumor-normal studies using circular binary segmentation, Bioinformatics 2011.
Parent-specific copy numbers from a single tumor-normal pair of SNP arrays
1. Tumor-normal pair 2. Genotype normal 3. Normalize tumor using normal 4. Segment tumor CNs in two steps 5. Estimate PSCNs within segments 6. Call segments
B
B
A
B
B
A
A
A
AB
BB
AB
AA
Single nucleotide polymorphism
10-20 million known SNPs
Genotypes are observed at single loci
♂ ♀
Genotypes and total copy numbers reflect the parent-specific copy numbers
B
B
A
B
B
A
A
A
AB
BB
AB
AA
Matched Normal (diploid)
BB
B
A
B
B
A
AA
A
AAB
BBB
AB
AA
Tumor with gain
(C1,C2): (1,2) (1,1)
* Occam's razor: Minimal number of events has occurred.
-
-
A
BB
BB
A
-
A
BB
BB
A
AA
Tumor with deletion &
copy-neutral LOH (C1,C2): (0,2) (0,1) (1,1)
SNP microarrays quantify total and allele-specific copy numbers Chip Design:
DNA
Probes CGTGTAATTGAACC
||||||||||||||
GCACATTAACTTGG
CCCCGTAAAGTACT
TATGCCGCCCTGCG
||||||||||||||
ATACGGCGGGACGC
GCACATCAACTTGG
||||||||||||||
CGTGTAGTTGAACC
T/C Sample DNA:
+
Together the SNPs of a region indicate the parent-specific copy numbers
NORMAL (1,1)
(1 individual, many SNPs, 2 different regions)
BB
AA
AB
GAIN (1,2)
CB
CA
BBB
ABB
AAB
AAA
CB
CA
Total CN: C = CA+CB
Total CNs and allele B fractions are easier to work with than ASCNs
AA AB BB AAA AAB ABB BBB
Total CN: C = CA+CB BAF: β = CB / C NORMAL (1,1) GAIN (1,2)
0 1/2 1 0 1/3 2/3 1
C
C
BAF BAF
Total CNs and BAFs reflect the underlying parent-specific CNs
NORMAL (1,1) GAIN (1,2) COPY-NEUTRAL LOH (0,2)
Total CN: C = CA + CB
← CN=2
Allele B Fraction: β = CB / C
← 100% B:s
← 0% B:s
← 50% B:s
← CN=3
Matched tumor-normals
- With a matched normal it is easier! …because we can genotype the normal and find the heterozygous SNPs... - Also, much greater SNRs
← BB ← AB ← AA
1. Genotypes (AA,AB,BB) from BAFs of a matched normal
(all loci)
2a. Total CNs C = CA + CB
Heterozygous SNPs (not homozygous) are informative for PSCNs
(SNPs only)
2b. Tumor BAFs β = CB / C
(hets only)
3. Decrease in Heterozygosity ρ = 2*| β - 1/2| ; hets only
Total CNs C = CA + CB
Decrease in Heterozygosity ρ = 2*| β - 1/2| ; hets only
Total CNs & DHs segmentation gives us PSCN regions and estimates
Per-segment PSCNs (C1,C2): C1 = 1/2 * (1- ρ) * C C2 = C - C1
avg(all loci) * avg(hets only)
NORMAL (1,1) GAIN (1,2) CN-LOH (0,2)
(i) Find change points (ii) Estimate mean levels
avg(hets only)
avg(all loci)
It is hard to infer PSCNs reliably when signals are noisy
Actual data:
←
? Segmentation may fail…
Let’s improve this...
CalMaTe
M Ortiz-Estevez, A. Aramburu, H. Bengtsson, P. Neuvial, & A. Rubio. A calibration method to improve allele-specific copy number estimates from SNP microarrays (submitted).
Better allele-specific copy numbers in tumors without matched normals by borrowing across many samples Features: • Multiple (> 30) samples. • Any SNP microarray platform. • Bounded memory usage (< 1GB of RAM) More: http://www.aroma-project.org/
The noise is due to SNP-specific effects that we can estimate and remove
Example: (CA,CB) for 310 samples per SNP: Systematic effects…
²
²
²
…are SNP specific!
²
²
²
SNP #1072 SNP #1053
Allele B fractions (BAFs): The bias is greater than the noise
SNP #1053
Example: (CA,CB) for 310 samples per SNP. TCN: between 2 arrays. BAF: within array.
²
²
²
Multi-sample model: (one per SNP) Fit affine transform across samples
CalMaTe
CalMaTe Multi-sample method for each SNP separately: Non-negative Matrix Factorization (NMF). Robustified against outliers (e.g. tumors). Special cases: Only one or two genotype groups.
Related ideas: Illumina’s “Cluster Regression” CRLMM CNs (*RLMM, …) …
Improved SNR of BAFs (and total CNs) when removing SNP-specific variation
Estimate & Backtransform Repeat for all 1,000,000 SNPs
before after
!
The above is the chromosomal plot for one sample of the 310 samples.
TumorBoost
H. Bengtsson, P. Neuvial, T.P. Speed TumorBoost: Normalization of allele-specific tumor copy numbers from one single tumor-normal pair of genotyping microarrays, BMC Bioinformatics, 2010.
Better allele-specific copy numbers in tumors with matched normals Requirements: • Matched tumor-normal pairs. • A single pair is enough. • Any SNP microarray platform. • Bounded memory usage (< 1GB of RAM) More: http://www.aroma-project.org/
The tumor “should be” close to its normal
² ²
When we have only a single tumor-normal pair: (i) Normal should be at e.g. (1,1) …so lets move it there! (ii) Adjust the tumor in a “similar” direction.
One SNP, a tumor-normal pair
CA
CB
² ²
CA
CB
Tumor- Boost
CB
NORMAL REGION
βT
βN
C
The tumor “should be” close to the normal; - data strongly agree!
For each genotype: Cor(βT, βN) ≈ 1
βT
βN
AA BB AB
A shared SNP effect: systematic variation
(βN, βT)
(βN,TRUE, βT,TBN)
δ
δ* βT,TBN
βN
The SNP effect can be estimated & removed for each SNP independently! βT
βN
AA BB AB
Observed: Allele B fractions βN [0,1] βT [0,1] Genotype calls (AA,AB,BB): βN,TRUE {0, 0.5, 1} Estimate from normal: SNP effect δ = βN - βN,TRUE
Remove from tumor: βT,TBN = βT – δ*
0 0.5 1
βT βT,TBN
δ*
2. Remove SNP effect from the tumor
3. Repeat for all SNPs.
1. Estimate SNP effect in the normal and its genotypes
βN,TRUE βN
0 0.5 1 AA AB BB
δ
TumorBoost removes the SNP effects from the tumor (only)
Before:
After:
NORMAL (1,1) GAIN (1,2) CN-LOH (0,2)
Even with a single tumor-normal pair, we can greatly improve the SNR
² ² ² ²
before after
! Estimate & Backtransform Repeat for all 1,000,000 SNPs
TumorBoost => more distinct (CA,CB) - key for PSCN segmentation
Original:
TumorBoost: - single-pair - tumor-normals - normal is not corrected
NORMAL (1,1) CN-LOH (0,2)
CalMaTe: - multi-sample
GAIN (1,2)
Original
Original
TumorBoost
CalMaTe
TumorBoost and CalMaTe significantly improve power to detect change points ! DH
DH
DH
CalMaTe (multi-sample)
TumorBoost (single pair)
Assessment: 1 sample, 1 change point
Paired PSCBS Parent-specific copy numbers from a single tumor-normal pair of SNP arrays
1. Tumor-normal pair 2. Genotype normal 3. Normalize tumor using normal 4. CBS segment tumor: (a) TCN, then (b) DH 5. Estimate PSCNs within segments 6. Call segments
Total CNs C = CA + CB
Decrease in Heterozygosity ρ = 2*| β - 1/2| ; hets only
Total CNs & DHs segmentation gives us PSCN regions and estimates
Per-segment PSCNs (C1,C2): C1 = 1/2 * (1- ρ) * C C2 = C - C1
avg(all loci) * avg(hets only)
NORMAL (1,1) GAIN (1,2) CN-LOH (0,2)
(i) Find change points (ii) Estimate mean levels
avg(hets only)
avg(all loci)
Calling allelic balance:
• Null: C1 = C2 (equivalent to DH = 0)
• DH is estimated with bias near 0, so we need offset ΔAB in test.
• Reject null if α:th percentile of bootstrap-estimated DH - ΔAB > 0.
• How do we choose ΔAB?
Calling LOH:
• Null: C1 > 0 (“not in LOH”)
• C1 is estimated with bias due to background (e.g. normal contamination), so we need offset ΔLOH in test.
• Reject null if (1-α):th percentile of bootstrap-estimated C1 - ΔLOH < 0.
• How do we choose ΔLOH?
Calling allelic balance and LOH
Results
PSCBS works with any SNP array - similar results on Affymetrix and Illumina
Affymetrix GenomeWideSNP_6
Illumina HumanHap550
!
Paired BAF (Staaf et al., 2008) is a paired.
Algorithm:
1. Genotype normal sample
2. Drop homozygote SNPs
3. Segment “mirrored BAF” (like DH)
4. Estimate parent-specific copy numbers
Other methods exists e.g. Paired BAF segmentation
Paired PSCBS performs very well compared to other PSCN methods
Assessment of calls:
- Staaf simulated data set. - Known regions. - Different amount of
normal contamination. - Keep FP rates at 0.0%. - TP rate of calls.
Normalization of ASCNs: • Single tumor-normal pair: TumorBoost [aroma.light, aroma.cn] • Multiple samples: CalMaTe [CalMaTe]
PSCN segmentation: • Single tumor-normal pair: Paired PSCBS [PSCBS] • No matched normals: <we’re working on it>
Everything is bounded in memory (< 1GB of RAM)
Methods are available (www.aroma-project.org)
Conclusions Paired PSCBS w/ TumorBoost: • High quality tumor PSCNs • Single tumor-normal pair • No external references needed • Any SNP microarray technology • Algorithms is fast and bounded in memory
Future: • Non-paired PSCBS • Calibration of PSCN states (e.g. clonality & ploidy)…
Next: We need to calibrate (C1,C2) before calling! (ongoing work with Pierre Neuvial)
After (1,2) (0,2) (2,2) (2,2)
clonality!
Before
? ? ? ? ?
Extra slides
The power to detect a change point varies with type of change!
NORMAL REGION GAIN
Decrease in Heterozygosity
Original
TumorBoost
Total CN
DH
DH
Total CNs NORMAL REGION GAIN
TCN
The reason why Illumina is “better” is because they do this calibration - Affymetrix does not.
Illumina (Human1M-Duo):
Affymetrix (GenomeWideSNP_6):
Illumina and Affymetrix have similar noise levels after CalMaTe.
Illumina and Affymetrix have similar noise levels after CalMaTe
Illumina and Affymetrix have similar noise levels after CalMaTe
← BB ← AB ← AA
1. Genotypes (AA,AB,BB) from BAFs of a matched normal
(all loci)
2a. Total CNs C = CA+CB
(SNPs only)
2b. Tumor BAFs β = CB/C
(hets only)
3. Decrease in Heterozygosity ρ = 2*| β -1/2| ; hets only
PSCNs can be estimated at each SNP if we know which SNPs are heterozygous