Supplementary appendix This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Lalonde E, Ishkanian AS, Sykes J, et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol 2014; published online Nov 13. http://dx.doi.org/10.1016/S1470-2045(14)71021-6.
131
Embed
Supplementary appendix - The Lancet · Supplementary appendix ... Supervised learning approach to biomarker ... indicator and was selected as the independent variable for all analyses
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supplementary appendixThis appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors.
Supplement to: Lalonde E, Ishkanian AS, Sykes J, et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol 2014; published online Nov 13. http://dx.doi.org/10.1016/S1470-2045(14)71021-6.
Page 1 of 130
Supplementary Appendix
Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year
biochemical recurrence of prostate cancer: a retrospective cohort study
Lalonde E*, Ishkanian AS*, Sykes J, et al. Lancet Oncology, November 2014.
Interaction between percent genome alteration and hypoxia ........................................................................................... 7
100-loci DNA signature ................................................................................................................................................... 7
Prediction of metastasis .................................................................................................................................................... 8
Comparison of prognostic variables for biochemical recurrence ..................................................................................... 8
Comparison of genomic prognostic signatures ................................................................................................................ 8
Table S1. Clinical characteristics of Toronto-IGRT cohort compared to MSKCC cohort. ............................................ 11
Table S2. Clinical characteristics of Toronto-IGRT cohort compared to Cambridge cohort. ........................................ 12
Table S3. CNAs (a) and genes involved in CNAs (b) per cohort. .................................................................................. 13
Table S4. Genetic differences between Subtypes 2 and 3. ............................................................................................. 14
Table S5. Enrichment of clinical variables across Subtypes in the Toronto-IGRT cohort. ............................................ 15
Table S6. Enrichment of clinical variables across Subtypes in the MSKCC cohort. ..................................................... 16
Table S7. Enrichment of clinical variables in Subtypes with patients from the Toronto-IGRT and MSKCC cohorts. . 17
Table S8. Multivariate Cox proportional hazard model for the genomic Subtypes in the pooled Toronto-IGRT and
MSKCC cohorts for low-intermediate risk patients. ...................................................................................................... 18
Table S9. Cox proportional hazard model for overall survival in the Toronto-IGRT cohort. ........................................ 19
Table S10. Multivariate Cox proportional hazard model for PGA in each cohort. ........................................................ 20
Table S11. C-index analysis of PGA and clinical variables. .......................................................................................... 21
Table S12. RNA hypoxia signatures used in this study. ................................................................................................ 22
Table S13. Prognosis of combined PGA and RNA hypoxia scores in the RadP cohorts. .............................................. 23
Table S14. C-index analysis of PGA and hypoxia in the pooled full RadP cohorts. ...................................................... 24
Page 2 of 130
Table S15. Sensitivity analysis of hypoxic measurements, alone and with an interaction with time, in the IGRT cohort.
Figure S5. The top 30 most recurrent cytoband regions involved in copy number aberrations in each cohort. ............. 42
Figure S6. Copy number profile of cohorts .................................................................................................................... 43
Figure S7. Prognostic CNAs in patient biopsies. ........................................................................................................... 44
Figure S8. Genomic overview of Toronto-IGRT training cohort................................................................................... 45
Figure S9. Copy number profiles of prostate cancer in the low-high risk patients. ....................................................... 47
Figure S10. Genomic Subtypes are prognostic. ............................................................................................................. 49
Figure S11. PGA comparison between patients with deletions of CHD1. ..................................................................... 50
Figure S12. PGA operating point analysis. .................................................................................................................... 51
Figure S13. PGA is prognostic for general and early failure in the two independent RadP cohorts. ............................. 52
Figure S14. Classification of metastatic Toronto-IGRT and MSKCC patients by PGA. ............................................... 54
Figure S15. PGA differs significantly between patients of each genomic Subtype. ...................................................... 55
Figure S16. Tumour hypoxia estimates based on the Buffa RNA signature in the pooled RadP patients. .................... 56
Figure S17. Tumour hypoxia estimates based on the West RNA signature in the pooled RadP patients. ..................... 57
Figure S18. Tumour hypoxia estimates based on the Winter RNA signature in the pooled RadP patients. .................. 58
Figure S19. Hypoxia signature scores vs. PGA in the pooled RadP cohorts. ................................................................. 59
Figure S20. The prognostic effect of PGA and hypoxia in the pooled RadP cohorts. ................................................... 60
Figure S21. Direct intra-tumour hypoxia measurements in the IGRT cohort ................................................................ 62
Figure S22. Genomic profile of patients ranked according to increasing hypoxia. ........................................................ 63
Figure S23. Percentage of hypoxic measurements (HP20) as a function of clinical and genetic variables ................... 64
Figure S24. Supervised learning approach to biomarker development .......................................................................... 66
Figure S25. Classification abilities of the 100-loci DNA signature (“RF”) or clinical variables in the RadP cohorts. .. 68
Figure S26. The 100-loci DNA signature is prognostic in two individual cohorts. ....................................................... 70
Figure S27. The Signature Risk Score is associated with clinical variables. ................................................................. 71
Figure S28. CNA-signature within Gleason score patient sub-groups. .......................................................................... 72
Figure S29. CNA-signature within T-category patient sub-groups. ............................................................................... 73
Page 3 of 130
Figure S30. CNA-signature within PSA patient sub-groups. ......................................................................................... 74
Figure S31. CNA-signature at 18 months within each clinical risk group. .................................................................... 75
Figure S32. Classification of metastatic MSKCC patients. ............................................................................................ 76
Figure S35. Low-recurrence genes are important for prognosis. ................................................................................... 79
Figure S36. Comparison of our CNA-signature to various known prognostic CNA biomarkers. ................................. 80
Figure S37. Relative importance of the 100 signature loci. ........................................................................................... 81
Figure S38: Functional analysis of CNA-signature. ....................................................................................................... 82
Figure S39. Global PGA vs. signature-estimated PGA. ................................................................................................. 84
Figure S40. Example of clinical stratification of patients based on the PGA-Hypoxia index. ....................................... 86
Figure S41. Example of clinical stratification of patients based on the 100-loci DNA signature. ................................. 89
Supplementary File 1. Full clinical annotation of Toronto-IGRT cohort.......................................................................95
Supplementary File 4. Annotation of 276 genes in the 100-loci DNA signature…………………………………….101
Supplementary File 5. Clinical characteristics and prognostic indices of Cambridge cohort…………………...……117
Supplementary MethodsSupplementary files are available at labs.oicr.on.ca/boutros-lab/publications
Toronto image-guided radiotherapy (IGRT) cohort (Training Set)A cohort of 247 men with histologically confirmed adenocarcinoma of the prostate were studied in a prospective clinicalstudy as previously described, which was approved by the University Health Network Research Ethics Board and registered(NCT00160979) in accordance with the criteria outlined by the International Committee of Medical Journal Editors.1 Informedconsent was obtained for all patients. Briefly, from 1996-2006, flash-frozen, pre-treatment biopsies were derived from thosepatients who had chosen radical IGRT for primary treatment. The clinical target volume (CTV) encompassed the prostate glandalone. The planning target volume (PTV) was defined by a 10 mm margin around the CTV except posteriorly where the marginwas 7 mm. All patients were treated with 6-field conformal or intensity modulated radiotherapy using fiducial gold seeds fordaily set-up and quality assurance to preclude geographical misses.
There was sufficient tumour in the biopsies of 142 of these patients to permit microdissection. Of these 142 patients, 126patients had information pertaining to long-term biochemical outcome and were treated with image-guided radiotherapy (IGRT).The final cohort therefore included 126 patients, of which 55 have had biochemical relapse (BCR) (Table S1; appendix p 11).Patients were followed at 6 month intervals after completing treatment with clinical examination and PSA tests. Additional testsand the management of patients with recurrent disease were at the discretion of the treating physician. The median follow-up ofsurviving patients is 7.8 years following the end of treatment. The clinical information for each patient is provided in Supple-mentary File 1.
Measurement of focal tumour hypoxia in IGRT cohort (HP20 index)Intra-glandular measurements of pO2 to define individual prostate cancer hypoxia was measured pre-radiotherapy for all patientsin the IGRT cohort using an ultrasound-guided transrectal needle-piezoelectrode technique, as previously described.2 Briefly,forty to eighty individual oxygen readings were obtained along 2 to 4 linear measurement tracks 1.5 to 2 cm in length throughregions of the prostate likely to contain tumour (based on real-time Doppler ultrasound, digital rectal examination, and previousdiagnostic biopsies). Tumour biopsies were taken directly parallel to this probe and one was fixed in formalin and another oneflash-frozen in liquid nitrogen for genomic studies, as previously described.1,3 The flash frozen biopsies used for aCGH analyseswere therefore obtained from the same spatial locale as the pO2 measurements.
A sensitivity analysis was performed assessing the prognostic ability of various hypoxic thresholds. We compared the percentageof pO2 oxygen measurements less than 5, 10, and 20 mm Hg (i.e. HP5, HP10, and HP20, respectively) in terms prognosticeffect. In this sensitivity analysis (Table S15; appendix p 25), we looked at all patients with hypoxic information (n = 247, i.e. allpatients from the Milosevic et al. study), and the subset of patients used in this study (n = 126, with two patients missing hypoxicreadings). We used Cox proportional hazard regression to model the effect of each hypoxic threshold alone, and in combinationwith time, which was previously shown to be significant.2 Based on this analysis, HP20 showed the most promise as a prognosticindicator and was selected as the independent variable for all analyses investigating relationships between genomic instabilityand hypoxia in the IGRT cohort.
aCGH analysisFrozen biopsies were embedded in optimum cutting temperature (OCT) at -80◦C and cut into 10 µm sections for manual mi-crodissection and preparation of DNA samples as previously described.1 Briefly, 300 ng of tumour and reference DNA weredifferentially labeled with Cyanine 3-dCTP and Cyanine 5-dCTP (Perkin Elmer Life Sciences). Reference DNA was obtainedfrom a healthy human male DNA (i.e. diploid). The samples were then applied onto whole genome tiling arrays containing26,819 bacterial artificial chromosome (BAC)-derived amplified fragment pools spotted in duplicate on aldehyde coated glassslides (SMRT v.2, BC Cancer Research Centre Array Facility, Vancouver). The log2 ratios of the Cyanine 3 to Cyanine 5 intensi-ties for each spot were calculated. Data were filtered based on both standard deviations of replicate spots (data points with greaterthan 0.075 standard deviation were removed) and signal to noise ratio (data points with a signal to noise ratio less than 3 wereremoved). The raw data and normalized gene-matrix has been deposited on NCBI’s Gene Expression Omnibus with accessionnumber GSE41120.
Page 4 of 130
The resulting dataset was normalized using a stepwise normalization procedure.4 The genomic positions of clones are mapped tothe NCBI’s Genome Build 36.1, released in March 2006. Areas of aberrant copy number were identified using a robust HiddenMarkov Model and classified as either loss, neutral, or gain for all probes processed.5 The liftOver tool from UCSC (versiondated 2011-11-27) was used to map the copy number segments to the hg19 human genome build. Fragments overlapping cen-tromeres, telomeres, or other gaps in the hg18 build were trimmed conservatively (regions were shortened rather than elongated).To generate contiguous CNA regions, probe-based CNA calls were collapsed with neighbouring probes within the same chromo-some with the same copy number. CNA regions with only one supporting probe were removed. In addition, any CNAs foundentirely within centromeres or telomeres, as defined by the UCSC ‘gap’ table, were removed. CNA regions were intersected witha merged and collapsed version of the RefSeq gene annotation (GRCh37/hg19) to generate gene-based CNA calls. This gene listwas further filtered to match the published gene list from the radical prostatectomy (RadP) cohort (n = 17603, Supplementary File2). TMPRSS2-ERG fusion status (see ‘erg_acgh’ column in Supplementary File 1) was defined by any deletion in the genomicregion between the 3’ end of ERG (chr21:39751950) and the 5’ end of TMPRSS2 (chr21:42879992), as previously described.6
MSKCC radical prostatectomy (RadP) cohort (Validation Set)To validate results derived from the IGRT cohort, a second cohort of CaP patients treated by radical prostatectomy at the MemorialSloan Kettering Cancer Center (MSKCC) was downloaded from the Cancer Genomics cBioPortal.7,8 We selected 154 clinically-staged T1-T4N0M0 primary tumours and classified patients as low, intermediate and high-risk, according to NCCN guidelines.9
Patients with salvage RadP were excluded. Patient DNA had been hybridized to Agilent’s 244k platform generating ∼244,000tumour to normal DNA intensity ratios. The normal samples used in this study were matched DNA when available or elsepooled normal DNA. To obtain gene-based calls for each patient, we downloaded the output of RAE, as described in the originalpublication.7 CNA calls were collapsed from {-2, -1, 0, 1, 2} to {-1, 0, 1} to match the dynamic range of the IGRT cohort (Supple-mentary File 2). To calculate PGA (see below), we also downloaded normalized and segmented data (.seg file) from cBioPortal.The segmented data consisted of regions of similar copy number status and a log-ratio. Thresholds of <-0.2 and >0.2 wereused to define deletions and amplifications, respectively, consistent with the cBio portal methodology. Again, the copy numberfragments were mapped to the hg19 human reference build using the liftOver tool, and filtered as above for the IGRT cohort.As with the IGRT cohort, TMPRSS2-ERG fusion status was defined by any deletion in the genomic region between the 3’ endof ERG (chr21:39751950) and the 5’ end of TMPRSS2 (chr21:42879992). The median follow-up time for this cohort was 4.6years, with 37 of 154 patients experiencing BCR (Table S1; appendix p 11). Given 37 events in this cohort and a 0.05 probabilityof a type I error, we have power of 0.56 and 0.92 to detect a hazard ratio of 2.0 and 3.0, respectively.
Cambridge RadP cohort (Validation Set)To further validate our prognostic indices, we obtained a second RadP cohort consisting of 117 low-high risk men treated in theUK (unpublished data; Ross-Adams et al.). Ethical approval for the use of samples and data collection was granted by the localResearch Ethics Committee under ProMPT (Prostate Mechanisms for Progression and Treatment) ’Diagnosis, investigation andtreatment of prostate disease’ (MREC 01/4/061). The Cambridge cohort comprises matched tumour and benign tissues from 117men with histologically-confirmed prostate cancer at radical prostatectomy. Samples were prepared as previously described, andthe minimum inclusion threshold for the percentage of tumour in samples was 40%.10 Comprehensive clinical (diagnostic) datawere collected, including pre-operative and follow-up PSA, TNM staging, and Gleason score (Table S2; appendix p 12). Theaverage age was 61 years (range 41-73). The median time to biochemical relapse is 2.8 years, and as such we focus on 18 monthbRFR for this cohort when used alone. Given 26 events in this cohort and a 0.05 probability of a type I error, we have power of0.42 and 0.80 to detect a hazard ratio of 2.0 and 3.0, respectively.
Total genomic DNA and mRNA RNA was extracted from each tumour and benign tissue core (Qiagen AllPrep). Copy num-ber variation was assayed with Illumina HumanOmni2.5-8 bead chip arrays (Aros Applied Biotechnology, Aarhus, Denmark)and pre-processed using OncoSNP.11 OncoSNP ranks the copy number calls from 1 (most confident, typically larger) to 5 (leastconfident, typically smaller); see https://sites.google.com/site/oncosnp/user-guide/interpreting-oncosnp-output for details. Weaccepted copy number calls of rank 3 or less in order to include both broad and focal CNAs. Expression profiling was per-formed on Illumina HT12 arrays. Bead level data were pre-processed to remove spatial artifacts, log2-transformed and quantilenormalized using the beadarray package in Bioconductor prior to analysis.12 The ComBAT method, as implemented in the svaBioconductor package (v3.2.1), was used to address batch effects in the expression data.13 To collapse the expression data to
Page 5 of 130
gene level, the probe with the largest inter-quartile range was used to represent each gene. The scores for each of the prognosticindices for the Cambridge cohort are supplied in Supplementary File 5.
RNA hypoxia signaturesTo evaluate hypoxia in the MSKCC and Cambridge cohorts, we used three previously published mRNA signatures for hypoxia(Table S12).14-16 The gene signatures were applied to 108/154 MSKCC patients and 110/117 Cambridge patients with mRNAdata available. To generate hypoxia scores, each gene in each patient was evaluated against the median gene abundance for thesame gene within the cohort. Patients with abundance greater than the median received a gene score of 1, and patients with abun-dance lower than the median received a gene score of -1. The hypoxia RNA score for a patient (p) is the sum of the gene-scoresfor each gene (g) in a signature with (sig.size) genes:
RNA Hypoxia Scorep =sig.size
∑g=1
f (x) ={
1, if genegp > median(g)−1, if genegp < median(g)
}
The RNA Hypoxia Scores were median dichotomized to define low- or high-hypoxia tumours. This was repeated for all threehypoxia signatures. These signatures have not been evaluated in prostate cancer. Validation in prostate cancer is required toillustrate that they are indeed measuring tumour hypoxia. Nonetheless, we use these promising signatures as a proxy for tumourhypoxia for the first time in prostate cancer, which is later validated by our results from the IGRT cohort, in which we have directintra-glandular hypoxia measurements at the site of biopsy.
Statistical methodsClinical risk groups were determined using the NCCN classification system.9 The primary outcome was time to biochemicalfailure (BCR) as defined by Roach et al.17 to be a PSA rise of at least 2 ng/mL above post-radiation nadir value for IGRT pa-tients, and for RadP patients as two consecutive PSA concentration values > 0.2 or triggered salvage radiotherapy.18 Five-yearbiochemical relapse-free rates (RFR) rates were calculated using the Kaplan-Meier method. Additionally, 18-month relapse-freerates were compared to evaluate risk of prostate cancer specific mortality.19 Cox proportional hazard models were fit when pos-sible, adjusting for Gleason score and PSA levels. T category was not prognostic within the low-intermediate risk patients in anycohort, and was thus not used in the models, except when using all risk groups where PSA, T category, and Gleason scores wereall included (Tables S1-S2; appendix pp 11-12). Proportional hazard assumptions were tested with the R function cox.zph. If avariable failed these assumptions, the variable was either stratified (e.g. for PSA) or a log-rank test was used.
Receive operator characteristic (ROC) and C-index analyses were performed with the survivalROC (v1.0.3) and Hmisc (3.14-4)packages, respectively. We used the survivalROC package to perform ROC analysis while accounting for data censoring, usingNearest Neighbour Estimation with default parameters at a prediction time of 18 months and 5 years.27 In the univariate setting,the biomarkers were used as the predictor variable for ROC and C-index analyses. In the multivariate setting, we use the outputof coxph models which include both the biomarker of interest and relevant clinical factors(PSA and Gleason score for low-intmodels, and PSA, Gleason score, and T category for full models). All statistical analyses were done in the open source R softwareversions 3.0.2 using the survival package version 2.37-4. A two-sided p-value of 0.05 was used to assess statistical significanceand the Benjamini-Hochberg false-discovery rate (FDR) method) or the Bonferroni correction was applied to correct for multipletesting, where appropriate.20
Cohort comparisonWe use several subsets of the validation cohorts in our analyses. To clinically match the IGRT/training cohort, we focus on thepatients with low or intermediate risk disease (’Low+Int’, n=124 for MSKCC and n=86 for Cambridge). To increase powerand to verify prognosis in a more diverse cohort, we also consider the full cohort which consists of an additional 30 high-riskMSKCC patients, 26 high-risk Cambridge patients, and 5 Cambridge patients with unknown classification (’Full’, n=271). Fi-nally, to evaluate the RNA hypoxia signature14-16 (above) and to compare our DNA-based signature to prognostic RNA indices(below), we consider the subset of 271 RadP patients with information on both mRNA and CNA (n=108 for MSKCC and n=110for Cambridge).
The distribution of clinical variables the IGRT and the low+int RadP cohorts were compared with χ2 tests for each valida-
Page 6 of 130
tion cohort separately (Tables S1-S2; appendix pp 11-12). In addition, the number of CNAs and the number of genes involved inregions of CNAs were compared between the IGRT and each of the low+int and full MSKCC cohorts. The t-test, Mann-Whitneytest and F-test were used to determine whether the two cohorts differed in terms of mean, median and standard deviation, respec-tively, for the number of CNAs or genes in CNAs per patient (Table S3, Figure S4; appendix pp 13, 40-41).
Univariate CNA prognosisCNA recurrence, defined as the percentage of patients within a cohort harbouring a CNA in a specific gene, was calculated forthe Toronto-IGRT and MSKCC cohorts alone and combined. Each gene with at least 10% recurrence within a cohort, or 5%recurrence across both cohorts, was assessed for prognosis using a Cox proportional hazard model, with adjustments for clinicalvariables. This analysis was repeated for each cohort separately, and both cohorts combined (see Supplementary File 3). If thenumber of gains and deletions in a gene were both above the minimum recurrence threshold, a multi-level factor was used forthe gene (copy neutral as the reference group, compared to gains and deletions separately). If only one type of CNA (i.e. gain ordeletion) was above the threshold, patients with the other CNA type were grouped with the copy neutral patients. Finally, if bothCNA types were each below the recurrence threshold, but together were above the threshold, patients with gains and deletionswere grouped together and compared to copy-neutral patients (i.e. CNAs vs. no CNAs). Multiple testing correction was appliedwith the false-discovery method.
Unsupervised hierarchical clusteringTo find the optimal number of subtypes, the R package ConsensusClusterPlus21 (v1.8.1) was used with 80% subsampling of pa-tients from the Toronto-IGRT cohort for 1000 iterations, with the maximum number of subtypes set to 15. Ward clustering withJaccard distance22 was used to cluster the genomic profiles of the patients (Figure S8A; appendix p 45). ConsensusClusterPlusalso determines the subtype assignment for each patient. The genomic profile of a subtype is defined as the median CN of eachgene in the patients assigned to that subtype, rounded to the nearest integer copy number. Patients from the MSKCC cohort wereassigned to the subtype which had the most similar CN profile (based on the Jaccard distance metric; Figures S9A-B; appendix p47). The distribution of several variables of interest was compared across the four subtypes. For the categorical variables (Glea-son score, T category, discretized hypoxia, ERG, and risk group), a deviance test was conducted with a Poisson regression modelto determine whether there was a statistically significant interaction between each variable and the clustering. For the continuousvariables (PSA, PGA), we conducted a Kruskal-Wallis test to compare the distribution of each variable across the four subtypes.These tests were repeated for both Toronto-IGRT and MSKCC cohorts combined and for each cohort separately (Tables S5-S7;appendix pp 15-17).
Percent genome alteration (PGA)Percentage Genome Alteration was calculated in the following way: each region of copy number alteration was identified anddefined by length of gain or loss in base pairs. The cumulative number of base pairs altered was calculated by adding all regionsof alteration per patient. The total number of base pairs altered was divided by the number of base pairs covered on the array toprovide a proportion of each patients genome altered. PGA does not account for the strand of CNAs, and thus the denominator isapproximately 3 billion base pairs (vs. 6 billion), depending on the platform used. PGA was treated as both a continuous variableand dichotomized at the Toronto-IGRT cohort upper tertile for presentation in Kaplan-Meier curve analyses. Mann-Whitney orKruskal-Wallis tests were used to compare the PGA of patients grouped according to clinical variables and genomic subtypes.
Interaction between percent genome alteration and hypoxiaA Cox proportional hazard regression model with an interaction term between PGA and hypoxia was used to test for a synergisticeffect between the two variables. Both variables were median dichotomized to define patients with low vs. high values. Forhypoxia, we used three previously published RNA signatures in the RadP cohorts (Table S12; appendix p 22)14-16 and HP20(which is a direct measurement of intra-tumour pO2, see above) in the Toronto-IGRT cohort.
100-loci DNA signatureA random forest23 with 1 million trees was trained with the Toronto-IGRT cohort and validated with the RadP cohorts usingthe R package randomForest (v4.6.7) (Figure S24; appendix pp 66-67). Given copy number status per patient (-1, 0, or 1), therandom forest predicts the occurrence of biochemical relapse for each patient. To eliminate redundancy, neighbouring genes withidentical copy numbers across all patients from both cohorts were collapsed into a single feature. This reduced our feature set
Page 7 of 130
by approximately 3-fold, resulting in 5,355 collapsed features. Signature sizes of 3, 5, 10, 30, 50, 75, 100, 300, 500 and 1000features were tested with a leave-one-out cross-validation approach in the IGRT cohort. To select which genes to include in asignature (i.e. attempt to find the most informative genes in predicting BCR) a binomial logistic regression model was fit to eachfeature and features were selected by p-value. The optimal gene signature size (100 features) was used to train the entire Toronto-IGRT cohort and was validated with both RadP cohorts. Variable importance was assessed using the change in Gini score andby the variable importance information generated from random forest training. The gene signature is obtained by mapping theselected collapsed features back to individual genes. The Signature Risk Score is the predicted score from the random forest (i.e.the proportion of trees that voted ‘yes’, where a ‘yes’ vote means the tree predicts that the patient will have biochemical relapse).
A bootstrap analysis was performed to evaluate how the identified signature compares to an empirical null distribution, as pre-viously described.24,25 A null distribution was created by generating by 1 million random sets of 100 features (sampled fromthe 5,355 collapsed regions) and repeating the random forest training and classification with the IGRT and pooled RadP cohorts,respectively. For each random gene set, the AUC and c-index of that model in the pooled RadP cohorts were obtained.
Prediction of metastasisWe examined the potential of the 100-loci DNA signature in predicting metastasis in the MSKCC cohort. The Cambridge cohortwas excluded as metastasis information was not available. Since the time to metastasis is unknown for the MSKCC cohort, resultsregarding prognosis for metastasis are preliminary. For receiver operator characteristic (ROC) analysis, all patients were used andthe the area under the ROC curve (AUC) was assessed with the pROC package, which does not take censored information intoaccount.26 Given that median time to follow-up is 4.6 years in the MSKCC cohort, additional patients will eventually experiencemetastasis. Thus, we will be better suited to understand the model’s ability to predict metastasis in the future. To evaluate theaccuracy of classification for metastasis, only patients with at least five-years of follow-up time, or a reported metastatic eventwere considered (n = 74, of a possible total of 154). A Mann-Whitney U test was used to compare scores of patients with andwithout metastasis.
Comparison of prognostic variables for biochemical relapseThe prognostic ability of the random forest signature, PGA, and clinical variables were compared with a ROC analysis, and inparticular the AUCs of each variable(s). The R package survivalROC (v1.0.3) was used to create ROC curves while accountingfor data censoring, using Nearest Neighbour Estimation with default parameters at a prediction time of 5 years.27 A permutationanalysis was used to assign p-values to pairs of AUCs by randomly scrambling sample-signature score pairings per marker 5000times, and building a distribution of the differences in AUC. A p-value was obtained based on the z-score for the difference inAUCs from the unscrambled sample-signature pairs. In addition, to assess the goodness of fit for models with vs. without PGA,the difference of deviances for models with only the signature or the genomic subtypes, to models combined with PGA wascompared to a χ2 distribution, with one degree of freedom.
Comparison of genomic prognostic signaturesWe compared the AUC of our 100-loci DNA signature to 23 previously published RNA-based prognostic signatures for BCRin prostate cancer (Table S21; appendix p 31). To enable a fair comparison between the DNA and RNA signatures, we trainedthe RNA signatures with random forests, and tested their performance on the same subset of the MSKCC cohort. In total,108 MSKCC patients with localized disease have mRNA and CNA information. To train the models with the RNA signatures,the GenomeDX prostate cancer database was used, which contains genome-wide mRNA abundance values from microarraysfor primary tumour samples from the Mayo Clinic28,29, Cleveland Clinic30, Thomas Jefferson University31, New York Univer-sity, Moffit Cancer Center, Erasmus Medical Center32, Institute of Cancer Research33, and MSKCC7. All patients from theGenomeDX database except for the MSKCC patients were used to train two models for each signature: one using only low andintermediate risk patients, and another using low- to high-risk patients, including some patients with node-positive disease. Thisresults in a training set of 293 patients for the low-intermediate risk patient models, and of 1299 patients for the full-cohort patientmodels.
The methodology for the low-intermediate risk cohort and the low-high risk cohort are the same, with each model producing aset of predictions scores and AUCs, implemented in R (version 2.15.3). Every patient sample is normalized using SCAN at theprobe selection region (PSR) level (v1.0.0, customized for the HuEx arrays).34 Each gene in the signatures is summarized by
Page 8 of 130
taking the median expression of any PSR which falls within an exon of the gene. In the rare event that no PSR and exon overlap,intronic PSRs are used instead. If no PSR is found within the gene’s genomic region, the gene is not included in the remodeledsignature.
All samples, excluding MSKCC, are used for training a random forest classifier (randomForest package v 4.6-7) to predict bio-chemical relapse. Tuning of the classifier’s parameters is done using a 5 by 5 grid search of the mtry and nodesize parameters.The best tuning parameters are selected after a 10-fold cross validation performance evaluation. Each tuned model was appliedto the MSKCC patients to produce a risk score between 0 - 1 for the patient’s likelihood of biochemical progression. In additionto the genomic models, a clinical model was created using pre-treatment PSA, T category, and diagnostic Gleason score. Againa random forest model was used and tuned in a similar way as described above. The scores of the models were evaluated fortheir ability to predict biochemical relapse using survivalROC.27 AUCs were calculated for prediction times of 5 years and 18months using survivalROC v1.0.3. Confidence intervals were estimated via 500 bootstrapping iterations. The AUCs for the 23RNA signatures were compared to the AUC of our 100-loci DNA signature, using the 108 MSKCC patients with both mRNAand DNA information (Figure 4C-D and Figure S34; appendix p 78).
Page 9 of 130
Supplementary Files
Supplementary File 1. Full clinical annotation of the Toronto-IGRT cohort. See pages 95-100. We have made the original excelfile available from NCBI’s Gene Expression Omnibus with accession number GSE41120 and at http://labs.oicr.on.ca/boutros-lab/publications.
Supplementary File 2. Gene-based CNA calls for the Toronto-IGRT and MSKCC cohorts after pre-processing. The datafor the MSKCC cohort was downloaded from the cBio portal, filtered to match the Toronto-IGRT gene list, and to select onlyprimary tumours from patients with localized disease and no adjuvant treatment (n = 154). The rows represent genes, the columnsare patients. Values are ternarized to -1, 0, 1, representing deletions, neutral, and amplification states, respectively. All genomicpositions refer to the GRCh37 (hg19). Access data here: http://labs.oicr.on.ca/boutros-lab/publications/, and for the Toronto co-hort only from NCBI’s Gene Expression Omnibus with accession number GSE41120.
Supplementary File 3. Univariate prognostic impact of each gene per cohort. A multivariate Cox proportional hazard modelwas fit for all genes with CNAs in 10% or more of the cohort, adjusting for clinical variables. For the Toronto-IGRT cohort(first tab), Gleason score and PSA (stratified at 10ng/mL) were included in the Cox proportional hazard model. For the MSKCCcohort and the combined cohorts (second and third tab, respectively), all patients were used, and thus T-category (T3 vs. T1-2)was included in the model, in addition to Gleason score and PSA. Multiple testing correction was applied with the Benjamini-Hochberg false-discovery rate correction (see q-value column). All genomic positions refer to the GRCh37 (hg19). Access datahere: http://labs.oicr.on.ca/boutros-lab/publications/.
Supplementary File 4. Complete annotation of the CNA signature involving 276 genes. Annotations include genomic positions(relative to GRCh37/hg19), prognostic information and frequency of CNAs (gains and deletions combined) in each genomicsubtype. For the prognostic information, the Toronto-IGRT and MSKCC cohorts were combined, including high risk MSKCCpatients. See ‘Univariate CNA prognosis’ section in the Supplementary methods. View data on pages 101-116 and access datahere: http://labs.oicr.on.ca/boutros-lab/publications/
Supplementary File 5. Annotations and scores for the Cambridge cohort. The clinical annotation (first tab), the PGA perpatient (second tab), the RNA hypoxia scores (third tab), and the CNA matrix for the 100-loci DNA signature (fourth tab) areprovided. View data on pages 117-130 and access data here: http://labs.oicr.on.ca/boutros-lab/publications/
Page 10 of 130
Supplementary Tables
Table S1. Clinical characteristics of Toronto-IGRT cohort compared to MSKCC cohort.
Clinical characteristics of localized prostate cancer patients treated by image-guided radiation therapy (IGRT; n=126) or radicalprostatectomy (RadP; low- to intermediate-risk n = 124; low- to high-risk n = 154). Shown are the hazard ratios and significance(Wald p-values) for the effect of the clinical prognostic factors (PSA, GS, T-category) for 5-year bRFR following either IGRT(for Toronto-IGRT cohort) or RadP (for MSKCC cohort) for T1-3N0M0 clinically-staged prostate cancers. Differences betweenthe two cohorts in these clinical variables are also shown with χ2 tests or two-sided Mann-Whitney U-tests. The use of theclinical factors in multivariate models is also stated for models using only the low- to intermediate-risk patients (low-int model)and for the models including all patients (full model). Of note, when considering only the low and intermediate risk RadPpatients, the IGRT cohort had significantly more patients with T2 and GS7 tumours and higher PSA (along with a longer medianfollow-up and an increased number of events compared to the RadP cohort).
Toronto-IGRT cohort MSKCC Cohort Difference between MSKCC Cohort Comparison used inLow-Int Risk cohorts Low-High Risk statistical models
N (%) N (%) p (test) N (%)T-category
T1 45 (36%) 68 (55%) 79 (51%) low-int model: NAT2 81 (64%) 56 (45%) 0.0036 66 (43%) full model: T3 vs. T1-T2T3 0 (0%) 0 (0%) (χ2) 9 (5.8%)
Median follow-up time 7.8 years 4.8 years p = 0.51 (log-rank) 4.8 years NAMedian time to BCR 6.8 years 4.4 years p = 0.14 (log-rank) 4.5 years Response variableNCCN classificationLow-risk 19 (15%) 58 (47%) < 0.0001 58 (38%)Intermediate-risk 107 (85%) 66 (53%) (χ2) 66 (43%)High-risk 0 (0%) 0 (0%) 30 (19%)
Metastasis 11 3 p = 0.058 (χ2) 11 NADeaths 12 7 p = 0.36 (χ2) 11 NA
Page 11 of 130
Table S2. Clinical characteristics of Toronto-IGRT cohort compared to Cambridge cohort.
Clinical characteristics of loclized prostate cancer patients treated by image-guided radiation therapy (IGRT; n=126) or radicalprostatectomy (Cambridge; low-int n = 86; full n = 117). Shown are the hazard ratios and significance (Wald p-values) for theeffect of the clinical prognostic factors (PSA, GS, T-category) for bRFR following either IGRT (Toronto-IGRT cohort, 5-yearbRFR) or RadP (Cambridge cohort, 18-month bRFR) for T1-3N0M0 clinically-staged prostate cancers. Differences between thetwo cohorts in these clinical variables are also shown with χ2 tests or two-sided Mann-Whitney U-tests. The use of the clinicalfactors in multivariate models is also stated for models using only the low- to intermediate-risk patients (low-int model) and forthe models including all patients (full model).
Toronto-IGRT cohort Cambridge Cohort Difference between Cambridge Cohort Comparison used inLow-Int Risk cohorts Low-High Risk statistical models
N (%) N (%) p (test) N (%)T-category
T1 45 (36%) 56 (65%) 67 (57%) low-int model: NAT2 81 (64%) 30 (35%) < 0.0001 35 (30%) full model: T3 vs. T1-T2T3 0 (0%) 0 (0%) (χ2) 15 (13%)
HR (T2 vs. T1) 0.82 2.3p 0.60 0.21 0.021 (log-rank)
Median time to BCR 6.8 years 2.8 years p = 0.32 (log-rank) 2.8 years Response variableNCCN classificationLow-risk 19 (15%) 16 (19%) 0.62 16 (14%)Intermediate-risk 107 (85%) 70 (81%) (χ2) 70 (60%)High-risk 0 (0%) 0 (0%) 26 (22%)NA 0 (0%) 0 (0%) 5 (4.3%)
Page 12 of 130
Table S3. CNAs (a) and genes involved in CNAs (b) per cohort.
Events were compared at the CNA level (a) and at the gene-CNA level (b) for the Toronto-IGRT and MSKCC cohorts. Quartilesand Mann-Whitney U p-values are shown for each cohort and event-type. Differences in the median number of CNAs perpatient are observed between cohorts but these differences are not present when comparing the frequencies of genes within CNAregions.
Table S4. Genetic differences between Subtypes 2 and 3.
The top 6/8 differential regions (χ2 test, q < 0.05) are shown. In Subtype-2, gain of 8q (including c-MYC as the top hit), gainof 3q, deletion of chromosome 13, and a deletion within 1q are more frequent. In contrast, in Subytpe-3, 16q deletion is morecommonly observed. The χ2 test-statistic and q-value are shown for representative genes in each region, and the additionalnumber of genes from the region is noted (+X genes).
Table S5. Enrichment of clinical variables across Subtypes in the Toronto-IGRT cohort.
The rows of significant variables are in bold. The values in the Subtype columns depend on the type of variable. For continuousvariables (PSA, PGA, and HP20), the median is shown. For binary variables (HP20 dichotomized and TMPRSS2-ERG), theproportion of patients with the event are shown. For categorical variables (Gleason score, T category and Risk group), the modeis shown. To determine whether any variable was enriched in a Subtype, a Kruskall-Wallis test was used for the continuousvariables, and a deviance test was used for the categorical and binary variables. The number of degrees of freedom (df) in eachtest are shown.
Table S6. Enrichment of clinical variables across Subtypes in the MSKCC cohort.
The rows of significant variables are in bold. The values in the Subtype columns depend on the type of variable. For continuousvariables (PSA and PGA), the median is shown. For binary variables (TMPRSS2-ERG), the proportion of patients with the eventare shown. For categorical variables (Gleason score, T category, and Risk group), the mode is shown. To determine whether anyvariable was enriched in a Subtype, a Kruskall-Wallis test was used for the continuous variables, and a deviance test was usedfor the categorical and binary variables. The number of degrees of freedom (df) in each test are shown.
Risk group Intermediate Intermediate High Intermediate 13 6 0.043TMPRSS2-ERG 0.33 0.39 0.39 0.27 2.4 3 0.49
Page 16 of 130
Table S7. Enrichment of clinical variables in Subtypes with patients from the Toronto-IGRT and MSKCC cohorts.
The rows of significant variables are in bold. The values in the Subtype columns depend on the type of variable. For continuousvariables (PSA and PGA), the median is shown. For binary variables (TMPRSS2-ERG), the proportion of patients with the eventare shown. For categorical variables (Gleason score, T category, and Risk group), the mode is shown. To determine whether anyvariable was enriched in a Subtype, a Kruskall-Wallis test was used for the continuous variables, and a deviance test was usedfor the categorical and binary variables. The number of degrees of freedom (df) in each test are shown.
Table S8. Multivariate Cox proportional hazard model for the genomic Subtypes in the pooled Toronto-IGRT and MSKCC cohorts forlow-intermediate risk patients.
a: A multivariate Cox proportional hazard model was fit using low- to intermediate-risk patients from both cohorts, with theSubtypes as the main predictor of bRFR, and Gleason score and PSA as clinical covariates. Subtype-4 is used as the referencegroup for the Subtype variable, as it has the best prognosis. PSA is stratified at 10 ng/mL since it fails the proportional hazardAZsassumption.
b: A second model with the addition of continuous PGA as the only change was also fit. Likelihood-ratio test revealsthat the model without PGA is the best fit for the data (p = 0.054).
Gleason Score 7 vs. 5-6 1.4 0.72 2.8 0.31PGA (continuous, for a 1.6 1.0 2.4 0.03310% increase in PGA)
Page 18 of 130
Table S9. Cox proportional hazard model for overall survival in the Toronto-IGRT cohort.
A multivariate Cox proportional hazard model was fit using only IGRT patients, with the Subtypes as the main predictor ofoverall survival, and Gleason score and PSA as clinical covariates. Subtype-4 is used as the reference group for the Subtypevariable, as it has the best prognosis, it is the largest group, and it has at least one event. Subtype-3 has no deceased patients,and thus a HR cannot be calculated for these patients. PSA is stratified at 10ng/mL since it fails the proportional hazardAZsassumption.
HR Lower 95% CI Upper 95% CI p-valueSubtype-1 1.2 0.14 10 0.86Subtype-2 4.2 1.2 15 0.030Subtype-3 NA NA NA NA
Gleason Score 7 vs. 5-6 1.9 0.42 9.1 0.39
Page 19 of 130
Table S10. Multivariate Cox proportional hazard model for PGA in each cohort.
a: Multivariate Cox Proportional hazard models were fit to patients from all risk groups in each cohort, modeling continuousPGA (modeled in 1% increments) as a predictor for bRFR, and adjusting for PSA, Gleason score and T category (T category forMSKCC and Cambridge cohorts). PSA is stratified at 10ng/mL since it fails the proportional hazardAZs assumption. Five-yearbRFR is used for the Toronto-IGRT and MSKCC cohorts, and 18-month bRFR is used for the Cambridge cohort.
b: Models above are repeated for dichotomized PGA at the upper tertile of the Toronto-IGRT cohort (7.49%).
Lower Upper Lower Upper Lower UpperHR 95% CI 95% CI p-value HR 95% CI 95% CI p-value HR 95% CI 95% CI p-value
Percent Genome Altered 4.5 2.1 9.8 1.3×10−4 3.4 1.6 7.2 0.0011 3.2 1.1 9.0 0.029(≥ 7.49 vs. <7.49)Gleason Score 7 vs. 5-6 0.74 0.30 1.8 0.49 3.1 1.3 7.4 0.012 6.3 0.82 49 0.077Gleason Score 8-9 vs. 5-6 NA NA NA NA 3.9 1.4 11 0.011 6.1 0.61 61 0.12T3 vs. T1-2 NA NA NA NA 4.7 1.9 11 7.2×10−4 2.3 0.65 8.2 0.19
Page 20 of 130
Table S11. C-index analysis of PGA and clinical variables.
The C-index was calculated for the binary (dichotomized at 7.49%) and continuous version of percent genome alteration (PGA)with or without clinical variables in the IGRT cohort (a), the low+int MSKCC cohort (b), the full MSKCC cohort (c), and thefull Cambridge cohort (d). Five-year bRFR is used for the Toronto-IGRT and MSKCC cohorts, and 18-month bRFR is used forthe Cambridge cohort.PGA = Percentage Genome Alteration; NCCN = National Comprehensive Cancer Network; GS = Gleason Score
NCCN group 0.64 0.53 0.75 0.013PGA and NCCN 0.69 0.56 0.81 0.0037
GS, T-category, and PSA 0.73 0.62 0.84 < 0.0001PGA and GS/T/PSA 0.72 0.61 0.84 < 0.0001
Page 21 of 130
Table S12. RNA hypoxia signatures used in this study.
Descriptions of the three previously published RNA signatures for hypoxia that were tested in our prostate cancer cohorts. Thenumber of genes in each signature is shown along with the cancer types in which the signatures were found to be prognostic inthe original publications.
Signature Number Cancer type Referencename of genesBuffa 52 Head and neck, lung, Buffa F.M. et al. Large meta-analysis of multiple cancers reveals
and breast cancers a common, compact and highly prognostic hypoxia metagene.Brit. J. Cancer 2010; 102: 428-35.
West 26 Laryngeal cancer Eustace A.et al. A 26-gene hypoxia signature predictsbenefit from hypoxia-modifying therapy in laryngeal cancer but
not bladder cancer.Clin. Cancer Res. 2013; 19: 4879-88.
Winter 99 Head and neck and Winter S.C. et al. Relation of a hypoxia metagene derived frombreast cancers head and neck cancer to prognosis of multiple cancers.
Cancer Res. 2007; 67: 3441-9.
Page 22 of 130
Table S13. Prognosis of combined PGA and RNA hypoxia scores in the RadP cohorts
Cox proportional hazard models were fit using PGA and binary RNA hypoxia scores for the Buffa (a), West (b), and Winter (c)RNA hypoxia signatures. Both PGA and RNA hypoxia scores are median dichotomized, where patients with a score above themedian ("+") are compared to patients with a score below the median ("-"). The full MSKCC and Cambridge cohorts were used(228 patients with mRNA abundance information).
Table S14. C-index analysis of PGA and hypoxia in the pooled full RadP cohorts.
The c-index of various Cox proportional hazard models were determined using different combinations of PGA and hypoxia forthe Buffa (a), West (b), and Winter (c). When binary variables are used, patients are stratified by the median PGA value (3.84%)and the median RNA Signature Score.
Table S15. Sensitivity analysis of hypoxic measurements, alone and with an interaction with time, in the Toronto-IGRT cohort.
The percentage of intra-glandular pO2 measurements below 5%, 10%, and 20% (i.e. HP5, HP10, and HP20, respectively) aremodeled as a continuous variable, alone and with an interaction with time. Overall, intra-glandular pO2 measurements areprognostic, and have a significant interaction with time. Additionally, HP20 shows the most promise for prognosis in prostatecancer. We considered the full patient cohort reported in a previous study2 (a; n = 247), and the subset of these patients used inthis study (b; n = 126 however two patients have no hypoxic measurements).
aVariable HR Lower 95% CI Upper 95% CI p-valueHP5 1.02 0.999 1.03 0.0646HP5 x time 1.00 0.999 1.00 0.0046HP10 1.03 1.00 1.05 0.0117HP10 x time 1.00 0.999 1.00 0.008HP20 1.04 1.01 1.07 0.0022HP20 x time 0.999 1.00 1.00 0.0002
bVariable HR Lower 95% CI Upper 95% CI p-valueHP5 1.01 0.994 1.03 0.174HP5 x time 1.00 0.999 1.00 0.0242HP10 1.02 0.995 1.04 0.130HP10 x time 1.00 0.999 1.00 0.0334HP20 1.03 0.998 1.06 0.0720HP20 x time 1.00 0.999 1.00 0.0367
Page 25 of 130
Table S16. The prognostic effect of PGA and hypoxia in the Toronto-IGRT cohort.
A Cox proportional hazard model was fit with median dichotomized PGA and hypoxia, including an interaction term. The PGAthreshold is the median of the Toronto-IGRT cohort (3.84%) and the hypoxia threshold is the median HP20 value (81.3%).a: There is a significant interaction between PGA and hypoxia on bRFR in the Toronto-IGRT patients.b: Modeling patients with low or high PGA separately shows that the effect of hypoxia on bRFR in differs between these groupsof patients.
aHR Lower 95% CI Upper 95% CI p-value
PGA ≥ 3.84 vs. PGA < 3.84 1.2 0.55 2.7 0.63HP20 ≥ 81.3 vs. HP20 < 81.3 0.71 0.31 1.6 0.42PGA x HP20 3.8 1.2 12 0.019
bHR Lower 95% CI Upper 95% CI p-value
Effect of hypoxia in low PGA 0.9 0.35 2.3 0.82Effect of hypoxia in high PGA 2.5 1.1 5.4 0.024
Page 26 of 130
Table S17. C-index analysis of PGA and hypoxia in the Toronto-IGRT cohort.
The c-index of various Cox proportional hazard models were determined using different combinations of PGA and hypoxia.When binary variables are used, patients are stratified by the median PGA value (3.84%) and the median HP20 value (81.3%).
C-index Lower 95% CI Upper 95% CI pHypoxia (continuous) 0.54 0.46 0.63 0.27
Table S18. C-index analysis of 100-loci DNA signature and clinical variables.
The C-index was calculated for the binary and continuous version of the 100-loci DNA signature with or without clinical variablesin the (a) low+int MSKCC cohort, the (b) full MSKCC cohort, and the full Cambridge cohort (c). Five-year bRFR is used forthe Toronto-IGRT and MSKCC cohorts, and 18-month bRFR is used for the Cambridge cohort.NCCN = National Comprehensive Cancer Network; GS = Gleason Score* binary** continuousa
C-index Lower 95% CI Upper 95% CI pSignature predicted group* 0.61 0.50 0.73 0.055
NCCN group 0.63 0.51 0.74 0.027Signature and NCCN 0.67 0.54 0.80 0.0066
GS, T-category, and PSA 0.72 0.61 0.83 < 0.0001Signature and GS/T/PSA 0.73 0.62 0.85 < 0.0001
Page 28 of 130
Table S19. Cox proportional hazard model for the CNA signature in the MSKCC cohort.
a: A multivariate Cox proportional hazard model was fit using the low-int MSKCC cohort, with the predicted random forestgroup as the main predictor of bRFR, and Gleason score as a clinical covariate. PSA is stratified at 10 ng/mL since it fails theproportional hazards assumption.
b: A multivariate Cox proportional hazard model was fit using the full MSKCC cohort, with the predicted random forestgroup as the main predictor of bRFR, and Gleason score, T category and PSA as clinical covariates. PSA is stratified at 10ng/mL since it fails the proportional hazardAZs assumption.
c: A second model was fit to the full MSKCC cohort with the addition of continuous PGA as the only change. Alikelihood-ratio test revealed that the model without PGA fits the data better (p = 0.054), supporting the exclusion of PGA in themultivariate Cox model.
CNA Signature Risk Score 2.8 1.4 6.0 0.0060Gleason Score 7 vs. 5-6 2.6 1.1 6.4 0.033Gleason Score 8-9 vs. 5-6 4.2 1.6 11 0.0045T3 vs. T1-2 3.7 1.5 9.4 0.0047
cHR Lower 95% CI Upper 95% CI p-value
CNA Signature Risk Score 2.5 1.0 5.9 0.042Gleason Score 7 vs. 5-6 2.7 1.1 6.5 0.042Gleason Score 8-9 vs. 5-6 4.0 1.5 11 0.0073T3 vs. T1-2 3.9 1.6 9.9 0.0036PGA (continuous, for a 1.0 0.96 1.1 0.5510% increase in PGA)
Page 29 of 130
Table S20. Cox proportional hazard model for the CNA signature in the Cambridge cohort.
A multivariate Cox proportional hazard model was fit using the full Cambridge cohort, with the predicted random forest groupas the main predictor of 18-month bRFR, and Gleason score, T category and PSA as clinical covariates. PSA is stratified at 10ng/mL.
HR Lower 95% CI Upper 95% CI p-valueCNA Signature Risk Score 2.9 1.0 8.2 0.046Gleason Score 7 vs. 5-6 5.5 0.72 42 0.099Gleason Score 8-9 vs. 5-6 7.2 0.74 70 0.895T3 vs. T1-2 3.2 0.95 11 0.061
Page 30 of 130
Table S21. RNA signatures used in prognostic signature comparison.
Signatures are annotated by author, institution, and year. The number of genes in the signature is also indicated.
Author Year Institution Genes ReferenceSingh 2002 Harvard 29 Singh D, Febbo PG, Ross K, et al. Gene expression correlates of clinical prostate
cancer behavior. Cancer Cell 2002; 1(2):203-9.Singh 2002 Harvard 5 Singh D, Febbo PG, Ross K, et al. Gene expression correlates of clinical prostate
cancer behavior. Cancer Cell 2002; 1(2):203-9.Glinsky 2004 Sidney Kimmel Cancer Center 5 Glinsky GV, Glinskii AB, Stephenson AJ, et al. Gene expression profiling predicts
clinical outcome of prostate cancer. J Clin Invest 2004; 113(6):913-23.Glinsky 2004 Sidney Kimmel Cancer Center 4 Glinsky GV, Glinskii AB, Stephenson AJ, et al. Gene expression profiling predicts
clinical outcome of prostate cancer. J Clin Invest 2004; 113(6):913-23.Glinsky 2004 Sidney Kimmel Cancer Center 5 Glinsky GV, Glinskii AB, Stephenson AJ, et al. Gene expression profiling predicts
clinical outcome of prostate cancer. J Clin Invest 2004; 113(6):913-23.Glinsky 2005 MSKCC 11 Glinsky GV, Berezovska O and Glinskii AB. Microarray analysis identifies a death
-from-cancer signature predicting therapy failure in patients with multipletypes of cancer. J Clin Invest 2005; 115(6): 1503-21.
LaPointe 2004 Stanford/Hopkins 29 Lapointe J, Li, C, Higgins HP, et al. Gene expression profiling identifies clinicallyrelevant subtypes of prostate cancer. Proc Natl Acad Sci U S A 2004; 101(3): 811-6.
Varambally 2005 MSKCC/Univ. Michigan 23 Varambally S, Yu J, Laxman B, et al. Integrative genomic and proteomic analysis of prostatecancer reveals signature of metastatic progression. Cancer Cell 2005; 8(5): 393-406.
Stephenson 2005 MSKCC 10 Stephenson AJ, Smith A, Kattan MW, et al. Integration of gene expression profilingand clinical variables to predict prostate carcinoma recurrence after radical prostatectomy.
Cancer 2005; 104(2): 290-8.Bismar 2006 Dana Farber 12 Bismar TA, Demichellis F, Riva A, et al. Defining aggressive prostate cancer
using a 12-gene model. Neoplasia 2006; 8(1): 59-68.Bibikova 2007 UC San Diego 31 Bibikova M, Chudin E, Arsanjani A, et al. Expression signatures that is correlated
with Gleason score and relapse in prostate cancer. Genomics 2007; 89(6): 666-72.Ramaswamy 2003 Dana Farber and MIT 17 Ramaswarmy S, Ross KN, Lander ES, et al. A molecular signature of metastasis
in primary solid tumours. Nat Genet 2002; 33: 49-54.Saal 2007 Cold Spring Harbour 246 Saal LH, Johansson P, Holm K, et al. Poor prognosis in carcinoma is associated
with a gene expression signature of aberrant PTEN tumor suppressor pathwayactivity. Proc Natl Acad Sci U S A 2007; 104(18):7564-9.
Yu 2007 Univ. Michigan 87 Yu J, Yu J, Rhodes DR, et al. A polycomb repression signature in metastaticprostate cancer predicts cancer outcome. Cancer Res 2007; 67(22): 10657-63.
Yu 2007 Univ. Michigan 14 Yu J, Yu J, Rhodes DR, et al. A polycomb repression signature in metastaticprostate cancer predicts cancer outcome. Cancer Res 2007; 67(22): 10657-63.
Cuzick 2011 Scott and White/King’s College 157 Cuzick J, Swanson GP, Fisher G, et al. Prognostic value of an RNA expressionsignature derived from cell cycle proliferation genes in patients with prostate
cancer: a retrospective study. Lancet Oncol 2011; 12(3): 245-55.Genomic 2011 Genomic Health 17 Knezevic D, Goddard A, Natraj N, et al. Analytical validation of the OncotypeDXHealth prostate cancer assay - a clinical RT-PCR assay optimized for prostate needle
biopsies. BMC Genomics, 14(1), 690.(2013).Long 2010 Emory/Sunnybrook, CA 12 Long Q, Johnson BA, Osunkoya AO, et al. Protein-coding and microRNA biomarkers
of recurrence of prostate cancer following radical prostatectomy.Am J Pathol2011; 179(1): 46-54.
Talantov 2010 Garvin Institute, AUS 3 Talantov D, Jatkoe TA, Bohm M, et al. Gene based prediction of clinically localizedprostate cancer progression after radical prostatectomy. J Urology 2010; 184(4): 1521-8.
Wu 2013 Massachusetts General Hospital and 32 Wu CL, Schroeder BE, Ma XJ, et al. Development and validation of a 32-gene prognosticHarvard Medical School index for prostate cancer progression. Proc Natl Acad Sci U S A 2013; 110(15): 6121-6.
Irshad 2013 Columbia University Medical Center 3 Irshad S, Bansal M, Castillo-Martin M, et al. A molecular signature predictive ofindolent prostate cancer. Sci Transl Med 2013; 202 (5): 202ra122.
Irshad 2013 Columbia University Medical Center 19 Irshad S, Bansal M, Castillo-Martin M, et al. A molecular signature predictive ofindolent prostate cancer. Sci Transl Med 2013; 202 (5): 202ra122.
Agell 2012 Hospital del Mar-Mar Health Park 12 Agell L, Hernandez S, Nonell L, et al. A 12-gene expression signature is associatedBarcelona with aggressive histological in prostate cancer. Am J Pathol 2012; 181(5): 1585-1594.
Page 31 of 130
Table S22. The prognostic effect of PGA estimated from the CNA-signature in the full RadP cohorts.
a: A Cox proportional hazard model adjusted for Gleason score, T-category and pre-treatment PSA shows that continuous PGA,as measured from the 276 genes in the CNA-signature is approximately as prognostic as global PGA (see Table S10; appendix p20). Five-year and 18-month bRFR are used for the MSKCC and Cambridge cohorts, respectively.
b: Adding the 30 genes which maximize the correlation between global PGA and signature-based PGA in the IGRT co-hort (see Figure S39; appendix p 85), improves the Cox model in (a) such that the hazard ratio of PGA matches exactly thehazard ratio of global PGA in the same cohort of patients.
aFull MSKCC Cohort Full Cambridge Cohort
HR Lower 95% CI Upper 95% CI p-value HR Lower 95% CI Upper 95% CI p-valuePGA estimated from 1.04 1.01 1.07 0.014 1.02 0.99 1.04 0.21genes in CNA SignatureGleason Score 7 vs. 5-6 2.64 1.09 6.37 0.031 5.2 0.67 40 0.12Gleason Score 8-9 vs. 5-6 5.21 2.02 13.5 0.00064 6.4 0.65 63 0.11T3 vs. T1-2 5.11 2.15 12.1 0.00022 2.3 0.73 7.3 0.15
bFull MSKCC Cohort Full Cambridge Cohort
HR Lower 95% CI Upper 95% CI p-value HR Lower 95% CI Upper 95% CI p-valuePGA estimated from genes 1.05 1.02 1.09 0.0052 1.04 1.00 1.08 0.033in CNA Signature + 30Gleason Score 7 vs. 5-6 3.06 1.30 7.21 0.011 5.0 0.65 39 0.12Gleason Score 8-9 vs. 5-6 5.54 2.12 14.5 0.00048 4.9 0.48 50 0.18T3 vs. T1-2 4.36 1.80 10.6 0.0011 1.7 0.57 5.4 0.33
Page 32 of 130
Table S23. Five-year survival rates for patients grouped by various biomarkers.
Five-year biochemical relapse-free rates (bRFR) for various groups of patients based on the developed markers, except for theCambridge cohort where 18-month bRFR was used. For the Hypoxia-PGA values in the RadP cohorts, the median values of theBuffa, West, and Winter signatures are indicated.
IGRT MSKCC MSKCC Cambridge Full IGRT+MSKCC Pooled RadP Pooled RadP(%) Low-Int (%) Full (%) (18mo bRFR %) Full (%) Low-Int (%) Full (%)
Signature+ with PGA - 59 48 64 85 77Signature- with PGA - 90 80 86 59 48
Page 33 of 130
Page 34 of 130
Supplementary Figures
Page 35 of 130
c
Figure S1. Workflow of analyses and cohorts used throughout the manuscript. Overview of the
datasets used and the bioinformatic processing of the CNA data
a, The CNA data from the Toronto-IGRT training cohort were derived from pre-treatment biopsies (n =
126) and signatures were validated in the MSKCC cBioPortal database and Cambridge cohorts using
clinically-staged localized RadP specimen (n = 154 and n = 117, respectively), when appropriate. The
Toronto-IGRT cohort has biopsies that were spatially-matched to simultaneous hypoxia measurements for
each patient. In the RadP cohorts, we used three previously published RNA signatures for hypoxia14-16
.
b, We derived four novel treatment-independent signatures of outcome in clinically-staged localized CaP.
In general, prognostic indices were developed in the Toronto-IGRT cohort, and tested in the RadP cohorts
(MSKCC and Cambridge). Our four prognostic indices are: 1) four genomic subtypes discovered by
unsupervised clustering; 2) the percentage of the genome altered (PGA; see Supplementary Methods) as a
proxy for genomic instability; 3) a combined PGA-hypoxia index; and 4) a CNA gene signature developed
with a random forest (a supervised machine learning approach).
c, A guide indicating where tables and figures for each survival analysis in each cohort is located. Main
figures and tables are listed in red, and supplementary material in blue.
F = Figure; T = Table; S = Supplementary
Page 36 of 130
a
Toronto-IGRT cohort:
b
MSKCC low-int cohort:
c
MSKCC full cohort:
Page 37 of 130
d
Cambridge low-int cohort:
e
Cambridge full cohort:
f
MSKCC-Cambridge pooled low-int cohort:
Page 38 of 130
g
MSKCC-Cambridge pooled full cohort:
Figure S2. Prognostic impact of clinical variables within each cohort. Log-rank tests were used to
determine the prognostic impact of for Gleason score (left), T category (middle) and PSA (right) in the
a, Toronto-IGRT,
b, MSKCC low-int cohort,
c, MSKCC full cohort,
d, Cambridge low-int cohort
e, Cambridge full cohort
f, MSKCC-Cambridge pooled low-int cohort
g, MSKCC-Cambridge pooled full cohort
Page 39 of 130
a b
Figure S3. Comparison of biochemical recurrence and overall survival between cohorts. a, No difference in BCR rates between Toronto-IGRT, MSKCC low-int, and Cambridge low-int cohorts
(log-rank test)
b, No difference in overall survival rates between IGRT and MSKCC (all risk groups) cohort (log-rank
test). Information on overall survival is not available for the Cambridge cohort.
Page 40 of 130
a b
e
Gleason 6 patients Gleason 7 patients c d
Low-risk Intermediate-risk f
Page 41 of 130
g
Figure S4. Comparison of number of CNAs and number of genes in CNAs between the training
(Toronto- IGRT) and validation (MSKCC) cohorts. The differences in means are tested by the Student's
t-test, the differences in medians by the Mann-Whitney test, and the differences in standard deviation by the
F-test. For complete statistics, see also Table S3 (appendix p 13). All risk groups for the MSKCC cohort
are used unless stated otherwise.
a, Number of genes in regions of CNAs per patient per cohort.
b, Number of patients with a CNA in each gene, per cohort.
c, Number of genes in regions of CNAs per cohort in Gleason 6 patients.
d, Number of genes in regions of CNAs per cohort in Gleason 7 patients.
e, Number of genes in regions of CNAs per cohort in low-risk patients.
f, Number of genes in regions of CNAs per cohort in intermediate-risk patients.
g, Statistical summary of a-f; comparison of mean, median, and standard deviation (“stdev”) of genes in
CNAs per cohort, across different sub-groups. The dots represent the difference between the Toronto-IGRT
cohort and MSKCC cohort (Toronto - MSKCC). Background colour represents the significance level of
each test (Bonferroni-adjusted p-values).
Page 42 of 130
a b
Figure S5. The top 30 most recurrent cytoband regions involved in copy number aberrations in each
cohort. Cytobands are sorted by recurrence within each cohort. The chromosome of each region is
indicated by the coloured box after the gene name.
a, Toronto- IGRT training cohort.
b, MSKCC validation cohort
Page 43 of 130
Figure S6. Copy number profile of cohorts. The profiles are composed of the fraction of patients from
the cohort with a copy number aberration in each gene on chromosomes 1-22. Top panel labelled ‘All’
shows the two cohorts combined. Patients from all risk groups are used. The purple vertical lines indicate
Figure S7. Prognostic CNAs in patient biopsies. Biopsy-derived CNAs (loss, neutral or gain) in putative
prognostic genes for 126 low- and intermediate-risk CaP patients treated with IGRT (individual patients are
individual rows). There is no relationship between any single gene and PGA (see black bars on right). No
single gene is found in the majority of patients.
Page 45 of 130
a
Page 46 of 130
b
Figure S8. Genomic overview of Toronto-IGRT training cohort.
a, CNA profile of the Toronto-IGRT cohort showing prognostic clinical covariates. PGA is dichotomized at
the 7.49%, which is the upper tertile from the Toronto-IGRT cohort (see text). Each column represents a
gene, sorted according to chromosomal positions for chromosomes 1-22. Ward’s clustering with the
Jaccard distance metric was applied to cluster the rows (patients) with consensus clustering (see
Supplementary Methods).
b, Most recurrent genes involved in copy number aberrations per genomic Subtype based on the Toronto-
IGRT cohort only. The x-axis represents the fraction of the subtype which contains a CNA in each denoted
gene. The chromosome of each gene is indicated by the coloured box after the gene name.
Page 47 of 130
a
b
Page 48 of 130
c
Figure S9. Copy number profiles of prostate cancer in the low-high risk patients.
a, MSKCC cohort showing all risk groups and prognostic clinical covariates as in Figure S8a.
b, Full dataset profile for combined Toronto-IGRT and MSKCC cohorts showing all risk groups and
prognostic clinical covariates.
c, Copy number profile of the genomic subtypes in combined Toronto-IGRT and MSKCC cohorts.
Subtype-1
Subtype-2
Subtype-3
Subtype-4
Page 49 of 130
a b
Figure S10. Genoonly.
Figure S10. Genomic Subtypes are prognostic. Genomic subtypes have significantly different
biochemical relapse rates (log-rank test) when considering a) all patients from Toronto-IGRT and MSKCC,
including high-risk patients, b) low-int patients from Toronto-IGRT and MSKCC at 18-months, c) Toronto-
IGRT patients only, d) and, MSKCC patients only.
c d
Page 50 of 130
Figure S11. PGA comparison between patients with deletions of CHD1.
PGA differs significantly between patients with or without a deletion in CHD1, in both the Toronto-IGRT
and MSKCC cohorts. P values from two-sided Mann-Whitney-U tests. The grey line indicates the median
PGA value for the Toronto-IGRT cohort (3.84).
Page 51 of 130
a b
c
Figure S12. PGA operating point analysis.
a, PGA thresholds from 0% to 20% in 0.1% increments were tested for prognosis in all three cohorts. The
vertical lines indicate the median (3.84%) and upper tertile (7.49%) PGA values for the Toronto-IGRT
cohort. The horizontal line is HR = 1. Multivariate Cox proportional hazard models adjusting for Gleason
score and pre-treatment PSA were fit for each PGA threshold in each cohort separately. The low+int
MSKCC and Cambridge cohorts were used.
b, ROC analysis for low-int patients using each cohort alone, and the three cohorts pooled together.
c, ROC analysis for all patients using each cohort alone, and all three pooled together.
Page 52 of 130
a b
c d
Page 53 of 130
e f
g h
Figure S13. PGA is prognostic for general and early failure in the two independent RadP cohorts.
The following patient subgroups are stratified based on the upper tertile PGA value from the Toronto-IGRT
cohort:
a, MSKCC low- to intermediate-risk patients at 5 years
b, MSKCC low- to intermediate-risk at 18 months.
c, MSKCC low- to high-risk at 5 years
d, MSKCC low- to high-risk at 18 months
e, Cambridge low- to high-risk patients at 5 years
f, Cambridge low- to high-risk patients at 18 months
g, Low- to intermediate-risk RadP patients from both MSKCC and Cambridge at 18 months See main
Figure 3B for 5-year curve.
h, RadP patients from all risk groups from both MSKCC and Cambridge at 5 years. See main Figure 3C for
18-month curve.
Page 54 of 130
Figure S14. Classification of metastatic Toronto-IGRT and MSKCC patients by PGA. Toronto-IGRT
patients and MSKCC patients (all risk) that were censored prior to five years and without event are
removed (n = 100; n = 180 remaining). The PGA threshold used to classify metastatic patients was the
upper tertile from the Toronto-IGRT cohort (7.49%, see text), which is the same threshold used for bRFR
predictions. A two-sided Mann-Whitney-U test was used to calculate the difference in median PGA
between patients that did not (‘No’) or did (‘Yes’) develop metastasis by five years.
Page 55 of 130
Figure S15. PGA differs significantly between patients of each genomic Subtype. The cohorts (first
Toronto-IGRT followed by MSKCC) are ordered within each Subtype. P values are determined by
Kruskal-Wallis tests. The combined statistic refers to pooling the two cohorts together.
Page 56 of 130
a b
c d
Figure S16. Tumour hypoxia estimates based on the Buffa RNA signature in pooled RadP patients.
The gene signature was applied to the 108 MSKCC and 110 Cambridge RadP patients with mRNA and
CNA information (includes all risk groups).
No association was found between the hypoxia signature score and T-category (a), Gleason scores (b), or
pre-treatment PSA groups (c). P-values are based on Mann-Whitney U tests.
d, Patients are dichotomized by the median Hypoxia Signature Score of the 52 genes (see appendix p 6 for
details). Patients with scores above the median are considered positive (Signature +) for the hypoxic
signature.
Page 57 of 130
a b
c d
Figure S17. Tumour hypoxia estimates based on the West RNA signature in pooled RadP patients.
The gene signature was applied to the 108 MSKCC and 110 Cambridge RadP patients with mRNA and
CNA information (includes all risk groups).
No association was found between the hypoxia signature score and T-category (a), Gleason scores (b), or
pre-treatment PSA groups (c). P-values are based on Mann-Whitney U tests.
d, Patients are dichotomized by the median Hypoxia Signature Score of the 26 genes (see appendix p 6 for
details). Patients with scores above the median are considered positive (Signature +) for the hypoxic
signature.
Page 58 of 130
a b
c d
Figure S18. Tumour hypoxia estimates based on the Winter RNA signature in pooled RadP patients.
The gene signature was applied to the 108 MSKCC and 110 Cambridge RadP patients with mRNA and
CNA information (includes all risk groups).
No association was found between the hypoxia signature score and T-category (a), Gleason scores (b), or
pre-treatment PSA groups (c). P-values are based on Mann-Whitney U tests.
d, Patients are dichotomized by the median Hypoxia Signature Score of the 99 genes (see appendix p 6 for
details). Patients with scores above the median are considered positive (Signature +) for the hypoxic
signature.
Page 59 of 130
a b
c
Figure S19. Hypoxia signature scores vs. PGA in the pooled RadP cohorts. The three hypoxia RNA
gene signatures were applied to the 108 MSKCC and 110 Cambridge patients with mRNA and CNA
information (includes all risk groups). No correlation (R = Pearson correlation; ρ = Spearman correlation)
is found between PGA and
a, the Buffa hypoxia signature score
b, the West hypoxia signature score
c, the Winter hypoxia signature score
Page 60 of 130
a b
c d
e f
Page 61 of 130
Figure S20. The prognostic effect of PGA and hypoxia in the pooled RadP cohorts. a, The Buffa signature applied to the full MSKCC cohort
b, The Buffa signature applied to the full Cambridge cohort
c, The West signature applied to the full MSKCC cohort
d, The West signature applied to the full Cambridge cohort
e, The Winter signature applied to the full MSKCC cohort
f, The Winter signature applied to the full Cambridge cohort
Page 62 of 130
Figure S21. Direct intra-tumour hypoxia measurements in the IGRT cohort. Patients with high
hypoxia (i.e. HP20 greater than the median HP20) have moderately worse prognosis than patients with low
hypoxia.
Page 63 of 130
Figure S22. Genomic profile of patients ranked according to increasing hypoxia. The hypoxia metric,
HP20, is based on the percentage of oxygen measurements less than 20 mm Hg (appendix p 4) determined
from pre-treatment biopsies of the IGRT cohort (n=126). The median HP20 value was 81•3% with a range
of 0% to 100%. There is no correlation between our genomic subtypes and HP20 (p=0•70, Kruskal-Wallis
test).
Page 64 of 130
a
b
c
Page 65 of 130
d, e,
f,
Figure S23. Percentage of hypoxic measurements (HP20) as a function of clinical and genetic
variables. No association was found between HP20 and
a, T- category (T1 or T2); Mann-Whitney-U test, p = 0.83
b, Pre-treatment PSA (>10ng/mL or <10ng/mL); Mann-Whitney-U test, p = 0.29
c, Gleason score (6 or 7); Mann-Whitney-U test, p = 0.42
d, Genomic subtypes, Kruskal-Wallis test, p = 0.70
e, Individual genes; A Mann-Whitney-U test was applied to genes with at least one CNA, comparing the
median HP20 in patients with vs. without CNAs. No significant genes remained after multiple testing
correction (FDR).
f, Percent genome alteration (PGA). Both Pearson (R) and Spearman (ρ) correlations between PGA and
HP20 as continuous variables are shown.
Page 66 of 130
Page 67 of 130
Figure S24. Supervised learning approach to biomarker development. The IGRT cohort was used to
build the model. Genes were first ordered univariately according to their ability to model patient BCR
status. With a leave-one-out cross-validation (LOOCV) approach, 9 difference signature sizes (3, 5, 10, 50,
75, 100, 300, 500, and 1000) were trained and tested within the IGRT cohort. The signature size with the
best accuracy after cross-validation was selected. The top genes were then re-evaluated using the entire
IGRT cohort (i.e. no held out samples) and tested on the RadP cohorts (only MSKCC cohort is shown).
Page 68 of 130
a b
c d
Figure S25. Classification abilities of the 100-loci DNA signature (“RF”) or clinical variables in the
RadP cohorts. Area under the receiver operator curve (AUC) was calculated using the survivalROC
package to account for censored events. The p-values for the improvement of the CNA signature over other
classifiers were determined by 5,000 permutations of patient-score pairs for each variable and z-test
comparing the true difference in AUC to this distribution. The 100-loci signature was compared to standard
clinical variables in the following cohorts:
a, Low- to intermediate-risk MSKCC patients only. The bootstrap analysis shows that the CNA signature
has superior discrimination potential compared to NCCN (p=0.00036), pre-treatment PSA (p<0.0001,
biopsy-based Gleason score (p=0.00052), and T category (p=0.00021).
b, Low- to high-risk MSKCC patients only. Based on the bootstrap analysis, the CNA signature has
superior discrimination potential compared to NCCN (p=0.012), pre-treatment PSA (p=0.00038), biopsy-
based Gleason score (p=0.00027), and T category (p<0.0001).
c, Low- to high-risk Cambridge patients only. Based on the bootstrap analysis, there is no evidence that the
CNA signature has superior discrimination potential compared to NCCN (p=0.39), pre-treatment PSA
(p=0.18), biopsy-based Gleason score (p=0.79), and T category (p<0.41).
d, Low- to high-risk pooled RadP patients (MSKCC and Cambridge). Based on the bootstrap analysis, the
Page 69 of 130
CNA signature has superior discrimination potential compared to NCCN (p=0.051), pre-treatment PSA
(p=0.0061), biopsy-based Gleason score (p=0.0066), and T category (p=0.00093).
Page 70 of 130
a b
c
Figure S26. The 100-loci DNA signature is prognostic in two individual cohorts. The signature is
significantly associated with patient prognosis in a multivariate analysis including standard clinical risk
factors in the MSKCC low-int cohort at 5 years (a), in the MSKCC full cohort at 18 months (b), and in the
Cambridge full cohort at 18 months (c).
Page 71 of 130
Figure S27. The Signature Risk Score is associated with clinical variables. The Signature risk scores
from the pooled RadP cohorts varies significantly between patients with different Gleason scores and
different NCCN risk groups but not between patients with different clinical T-categories or pre-treatment
PSA values. Significance is based on two-sided Kruskal-Wallis tests.
Page 72 of 130
Figure S28. CNA-signature within Gleason score patient sub-groups. The CNA-signature is efficient at
stratifying RadP patients from both cohorts at risk of biochemical recurrence within 18-months when
considering only patients with a Gleason score less than 7, equal to 7, and greater than 7.
Page 73 of 130
Figure S29. CNA-signature within T-category patient sub-groups The CNA-signature is efficient at
stratifying RadP patients from both cohorts at risk of biochemical recurrence within 18-months when
considering only patients with a T status of T1 or T3, but not T2.
Page 74 of 130
Figure S30. CNA-signature within PSA patient sub-groups. The CNA-signature is efficient at
stratifying RadP patients from both cohorts at risk of biochemical recurrence within 18-months when
considering only patients with PSA concentration < 10ng/mL, but not between 10-20ng/mL or greater than
20ng/mL.
Page 75 of 130
Figure S31. CNA-signature at 18 months within each clinical risk group.
The CNA-signature is efficient at stratifying RadP patients from both cohorts at risk of biochemical
recurrence within 18-months when considering only patients with high-risk disease but not low- or
intermediate-risk.
Page 76 of 130
Figure S32. Classification of metastatic MSKCC patients. Since information on metastasis is not
available for the Cambrdige cohort, this cohort is excluded. Additionally, since time to metastasis was not
available for the MSKCC cohort, MSKCC patients follow-up time less than 5 years and without metastasis
are removed (n = 80; n = 74 remaining). Patients are classified based on their score from the random forest
CNA-signature trained for bRFR (Signature Risk Score). The y-axis shows the percentage of trees in the
random forest (n = 1 million) which ‘voted’ that the patient would experience bRFR. The x-axis shows the
true status for each patient (metastasis or not), and the red lines indicate median score per class. In general,
the patients which went on to develop metastasis show a higher median score based on their pre-treatment
tumour DNA. The grey line indicates the threshold that was used to classify patients (50%) in terms of
bRFR in the original random forest model. A two-sided Mann-Whitney U test was used to determine
whether the metastatic patients had a higher Signature Risk Score than the non-metastatic patients.
Page 77 of 130
a b
Figure S33. Signature plurality analysis. Based on 1 million permutations with signatures of 100
randomly selected regions, our 100-loci DNA signature (indicated with the vertical black line) ranks in the
top 3% of tested signatures for the pooled RadP patients based on C-index (a) and AUC (b). Prognosis was
tested for 5-year biochemical recurrence.
Page 78 of 130
a b
Figure S34. Signature comparison to previously published RNA signatures for 18-month biochemical
recurrence-free survival. We compared the 18-month AUC of our signature (‘CNA_RF’) to 23 previously
published RNA signatures (see Supplementary Table S21, appendix p 31). Signatures are annotated by
author first name of the original publication, and where there are multiple signatures per paper, only the
best ranking signature is shown. We used the 108 MSKCC patients with both mRNA and CNA information
(all risk groups) in order to use a common validation cohort across all signatures. The clinical model is
based on clinical Gleason score, clinical tumour stage (T) and pre-treatment PSA. Our signature, trained on
126 low- to intermediate-risk patients, consistently ranks amongst the top signatures compared to:
a, the RNA signatures trained on 293 low- to intermediate-risk patients
b, the RNA signatures trained on 1299 low- to high-risk patients
Page 79 of 130
a
b
Figure S35. Low-recurrence genes are important for prognosis. Many genes in the CNA-signature have
low-recurrence
a, CNA recurrence of the top 30 genes in the CNA-signatures based on Gini score.
b, Patients with at least n (x-axis) genes involved in CNAs within the signature.
Page 80 of 130
Figure S36. Comparison of the CNA-signature to various known prognostic single CNA biomarkers. HRs are based on 5-year survival and are adjusted for Gleason score and PSA (stratified at 10ng/mL), and
are shown by the circles, with the extent of the horizontal lines illustrating the 95% confidence intervals.
Here, the MSKCC cohort was restricted to the low- and intermediate-risk patients only to match the clinical
distribution of the IGRT cohort. Our CNA signature outperforms known univariate prognostic CNA
markers in the MSKCC cohort. The Combined cohort refers to the Toronto-IGRT and low-int MSKCC
cohorts pooled together.
Page 81 of 130
Figure S37. Relative importance of the 100 signature loci. Higher importance scores (based on random
forest Gini scores) represent features (loci) that are better at distinguishing patients with vs. without
biochemical relapse, in the context of the model. Not all chromosomes were informative: only 14 of 22
chromosomes are represented in the signature (p < 0•0001, two-tailed chi-squared test).
Page 82 of 130
a
b
c
Page 83 of 130
Figure S38: Functional analysis of the CNA-signature. a, using 276 signature genes
b, repeated but excluding duplicate gene family members from a single CNA region. For example, in (a)
LIPF and LIPK are both included and in the same CNA/signature region, but in (b), only of them is
included. The signature genes were compared to all other genes used in the aCGH analyses (n = 17,603).
Background shading refers to p-value, and the yellow spots represent -fold enrichment of CNA-signature
genes in the pathway compared to chance.
c, The Signature Risk Score varies significantly across the four genomic subtypes in the full MSKCC
cohort (Kruskal-Wallis test). The red lines indicate the median score per subtype.
Page 84 of 130
a b
c d
Page 85 of 130
e
Figure S39. Global PGA vs. signature-estimated PGA. Estimating PGA from only the 276 genes from
the CNA-signature is reasonable (ρ: Spearman correlation, Pρ: p-value for Spearman correlation, B1: slope
of the line, PB1: p-value for slope of the line) in patients from all cohorts combined (a), as well as in the
IGRT (b) the RadP (c) and the Cambridge (d) cohorts alone. e, Adding 100 additional genes increased the
Spearman correlation. The gene which maximizes the PGA to signature-estimated PGA correlation in the
IGRT cohort was selected as the next gene to be added. Each additional gene is added sequentially and the
signature-estimated PGA is cumulative for all previously included genes.
Page 86 of 130
a
Page 87 of 130
b
Figure S40. Example of clinical stratification of patients based on the PGA-Hypoxia index.
a, An overview of how clinical management could change for patients of all risk groups based on assessing
PGA and hypoxia status at the time of diagnosis. For each subgroup, the proposed treatment, the sample
size, and the 5-year bRFR rate is shown. Of 1000 patients, 360 patients would be placed into intensification
trials, and the remainder could opt for current protocols or be placed into treatment de-escalation trials. See
part b for a proposed clinical trial for low-int patients.
b, A proposed prospective randomized clinical trial of low-intermediate risk patients stratified by the PGA-
Hypoxia index. Patients will first be stratified by their PGA and Hypoxia status (“PGA+ Hypoxia+” vs. all
other combinations (“Others”)). The primary outcome is comparing PGA+ Hypoxia+ patients treated with
additional therapy (ADT and new agents, see below) to PGA+ Hypoxia+ patients treated with local therapy
alone (“Intensification Trial” arm). We are aiming for a HR 0.45 for the patients with additional treatment,
as ADT alone has a HR of 0.59.35,36
Two secondary outcomes will also be evaluated. First, we will assess
the efficacy of the signature by comparing Sig+ to Sig- patients treated with local therapy only. Based on
the pooled RadP cohorts of our study, we expect a HR of 3.4 for the PGA+ Hypoxia+ patients (or 0.29 for
the remaining patients; see Figure 4). Finally, in a separate single-arm de-escalation study, we will
determine whether patient without high PGA and high hypoxia can be effectively managed with active
surveillance. Generally at 5-years, 30% of patients with localized disease managed with active surveillance
Page 88 of 130
have progressed37
; here, we aim to decrease this percentage to 15% of patients. An alpha of 0.05 is used for
each of the primary and secondary analyses. Patients will be randomly distributed to each group; “Other”
patients have 67% chance of receiving active surveillance compared to local therapy, and PGA+Hypoxia+
patients have equal probability of receiving local therapy alone, or local therapy, ADT and new therapeutic
agents. Examples of new agents that could be tested in this trial include Enzalutamide, Abiraterone, PARP
inhibitors, and Metformin.
Page 89 of 130
a
Page 90 of 130
b
Figure S41. Example of clinical stratification of patients based on the 100-loci DNA signature.
a, An overview of how clinical management could change for patients of all risk groups based on the 100-
loci DNA signature. For each subgroup, the proposed treatment, the sample size, and the 5-year bRFR rate
is shown. Of 1000 patients, 144 patients would be placed into intensification trials, and the remainder could
opt for current protocols or be placed into treatment de-escalation trials. See part b for a proposed clinical
trial for low-int patients.
b, A proposed prospective randomized clinical trial of low-intermediate risk patients stratified by the 100-
loci DNA signature. Patients will first be stratified by the DNA signature (“Sig-” vs. “Sig+”). The primary
outcome is comparing Sig+ patients treated with additional therapy (ADT and new agents, see below) to
Sig+ patients treated with local therapy alone (“Intensification Trial” arm). We are aiming for a HR 0.45
for the patients with additional treatment, as ADT alone has a HR of 0.59.35,36
Two secondary outcomes
will also be evaluated. First, we will assess the efficacy of the signature by comparing Sig+ to Sig- patients
treated with local therapy only. Based on the pooled RadP cohorts of our study, we expect a HR of 2.72 for
the Sig+ patients (or 0.37 for the Sig- patients; see Figure 5). Finally, in a separate single-arm de-escalation
study, we will determine whether Sig- can be effectively managed with active surveillance. Generally at 5-
years, 30% of patients with localized disease managed with active surveillance have progressed37
; here, we
aim to decrease this percentage to 15% of patients. An alpha of 0.05 is used for each of the primary and
secondary analyses. Patients will be randomly distributed to each group; Sig- patients have 67% chance of
Page 91 of 130
receiving active surveillance compared to local therapy, and Sig+ patients have equal probability of
receiving local therapy alone, or local therapy, ADT and new therapeutic agents. Examples of new agents
that could be tested in this trial include Enzalutamide, Abiraterone, PARP inhibitors, statins, and Protein
Kinase C inhibitors.
References
1. Ishkanian AS, Mallof CA, Ho J, Meng A, Albert M, Syed A, et al. High-resolution array CGH identifies novel regionsof genomic alteration in intermediate-risk prostate cancer. Prostate 2009; 69: 1091-100.
2. Milosevic M, Warde P, Menard C, Chung P, Toi A, Ishkanian AS, et al. Tumor hypoxia predicts biochemical failure fol-lowing radiotherapy for clinically localized prostate cancer. Clin. Cancer Res. 2012; 18: 2108-14.
3. Locke JA, Zafarana G, Ishkanian AS, Milosevic M, Thoms J, Have CL, et al. NKX3.1 haploinsufficiency is prognosticfor prostate cancer relapse following surgery or image-guided radiotherapy. Clin. Cancer Res. 2012; 18: 308-16.
4. Khojasteh M, Lam WL, Ward RK, MacAulay C. A stepwise framework for the normalization of array CGH data. BMCBioinformatics 2005; 6: 274.
5. Shah SP, Xuan X, DeLeeuw RJ, Khojasteh M, Lam WL, Ng R, et al. Integrating copy number polymorphisms into arrayCGH analysis using a robust HMM. Bioinformatics 2006; 22: e431-9.
6. Dal Pra A, Lalonde E, Sykes J, Warde F, Ishkanian AS, Meng A, et al. TMPRSS2-ERG status is not prognostic follow-ing prostate cancer radiotherapy: Implications for fusion status and DSB repair. Clin. Cancer Res. 2013; 19: 5202-9.
7. Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, et al. Integrative genomic profiling of human prostatecancer. Cancer Cell 2010; 18: 11-22.
8. Cerami E, Gao J, Dosgrusoz U, Gross BE, Sumer SO, Aksoy BU, et al. The cBio Cancer Genomics Portal: An openplatform for exploring multidimensional cancer genomics data. Cancer Discov. 2012; 2: 401.
9. Mohler JL, Bahnson RR, Boston B, Busby JE, D’Amico AV, Eastham JA, et al. Prostate Cancer, Version 3.2012 FeaturedUpdates to the NCCN Guidelines. J Natl Compr Canc Netw 2012; 10: 1081-7.
10. Warren AY, Whitaker HC, Haynes B, Sangan T, McDuffus LA, Kay JD, et al. Method for sampling tissue for researchwhich preserves pathological data in radical prostatectomy. Prostate 2013; 73: 194-202.
11. Yau C, Mouradov D, Jorissen RN, Colella S, Ghazala M, Steers G, et al. A statistical approach for detecting genomicaberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol 2010; 11: R92.
12. Dunning MJ, Smith ML, Ritchie ME, Tavare S. et al. beadarray: R classes and methods for Illumina bead-based data.Bioinformatics 2007; 23: 2183-2184.
13. Johnson WE, Li C, Rabinovic A, Tavare S. Adjusting batch effects in microarray expression data using empirical Bayesmethods. Biostatistics 2007; 8: 118-127.
14. Eustace A, Mani N, Span PN, Joely JI, Taylor J, Betts GNJ, et al. A 26-gene hypoxia signature predicts benefit fromhypoxia-modifying therapy in laryngeal cancer but not bladder cancer. Clin. Cancer Res. 2013; 19: 4879-88.
15. Buffa FM, Harris AL, West CM, CJ Miller. Large meta-analysis of multiple cancers reveals a common, compact andhighly prognostic hypoxia metagene. Brit. J. Cancer 2010; 102: 428-35.
16. Winter SC, Buffa FM, Silva P, Crispin Miller, Valentine HR, Turley H, et al. Relation of a hypoxia metagene derivedfrom head and neck cancer to prognosis of multiple cancers. Cancer Res. 2007; 67: 3441-9.
17. Roach M, Hanks G, Thames HJ, Schellhammer P, Shipley WU, Sokol GH, et al. Defining biochemical failure follow-
Page 92 of 130
ing radiotherapy with or without hormonal therapy in men with clinically localized prostate cancer: recommendations of theRTOG-ASTRO Phoenix Consensus Conference. Int J Radiat Oncol Biol Phys 2006; 65: 965-74.
18. Mottet N, Bastian PJ, Bellmunt J, van den Bergh, RCN, Bolla M, van Casteren, NJ, et al.. Guidelines on prostate can-cer. Uroweb 2014. Available at:http://www.uroweb.org/gls/pdf/09%20Prostate%20Cancer_LRLV2.pdf. Accessed June 20 2014.
19. Buyyounouski MK, Pickles T, Kestin LL, Allison R, Williams SG. Validating the interval to biochemical failure for theidentification of potentially lethal prostate cancer. J Clin Oncol 2012; 30: 1857-63.
20. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRoy Stat Soc B 1995; 57: 289-300.
21. Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: A resampling-based method for class discovery and vi-sualization of gene expression microarray data. Mach Learn 2003; 52: 91-118.
22. Jaccard P. Etude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoisedes Sciences Naturelles 1901; 37: 547-79.
23. Breiman L. Random forest. Mach Learn 2001; 45: 5-32.
24. Boutros PC, Lau SK, Pintilie M, Shepherd FA, Der SD, Tsao MS, et al. Prognostic gene signatures for non-small-celllung cancer. Proc Natl Acad Sci U S A 2009; 106: 2824-8.
25. Starmans MHW, Fung G, Steck H, Wouters BG, Lambin P. A simple but highly effective approach to evaluate the prog-nostic performance of gene expression signatures. PLoS One 2011; 6: e28320.
26. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ toanalyze and compare ROC curves. BMC Bioinformatics 2011; 12: 77.
27. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biomet-rics 2004; 56: 337-44.
28. Erho N, Crisan A, Vergara IA, Mitra AP, Ghadessi M, Buerki C, et al. Discovery and validation of a prostate cancergenomic classifier that predicts early metastasis following radical prostatectomy. PLoS One 2013; 8: e66855.
29. Karnes JR, Bergstralh EJ, Davicioni E, Ghadessi M, Buerki C, Mitra AP, et al. Validation of a genomic classifier thatpredicts metastasis following radical prostatectomy in an at risk patient population. J Urol 2013; 190: 2047-53.
30. Magi-Galluzzi C, Li J, Stephenson AJ, et al. Independent validation of a genomic classifier in an at risk population ofmen conservatively managed after radical prostatectomy. Presented at SUO Annual Meeting, Bethesda, 2013.
31. Den R, Feng FY, Showalter TN, et al. The Decipher prostate cancer classifier predicts biochemical failure in patientsfollowing post-operative radiation therapy. Presented at SUO Annual Meeting, Bethesda, 2013.
32. Boormans JL, Korsten H, Ziel-van der Made AJ, van Leenders GJ, de vos CV, Jenster G, et al. Identification of TDRD1 as adirect target gene of ERG in primary prostate cancer. Int J Cancer (2013); 133: 335-45.
33. Jhavar S, Brewer D, Edwards S, Kote-Jarai Z, Attard G, Clark J, et al. Integration of ERG gene mapping and gene-expressionprofiling identifies distinct categories of human prostate cancer. BJU Int 2009; 103: 1256-69.
Page 93 of 130
34. Piccolo SR, Withers MR, Francis OE, Bild AH, Johnson WE. Multiplatform single-sample estimates of transcriptionalactivation. Proc Natl Acad Sci U S A 2013; 110: 17778-83.
35. Zumsteg ZS, Spratt DE, Pei X, Yamada Y, Kalikstein A, Kuk D, et al. Short-term Androgen-Deprivation Therapy Im-proves Prostate Cancer-Specific Mortality in Intermediate-Risk Prostate Cancer Patients Undergoing Dose-Escalated ExternalBeam Radiation Therapy. International Journal of Radiation Oncology, Biology, Physics 2013; 85(4): 1012-7.
36. Jones CU, Hunt D, McGowan DG, Amin MB, Chetner MP, Bruner DW, et al. Radiotherapy and Short-Term AndrogenDeprivation for Localized Prostate Cancer. New Engl J Med 2011; 365(2): 107-18.
37. Klotz L, Zhang L, Lam A, Nam R, Mamedov A, Loblaw A. Clinical Results of Long-Term Follow-Up of a Large, Ac-tive Surveillance Cohort With Localized Prostate Cancer. J Clin Oncol 2010; 28(1): 126-31.
Not applicable Not applicable Not applicable 58.75 0.9167 2 0Not applicable Not applicable Not applicable 68.92 0.7619 4 0Not applicable Not applicable Not applicable 82.56 0.3214 3 1Not applicable Not applicable Not applicable 75.27 0.7381 3 0Not applicable Not applicable Not applicable 79.02 0.9524 2 0Not applicable Not applicable Not applicable 75.2 0.869 3 0Not applicable Not applicable Not applicable 68.07 0.9286 1 0Not applicable Not applicable Not applicable 67.34 1 4 1Not applicable Not applicable Not applicable 74.78 0.2857 2 1Not applicable No Not applicable 77.35 1 3 0Not applicable Not applicable Not applicable 70.3 0.9762 3 1Not applicable Not applicable Not applicable 72.55 0.7143 4 0Not applicable Not applicable Not applicable 70.95 0.8333 4 0Not applicable Not applicable Not applicable 75.54 0.2619 4 0Not applicable Not applicable Not applicable 76.63 0.0682 4 0Not applicable Not applicable Not applicable 74.7 0.9286 4 1Not applicable Not applicable Not applicable 70.44 0.9286 4 1Yes Yes Not applicable 75.29 0.9762 1 0Yes Yes Yes 73.76 0.9762 3 0Not applicable Not applicable Not applicable 60.61 0.881 4 1Not applicable Not applicable Not applicable 75.05 0.5476 4 0Not applicable Not applicable Not applicable 71.74 0.5946 4 0Not applicable Not applicable Not applicable 67.85 0.7857 4 0Not applicable Not applicable Not applicable 76.65 0.9286 4 0Yes Yes Yes 57.31 0.6923 2 0Not applicable Not applicable Not applicable 68.94 0.5 2 0No Indeterminate Not applicable 67.18 1 4 0No Not applicable Not applicable 57.63 0.8571 4 0Not applicable Not applicable Not applicable 63.42 0 4 0Not applicable No Not applicable 64.12 0.9762 4 0Not applicable Not applicable Not applicable 74.88 0.9048 2 0Not applicable Not applicable Not applicable 74.3 0.1212 3 1Not applicable Not applicable Not applicable 60.43 0.7857 4 1No No Not applicable 73.79 0.9524 4 0Not applicable Yes Not applicable 69.39 0.8571 2 1Yes No Not applicable 75.94 0.7778 4 0Not applicable Not applicable Not applicable 73.93 0.6905 3 1Not applicable Not applicable Not applicable 69.5 0.7143 1 0Not applicable Not applicable Not applicable 63.52 0.631 3 0Not applicable Not applicable Not applicable 70.67 0.7529 3 1Not applicable Not applicable Not applicable 69.54 0.5333 4 0Not applicable Not applicable Not applicable 72.82 0.7262 3 0Not applicable Not applicable Not applicable 69.96 0.6905 4 0Not applicable Not applicable Not applicable 71.3 0.9167 4 0Not applicable Not applicable Not applicable 66.96 0.369 1 0Not applicable Not applicable Not applicable 69.85 0.6429 4 1Not applicable Not applicable Not applicable 65.89 0.869 1 0Not applicable Not applicable Not applicable 65.24 0.7831 3 0Not applicable Not applicable Not applicable 75.66 0.5952 4 0Not applicable Not applicable Not applicable 60.72 0.8621 4 0
Page 99 of 130
No No No 69.75 0.25 2 0Not applicable Not applicable Not applicable 69.02 0.8313 3 0Not applicable Not applicable Not applicable 70.8 0.9615 4 0Not applicable Not applicable Not applicable 70.09 0.7381 4 0Not applicable Not applicable Not applicable 74.62 0.881 4 0Not applicable Not applicable Not applicable 61.63 0.35 4 0Yes No Not applicable 71.38 0 4 0Yes Yes Not applicable 61.47 1 4 0No No No 73.11 0.6667 4 0Not applicable Not applicable Not applicable 70.75 0.8571 4 0Not applicable Not applicable Not applicable 70.41 0.7738 4 0Not applicable Not applicable Not applicable 70.78 0.9524 4 0Not applicable Not applicable Not applicable 69.83 0.7976 3 1Not applicable Not applicable Not applicable 69.4 0.1905 4 1Not applicable Not applicable Not applicable 56.77 0.9643 4 0No No Not applicable 72.41 0.75 1 0Not applicable Not applicable Not applicable 67.8 0.9048 1 0Not applicable Not applicable Not applicable 74.35 0.9405 1 0Not applicable Not applicable Not applicable 74.78 0.7024 4 1Not applicable Not applicable Not applicable 61.54 0.1071 4 0Not applicable Not applicable Not applicable 71.23 0.8929 3 0No Not applicable Not applicable 71.87 0.9762 2 0Yes No Not applicable 60.96 1 4 0Not applicable Not applicable Not applicable 76.45 0.9643 1 1Not applicable Not applicable Not applicable 77.59 0.3452 4 0Not applicable Not applicable Not applicable 68.61 0.9762 2 0Yes No Not applicable 73.82 0.4133 1 0Not applicable Not applicable Not applicable 72.05 0.8929 2 0No No Not applicable 71.77 0.2976 2 0Not applicable Not applicable Not applicable 75.18 0.4048 4 0Not applicable Not applicable Not applicable 75.33 0.5595 4 0No Yes Not applicable 73.67 0.8333 3 0Not applicable Not applicable Not applicable 71.92 1 4 0Not applicable Not applicable Not applicable 77.63 0.7976 1 0Not applicable Not applicable Not applicable 66.26 0.9167 1 1Not applicable Not applicable Not applicable 81.25 1 1 0Not applicable Not applicable Not applicable 80.65 0.9286 1 0Not applicable Not applicable Not applicable 72.49 0.5714 4 1Not applicable Not applicable No 69.95 0.7381 4 1Not applicable Not applicable Not applicable 74.67 0.631 4 0Not applicable Not applicable Not applicable 55.43 0.9773 4 0Not applicable Not applicable Not applicable 65.26 0.6905 2 0No No Not applicable 71.58 1 3 0Not applicable Not applicable Not applicable 75.93 0.75 4 0Not applicable Not applicable Not applicable 80.49 0.7381 3 0Not applicable Not applicable Not applicable 76.03 0.119 4 1Not applicable Not applicable Not applicable 79.41 0.75 4 0Not applicable Not applicable Not applicable 75.78 0.7262 4 0Not applicable Not applicable Not applicable 70.41 0.8333 4 0Not applicable Not applicable Not applicable 66.86 0.9365 2 0Not applicable Not applicable Not applicable 63.71 0.4762 4 0
Page 100 of 130
Not applicable Not applicable Not applicable 71.46 0.7143 4 0Not applicable Not applicable Not applicable 70.29 0.381 4 1No Indeterminate Suspicious 74.15 0.9286 4 0No Not applicable Not applicable 73.54 0.9524 4 0Not applicable Not applicable Not applicable 76.96 0.9286 4 0Not applicable Not applicable Not applicable 77.58 NA 3 0Not applicable Not applicable Not applicable 65.9 0.2286 4 0Inconclusive Not applicable Not applicable 63.81 0.6905 4 0Not applicable Not applicable Not applicable 77.94 0.9048 4 0Not applicable Not applicable Not applicable 73.06 0.8571 4 0No Not applicable Not applicable 76.14 0.9762 4 1Not applicable Not applicable Not applicable 63.45 0.3571 2 0No Not applicable Not applicable 70.84 0.9762 4 0Not applicable Not applicable Not applicable 68.53 0.6667 1 1No Yes Yes 74.06 0.8293 2 0Not applicable Not applicable Not applicable 61.96 0.9048 4 0Not applicable Not applicable Not applicable 72.42 0.8889 4 0No No Not applicable 65.34 NA 2 1Not applicable Not applicable Not applicable 79.53 0.9524 4 0Not applicable Not applicable Not applicable 75.92 0.631 2 0Not applicable Not applicable Not applicable 71.15 0.9286 4 1Not applicable Not applicable Not applicable 75.74 0.7143 3 1Not applicable Not applicable Not applicable 75.12 1 4 0Not applicable Not applicable Not applicable 82.29 0.9762 4 0Not applicable Not applicable Not applicable 74 1 4 0
Page 101 of 130
Collapsed feature Frequency of a CNA in:Symbol EntrezID Chromosome Start End in CNA-signature Subtype-1 Subtype-2 Subtype-3 Subtype-4
Date of BCR PSA at BCR Death Age at deathmulti-focal 01-12-2013 23-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 28-01-2011 01-08-2013 0.1 N N/A N/A N N/A N/Auni-focal 01-12-2013 23-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 01-08-2013 0.19 Y 18-10-2008 0.25 N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A N N/A N/Auni-focal 01-12-2013 12-04-2013 0.04 Y 04-07-2008 9.8 N N/A N/Amulti-focal 01-12-2013 02-08-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 05-08-2013 0.04 Y 09-02-2013 0.3 N N/A N/Amulti-focal 01-12-2013 20-10-2008 49.2 Y 08-09-2008 16.2 Y 26-09-2009 67multi-focal 01-12-2013 01-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 29-01-2013 0.05 N N/A N/A N N/A N/Amulti-focal 01-12-2013 27-01-2012 0.05 N N/A N/A N N/A N/Amulti-focal 28-01-2011 05-10-2010 0.1 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.02 Y 16-04-2009 0.11 N N/A N/Amulti-focal 01-12-2013 02-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 04-08-2013 0.1 N N/A N/A N N/A N/Amulti-focal 01-12-2013 24-05-2012 0.1 N N/A N/A N N/A N/Auni-focal 01-12-2013 23-08-2013 0.11 N N/A N/A N N/A N/Auni-focal 01-12-2013 24-05-2012 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 26-06-2012 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 01-10-2012 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 05-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.04 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 04-08-2013 0.02 N N/A N/A N N/A N/Auni-focal 01-12-2013 02-08-2013 0 N N/A N/A N N/A N/Amulti-focal 01-12-2013 17-09-2012 0.1 N N/A N/A N N/A N/Auni-focal 28-01-2010 13-10-2010 0.1 Y 13-10-2009 0.06 N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 14-08-2013 0.11 Y 14-08-2013 0.11 N N/A N/Amulti-focal 01-12-2013 01-02-2012 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 5.8 Y 13-01-2012 1.9 N N/A N/Amulti-focal 01-12-2013 01-08-2013 0.01 N N/A N/A N N/A N/A
Number Of Tumours
Date of update
Date final PSA
Final PSA
Biochem Relapse
Date of death
Page 122 of 130
multi-focal 01-12-2013 02-09-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 29-01-2013 0.05 N N/A N/A N N/A N/Amulti-focal 14-12-2010 24-05-2012 5.24 Y 03-02-2010 0.32 N N/A N/Amulti-focal 14-12-2010 04-08-2013 0.11 Y 15-06-2010 0.12 N N/A N/Amulti-focal 28-01-2011 05-08-2013 0 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 09-02-2013 0.13 Y 07-05-2009 0.04 N N/A N/Amulti-focal 28-01-2011 23-08-2013 0.2 Y 29-01-2013 0.3 N N/A N/Auni-focal 01-12-2013 01-10-2013 0.02 N N/A N/A N N/A N/Amulti-focal 01-12-2013 08-09-2010 0.1 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.04 N N/A N/A N N/A N/Auni-focal 01-12-2013 23-08-2013 0.03 N N/A N/A N N/A N/Auni-focal 01-12-2013 02-08-2013 0 N N/A N/A N N/A N/Amulti-focal 01-12-2013 23-08-2013 0.01 N N/A N/A N N/A N/Amulti-focal 01-12-2013 01-08-2013 0.03 N N/A N/A N N/A N/Amulti-focal 01-12-2013 06-09-2013 0.02 N N/A N/A N N/A N/A
01-12-2013 21-05-2012 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 12-11-2012 0.01 N N/A N/A UNKNOWUNKNOWN N/A
multi-focal 23-08-2013 0.05 N N/A N/A UNKNOWN N/Amulti-focal 01-12-2013 31-07-2013 0.02 Y 20-10-2010 0.08 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 28-05-2012 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.1 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 01-10-2013 0.02 Y 25-07-2012 0.09 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 04-07-2012 0.02 N N/A N/A UNKNOWUNKNOWN N/Auni-focal 01-12-2013 30-07-2013 0.01 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Auni-focal 01-12-2013 23-08-2013 0.27 Y 01-08-2011 1.7 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 07-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 10-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 11-07-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 26-06-2012 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 05-08-2013 0.04 Y 15-03-2011 0.12 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 29-01-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 15-08-2013 0.01 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 21-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A
01-12-2013 28-07-2013 0.03 N N/A N/A UNKNOWUNKNOWN N/A
Page 123 of 130
multi-focal 01-12-2013 08-08-2013 0.1 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 31-07-2013 0.01 Y 21-04-2011 0.37 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 02-02-2013 0.01 N N/A N/A UNKNOWUNKNOWN N/Auni-focal 01-12-2013 23-08-2013 0.03 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 05-10-2013 0.04 Y 29-01-2013 0.16 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.02 Y 21-05-2011 0.43 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.6 Y 20-07-2011 0.36 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.1 Y 08-07-2011 0.07 UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 02-08-2013 0 N N/A N/A UNKNOWUNKNOWN N/A
01-12-2013 04-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2012 0.01 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 23-08-2013 0.2 N N/A N/A UNKNOWUNKNOWN N/Amulti-focal 01-12-2013 17-05-2013 0.02 N N/A N/A N/A
01-12-2013 11-11-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 04-02-2013 0.01 N N/A N/A UNKNOWUNKNOWN N/A
13-01-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.1 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 02-02-2013 0.1 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 02-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 01-02-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 09-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.11 N N/A 0.11 UNKNOWUNKNOWN N/A01-12-2013 25-02-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 23-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 22-08-2013 0.04 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 17-07-2013 0.02 Y 10-02-2012 0.08 UNKNOWUNKNOWN N/A01-12-2013 11-05-2012 0.02 Y 10-02-2012 0.28 UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.2 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 26-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.04 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 19-10-2012 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 13-10-2013 0.14 Y 13-10-2013 0.14 UNKNOWUNKNOWN N/A01-12-2013 11-05-2012 0.03 Y 11-05-2012 0.03 UNKNOWUNKNOWN N/A01-12-2013 23-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 07-02-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A
Page 124 of 130
01-12-2013 16-08-2013 0.1 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 08-04-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 05-11-2013 0.1 Y 22-06-2012 1.26 UNKNOWUNKNOWN N/A01-12-2013 04-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 09-08-2013 0.03 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 10-08-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 24-05-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 06-09-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A01-12-2013 05-07-2013 0.02 N N/A N/A UNKNOWUNKNOWN N/A