The Cancer Genome Browser Sofie Salama COAT-PhD Summer School 2012 1
Jan 15, 2016
The Cancer Genome Browser
Sofie Salama
COAT-PhD Summer School 2012
1
The Cancer Genome Browser
• OUTLINE– Slide show to introduce the Cancer Genomics Browser
• What’s there?• How to visualize the data?• Tools
– Live Demo• Basic setup• Breast cancer data
– Using signatures– Microarray vs RNA-Seq– Comparing across datasets
• GBM data– Genesets– What genes correlate with phenotypes?
– Playtime!2
3https://genome.ucsc.edu
UCSC Genome Browser
• Base level to full genome display capability
• ENCODE• Human sequence
variation• Whole genome
association studies• Human genetic
and disease related genome annotation
4https://genome-cancer.ucsc.edu
Large-scale Medical Genomics Datasets
New issues arise to visualize high-throughput cancer genomics data: data security and access control, sample cohort, multi-analytes, and clinical and phenotypic information.
5
UCSC Cancer Genomics Browser
• Simultaneously display patient genomic and clinical data from a cohort of samples
• Base level to full genome display capability
• Multiple studies
• Growing list of published studies, including public-tier TCGA data
• Integrated with popular UCSC Genome Browser and its vast store of genomic information
Zhu J et. al Nature Methods. 2009 Sanborn JZ et.al. Nucleic Acids Res. 2010
New UCSC Cancer Browser Portal
genome-cancer.ucsc.edu
User Interface: A portal to display high throughput data sets
Teresa Swatloski, Brian Craft, Mary Goldmangenome-cancer.ucsc.edu
toggle on/off RefSeq genes
link to tumor image browser
link to human genome browser
user sign in help menu
view in chromosome mode
select dataset to view
configuregenesets
configuregenomic signatures
view in gene mode
resize panels
position or gene search bar
User Interface Features
Teresa Swatloski, Brian Craft, Mary Goldmangenome-cancer.ucsc.edu
Dataset selection showing TCGA breast cancer data
TCGA breast cancer datasets•Gene expression, copy number, DNA Methylation, RPPA, Paradigmlite•TCGA clinical data
Teresa Swatloski
Genomic and phenotypic data heatmaps
Genomic data Clinical data
genome-cancer.ucsc.edu
Individual dataset layout
Samples
Genomic data Clinical data
Genomic locations / Genes
genome-cancer.ucsc.edu
Samples
Samples
Clinical Heatmap
sample_type days_to_last_followup
Solid tissue normal
Primary solid tumor
amplification deletion
Genomics Heatmap
Metastatic
•Multiple clinical features
•Clinical data encoded in color
Sample sorting determined by clinical data
• Sample (i.e. vertical) order is determined by the clinical data on the right• The samples is always sorted by clinical features• Tie break using subsequent clinical features
Samples
genome-cancer.ucsc.edu
Zoom in to See Individual Sample
drag zoomslider
genome-cancer.ucsc.edu
genomic heatmap
clinical heatmap
heatmap view
adjust display coloring
configuration window for clinical variables, sample subgrouping and statistics
box plot summary view
proportions summary view
click to show dataset detail
remove dataset
Individual Dataset Control
Teresa Swatloski
Summary Views
Heatmap View - Amplified / Deleted Regions
Proportions Summary View
Box Plot Summary View
glioblastoma multiforme
breastcarcinoma
lung squamous
cell
DNA Copy Number Profile Summary View
TCGA CNV
glioblastoma multiforme
breastcarcinoma
lung squamous
cell
DNA Copy Number Profile Summary View
TCGA CNV
EGFRCDKN2A,CDKN2B
Genes View Mode
genome-cancer.ucsc.edu
20
“Genes” Configuration
Currently displayed gene list
Three ways to add a gene list
Type or copy and paste user defined genes
1
2
3
genome-cancer.ucsc.edu Teresa Swatloski
Genes view to see the PAM50 intrinsic gene expression subtypes in TCGA Breast
data
Basal
LuA
LuB
Her2-like
Normal-likePAM50: Parker et al., Journal of Clinical Oncology (2009)
Basal
LumA
LumB
Her2
Tumor
Solid normal
Same thing with RNA-Seq Data
Online statistical tests compare two subgroups
Samples
Subgroup samples
genome-cancer.ucsc.edu
Online statistical tests compare two subgroups
Samples
Subgroup samples
p values
genome-cancer.ucsc.edu
click to view detail and use the variable to subgroup samples
perform statistical tests to compare subgroup1 and subgroup2
subgroup 1
subgroup 2
variables used in defining subgroups
“Active Feature List” area
Sample subgroup configuration
Compare subgroups using the summary view
EGFR amplification in GBM is largely in the non CpG island DNA methylator samples (non G-CIMP)
methylator samples in GBM is largely proneural by gene expression, also from younger patients, with better survival
Evaluate Genomic Signature on the Browser
B. Computed signatures online -> approximate prediction
A. Enter signature as an algebraic expression
Evaluate Genomic Signature on the Browser
• 21 gene signature predicts rate of recurrence at 10 yr in ER+ patients treated with TAM (Paik 2004)
• Genomic signature online approximation: higher score -> higher likelihood of recurrence; low score -> lower likelihood of recurrence
Evaluate Genomic Signature on the Browser
• Browser view of ER+ patients in a preoperative chemotherapy study dataset • Signature score correlates with pathCR: the paradox that ER+ patient who is more likely to
have recurrent disease in 10 years treated with TAM is also more likely to respond to chemotherapy
Genomic Signature Configuration
Current signatures
Three ways to add a genomic signature1
2
3Enter signature as an algebraic expression
Such as: + TP53 – 0.25* ERBB2
Teresa Swatloski
Web APICreate a url to specify a view to the cancer browser•base: https://genome-cancer.ucsc.edu/hgHeatmap/#?•data track(s): comma separated gene names
•display mode•gene list: coma separated gene names
•chromosomal position •genomic signature: e.g. +TP53-0.25*ERBB2
Examples
•dataset=vijver2002&pos=chr2:123767566-chr2:187943340
•dataset=ucsfNeveCGH&displayas=geneset&gene_list=TP53,ERBB2
Documentation
https://genome-cancer.soe.ucsc.edu/proj/site/help Brian Craft, Mary Goldman
User Account and Security
Brian Craftgenome-cancer.ucsc.edu
cgData: Cancer Genomic data specification
• Gene expression, copy number, RPPA, DNA methlylation, siRNA viability, phenotypes, clinical data
• Support large-scale genomic data repository
- Currently supports Cancer Browser
- Plan to support automated data analysis pipeline
• “Solve” (address) common data linking problem
• Meta data tracking
• Once data in this specification, automated data ingestion to UCSC Cancer Browser
Kyle Ellrott
Cancer Browser Updates
• Current improved version launched January, 2012
• Monthly data freeze
• Latest freeze data viewable on the Cancer Browser within a few days
• July, 2012 – Added ability to download processed datasets and improved user interface for clinical features, subgrouping and statistics
Data freeze 2012-02-28 summary (sample number)
37
Summary
• Simultaneously display patient genomic and clinical data from a cohort of samples
• Multiple studies data visualization• Base level to full genome, and genesets display
capability• cgData data repository driven
• Monthly data freeze and version control
• User account
• Project-specific access-control
• Single signon portal
• Provide web API for linking
DCC,Firehose
UCSC cgData Repository
UCSC Next-gen Sequencing UCSC Next-gen Sequencing Data AnalysisData Analysis•DNA-seq (bambam, bridget)DNA-seq (bambam, bridget)•mutation, allelic-specific copymutation, allelic-specific copy number, number, structural rearrangementstructural rearrangement•Combined RNA/DNA analysisCombined RNA/DNA analysis•RNA editingRNA editing
converter
browser
pathway analysisClinical Clinical Predictors Predictors (TopModel)(TopModel)
Bam files Mutation call Mutation call comparisoncomparison
PARADIGM PARADIGM pathway pathway analysisanalysis
UCSC Cancer UCSC Cancer Genomics Genomics BrowserBrowser
cBio
39
UCSC Cancer Genomics GroupBrian CraftTeresa SwatloskiMary GoldmanKyle EllrottErich WeilerChris WilksSinger MaChristopher SzetoSofie SalamaMia GriffordSam NgTed GoldsteinDan CarlinDaniel ZerbinoMelissa ClineMark DiekhansJosh StuartDavid Haussler
CollaboratorsThe Cancer Genome AtlasStand Up To CancerIntl. Cancer Genomics ConsortiumISPY consortiumMSKCCLINCS consortiumChristopher Benz, Buck InstituteLaura Esserman, UCSFJoe Gray, OHSUEric Collisson, UCSFGordon Mills, MDACCRachel Schiff, BCM
Funding AgenciesNCI/NIH, NHGRIAmerican Association for Cancer Research
Acknowledgment
The Cancer Genome Browser
• OUTLINE– Slide show to introduce the Cancer Genomics Browser
• What’s there?• How to visualize the data?• Tools
– Live Demo• Basic setup• Breast cancer data
– Using signatures– Microarray vs RNA-Seq– Comparing across datasets
• GBM data– Genesets– What genes correlate with phenotypes?
– Playtime!40
cgData Packages
genomic data A (CNV)
genomic data B (RPPA)
clinical data1(FFPE, timepoint)
clinical data 2(patient, age,..)
meta-data
Most likely your data files
Need to add meta data file
meta-datameta-data
meta-data
cgData Packages
idMap(TCGA BRCA)
genomic data A (CNV)
genomic data B (RPPA)
clinical data1(FFPE, timepoint)
clinical data 2(patient, age,..)
TCGA-01-ABCD-01A
TCGA-01-ABCD-01A-EG
TCGA-01-ABCD
TCGA-01-ABCD-01A-JH
patient
sample
aliquot
sample
aliquot
cgData Packages
idMap(TCGA BRCA)
genomic data A (CNV)
genomic data B (RPPA)
clinical data1(FFPE, timepoint)
clinical data 2(patient, age,..)
Mostly likely already in UCSC cgData library
Most likely your data files
Need to add meta data file
Identifiers used in data files
parent-child relationships
probeMap Bassembly
(hg18)probeMap B(antibody)