Top Banner
Integration and analysis of high- throughput data types for insights into complex disease National Council of Women NSW Olena Pchilka Branch of the Ukranian Women’s Association NSW SARAH-JANE SCHRAMM
22

Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Aug 23, 2014

Download

Science

Integrative ‘-omics’ – wherein multiple types of high-throughput data are combined and analysed together – continues to grow in popularity for its potential to illuminate the basis of complex diseases. Our work explores different ways of combining such data to reveal insights into cancer biology.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Integration and analysis of high-throughput data types for insights into

complex disease

National Council of Women NSW

Olena Pchilka Branch of the Ukranian Women’s Association NSW

SARAH-JANE SCHRAMM

Page 2: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Multiple approaches

Page 3: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Melanoma

› High and rising incidence

› Aggressive and therapy resistant, surgical resection is key

› Same stage disease can have markedly different survival outcomes

› Patient outcome predicted using clinical and histological features

› Limited predictive power for individual patients

Stages I & II, primary melanoma

Stage III, lymphatic drainage from primary (nodal metastases)

Stage IV, further dissemination (distant metastases)

Image adapted and reproduced from LANCET ONCOLOGY|Vol 8|2007

Page 4: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Research aims

› New prognostic markers

- To determine whether there are significant biomarker and pathway differences between melanomas of good and bad prognosis after resection of nodal metastatic disease;

› New therapeutic targets

- To identify and validate the principal regulatory pathway abnormalities that characterise metastatic (stage III and IV) melanomas;

- To investigate novel genomic drivers of melanoma tumour progression and outcome.

Page 5: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

What is Cancer Systems Biology?

“Recent findings point to daunting heterogeneity within individuals, and even

within tumours over time…

…rummaging through that complexity is exactly what systems biologists

do…

…Rather than focusing on one molecular pathway, this integrative approach

blends many contexts, including DNA, RNA, proteins, signalling networks,

cells, organs, whole organisms and even environmental factors.

This varied data mix requires scientists to build complex mathematical

models of cancer, which in turn drive new research questions…”

Reprinted from NATURE|Vol 464|2010

Page 6: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

How does one do Cancer Systems Biology?

1. Collect and prepare different data types such as,

› Gene expression microarray data

› MicroRNA expression array data

› Proteomic data

› Clinical data e.g., survival data

› Pathologic data e.g., subtypes

› Mutation data e.g., RNA/DNA-seq

2. Combine and interpret data with mathematical models

3. Validate the models

Slide adapted from Los Alamos q-bio Summer School, 2009

Page 7: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

1. Collection and preparation of data

P Natl Acad Sci USA|Nov. 13|2009 Clin Cancer Res.|Vol. 14|2008 Clin Cancer Res.|Vol. 16|2010 JID|Vol 133|2013

Gene expression microarray data

Thank you to Drs Anna Campain, Vivek Jawayasal and Yee Hwa Yang, School of Mathematics & Statistics, The University of Sydney

Page 8: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

1. Collection and preparation of data

CLINICAL DATA

Tumour_DateBanked

Person_Sex

Person_DateBirth

Person_NumPrim

Person_DateLastFUDeath

Person_FUStatus

Person_StageatBank

Person_DateRelapse

Age_Analysis

Prognosis_TimeSinceLNMet GENOTYPE

Tum_BRAFmut

Tum_NRASmut

Tum_FLT3mut

Tum_METmut

Tum_PIK3CAmut

Tum_PDGFRAmut

Tum_EGFRmut

PATHOLOGY - PRECEDING PRIMARY

Person_NumPrim

Prim_Worst

Prim_BestGuess

Prim_Date_Diag

Prognosis_TimeOverall

Prim_Site

Prim_Site_SunExp

Prim_Stage

Prim_TStage

Prim_NStage

Prim_Naevus

Prim_Breslow

Prim_Mitos

Prim_Clark

Prim_Histol

Prim_Regress

Prim_Ulc

Prim_Vasc

Prim_LymphInv

Prim_Satell

Sun_Damage_Score

NM

SSM

PATHOLOGY - METASASES

Tum_NumNodesInv

Tum_MetSize

Tum_Extranodal

Tum_CellType

Tum_CellSize

Tum_Necrosis

Tum_Pigment

Tum_NonTumour%

Clinical, pathological, and mutation type data

Thank you to Prof. Richard Scolyer and his team at the Royal Prince Alfred Hospital, The University of Sydney

Page 9: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

1. Collection and preparation of data

› Human Protein Reference Database

- Keshava Prasad et al. 2009

› iRefWeb

- Turner et al. 2010

› BioGRID

- Chatr-aryamontri et al. 2013

› MetaCore

- From GeneGo Inc.

Hairball image generated using Cytoscape

(Smoot et al. 2011)

Protein-protein interaction data

Thanks to Simone Li and Drs Igy Pang and David Fung at the Systems Biology Initiative, the University of New South Wales

Page 10: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

NATURE BIOTECH.|Vol 27|2009

Page 11: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

Results – gene co-expression networks are significantly disturbed among patients with good and poor clinical outcomes

› A:

› Patients surviving >4yr post

resection of metastatic disease

› B:

› Patients surviving <1yr post

resection of metastatic disease

› C & D:

› Enlarged view (HDAC)

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 12: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes

Gene symbol ID Known drug target

Causally

implicated in

cancer(s)

Number of

interaction

partners (k) = 6-38

Previously

prognosis-

associated

Previously

progression-

associated

Previously tumor

thickness-

associated

Protein type1

AKT1 P P Protein Kinase

APPL1 P Protein

CCNA2 P P Protein

CDC25A P Phosphatase

CIITA P P Protein

CREBBP P Enzyme

CSNK2A1 Protein Kinase

FANCG P P Protein

GATA4 P Transcription

Factor

GRAP2 P Protein

GRB2 Protein

HDAC1 P Enzyme

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 13: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes

Gene symbol ID Known drug target

Causally

implicated in

cancer(s)

Number of

interaction

partners (k) = 6-38

Previously

prognosis-

associated

Previously

progression-

associated

Previously tumor

thickness-

associated

Protein type1

HIF1A P P P Transcription

Factor

IKBKB P P Protein Kinase

IL16 Receptor

Ligand

JAK1 P P Protein Kinase

KHDRBS1 P Protein

MYBL2 P Transcription

Factor

NF2 P P Protein

PDZK1 P Protein

PIM1 P P P Protein Kinase

PSTPIP1 P Protein

PTPN11 P P Phosphatase

RAPGEF1 P Regulator

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 14: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes

Gene symbol ID Known drug target

Causally

implicated in

cancer(s)

Number of

interaction

partners (k) = 6-38

Previously

prognosis-

associated

Previously

progression-

associated

Previously tumor

thickness-

associated

Protein type1

RBL1 P Protein

RBX1 P Enzyme

SMAD2 P Transcription

Factor

SMAD7 P Protein

STAMBP P Metalloproteas

e

TGM2 P P Enzyme

TLE1 Protein

TNF P P P Receptor

Ligand

› 9 are already known drug targets (although not in melanoma)

› 8 already causally implicated in other cancers

› 5 previously associated with melanoma progression or prognosis or indirectly associated

via correlation with tumor thickness (more than would be expected by chance)

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 15: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

15

Results – top ranking hubs are cancer-associated both individually (below) and as a gene set (data not shown)

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 16: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

2. Mathematical modeling and interpretation

Results - top ranking hubs can be used together to predict patient outcome

Cohort Mann Bogunovic Jönsson John

Sample size (ngood

outcome; npoor outcome) 47 (23;25) 33 (23;10) 54 (7;47) 24 (10;14)

Classes compared

survival >4yr with no

sign of relapse or <1yr

after surgical resection

of stage III disease

survival ≥ 1.5yr

or<1.5yr since

metastasis

overall survival

time taken to tumor

progression from stage

III to stage IV disease

≥2yr or <2yr

Class prediction error

rate (LOOCV under

KNN)

0.33 0.24 0.20 0.29

• Comparison with standard-of-care prognostic markers

• Novel proposed prognostic biomarkers should be tested for improved performance relative to current biomarkers (McShane, Altman et al. 2005)

• We compared the prediction accuracy of our 32-hub classifier with the prediction accuracy of the four most statistically significant clinico-pathologic prognostic parameters in stage III melanomas: i.e., number of tumor-positive lymph nodes, tumor burden at the time of staging (microscopic v. macroscopic), presence or absence of primary tumor ulceration, and thickness of the primary melanoma (Balch, Gershenwald et al. 2009).

• Misclassification rate of 56% for our set of 48 patients, which is less accurate than the misclassification rate of 33% obtained for this cohort using the hub-based classifier.

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 17: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

3. Validation with a view to a mechanism…?

› Exome sequencing data (Hodis et al. 2012)

› Calculation of functional mutation burden for ~16,000 genes (Broad

Institute software)

› Functional mutation burden is significantly (P<0.05) higher in protein

interaction partners of top-ranking hubs than would be expected by chance

› So, is functional mutation burden a pathogenic mechanism behind the

differential network behaviour we observe between patients with good and

poor clinical outcome?

› If so, can differential network behaviour act as a compass by indicating

genomic areas (i.e., members of disturbed networks) that should be

carefully scrutinized (including non-coding regions) for undiscovered and

potentially targetable mutations???

› More work is needed!

17

An association between network-type and functional mutation burden

PIG. CELL & MEL. RES.|26(5):708-22|2013

Page 18: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Summary and conclusions

• Used a large-scale, ‘systems biology’ approach to identify features of intracellular networks that are perturbed in poor-prognosis metastatic melanoma.

• Showed this to be consistent in a number of independent patient cohorts identifying:

• A portfolio of high priority potential targets for therapy, characterised by enrichment for cancer pathways, existing cancer drug targets, and functional mutation burden

• Gene expression of the 32 hubs forms a new, a priori-selected prognostic gene expression signature in the setting of metastatic melanoma: a critical turning point for many patients for which therapeutic options are very limited (but further validation needed).

• Present work is focussed on investigating our preliminary observation that network disturbances are associated with higher functional mutation burden

• Modelling and integration of different data types to answer clinically relevant questions is ongoing

Page 19: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Square One.

› Perform equivalent experiments using data from larger cohorts as well as other cancer types to see whether the observation can be repeated

› So,

- 1. Collect and prepare data: breast (TCGA), ovarian (Metabric), and melanoma (in-house/TCGA), lung (TCGA)

• Permissions and applications…yikes!

- 2. Mathematical modelling and interpretation

• In collaboration with The USYD Maths and Stats team (Yee Hwa Yang, Shila Ghazanfar, and John Ormerod)

• Software generated in-house and available externally (VAN, Jayaswal et al. 2013; MuText and InVex – Broad Institute, Hodis et al. 2012)

- 3. Validation…

19

An association between network phenotype and functional mutation burden?

Sincere thanks to Dr Yee Hwa Yang and Shila Ghanazfar for their essential collaboration in this work

Page 20: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Software spruik

20

VAN: identifying biologically perturbed networks using differential variability analysis

BMC RES. NOTES.|6(430)w|2013, special thanks to Dr Vivek Jayaswal for his invaluable collaboration

Page 21: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Issues common among different integration approaches

• Power

• Handling of prior knowledge biases

• Visualisation

• Maintaining clinical relevance

• Computational search space

Page 22: Integration and analysis of high throughput data types for insights into complex disease - Sarah-Jane Schramm

Acknowledgements

› UNSW

› Marc Wilkins

- Simone Li

- Chi Nam Ignatius Pang

- David Fung

- Apurv Goel

- Natalie Twine

› USYD

› Graham Mann

- Gulietta Pupo & Varsha Tembe

› Swetlana Mactier

› Richard Scolyer (RPA)

› Yee Hwa Yang

- Anna Campain

- Vivek Jayaswal

- Kaushala Jayawardana

- Shila Ghanazfar

My contact details:

[email protected]

p. 0408 260 588