Top Banner
Integrative Multi-Scale Biomedical Informatics Joel Saltz MD, PhD Director Center for Comprehensive Informatics
55

Indiana 4 2011 Final Final

Dec 07, 2014

Download

Documents

Joel Saltz

Presentation at the University of Indiana Spring 2011
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Indiana 4 2011 Final Final

Integrative Multi-Scale Biomedical Informatics

Joel Saltz MD, PhDDirector Center for Comprehensive

Informatics

Page 2: Indiana 4 2011 Final Final

2 Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data

Run lots of different algorithms to derive same features

Run lots of algorithms to derive complementary features

Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms

Squeezing Information from Spatial Datasets

Page 3: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Outline• Integrative biomedical informatics analysis –

feature sets obtained from Pathology and Radiology studies

• Techniques, tools and methodologies for derivation, management and analysis of feature sets

Page 4: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

INTEGRATIVE BIOMEDICAL INFORMATICS ANALYSIS

Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology)Integration of anatomic/functional characterization with multiple types of “omic” informationCreate categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment

Page 5: Indiana 4 2011 Final Final

In Silico Center for Brain Tumor Research

Specific Aims:

1. Influence of necrosis/hypoxia on gene expression andgenetic classification.

2. Molecular correlates of highresolution nuclear morphometry.

3. Gene expression profiles that predict glioma progression.

4. Molecular correlates of MRIenhancement patterns.

Page 6: Indiana 4 2011 Final Final

Integration of heterogeneous multiscale information

•Coordinated initiatives Pathology, Radiology, “omics”

•Exploit synergies between all initiatives to improve ability to forecast survival & response.

RadiologyImaging

Patient Outco

me

Pathologic Features

“Omic”Data

Page 7: Indiana 4 2011 Final Final

Lee Cooper Carlos Moreno

Example: Pathology and Gene Expression Joint Predictors of Recurrence/Survival

Page 8: Indiana 4 2011 Final Final

FEATURE CHARACTERIZATION IN PATHOLOGY AND RADIOLOGY

Role – In silico Brain Tumor ResearchAlgorithmsScaling Requirements

Page 9: Indiana 4 2011 Final Final

In Silico Center for Brain Tumor ResearchKey Data Sets

REMBRANDT: Gene expression and genomics data set of all glioma subtypes

The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology

Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others

Page 10: Indiana 4 2011 Final Final

TCGA Research Network

Digital Pathology

Neuroimaging

Page 11: Indiana 4 2011 Final Final

Progression to GBM

Anaplastic Astrocytoma(WHO grade III)

Glioblastoma(WHO grade IV)

Page 12: Indiana 4 2011 Final Final

TCGA Neuropathology Attributes 120 TCGA specimens; 3 Reviewers

 Presence and Degree of:

Microvascular hyperplasia Complex/glomeruloid Endothelial hyperplasia

Necrosis Pseudopalisading pattern Zonal necrosis

Inflammation Macrophages/histiocytes Lymphocytes Neutrophils 

Differentiation: Small cell component Gemistocytes Oligodendroglial Multi-nucleated/giant cells Epithelial metaplasia           Mesenchymal metaplasia

Other Features Perineuronal/perivascular

satellitosis Entrapped gray or white matter Micro-mineralization  

Page 13: Indiana 4 2011 Final Final

Distinguishing Characteristic in Gliomas

Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images

Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities

Oligodendroglioma Astrocytoma

Nuclear QualitiesRound shaped withsmooth regular texture

Elongated with rough, irregular texture

Page 14: Indiana 4 2011 Final Final

Feature Extraction

TCGA Whole Slide Images

Jun Kong

Page 15: Indiana 4 2011 Final Final

Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology

Astrocytoma vs Oligodendroglima• Assess nuclear size (area and

perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).

Page 16: Indiana 4 2011 Final Final

Whole slide scans from 14 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo component399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells

Machine-based Classification of TCGA GBMs (J Kong)

TCGA Gene Expression Query: c-Met overexpression

Page 17: Indiana 4 2011 Final Final

Clustergram of selected features used in consensus clustering

Feat

ure

Indi

ces

10

20

30

40

50

60

70

80

90

100

110

Nuclear Features Used to Classify GBMs

Page 18: Indiana 4 2011 Final Final

2 1 3 4

50 100 150

20

40

60

80

100

120

140

160

0 0.2 0.4 0.6 0.8 1

1

2

3

4

Silhouette Value

Clu

ster

Consensus clustering of morphological signatures

Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.

Nuclear Features Used to Classify GBMs

Page 19: Indiana 4 2011 Final Final

Survival of morphological clusters

0 500 1000 1500 2000 2500 30000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Days

Sur

viva

l

Cluster 1Cluster 2Cluster 3Cluster 4

Page 20: Indiana 4 2011 Final Final

Survival of patients by molecular tumor subtype

0 500 1000 1500 2000 2500 30000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Days

Sur

viva

l

ProneuralNeuralClassicalMesenchymal

Page 21: Indiana 4 2011 Final Final

Articulate Physical Interpretations of Results

Page 22: Indiana 4 2011 Final Final

Images

Page 23: Indiana 4 2011 Final Final

Multiscale Systems BiologyEmploy multi-resolution methods to

characterize necrosis, angiogenesis and correlate these with “omics”

No enhancementNormal VesselsStable lesion

?Rim-enhancementVascular ChangesRapid progression

Page 24: Indiana 4 2011 Final Final

Correlation of Necrosis, Angiogenesis and “omics”

• GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis

• These factors may impact gene expression profiles

Page 25: Indiana 4 2011 Final Final

Genes Correlated with Necrosis include Transcription Factors Identified as Regulators of the Mesenchymal Transition

GeneSymbol

SAM q-value(Corrected p-value)

C/EBPB < 0.000001C/EBPD < 0.000001FOSL2 < 0.000001STAT3 0.0047RUNX1 0.0082

Carro MS, et al. Nature 263: 318-25, 2010

• Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis

• Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area

Page 26: Indiana 4 2011 Final Final

Feature Sets in Radiology(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)

• Require standardized validated feature sets to describe de novo disease.

• Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology:– To define a comprehensive set of imaging

features of cancer– For reporting imaging results– To provide a more quantitative, reproducible

basis for assessing baseline disease and treatment response

Page 27: Indiana 4 2011 Final Final

Defining Rich Set of Qualitative and Quantitative Image Biomarkers

• Community-driven ontology development project; collaboration with ASNR

• Imaging features (5 categories)– Location of lesion– Morphology of lesion margin (definition, thickness,

enhancement, diffusion)– Morphology of lesion substance (enhancement, PS

characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)

– Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)

– Resection features (extent of nCE tissue, CE tissue, resected components)

Page 28: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Emory TJU/CBIT/NCI UVA/Northwestern Henry Ford

David A Gutman1 Adam Flanders3 Max Wintermark8 Lisa Scarpace4

Lee Cooper1 Eric Huang2 Manal Jilwan8 Tom Mikkelsen4

Scott N Hwang1 Robert J Clifford2 Prashant Raghavan8

Chad A Holder1 Dina Hammoud3 Pat Mongkolwat9

Doris Gao1 John Freymann7

Carlos Moreno1 Justin Kirby7

Arun Krishnan1

Jun Kong1

Carl Jaffe6

Seena Dehkharghani1

Joel Saltz1

Dan Brat1

Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set

The TCGA glioma working group1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD.  3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD.  6Boston University School of

Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University Chicago, IL

Page 29: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Assumed Dependence Between Features

F6

F16

F7

F5

F11

F22

F14

F21

F19

F1

F24

F9F3

F10 F13

F18 F20F1: Tumor LocationF3: Eloq. BrainF5: Prop. EnhanceF6: Prop. nCETF7: Prop. NecrosisF9: DistributionF10: T1/FLAIRF11: En. Marg. Thick.F13: Def. Non. Marg.F14: Prop. EdemaF16: HemorrhageF18: Pial InvasionF19: EpendymalF20: Cort. Involve.F21: Deep WM Inv.F22: nCET Cross. Mid.F24: Satellites

NOTE: each feature omitted from this graph is independent ofevery other feature.

Slide thanks to Eric Huang, NIH

Page 30: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Estimation Problem Size ReductionCan ignore seven of the features size of contingency table reduced from 2.64 × 1012 cells to 1.34 × 1010 cells.

Collapsibility reduces size of contingency table even further: • Any binary feature Fj connected to only one feature on

graph (i.e. given the feature Fj is connected to, Fj is independent of all other features) can also be ignored

• Eliminates need to deal with Hemorrhage (F16), Pial Invasion (F18), Cortical Involvement (F20), and Satellites (F24).

• Reduces size of contingency table to 1.68 × 109 cells.• Additional analogous considerations can be used to reduce

size of contingency table by more than two additional orders of magnitude

Slide thanks to Eric Huang, NIH

Page 31: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Correlative Imaging Results

• Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006).

• >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).

• Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001).

< 5% Enhancement

Page 32: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Correlative Imaging Results

• TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images.

• EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005).

• High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01).

> 5% Necrosis

Page 33: Indiana 4 2011 Final Final

• Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data

• Run lots of different algorithms to derive same features

• Run lots of algorithms to derive complementary features

• Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms

Squeezing Information from Spatial Datasets

Page 34: Indiana 4 2011 Final Final

Pipeline for Whole Slide Feature Characterization

• 1010 pixels for each whole slide image• 10 whole slide images per patient• 108 image features per whole slide image• 10,000 brain tumor patients• 1015 pixels• 1013 features• Hundreds of algorithms• Annotations and markups from dozens of

humans

Page 35: Indiana 4 2011 Final Final

PAIS Database

Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)

Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.

Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships

Page 36: Indiana 4 2011 Final Final

Data Models to Represent Feature Sets and Experimental Metadata

PAIS |pās| : Pathology Analytical Imaging Standards• Provide semantically enabled data model to support

pathology analytical imaging• Data objects, comprehensive data types, and flexible

relationships• Object-oriented design, easily extensible• Reuse existing standards

– Reuse relevant classes already defined in AIM– Follow DICOM WG 26 metadata specifications on WSI reference– Specimen information in DICOM Supplement 122 and caTissue– Use caDSR for CDE and NCI Thesaurus for ontology concepts

Page 37: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Pathology Imaging GIS

Segmentation

Feature extraction

Image analysis

class Domain Mo...

Annotation

GeometricShape

CalculationObservation

Specimen

ImageReference

Provenance

User

PAIS

EquipmentGroup

AnatomicEntity

Subject

Field

Project

MicroscopyImageReference

DICOMImageReference

TMAImageReference

Markup

Inference

Region

WholeSlideImageReferencePatient

Surface

Collection

AnnotationReference

10..1

1

0..1

0..*

0..*

1

0..*1

0..11 0..*

1

0..1

10..1

10..1

10..*

10..*

0..*

0..*

1 0..11

0..1

1

0..*

0..1

0..*

1

0..*

1

0..1

1

0..*

10..1

10..1

1

0..*

10..*

1 0..*

1

0..*

Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS

PAIS model PAIS data management

On the fly data processing for algorithm validation/algorithm sensitivity studies, or discovery of preliminary results

HDFS data staging MapReduce based queries

Page 38: Indiana 4 2011 Final Final

Generation and Analysis of Imaging Features

• In-transit data processing using filter/stream systems

• Semantic Workflows• Hierarchical pipeline design with coarse

and fine grained components• Adaptivity and Quality of Service

Page 39: Indiana 4 2011 Final Final

Same basic story in multiple domains

Page 40: Indiana 4 2011 Final Final

Classification using DataCutter Filter Stream Workflow

Page 41: Indiana 4 2011 Final Final

Slides’ Preparation

• 64990 x 59412 pixels in full resolution• Original Size: 10.8 Gb; Compressed Sized: ≈

833Mb

8x

40x

Page 42: Indiana 4 2011 Final Final

Computerized Classification System for Grading Neuroblastoma

• Background Identification• Image Decomposition (Multi-

resolution levels)• Image Segmentation

(EMLDA)• Feature Construction (2nd

order statistics, Tonal Features)

• Feature Extraction (LDA) + Classification (Bayesian)

• Multi-resolution Layer Controller (Confidence Region)

No

YesImage Tile Initialization

I = L Background? Label

Create Image I(L)

Segmentation

Feature Construction

Feature Extraction

Classification

Segmentation

Feature Construction

Feature Extraction

Classifier Training

Down-sampling

Training Tiles

Within ConfidenceRegion ?

I = I -1

I > 1?

Yes

Yes

No

No

TRAINING

TESTING

Page 43: Indiana 4 2011 Final Final

Segmentation

A typical segmentation result of an image from undifferentiated class with components segmented by this method is shown. (a) Original image; (b) Partitioned image shown in color; (c)Nuclei; (d)Cytoplasm; (e)Neuropil; (f)Background component.

Page 44: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Semantic Workflows (Wings)Collaborative Work with Yolanda Gil, Mary Hall

• A systematic strategy for composing application components into workflows

• Search for the most appropriate implementation of both components and workflows

• Component optimization– Select among implementation variants of the same

computation– Derive integer values of optimization parameters– Only search promising code variants and a restricted

parameter space• Workflow optimization

– Knowledge-rich representation of workflow properties

Page 45: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

Adaptivity

Page 46: Indiana 4 2011 Final Final

Framework

• Description Module (Wings): Describe application workflow using semantics of workflow components

• Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines

• Tradeoff Module: Schedules execution based on application level QOS

Page 47: Indiana 4 2011 Final Final
Page 48: Indiana 4 2011 Final Final

Impact

Page 49: Indiana 4 2011 Final Final
Page 50: Indiana 4 2011 Final Final

Cen

ter

for

Com

preh

ensi

ve In

form

atic

s

  

Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray

(PI’s: Foran and Saltz)

Build reference library ofexpression signatures, integrate state-of-the-art multi-spectral imaging capability and build a deployable clinical decision support system for analyzing imaged specimens. Technologies and computational tools developed during the course of the project to be tested on a Grid-enabled, virtual laboratory established among strategic sites located at CINJ, Emory, RU, UPenn, OSU, and ASU.

Funded by NIH through grant#5R01LM009239-02

David J. Foran, Ph.D.

Page 51: Indiana 4 2011 Final Final

Center for Comprehensive Informatics Integrative Biomedical Informatics Projects

In Silico Study of Brain Tumors Minority Health Genomics and Translational

Research Bio-Repository Database (MH-GRID) ACTSI Cardiovascular, Diabetes, Brain Tumor Registry Early Hospital Readmission CFAR (Center for AIDS Research) HIV/Cancer Project Radiation Therapy and Quantitative Imaging Integrative Analysis of Text and Discrete Data

Related to Smoking Cessation and Asthma Semantic Query and Analysis of Integrative Datasets

in Renal Transplant Clinical Studies (CTOT-C)

Page 52: Indiana 4 2011 Final Final

Atlanta Clinical and Translational Science InstituteFederated Data Warehouse System

Develop integrative, federated ACTSI information warehouse Integrated clinical/imaging/”omic”/biomarker/tissue information

should always be available A virtually centralized, big Atlanta wide information warehouse that

has all relevant data Patients seen and information gathered at any ACTSI site, specimens sent

to any affiliated core, imaging carried out at any affiliated site E.g. Gene expression, SNP, virtual slide images, hematology studies

and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010.

Development efforts Security, Web Portal, Common Data Elements & Vocabularies,

Identifiers, High-performance Computing middleware, Testing framework.

Page 53: Indiana 4 2011 Final Final

ACTSI-wide Federated Data Warehouse

Page 54: Indiana 4 2011 Final Final

Thanks to:• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David

Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)

• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads

• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others

• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul

Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony

Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)

• NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe

• ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos

• NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi

Page 55: Indiana 4 2011 Final Final

Thanks!