caGrid, Fog and Clouds
Joel Saltz MD, PhDDirector Center for
Comprehensive Informatics
Overview
• Introduction• Tools• Use Case• Fog and Clouds
Fog ComputingGrid interoperable enterprise virtualization
caGrid, Enterprise and Clouds
Biomedical Middleware: caGrid, TRIAD, i2b2
caGrid Components– Security (GAARDS)– Language (metadata,
ontologies)– Semantic/Federated
query– Workflow – Grid Service Graphical
Development Toolkit (Introduce)
– DICOM, IHE compatibility– Advertisement and
Discovery
Preferred Name
Synonyms
Definition
Relationships
Concept Code
Vocabulary/Ontology
Interoperability– Registered metadata– Ontology concept
codes used to annotate models
– XML schemas that define data structures also registered
– Thus both data semantics AND data structures are registered. That is how we achieve (relative) interoperability.
Use Case: Will Treatment work and if not, why not?
Avastin and Glioblastoma in RTOG-0825
Treatment: Radiation therapy and Avastin (anti angiogenesis)
Predict and Explain: Genetic, gene expression, microRNA, Pathology, Imaging
RT, imaging, Pathology markup/annotations/query
Active Data workflows
Avastin and GBMs in RTOG-0825Analysis on pre-treatment tissue to extract imaging and molecular
biomarkers that are indicative of Outcome/Avastin response. whole genome mRNA and microRNA expression profiling of GBM
tumor specimens to identify outcome/Avastin response biomarkersAnalyzing the Pathology imaging and diagnostic imaging registered
with the therapy plan to extract any biomarkers that can indicate Avastin response.
Does advanced imaging (eg: diffusion weighted imaging) provide markers that can predict patient response?
RT, Diagnostic Imaging and Pathology: Support Human/Algorithm analyses, annotation, markup
Data management and display framework that integrates the pathology with the radiology, therapy treatment information and the clinical data. This involves integrating platforms that manage imaging data at ACRIN, pathology at UCSF and molecular data from Emory.
For the sake of quality control, reproducibility and data sharing, results of RT, imaging, Pathology observations and
analyses need to be described in a well defined manner
Finding: massMass ID: 1
Margins: spiculatedLength: 2.3cmWidth: 1.2cmCavitary: YCalcified: N
Spatial relationships: Abuts pleural surface; invades aorta
Distinguish (and maybe redefine) astrocytic, oligodendroglial and oligoastrocytic tumors using TCGA and Rembrandt
Important since treatment and Outcome differ
• Link nuclear shape, texture to biological and clinical behavior
• How is nuclear shape, texture related to gene expression category defined by clustering analysis of Rembrandt data sets?
• Relate nuclear morphometry and gene expression to neuroimaging features (Vasari feature set)
• Genetic and gene expression correlates of high resolution nuclear morphometry and relation to MR features using Rembrandt and TCGA datasets.
Annotation and Markup of Pathology Data needs Human/Algorithm Cooperation
Astrocytoma vs Oligodendroglima• TCGA finds genetic, gene expression
overlap• Pathologists have also long seen
overlap• Relationship between Pathology,
Molecular, Radiology• Relationship to Outcome, treatment
response
Example: Compute Intensive Workflow
Fog Computing
Attributes of Fog Computing
• For legal and organizational reasons, data location is constrained
• Virtual machines used to create software stacks that use caGrid + other middleware to expose data services (we regularly ship VMs with caGrid based software stacks)
• Enterprises such as Emory increasingly rely on virtualization architectures
• Mid-range active storage platforms (eg Emory 1PB archival, 100TB fast disk, 2K cores, infiniband interconnect) will rely heavily on virtualization
Relationship between Fog and Cloud Computing
• Workflows have enterprise, caGrid and compute intensive components
• Interoperability between enterprise and caGrid software stacks major current NCI/ONR issue
• Given virtualization, HPC and large scale data requirements can be tackled with cloud approach
Issues• Organizational, legal requirements create
constraints on placement of datasets, where VMs can be run
• Mapping of VMs, datasets:– Huge variation in communication, I/O bandwidth,
compute capabilities in Fog/Cloud continua– Multiple software stacks, heterogeneous hardware
architectures• Security: Fog to Cloud: multiple distinct but
overlapping security related requirements and constraints
Thank you