Top Banner
Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for Bioinformatics
50

Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid

Dec 30, 2015

Download

Documents

Moses Barnett

Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid. 0. Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for Bioinformatics. The Center for Bioinformatics is the NCI’s strategic and tactical arm for research information management - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

1

Vision and Infrastructure Behind the

Cancer Biomedical Informatics Grid

Peter A. Covitz, Ph.D.

Director, Core InfrastructureNational Cancer Institute

Center for Bioinformatics

Page 2: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

2

The Center for Bioinformatics is the NCI’s strategic and tactical arm for research information management

We collaborate with both intramural and extramural groups

Mission to integrate and harmonize disparate research data

Production, service-oriented organization. Evaluated based upon customer and partner satisfaction.

Page 3: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

3

NCICB Operations teams

Systems and Hardware Support

Database Administration

Software Development

Quality Assurance

Technical Writing

Application Support and Training

caBIG Management

Page 4: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

4

National Cancer Institute 2015 Goal

Relieve suffering and death due to cancer by the year 2015

Page 5: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

5

Origins of caBIG

Need: Enable investigators and research teams nationwide to combine and leverage their findings and expertise in order to meet NCI 2015 Goal.

Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

Page 6: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

6

Scenario from caBIG Strategic Plan

A researcher involved in a phase II clinical trial of a new targeted therapeutic for brain tumors observes that cancers derived from one specific tissue progenitor appear to be strongly affected.

The trial has been generating proteomic and microarray data. The researcher would like to identify potential biochemical and signaling pathways that might be different between this cell type and other potential progenitors in cancer, deduce whether anything similar has been observed in other clinical trials involving agents known to affect these specific pathways, and identify any studies in model organisms involving tissues with similar pathway activity.

Page 7: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

7

caBIG Governance and Organization

Page 8: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

8

caBIG Governance Models

Feudalism

X Warlord culture offers little

incentive to cooperate

Page 9: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

9

Governance Models

Forced Collectivization

X Centralized monolithic approach

not flexible or scalable

Page 10: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

10

Governance Models

Federal Democracy

Alexander Hamilton, James Madison, John Jay

Federalist Papers

Balance between central management and local control. Best fit for

caBIG Principles.

Page 11: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

11

caBIG Organization Structure

Architecture

Vocabularies & Common Data Elements

Working Working GroupGroup

General ContractorGeneral Contractor

Strategic Working GroupsStrategic Working Groups

Clinical Trial Mgmt

Integrative Cancer Research

Tissue Banks & Pathology Tools

Working Working GroupGroup

Working Working GroupGroup

Working Working GroupGroup

Working Working GroupGroup

caBIG OversightcaBIG Oversight

= Project

Page 12: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

12

SemanticSemanticinteroperabilityinteroperability

SyntacticSyntacticinteroperabilityinteroperability

Interoperability

ability of a system to access and use the parts or equipment of another system

Page 13: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

13

SYNTACTIC

SEMANTIC

SEMANTIC

SEMANTIC

caBIG Compatibility Guidelines

Page 14: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

14

Model-Driven Architecture

Page 15: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

15

Page 16: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

16

MDA Approach

Analyze the problem space and develop the artifacts for each scenario– Use Cases

Use Unified Modeling Language (UML) to standardize model representations and artifacts. Design the system by developing artifacts based on the use cases– Class Diagram – Information Model– Sequence Diagram – Temporal Behavior

Use meta-model tools to generate the code

Page 17: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

17

Limitations of MDA

Limited expressivity for semantics

No facility for runtime semantic metadata management

Page 18: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

18

caCORE

MDA plus a whole lot more!

Page 19: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

19

caCORE

Bioinformatics Objects

Enterprise Vocabulary

Common Data Elements

SECURITY

Page 20: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

20

Use Cases

Description

Actors

Basic Course

Alternative Course

Page 21: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

21

Bioinformatics Objects

Page 22: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

22

What do all those data classes and attributes actually mean, anyway?

Data descriptors or “semantic metadata” required

Computable, commonly structured, reusable units of metadata are “Common Data Elements” or CDEs.

NCI uses the ISO/IEC 11179 standard for metadata structure and registration

Semantics all drawn from Enterprise Vocabulary Service resources

Common Data Elements

Page 23: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

23

Preferred Name

Synonyms

Definition

Relationships

Concept Code

Enterprise Vocabulary Description Logic

Page 24: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

24

Semantic metadata example: Agent

<Agent>

<name>Taxol</name>

<nSCNumber>007</nSCNumber>

</Agent>

Page 25: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

25

Why do you need metadata?Why do you need metadata?

Class/Attribute

Example Object Data

CIA Metadata NCI Metadata

Agent A sworn intelligence agent; a spy

Chemical compound administered to a human being to treat a disease or condition, or prevent the onset of a disease or condition

AgentnSCNumber

007 Identifier given to an intelligence agent by the National Security Council

Identifier given to chemical compound by the US Food and Drug Administration Nomenclature Standards Committee

Agentname

Taxol CIA code name given to intelligence agents

Common name of chemical compound used as an agent

Page 26: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

26

Computable Interoperability

Agent

name

nSCNumber

FDAIndID

CTEPName

IUPACName

Drug

id

NDCCode

approver

approvalDate

fdaCode

C1708:C41243

C1708:C41243

C1708 C1708

My model Your model

Page 27: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

27

Tying it all together: The caCORE semantic management framework

Desc. LogicCDEs Concept Codes

2223333 C1708

2223866 C1708:C412432223869 C1708:C253932223870 C1708:C256832223871 C1708:C42614

Enterprise VocabularyCommon Data

ElementsBioinformatics Objects

Page 28: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

28

Cancer Data Standards Repository

ISO/IEC 11179 Registry for Common Data Elements – units of semantic metadata

Client for Enterprise Vocabulary: metadata constructed from controlled terminology and annotated with concept codes

Precise specification of Classes, Attributes, Data Types, Permissible Values: Strong typing of data objects.

Tools:– UML Loader: automatically register UML models as metadata

components– CDE Curation: Fine tune metadata and constrain permissible

values with data standards– Form Builder: Create standards-based data collection forms– CDE Browser: search and export metadata components

Page 29: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

29

Common Security Module

SECURITY

CommonAuthorization

Schema

Page 30: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

30

Java Applications

Data AccessObjects

Web Application Server

Interfaces

Java

SOAP

XML

HTTP Clients

SOAP Clients

DataDataClientsClients

Perl Clients

EnterpriseVocabulary

CommonData

Elements

MiddlewareMiddleware

API

API

API

API

Data AccessObjects

DomainObjects[Gene,

Disease, etc.]

DomainObjects[Gene,

Disease, Agent,etc.]

caCORE Architecture

BiomedicalData

Authorization

Page 31: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

31

Development and Deployment

Use Cases Design

Test Plans

Iterative Developme

nt

Modeling

Unit Testing

User Guides

System Testing

Staging

Packaging

PRODUCTION

DEV………..………………………………..|QA…..…....|STAGE...|PROD

Page 32: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

32

caCORE Software Development Kit

Page 33: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

33

caCORE SDK Components

UML Modeling Tool (any with XMI export)

Semantic Connector (concept binding utility)

UML Loader (model registration in caDSR)

Codegen (middleware code generator)

Security Adaptor (Common Security Module)

caCORE SDK Generates acaBIG Silver-Compliant System

caCORE SDK Generates acaBIG Silver-Compliant System

Page 34: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

34

Professional Documentation

Page 35: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

35

caBIG UML Models Completed and in the Works at Cancer Centers for Silver Systems

mzXML mass spec proteomics data scanFeatures Proteomics

AML Proteomics

statml Statistical markup model

CAP College of American Pathologists protocols for Breast, Lung, Prostate

GoMiner Text mining tool for GO

caTISSUE Tissue banking

protLIMS Laboratory Information Management System for proteomics

BRIDG Clinical Trials

caBIO General bioinformatics

caDSR ISO11179 metadata

EVS Vocabulary

caMOD Cancer Models

MAGE 1.2 Microarray data

CSM Security

Common Provenance, DBxrefs

caTIES Pathology reports.

gridPIR Protein Information

Page 36: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

36

From Silver to Gold:

caGrid

Page 37: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

37

caBIG Use Cases

Advertisement– Service Provider composes service metadata describing the service and

publishes it to grid.

Discovery– Researcher (or application developer) specifies search criteria describing a

service of interest– The research submits the discovery request to a discovery service, which

identifies a list of services matching the criteria, and returns the list.

Query and Invocation– Researcher (or application developer) instantiates the grid service and

access its resources

Security– Service Provider restricts access to service based upon

authentication and authorization rules

Page 38: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

38

GolGoldd

Cancer Center Cancer Center

Cancer Center

Cancer Center

Cancer Center

NCIOTHER caBIGSERVICE

PROVIDERS

OTHERTOOLKITS

SilverSilver

SilverSilver

SilverSilverSilverSilver

SilverSilver

SilverSilver SilverSilver

Page 39: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

39

Grid Communication Protocol

Service Description

Service

Workflow

Service R

egistry

Secu

rity

Metad

ata Man

agem

ent

Reso

urce M

anag

emen

t

Functions Management

ID R

esolu

tion

OGSA Compliant - Service Oriented Architecture

Transport

caGrid Service-Oriented Architecture

Sch

ema M

anag

emen

t GSI

CAS

myProxy

Globus

OGSA-DAIGlobusGRAM

Globus Toolkit

GlobusBPEL

Mobius

caCORE

Page 40: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

40

Service Data Elements

Service Data Elements (SDEs) describe services so clients can discover what they do

Two types of top-level grid services defined– Data Services– Analytical Services

Three models for SDEs have been designed– Data service-specific– Analytical Service-specific– Common (all services)

Page 41: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

41

EVS

Silver to Gold: Data Services

caGrid Infrastructure

caBIG Gold data service

Query Adaptor

Silver Data Service

Page 42: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

42

Data Object Semantics, Metadata, and Schemas

Client and service APIs are object oriented, and operate over well-defined and curated data types

Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)

Object definitions draw from vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described

XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

Service

Core Services

Client

XSDWSDL

Grid Service

Service Definition

Data TypeDefinitions

Service API

Grid Client

Client API

Registered In

Object Definitions

SemanticallyDescribed In

XMLObjectsSerialize To

ValidatesAgainst

Client Uses

Cancer Data Standards Repository

Enterprise Vocabulary

Services

Objects

GlobalModel

Exchange

GMERegistered In

ObjectDefinitions

Objects

Page 43: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

43

Analytical Services

Accept and emit strongly typed data objects that conform to Gold data service requirements

Analytical method implementation is defined by service provider

Toolkit to assist with creating a caGrid Analytical Service will come with caGrid 0.5 download

Page 44: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

44

Analytical Service Creation Wizard

Page 45: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

45

Method Implementation

Insert method code here

Page 46: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

46

Test bed Infrastructure

caGrid 0.5 Test Bed

Page 47: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

47

Acknowledgements

NCIAndrew von EschenbachAnna BarkerWendy PattersonOCDCTDDCBDCPDCEGDCCPSCCR

Industry PartnersSAICBAHOracleScenProEkagraApelonTerrapin SystemsPanther Informatics

NCICBKen BuetowAvinash Shanbhag George Komatsoulis Denise Warzel Frank HartelSherri De CoronadoDianne ReevesGilberto FragosoJill HadfieldSue DubmanLeslie Derr

Page 48: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

48

Acknowledgements – caGrid

Georgetown– Baris Suzek– Scott Shung– Colin Freas– Nick Marcou– Arnie Miles– Cathy Wu– Robert Clarke

Duke– Patrick McConnell

UPMC– Rebecca Crawley– Kevin Mitchell

TerpSys– Gavin Brennan– Troy Smith– Wei Lu– Doug Kanoza

Ohio State Univ. – Scott Oster– Shannon Hastings– Steve Langella– Tahsin Kurc– Joel Saltz

SAIC– William Sanchez – Manav Kher– Rouwei Wu – Jijin Yan – Tara Akhavan

Panther Informatics– Brian Gilman– Nick Encina

OracleRam Chilukuri

BAH– Arumani Manisundaram

NCICB– Avinash Shanbhag– George Komatsoulis– Denise Warzel– Frank Hartel

Page 49: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

49

caBIG Participant Community

9Star ResearchAlbert EinsteinArdais Argonne National LaboratoryBurnham Institute California Institute of Technology-JPLCity of Hope Clinical Trial Information Service (CTIS)Cold Spring HarborColumbia University-Herbert IrvingConsumer Advocates in Research and Related Activities (CARRA)Dartmouth-Norris CottonData Works DevelopmentDepartment of Veterans AffairsDrexel University Duke UniversityEMMES CorporationFirst Genetic TrustFood and Drug AdministrationFox Chase Fred HutchinsonGE Global Research CenterGeorgetown University-LombardiIBMIndiana UniversityInternet 2Jackson LaboratoryJohns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan KetteringMeyer L. Prentis-KarmanosNew York UniversityNorthwestern University-Robert H. Lurie

Ohio State University-Arthur G. James/Richard SoloveOregon Health and Science UniversityRoswell Park Cancer Institute St Jude Children's Research HospitalThomas Jefferson University-KimmelTranslational Genomics Research InstituteTulane University School of MedicineUniversity of Alabama at BirminghamUniversity of Arizona University of California Irvine-Chao FamilyUniversity of California, San FranciscoUniversity of California-DavisUniversity of ChicagoUniversity of ColoradoUniversity of Hawaii University of Iowa-HoldenUniversity of MichiganUniversity of MinnesotaUniversity of NebraskaUniversity of North Carolina-Lineberger University of Pennsylvania-AbramsonUniversity of PittsburghUniversity of South Florida-H. Lee Moffitt University of Southern California-NorrisUniversity of VermontUniversity of WisconsinVanderbilt University-IngramVelosVirginia Commonwealth University-MasseyVirginia TechWake Forest UniversityWashington University-SitemanWistarYale University

Page 50: Vision and Infrastructure Behind the  Cancer Biomedical Informatics Grid

50

From Village to City