Cancer Bioinformatics Infrastructure Objects (caBIO) Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)

cancer Bioinformatics cancer Bioinformatics Infrastructure Objects (caBIO)Infrastructure Objects (caBIO)

Providing Innovative and IntegrativeInformatics Solutions

Himanso Sahni (SAIC)Sharon Settnek (SAIC)

caBIOcaBIO• The cancer Bioinformatics Infrastructure Objects (caBIO) is an

infrastructure which integrates internal and publicly available bioinformatics data spanning multiple scientific disciplines

• caBIO objects simulate the behavior of actual bioinformatics components such as genes, chromosomes, sequences, ontologies, trials, agents, etc.

• caBIO provides access to a variety of bioinformatics data sources including, Unigene, Homologene, LocusLink, RefSeq, BioCarta, GoldenPath (via DAS), and NCICB’s CGAP (Cancer Genome Anatomy Project) and GAI (Genetic Annotation Initiative) data repositories

• caBIO is “open source” and provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details and data management

caBIO Object ModelcaBIO Object Model

Model ExtensionsModel Extensions

Animal Models

Clinical Protocols• A clinical protocols object model facilitates the integration of clinical data with genomic data

• An animal models object model supports queries between human and animal models of cancer

MAGE-OM ExtensionMAGE-OM Extension

• caBIO is currently being extended to included the MAGE-OM object model

• Microarray data from the NCICB Gene Expression Data Portal (GEDP) will be retrieved via the caBIO MAGE API

caBIO APIscaBIO APIs• A Java API is

available for Java programmers

• A Simple Object Access Protocol (SOAP) API is provided for non-Java programmers

• An HTTP API is also available– Developers can

request XML or HTML (via XSLT)

caBIO ApplicationscaBIO ApplicationsCancer Molecular Analysis Project (CMAP)Cancer Molecular Analysis Project (CMAP)

Powered by

caBIO!

Powered by

caBIO!

Molecular TargetsMolecular Targets

Powered by

caBIO!

Powered by

caBIO!

• A collection of genes organized by pathways can be displayed facilitating the evaluation of anomalies

Targeted AgentsTargeted Agents

• Researchers can retrieve information about agents linked to multiple targets and contexts

Powered by

caBIO!

Powered by

caBIO!

Clinical TrialsClinical TrialsPowered

by caBIO!

Powered by

caBIO!

• Researchers can view detailed information about therapeutic trials associated with histology types and agents

• A Clinical Protocols Portal is available to allow researchers to search and submit clinical protocols affiliated with Specialized Programs of Research Excellence (SPOREs)

caBIO ArchitecturecaBIO Architecture• caBIO was designed using a J2EE architecture with client interfaces, server

components, back-end objects and data sources• Clients (browsers, applications) can receive information (HTML and XML) from

back-end objects over HTTP• Client applications can also communicate with back-end objects via Java RMI

(Java applications)• Non-Java based applications can communicate via SOAP or HTTP• Server components communicate with back-end objects via Java RMI• Back-end objects communicate directly with data sources (database, URLs, flat

files)• caBIO web services can be advertised to facilitate information sharing

– RDF can be used to advertise content to crawlers and agents – A UDDI registry may be configured to advertise services– caBIO services can be advertised via bioMOBY central

caBIO Architecture caBIO Architecture

Clients Presentation Layer Object Layer Data Sources

Browsers

Other Apps

HTML/HTTP

XML/HTTP

SOAP

Java Apps

Web Server

Servlet Container

JSPs

Servlets

UI Bean

XML Builder

XSLT Engine

SOAP Engine

XML Docs

DTDsXSL

Style Sheet

RMI

URLs

Flat Files

ExternalDatabases

ExternalDatabases

Genes Chromosomes

Libraries

Tissues Clusters

Object Managers

JDBC

HTTP

FTPAgents RDF

Data Access Objects

Sequences

Diseases

Other

Domain Objects caDSRcaDSR

EVSEVS

Data Sources Data Sources

caBIO

DAS

DAS

CGAP Database

Gene Annotations

RefSeqUniGene

Locus Link

Homolo Gene

UCSC Golden

Path

BioCarta

CGAP/ GAI

Reference Sequences

Genes, Sequences

Chromosomes

Homologs

Pathways

Gene Loci, Locus Link Summaries

SNPs

Genes, Sequences

External Public Databases

CTEP/ SPOREs

Trials

GO

Gene Ontology

caBIO BenefitscaBIO Benefits• Provides an abstraction layer that allows developers to access

genomic information using a standardized tool set without concerns for implementation details

• Permits access to allow developers to obtain the information they need from a variety of data sources without data management

• Manages the display of large volumes of data to assist in load balancing

• Provides an effective mechanism for performing complex queries that rely on diverse data sources

• Facilitates information sharing without managing linkages between multiple data sources

caBIO UsagecaBIO Usage

Find me the Pathways,

with Genes that are expressed in tissues

with a particular Histopathology that includes

a particular Organ and a particular Disease.

Find me the Pathways,

with Genes that are expressed in tissues

with a particular Histopathology that includes

a particular Organ and a particular Disease.

Facilitates solving Complex Queries such as:

Java PackagesJava Packages• gov.nih.nci.caBIO.bean

– Contains domain objects to access genomic and biomedical components

• gov.nih.nci.caBIO.util.das – Primary interface to the UCSC DAS – Uses JAXB to convert DAS DTDs to objects

• gov.nih.nci.caBIO.evs– Provides synonym search and concept based search to the NCI’s

Enterprise Vocabulary System (EVS)

• gov.nih.nci.caBIO.webservices– Provides access to caBIO via SOAP

• gov.nih.nci.caBIO.servlet– Provides access to caBIO via HTTP

• gov.nih.nci.caBIO.util– Provides interface to caBIO utilities

Java APIJava API

Gene myGene = new Gene();GeneSearchCriteria criteria = new GeneSearchCriteria();criteria.setSymbol("pTEN");

SearchResult result = myGene.search(criteria);Gene[] genes = (Gene[]) result.getResultSet();

Gene myGene = new Gene();GeneSearchCriteria criteria = new GeneSearchCriteria();criteria.setSymbol("pTEN");

SearchResult result = myGene.search(criteria);Gene[] genes = (Gene[]) result.getResultSet();

Domain objects have companion SearchCriteria objects

caBIO supports nested SearchCriteriaSearchCriteria from one object type can be fed as parameters into SearchCriteria of another type.

Complex queries without any SQL

Traverse Relationships in ModelTraverse Relationships in Model

GenesGenes

PathwaysPathways

DiseaseDisease

OrganOrgan

HistopathologyHistopathology

INPUT

OUTPUT

Find me the Pathways,with Genes that are expressed in Tissues with a particular Histopathology that includes a particular Organ and a particular Disease.

findPathwayfindPathway

public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();

public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();

Input disease, organ; create SearchCriteria Objects:

findPathwayfindPathway

diseaseCriteria.setName(disease);organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND);

histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND);

Pathway myPathway = new Pathway();

return myPathway.searchPathways(pathCriteria);}

diseaseCriteria.setName(disease);organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND);

histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND);

Pathway myPathway = new Pathway();

return myPathway.searchPathways(pathCriteria);}

Nest the SearchCriteria, then do the search:

findPathways: Query ResultsfindPathways: Query Results

Web Services: SOAPWeb Services: SOAP

http://cabio.nci.nih.gov/soap/services/index.html

SOAP APISOAP API

use SOAP::Lite;$s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter");

my %searchCriteria=();$searchCriteria{symbol}=“pTEN”;$som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria));$xmldoc = $som->result;

use SOAP::Lite;$s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter");

my %searchCriteria=();$searchCriteria{symbol}=“pTEN”;$som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria));$xmldoc = $som->result;

Perl Example

SOAP output with xlinksSOAP output with xlinks<?xml version="1.0" encoding="UTF-8" ?><nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs>

<Pathway xlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt> <endsAt>3</endsAt> </searchResult>

</nci-core>

<?xml version="1.0" encoding="UTF-8" ?><nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs>

<Pathway xlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt> <endsAt>3</endsAt> </searchResult>

</nci-core>

SOAP with returnHeavyXMLSOAP with returnHeavyXML

<gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>

Data is now returned in full. Pathway object snippet:

HTTP APIHTTP APIDirect access to XML-formatted data via URLs:

http://cabio.nci.nih.gov/servlet/GetXML? operation=Gene&Symbol=pTEN

MethodSearch Parameter

Parameter Value

HTTP APIHTTP APIDirect access to SVG-formatted data via URLs:

http://cabio.nci.nih.gov:80/servlet/GetSVG?operation=Pathway&name=g2Pathway&GeneInfoLocation=/servlet/GetXML?operation=Gene&ielikes=.svg

MethodSearch Parameter Parameter Value

BIOgopherBIOgopher

• BIOgopher enables a researcher to perform complex queries against caBIO data sources

• Researchers can:– Provide local data

– Create custom queries

– Design custom reports

Importing Local DataImporting Local Data• Researchers can import

local experiment data in spreadsheet format

• Researchers can leverage imported data during the session

• Researchers can include imported data in defining custom queries and reports

Creating a QueryCreating a Query

• Researchers can create a query or access an existing query within a session

• Researchers specify the caBIO object that will be the subject of the query

Specifying Search CriteriaSpecifying Search Criteria• Researchers can dynamically

specify search criteria– Attributes of caBIO objects

related to the chosen subject can be selected as search criteria

– Local data can be fetched for inclusion as search criteria

• Researchers can browse caBIO data for inclusion in search criteria values

Creating a ReportCreating a Report

• Researchers can create and format reports based on the selected search criteria

• Reports can be viewed and exported as a spreadsheet

BIOgopher Architectural DetailsBIOgopher Architectural Details

• Leveraged the Model-View-Controller 2 (MVC 2) architecture– Abstracted the presentation layer from

spreadsheet manipulation, meta-data retrieval, query design, and report generation

• Developed a server-side N-dimensional query builder– An object-cube was leveraged in support of

object-mining

Presentation LayerPresentation Layer

• Leverages the Jakarta Struts Project

Spreadsheet ManipulationSpreadsheet Manipulation

• Leverages the Apache POI Project

Meta-Data LayerMeta-Data Layer

• Leverages NCICB’s caDSR

Query DesignQuery Design

• Leverages Java Swing components for trees, nodes and tables

Ontology InterfaceOntology Interface

caBIO KernelcaBIO Kernel

BIOgopherClient

BIOgopherClient P

r ox y

Pr o

x y

NCICB caBIO server

NCICB caBIO server

Local caBIO Server

Local caBIO Server

2. Parses query and DSI, and authenticates user.

3. Passes query to NCICB server

4. Parses query and DSI andauthenticates user (in any).

Object/DB bridgeObject/DB bridge

6.Returns objects to requestor

Data mapData map

Object/DB bridgeObject/DB bridge

8.Queries Persistence layer

7.Queries data map

8. Returns results

1. Sends query and user info

DSIDSI

• Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server

– Leverages the JXTA protocol for peer-to-peer communication

• Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server

– Leverages the JXTA protocol for peer-to-peer communication

5. Queries Persistence Layer

FutureFuture• caBIO "kernel"• Object-level Security Module (fine grain)• Standard LDAP authentication • Vocabulary Object to extend the EVS API• caDSR objects, API • MAGE-OM, API• Animal Models-OM, API• Extend pathway object model to support KEGG and

BioCarta interactions• Analytical Tool handling i.e. BLAST.

FutureFuture• New Data Sources

– Proteins • PDB, PIR, BioJava for protein data

• OMIM - For link from proteins to diseases

– Agents - Access agent data from EVS and DCP

– Pharmacokinetics

– Histology, Tissue/Organ - Leverage EVS vocabulary (currently LASH)

– PubMed

AcknowledgementsAcknowledgements

• NCICB– Kenneth Buetow

– Peter Covitz

– Carl Schaefer

– Robert Clifford

– Mike Edmonson

– Frank Hartel

– Sherri DeCoronado

• SAIC– Scott Gustafson

– Mike Connolly

– Joshua Phillips

• Kevric ( documentation )– Diane Zimmerman

Visit our new and improved web site:

http://ncicb.nci.nih.gov/core/caBIO

Cancer Bioinformatics Infrastructure Objects (caBIO) Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)

Documents

cabio architecture cabio

cabio apis

cabio object model slide

cabio mage api slide

services cabio services

data management slide

available bioinformatics

genomic data