cancer Bioinformatics cancer Bioinformatics Infrastructure Objects Infrastructure Objects (caBIO) (caBIO) Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)
Dec 18, 2015
cancer Bioinformatics cancer Bioinformatics Infrastructure Objects (caBIO)Infrastructure Objects (caBIO)
Providing Innovative and IntegrativeInformatics Solutions
Himanso Sahni (SAIC)Sharon Settnek (SAIC)
caBIOcaBIO• The cancer Bioinformatics Infrastructure Objects (caBIO) is an
infrastructure which integrates internal and publicly available bioinformatics data spanning multiple scientific disciplines
• caBIO objects simulate the behavior of actual bioinformatics components such as genes, chromosomes, sequences, ontologies, trials, agents, etc.
• caBIO provides access to a variety of bioinformatics data sources including, Unigene, Homologene, LocusLink, RefSeq, BioCarta, GoldenPath (via DAS), and NCICB’s CGAP (Cancer Genome Anatomy Project) and GAI (Genetic Annotation Initiative) data repositories
• caBIO is “open source” and provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details and data management
caBIO Object ModelcaBIO Object Model
Model ExtensionsModel Extensions
Animal Models
Clinical Protocols• A clinical protocols object model facilitates the integration of clinical data with genomic data
• An animal models object model supports queries between human and animal models of cancer
MAGE-OM ExtensionMAGE-OM Extension
• caBIO is currently being extended to included the MAGE-OM object model
• Microarray data from the NCICB Gene Expression Data Portal (GEDP) will be retrieved via the caBIO MAGE API
caBIO APIscaBIO APIs• A Java API is
available for Java programmers
• A Simple Object Access Protocol (SOAP) API is provided for non-Java programmers
• An HTTP API is also available– Developers can
request XML or HTML (via XSLT)
caBIO ApplicationscaBIO ApplicationsCancer Molecular Analysis Project (CMAP)Cancer Molecular Analysis Project (CMAP)
Powered by
caBIO!
Powered by
caBIO!
Molecular TargetsMolecular Targets
Powered by
caBIO!
Powered by
caBIO!
• A collection of genes organized by pathways can be displayed facilitating the evaluation of anomalies
Targeted AgentsTargeted Agents
• Researchers can retrieve information about agents linked to multiple targets and contexts
Powered by
caBIO!
Powered by
caBIO!
Clinical TrialsClinical TrialsPowered
by caBIO!
Powered by
caBIO!
• Researchers can view detailed information about therapeutic trials associated with histology types and agents
• A Clinical Protocols Portal is available to allow researchers to search and submit clinical protocols affiliated with Specialized Programs of Research Excellence (SPOREs)
caBIO ArchitecturecaBIO Architecture• caBIO was designed using a J2EE architecture with client interfaces, server
components, back-end objects and data sources• Clients (browsers, applications) can receive information (HTML and XML) from
back-end objects over HTTP• Client applications can also communicate with back-end objects via Java RMI
(Java applications)• Non-Java based applications can communicate via SOAP or HTTP• Server components communicate with back-end objects via Java RMI• Back-end objects communicate directly with data sources (database, URLs, flat
files)• caBIO web services can be advertised to facilitate information sharing
– RDF can be used to advertise content to crawlers and agents – A UDDI registry may be configured to advertise services– caBIO services can be advertised via bioMOBY central
caBIO Architecture caBIO Architecture
Clients Presentation Layer Object Layer Data Sources
Browsers
Other Apps
HTML/HTTP
XML/HTTP
SOAP
Java Apps
Web Server
Servlet Container
JSPs
Servlets
UI Bean
XML Builder
XSLT Engine
SOAP Engine
XML Docs
DTDsXSL
Style Sheet
RMI
URLs
Flat Files
ExternalDatabases
ExternalDatabases
Genes Chromosomes
Libraries
Tissues Clusters
Object Managers
JDBC
HTTP
FTPAgents RDF
Data Access Objects
Sequences
Diseases
Other
Domain Objects caDSRcaDSR
EVSEVS
Data Sources Data Sources
caBIO
DAS
DAS
CGAP Database
Gene Annotations
RefSeqUniGene
Locus Link
Homolo Gene
UCSC Golden
Path
BioCarta
CGAP/ GAI
Reference Sequences
Genes, Sequences
Chromosomes
Homologs
Pathways
Gene Loci, Locus Link Summaries
SNPs
Genes, Sequences
External Public Databases
CTEP/ SPOREs
Trials
GO
Gene Ontology
caBIO BenefitscaBIO Benefits• Provides an abstraction layer that allows developers to access
genomic information using a standardized tool set without concerns for implementation details
• Permits access to allow developers to obtain the information they need from a variety of data sources without data management
• Manages the display of large volumes of data to assist in load balancing
• Provides an effective mechanism for performing complex queries that rely on diverse data sources
• Facilitates information sharing without managing linkages between multiple data sources
caBIO UsagecaBIO Usage
Find me the Pathways,
with Genes that are expressed in tissues
with a particular Histopathology that includes
a particular Organ and a particular Disease.
Find me the Pathways,
with Genes that are expressed in tissues
with a particular Histopathology that includes
a particular Organ and a particular Disease.
Facilitates solving Complex Queries such as:
Java PackagesJava Packages• gov.nih.nci.caBIO.bean
– Contains domain objects to access genomic and biomedical components
• gov.nih.nci.caBIO.util.das – Primary interface to the UCSC DAS – Uses JAXB to convert DAS DTDs to objects
• gov.nih.nci.caBIO.evs– Provides synonym search and concept based search to the NCI’s
Enterprise Vocabulary System (EVS)
• gov.nih.nci.caBIO.webservices– Provides access to caBIO via SOAP
• gov.nih.nci.caBIO.servlet– Provides access to caBIO via HTTP
• gov.nih.nci.caBIO.util– Provides interface to caBIO utilities
Java APIJava API
Gene myGene = new Gene();GeneSearchCriteria criteria = new GeneSearchCriteria();criteria.setSymbol("pTEN");
SearchResult result = myGene.search(criteria);Gene[] genes = (Gene[]) result.getResultSet();
Gene myGene = new Gene();GeneSearchCriteria criteria = new GeneSearchCriteria();criteria.setSymbol("pTEN");
SearchResult result = myGene.search(criteria);Gene[] genes = (Gene[]) result.getResultSet();
Domain objects have companion SearchCriteria objects
caBIO supports nested SearchCriteriaSearchCriteria from one object type can be fed as parameters into SearchCriteria of another type.
Complex queries without any SQL
Traverse Relationships in ModelTraverse Relationships in Model
GenesGenes
PathwaysPathways
DiseaseDisease
OrganOrgan
HistopathologyHistopathology
INPUT
OUTPUT
Find me the Pathways,with Genes that are expressed in Tissues with a particular Histopathology that includes a particular Organ and a particular Disease.
findPathwayfindPathway
public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();
public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();
Input disease, organ; create SearchCriteria Objects:
findPathwayfindPathway
diseaseCriteria.setName(disease);organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND);
histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND);
Pathway myPathway = new Pathway();
return myPathway.searchPathways(pathCriteria);}
diseaseCriteria.setName(disease);organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND);
histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND);
Pathway myPathway = new Pathway();
return myPathway.searchPathways(pathCriteria);}
Nest the SearchCriteria, then do the search:
findPathways: Query ResultsfindPathways: Query Results
Web Services: SOAPWeb Services: SOAP
http://cabio.nci.nih.gov/soap/services/index.html
SOAP APISOAP API
use SOAP::Lite;$s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter");
my %searchCriteria=();$searchCriteria{symbol}=“pTEN”;$som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria));$xmldoc = $som->result;
use SOAP::Lite;$s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter");
my %searchCriteria=();$searchCriteria{symbol}=“pTEN”;$som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria));$xmldoc = $som->result;
Perl Example
SOAP output with xlinksSOAP output with xlinks<?xml version="1.0" encoding="UTF-8" ?><nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs>
<Pathway xlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt> <endsAt>3</endsAt> </searchResult>
</nci-core>
<?xml version="1.0" encoding="UTF-8" ?><nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs>
<Pathway xlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt> <endsAt>3</endsAt> </searchResult>
</nci-core>
SOAP with returnHeavyXMLSOAP with returnHeavyXML
<gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>
Data is now returned in full. Pathway object snippet:
HTTP APIHTTP APIDirect access to XML-formatted data via URLs:
http://cabio.nci.nih.gov/servlet/GetXML? operation=Gene&Symbol=pTEN
MethodSearch Parameter
Parameter Value
HTTP APIHTTP APIDirect access to SVG-formatted data via URLs:
http://cabio.nci.nih.gov:80/servlet/GetSVG?operation=Pathway&name=g2Pathway&GeneInfoLocation=/servlet/GetXML?operation=Gene&ielikes=.svg
MethodSearch Parameter Parameter Value
BIOgopherBIOgopher
• BIOgopher enables a researcher to perform complex queries against caBIO data sources
• Researchers can:– Provide local data
– Create custom queries
– Design custom reports
Importing Local DataImporting Local Data• Researchers can import
local experiment data in spreadsheet format
• Researchers can leverage imported data during the session
• Researchers can include imported data in defining custom queries and reports
Creating a QueryCreating a Query
• Researchers can create a query or access an existing query within a session
• Researchers specify the caBIO object that will be the subject of the query
Specifying Search CriteriaSpecifying Search Criteria• Researchers can dynamically
specify search criteria– Attributes of caBIO objects
related to the chosen subject can be selected as search criteria
– Local data can be fetched for inclusion as search criteria
• Researchers can browse caBIO data for inclusion in search criteria values
Creating a ReportCreating a Report
• Researchers can create and format reports based on the selected search criteria
• Reports can be viewed and exported as a spreadsheet
BIOgopher Architectural DetailsBIOgopher Architectural Details
• Leveraged the Model-View-Controller 2 (MVC 2) architecture– Abstracted the presentation layer from
spreadsheet manipulation, meta-data retrieval, query design, and report generation
• Developed a server-side N-dimensional query builder– An object-cube was leveraged in support of
object-mining
Presentation LayerPresentation Layer
• Leverages the Jakarta Struts Project
Spreadsheet ManipulationSpreadsheet Manipulation
• Leverages the Apache POI Project
Meta-Data LayerMeta-Data Layer
• Leverages NCICB’s caDSR
Query DesignQuery Design
• Leverages Java Swing components for trees, nodes and tables
Ontology InterfaceOntology Interface
caBIO KernelcaBIO Kernel
BIOgopherClient
BIOgopherClient P
r ox y
Pr o
x y
NCICB caBIO server
NCICB caBIO server
Local caBIO Server
Local caBIO Server
2. Parses query and DSI, and authenticates user.
3. Passes query to NCICB server
4. Parses query and DSI andauthenticates user (in any).
Object/DB bridgeObject/DB bridge
6.Returns objects to requestor
Data mapData map
Object/DB bridgeObject/DB bridge
8.Queries Persistence layer
7.Queries data map
8. Returns results
1. Sends query and user info
DSIDSI
• Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server
– Leverages the JXTA protocol for peer-to-peer communication
• Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server
– Leverages the JXTA protocol for peer-to-peer communication
5. Queries Persistence Layer
FutureFuture• caBIO "kernel"• Object-level Security Module (fine grain)• Standard LDAP authentication • Vocabulary Object to extend the EVS API• caDSR objects, API • MAGE-OM, API• Animal Models-OM, API• Extend pathway object model to support KEGG and
BioCarta interactions• Analytical Tool handling i.e. BLAST.
FutureFuture• New Data Sources
– Proteins • PDB, PIR, BioJava for protein data
• OMIM - For link from proteins to diseases
– Agents - Access agent data from EVS and DCP
– Pharmacokinetics
– Histology, Tissue/Organ - Leverage EVS vocabulary (currently LASH)
– PubMed
AcknowledgementsAcknowledgements
• NCICB– Kenneth Buetow
– Peter Covitz
– Carl Schaefer
– Robert Clifford
– Mike Edmonson
– Frank Hartel
– Sherri DeCoronado
• SAIC– Scott Gustafson
– Mike Connolly
– Joshua Phillips
• Kevric ( documentation )– Diane Zimmerman
Visit our new and improved web site:
http://ncicb.nci.nih.gov/core/caBIO