Top Banner
The BioAssay Research Database A Pla4orm to Support the Collec:on, Management and Analysis of Chemical Biology Data ACS Na’onal Mee’ng New Orleans April 7, 2013 @AskTheBARD hCp://bard.nih.gov
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The BioAssay Research Database

The  BioAssay  Research  Database  A  Pla4orm  to  Support  the  Collec:on,  Management  and  

Analysis  of  Chemical  Biology  Data    

ACS  Na'onal  Mee'ng  New  Orleans  April  7,  2013  

@AskTheBARD  

hCp://bard.nih.gov  

Page 2: The BioAssay Research Database

Direct  Contributors  NIH Molecular Libraries – Glenn McFadden, Ajay Pillai NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel Southall, Henrike Veith Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil Walzer, Xiaorong Xiang University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea, Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang

University of Miami – Saminda Abeyruwan, Hande Küküc, Vance Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan Schürer, Uma Vempati, Ubbo Visser Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun Stauffer Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass, Anthony Pinkerton, Derek Stonich Scripps Research Institute – Yasel Cruz, Mark Southern

Page 3: The BioAssay Research Database

BARD: BioAssay Research Database BARD’s mission is to enable novice and expert scientists to effectively utilize MLP data to generate new hypotheses •  Unique collaboration amongst NIH and academic centers

with expertise in screening and software development •  Developed as an open-source, industrial-strength platform

to support public translational research. •  Provides opportunity to address existing cheminformatics barriers

o  Deploy predictive models o  Foster new methods to interpret chemical biology data o  Enable private data sharing o  Develop and adopt a Assay Data Standard with tools to:

o  Annotate assays to a minimum standards and definitions o  Integrate and extend existing ontologies for meaningful experiment

descriptions o  Enable assay creation, registration and modification

o  Provide an easy-to-use portal and an advanced desktop client

Page 4: The BioAssay Research Database

Engagement  &  Milestones  Summer  2011   MLP issues administrative supplement and call for proposals to

create the Molecular Libraries Biological Database January    2012   Inaugural  mee'ng  of  MLPCN  Stakeholders  &  NIH  MLP  PT  

February  2012   Update  on  progress-­‐  data  extrac'on  &  annota'on,  test  plaKorm  selec'on,  GUI  design  &  test,  Outreach  

March  2012   BARD  Program  Kick-­‐off  

April  2012   Outreach  strategy  &  tac'c  session  at  UNM  w/  subteam  

May  –  July  2012   Discussions  with  and  reviews  of  Amgen,  Vertex,  Novar's,  Sanofi  assay  registra'on  and  chem-­‐bio  informa'on  query  systems  

November  2012   Conducted  mul'-­‐level  usability  interviews  on  BARD  GUI  &  func'on  w/  Dir.  Computa'on,  Informa'cs/Lab  Mgr,  TA  Lead,  Dir.  Chem,  Med  chem,  Db  developer,  Cmpd  curator  

January    2013   BARD  Review  by  Ext.  Sci  Panel  &  Public  alpha  release  (CAP,  REST  API,  Web  &  Desktop  clients)  

March  2013   BARD  limited  beta-­‐release  –  then  transi'on  to  enabling  science  

Page 5: The BioAssay Research Database

BARD  Technology  Components  

Define & Register Assays

Data Dictionary – std terms Catalog of Assay Protocols

High Quality Data & Result Deposition

Calculations & Results Project-experiment association

Query & Interpret Information

Intuitive Guided Queries Cross Assay & SAR centric views

Advance applications

Ena

ble

Hyp

othe

sis

Gen

erat

ion

Novice   Expert  

Page 6: The BioAssay Research Database

Where  Are  We  today?  CAP, Data Dictionary, and Results Deposition Data model created & populated

CAP UI with View and basic editing

Dictionary defined as OWL using Protégé

Annotations for 85% of MLPCN experiments & projects loaded via spreadsheet

~95% of PubChem result types mapped to BARD dictionary

~70% of PubChem columns mapped to BARD result types

Warehouse loaded with all PubChem AIDs and results

Warehouse loaded with GO terms, KEGG terms, and DrugBank annotations

Manual annotation of AIDs ~70% completed by centers

Page 7: The BioAssay Research Database

The  BARD  Data  Warehouse  •  Running on MySQL with replication •  0.85 TB of data…

– 151M result rows – 46M compound rows

•  Locally deployed at UNM •  Planning to build better packaging

– VM based deployment

Page 8: The BioAssay Research Database

Open  Source  As  Far  as  Possible  

ETL Database Text Search Engine Structure Search Engine

Caching Layer

http://bard.nih.gov/api

Jersey Webapps deployed on HA

Application Server Cluster

Page 9: The BioAssay Research Database

The  BARD  Public  API  •  Java, REST-like, read-only, deployed on

Glassfish cluster •  Different functionality

hosted in different containers – Maintenance, security – Stability – Performance

•  Versioned •  Fully documented

API

Text Search

Struct Search

Data Warehouse

Plugins

Page 10: The BioAssay Research Database

API  Resources  •  Extensive list of

resources covering many data types

•  Each resource supports a variety of sub-resources – Usually linked to

other resources

Page 11: The BioAssay Research Database

API  Level  of  Detail  •  Supports different

levels of detail •  Allows clients to trade-

off detail for speed •  Good for mobile apps

Page 12: The BioAssay Research Database

API  Caching    &  Storage  •  Caching is enabled at resource level •  The API supports ETags

– Every request returns an ETag in the header – With If-None-Match, supports web caching

•  We also abuse ETags to support persistent references to collections

•  An ETag can refer to other ETags recursively – Allows clients to create and store arbitrarily

complex collections •  Not permanent, not infinite!

Page 13: The BioAssay Research Database

Annota:ng  Data  

Entrez

Uniprot

Gene Ontology Gene Ontology

Disease Ontology

BioAssay Ontology BioAssay Ontology BioAssay Ontology BioAssay Ontology

Unit Ontology

Uniprot Uniprot

Unit Ontology

BARD Dictionary & Term Hierarchy

Chemical Ontology

BARD  Assay  Definition  Hierarchy

•  To best exploit the current data set, and encourage discoverability, we need to better structure the data – Annotate all assays to a minimum standard –  Integrate and extend existing ontologies to

support meaningful experiment descriptions – Develop processes

and tools to enable assay registration

Page 14: The BioAssay Research Database

(Pseudo)  Linked  Data  •  Full text search enabled by Solr

– Enables filtering, faceting, auto-suggest – Key entry point for users – Type ahead suggestions provide guidance

•  By virtue of manual associations of data types, we enable “linked data” – Allows searches to indicate what matched the

query and how – Solr supports sophisticated scoring schemes

•  Doesn’t yet take advantage of ontologies

Page 15: The BioAssay Research Database

Desktop  Client  •  Support large datasets •  Merge private &

public data •  Examine SAR

Page 16: The BioAssay Research Database

Web  Client  

Filter  on  annota'ons,  such  as  detec'on  method  type  

Google-­‐like  searching  of:  4,000+  assays,  35M+  compounds,  300+  projects  

Save  items  of  interest  for  further  analysis  

Amazon-­‐like  Query  Cart  

Page 17: The BioAssay Research Database

Community  Engagement  •  Sustained outreach efforts

–  7 MLPCN sites participating •  Facilitate access, driven by compelling use-

cases and stakeholder feedback – Assay definition standard is collaboration with

industrial partners in addition to MLPCN •  Publish APIs for data access, first-adopters •  A ‘BARD App Store’: Enabling new

approaches to data integration, mining – Promiscuity calculations – CYP450 prediction

Page 18: The BioAssay Research Database

Extending  BARD  with  Plugins  •  BARD supports deployment of external code

as part of core API •  Plugins can access the data warehouse via

direct calls – No need to go via REST API

•  Plugin resources can accept anything – Text, JSON, files, links, …

•  Plugin responses can be anything – Plain text, JSON, HTML, SVG, …

Page 19: The BioAssay Research Database

BARD  Plugin  Development  

Plugins  have  to    be  deployable    on  the  JVM  

Page 20: The BioAssay Research Database

BARD  -­‐  SMARTCyp  •  Predicts site of metabolism by CYP450

isoforms using 2D structures •  Developed by Patrik Rydberg and co-

workers •  Released under LGPL •  BARD plugin exposes two resources

– Summary HTML view – Data view (JSON)

Page 21: The BioAssay Research Database

BARD  -­‐  SMARTCyp  

P.  Rydberg  et  al,  hgp://www.farma.ku.dk/smartcyp/  

Page 22: The BioAssay Research Database

BARD - BADAPPLE

•  BioActivity Data Associative Promiscuity Pattern Learning Engine

•  Associations via scaffolds for chemical space navigation.

Example  URI*   descrip'on  

<base>/badapple/prom/cid/752424  

For  compound  with  specified  ID,  return  scaffold  IDs  and  scores.  

<base>/badapple/prom/cid/752424?expand=true  

Addi'onal  sta's'cs,  scaffold  smiles,  and  inDrug  flag.    

<base>/badapple/prom/scafid/233  

For  scaffold  with  specified  ID,  return  sta's'cs  and  smiles.  

Page 23: The BioAssay Research Database

On the Horizon

23  

•  Reproducibility – Be honest with me …

•  Private data in the context of public data – Local installs, molecule hashes

•  Mobile – Compounds as funny looking QR tags

Page 24: The BioAssay Research Database

Long-Term Path Forward

•  BARD is not just a data store – it’s a platform –  Seamlessly interact with users’ preferred tools –  Allows the community to tailor it to their needs –  Serve as a meeting ground for experimental and

computational methods –  Enhance collaboration opportunities –  Consider cloud deployment

•  Enhance the ability to translate data from individual experiments to systems level insight