Top Banner
Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia
14

Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Mar 26, 2015

Download

Documents

Cody Vaughn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Mats Dahlberg

Research Informatics

iNovacia AB, Sweden

ChemAxon UGM, Budapest June 7 2006

BeeHive a datamining toolat

Biovitrum and iNovacia

Page 2: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Research Informatics’ Philosophy

• All data in Oracle– Safe, pharma industry standard (e.g. many chemical cartridges,

ChemAxon, MDL, Accelrys, ...)– ”Data is our asset. Programs come and go.”

• Integration through database layer– ...but hidden to the users. Multiple front-ends allowed

• Applications rapidly adapted to users needs– Close connection developers - users– Workflow support requires full control over the code

• Unorthodox solutions are allowed– Sometimes quick and dirty development– Sometimes unstable code (but usually fixed quickly...)– Sometimes non-standard technical platform (e.g. Bee language)

Page 3: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

BeeHive

• Function– Main repository for ALL research data (almost)– Used by all project teams– Technical platform for various modules

• Features– Advanced on-the-fly join of DB table– Versatile handling of lists (compounds, batches, projects ...) and

Queries– Data grouping (”One-line-per-compound”)– Fully customisable through meta-data, easy to add new branches

(CBT, ELN stats etc)– Structure searching through ChemAxon Oracle cartridge– Built on Bee language from MolSoft LLC, San Diego

• Status– Moved from MDL’s cartridge 2006– Business critical. Appr 250 users throughout R&D

Page 4: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

The heart – just a SQL generator…

• Defines column types and cost for all joinable columns

• All possible joins are pre-calculated, travelling salesman problem (more then 300 tables)

Page 5: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Meta data structure• Define entities and clean up the dictionaries

– Compound numbers, protein targets, batches, plasmids ...– One source for every entity possible to validate numbers

no misspellings improved data quality

• This is the core of integration - not a particular client or system

• None of this comes out-of-the-box!

Cross database

client

Prog 1

Prog 2

AssaysAssaysBatchBatch

pBVpBV

TargetsTargets

GenesGenes

“Chemistry”

“Biology”

Backbone

ChemSpecChemSpec

ActivityBaseActivityBase

BVT cpndBVT cpnd

BioPhysPropBioPhysProp

DecisionsDecisions

CIMSCIMSbCOOLbCOOL

AssaysAssaysBatchBatch

pBVpBV

TargetsTargets

GenesGenes

AssaysAssaysAssaysAssaysBatchBatchBatchBatch

pBVpBVpBVpBV

TargetsTargetsTargetsTargets

GenesGenesGenesGenes

“Chemistry”

“Biology”

Backbone

“Chemistry”

“Biology”

Backbone

ChemSpecChemSpec

ActivityBaseActivityBase

BVT cpndBVT cpnd

BioPhysPropBioPhysProp

DecisionsDecisions

CIMSCIMSbCOOLbCOOL

ChemSpecChemSpecChemSpecChemSpec

ActivityBaseActivityBaseActivityBaseActivityBase

BVT cpndBVT cpndBVT cpndBVT cpnd

BioPhysPropBioPhysPropBioPhysPropBioPhysProp

DecisionsDecisionsDecisionsDecisions

CIMSCIMSbCOOLbCOOLCIMSCIMSbCOOLbCOOL

Example from Biovitrum

Page 6: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Activity, solubility, chemist etc

Query builder with structural searching

Navigate through all tables

BeeHive Overview

Page 7: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Query builder

•All unique values in drop-down lists •No hard-coded values•Easy to spot errors

Page 8: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Extraction of data for SAR analysis

•One compound per line

•Average IC50 and SD values

•Hill number from ActivityBase•Structure pop-up window

Page 9: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Systems and applications:BeeHive Modules That Uses JChem

• CIMS– Chemical Inventory Management System– Keeps track of all chemicals (bottle history, location, risk

phrases etc)– Replaced previous MDL system– Fully barcoded (bottles, shelves, people...)– Has improved compliance, reagent availability and speed

of inventory work

• Reagent Search– ACX database of chemical catalogues from CambridgeSoft– Cross-linked to CIMS– ”Give me all amines under 250 Dal and show in-house on

top of the list”

Page 10: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Reagent searching

Page 11: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Systems and applications:BeeHive Modules /cont’d/

• ChemSpec– Registration of all new compounds– Structure based logic for new compounds and batches– BVT (iNo) number assignment– Connection point for analytical data and requests– Used by all medicinal and analytical chemists

Page 12: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.
Page 13: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

What is next on the list?

• JChem Calculated properties on all molecule databases– pKa, logP, logD, ...

• Generation of diverse screening sets on the fly (BCUT?)

• ...

Page 14: Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia.

Summary - informatics

• Data sharing is crucial• Excel is not enough!• No database no modelling• Each organisation must define their meta data• You need a database administrator• Define the data structure first - applications can be

improved gradually