Top Banner
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE 1 DRUG DISCOVERY TODAY ELN Biology Therapeutic target Chemistry Check chemical feasibility Synthesize or buy Test Check ADME/Tox Report Analyze SAR Generate chemistry ideas In-house Knowledge survey ELN DBs Flatfiles Journals & Patents Journals Docs Known ligands
33

ICIC 2013 Conference Proceedings Sebastian Radestock

May 10, 2015

Download

Technology

Making hidden data discoverable: How to build effective drug discovery engines?
Sebastian Radestock (Elsevier, Germany)
In a complex IT environment comprising dozens if not hundreds of databases and likely as many user interfaces it becomes difficult if not impossible to find all the relevant information needed to make informed decisions. Historical data get lost, not normalized data cannot be compared and maintenance becomes a nightmare. We will discuss a new approach to address this issue by showing various examples and use cases on how in-house data and public data can be integrated in various ways to address the unique and individual needs of companies to keep the competitive edge.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICIC 2013 Conference Proceedings Sebastian Radestock

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

1

DRUG DISCOVERY TODAY

ELN

Biology

Therapeutic

target

Chemistry

Check chemical feasibility

Synthesize or buy

Test

Check ADME/Tox

Report

Analyze SAR

Generate chemistry ideas

In-house

Knowledge

survey

ELN

DBs

Flatfiles

Journals &

Patents

Journals

Docs

Known ligands

Page 2: ICIC 2013 Conference Proceedings Sebastian Radestock

2

Dr. Sebastian Radestock

Product Manager Reaxys

Elsevier Information Systems GmbH

Frankfurt am Main Germany

MAKING HIDDEN DATA DISCOVERABLE:

HOW TO BUILD EFFECTIVE DRUG

DISCOVERY ENGINES?

Page 3: ICIC 2013 Conference Proceedings Sebastian Radestock

3

DRUG DISCOVERY TOMORROW

Therapeutic

target

Chemistry

Check chemical feasibility

Synthesize or buy

Test

Check ADME/Tox

Report

Analyze SAR

Generate chemistry ideas

In-house

Knowledge

survey

ELN

Known ligands

DBs

Journals &

Patents

ELN

Flatfiles

Docs

Biology

Federated

search system

Chemistry

integrator

Biology

integrator

Integrator for

biomedical

data

Literature

management

tool

Integrator for

drug safety

Page 4: ICIC 2013 Conference Proceedings Sebastian Radestock

TWO APPROACHES TO SOLVING THE CHALLENGE OF DATA ACCESS

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

4

DRUG DISCOVERY TOMORROW

ELN ELN

Storage and capture Capture

FEDERATED MODEL

WAREHOUSE APPROACH

Storage Analysis

system

User

CHEMISTRY INTEGRATOR

Page 5: ICIC 2013 Conference Proceedings Sebastian Radestock

In-house

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

5

EXAMPLE IMPLEMENTATION OF THE FEDERATED MODEL

Storage and capture

FEDERATED MODEL

Analysis

system

User

CHEMISTRY INTEGRATOR

In-house

Expansion of the existing system by integrating Reaxys

Existing system for consolidating in-house structure and bioactivity data

Page 6: ICIC 2013 Conference Proceedings Sebastian Radestock

CONTAINS ALL PUBLISHED AND HISTORICAL CHEMISTRY DATA

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

6

THE REAXYS DATA BASE

• Compounds, substance property

data, preparations, reactions, and

bibliographic information…

…from 400 core chemistry journals

…from relevant chemistry patents

• Manual extraction of all the data

• Coverage of 500 property data fields,

from basics like boiling point or

melting point, via crystal data and

magnetic properties, to spectra

• All together 750 million data points

• Historical chemistry data…

…dating back to 1771

Scie

nti

fic d

ata

model

Page 7: ICIC 2013 Conference Proceedings Sebastian Radestock

SUPPORTING MULTI-DISCIPLINARY RESEARCH

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

7

EXPANDED BIBLIOGRAPHIC CONTENT IN REAXYS

• Bibliographic data from 16.000

periodicals covering chemistry and

related sciences have been loaded

into Reaxys

• This goes beyond journals and

patents, it includes conference

proceedings, business articles,

reviews etc.

AGRICULTURAL AND BIOLOGICAL SCIENCES

BIOCHEMISTRY, GENETICS AND MOLECULAR BIOLOGY

CHEMICAL ENGINEERING

DENTISTRY

EARTH AND PLANETARY SCIENCES

ENERGY

ENGINEERING

ENVIRONMENTAL SCIENCE

IMMUNOLOGY AND MICROBIOLOGY

MATERIALS SCIENCE

MEDICINE

NEUROSCIENCE

PHARMACOLOGY, TOXICOLOGY AND PHARMACEUTICS

PHYSICS AND ASTRONOMY

VETERINARY

ETC.

old

new

Page 8: ICIC 2013 Conference Proceedings Sebastian Radestock

THE NEXT STEP… COMING 2014

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

8

REAXYS-TREE AND AUTOMATIC INDEXING

Title/Abstract

The Biosynthesis of Aristeromycin. Conversion of Neplanocin A to

Aristeromycin by a Novel Enzymatic Reduction

Partially purified cell-free extracts of the aristeromycin producer

Streptomyces citricolor have been shown to catalyze the NADPH-

dependent reduction of neplanocin A to aristeromycin. Stereochemical

studies revealed that the reduction proceeds with anti-geometry and

involves transfer of the 4 pro-R hydrogen atom of NADPH to the 6'β

position of aristeromycin.

Reaxys Keywords: Aristeromycin – biosynthesis, enzymatic

reduction, Neplanocin A

Index for chemistry terms

Translate chemical names into structures and make them searchable

Step 1

Add chemistry relevant keywords and identified chemical entities

Step 2

Step 3

Page 9: ICIC 2013 Conference Proceedings Sebastian Radestock

ACCESS TO REAXYS IS VIA THE REAXYS APPLICATION PROGRAMMING INTERFACE (API)

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

9

FEDERATED SEARCH SYSTEM WITH REAXYS LOOK-UP

How should the API look

like?

Page 10: ICIC 2013 Conference Proceedings Sebastian Radestock

LESSONS LEARNED

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

10

THE REAXYS APPLICATION PROGRAMMING INTERFACE (API)

• Customers want to have access to all data in Reaxys

• Substances and substance property data

• Reactions and reaction details

• Citations

• Customers want to have access to all functionality of Reaxys

• Exact structure and reaction searching, similarity and substructure searching

• Factual queries

• Further processing of hitsets

• The Reaxys API was designed to be based on exchanging XML code between the user

and the Reaxys server via HTML POST requests

• Security and usage tracking is an issue

• Secure communication via HTTPS POST is supported

• The Reaxys API is stateful, and login is required

Page 11: ICIC 2013 Conference Proceedings Sebastian Radestock

In-house

SOME DISADVANTAGES TO CONSIDER

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

11

LIMITATIONS OF THE FEDERATED MODEL

Storage and capture

FEDERATED MODEL

ELN

Analysis

system

User

CHEMISTRY INTEGRATOR

In-house

Architecture allows easy expansion when new data source becomes available

Performance and availability of the system is dependent on the source systems

No clean-up and normalization of the data

Page 12: ICIC 2013 Conference Proceedings Sebastian Radestock

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

12

EXAMPLE IMPLEMENTATION OF THE WAREHOUSE APPROACH

Storage

Capture

WAREHOUSE APPROACH

Analysis

system

User

In-house

Bioactivity data was normalized, and structures were de-duplicated

A customer set up a system that contains structure and bioactivity data, and IP information

All data was extracted, translated and loaded to fit into one unified data model (UDM)

CHEMISTRY INTEGRATOR

Structures

Page 13: ICIC 2013 Conference Proceedings Sebastian Radestock

STRUCTURE DATA FROM REAXYS COMES FROM THE REAXYS STRUCTURE FLAT FILE

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

13

PLATFORM FOR STRUCTURE AND BIOACTIVITY DATA

Page 14: ICIC 2013 Conference Proceedings Sebastian Radestock

SOME DISADVANTAGES TO CONSIDER

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

14

LIMITATIONS OF THE OF THE WAREHOUSE APPROACH

Storage

Capture

WAREHOUSE APPROACH

Analysis

system

User

In-house

Long implementation time and associated high cost

Difficult to accommodate differences or changes in data types

CHEMISTRY INTEGRATOR

Structures

Page 15: ICIC 2013 Conference Proceedings Sebastian Radestock

A CONTENT INTEGRATION SOLUTION THAT IS NOW AVAILABLE TO ALL REAXYS USERS

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

15

EXAMPLE IMPLEMENTATION OF A FLEXIBLE APPROACH

Capture

FEDERATED MODEL

WAREHOUSE APPROACH

User Structures from PubChem and eMolecules have been integrated into Reaxys

Storage and capture

Pricing

Real-time commercial availability and pricing information comes via an eMolecules API

Structures

Structures

Storage

Using multiple storage systems eliminates the need for one UDM

CHEMISTRY INTEGRATOR

Page 16: ICIC 2013 Conference Proceedings Sebastian Radestock

16

The substance crosslinking

icon allows to switch to

corresponding substances in

other data sources

Results are available from

different data source tabs

Page 17: ICIC 2013 Conference Proceedings Sebastian Radestock

17

Filter for substances that

are/aren’t contained in

other data sources

Results are available from

different data source tabs

Page 18: ICIC 2013 Conference Proceedings Sebastian Radestock

18

Filter for substances that

are/aren’t contained in

other data sources

Results are available from

different data source tabs

Page 19: ICIC 2013 Conference Proceedings Sebastian Radestock

19

The commercial availability

icon allows to check real-time

pricing information from

eMolecules

Page 20: ICIC 2013 Conference Proceedings Sebastian Radestock

20

PubChem property

headers with direct

links to PubChem

Page 21: ICIC 2013 Conference Proceedings Sebastian Radestock

LESSONS LEARNED

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

21

FLEXIBLE APPROACH FOR INTEGRATION

• Reaxys has proven to be extremely powerful as analysis and database system

• Separation of the data from different data sources into multiple storage

systems is the way to go…

• … if a powerful crosslinking mechanism is in place

• Some pieces of information that are subject to frequent updates should be

integrated using the federated model

Page 22: ICIC 2013 Conference Proceedings Sebastian Radestock

A CONTENT INTEGRATION SOLUTION THAT ELSEVIER BUILT FOR ROCHE

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

22

EXAMPLE IMPLEMENTATION OF A FLEXIBLE APPROACH

FEDERATED MODEL

WAREHOUSE APPROACH

User Integration of Roche proprietary data on chemistry experiments

Storage and capture

Pricing

Reaxys with PubChem and eMolecules integrated

ELN In-house Reactions

Capture

Structures

Structures

Storage

CHEMISTRY INTEGRATOR

Page 23: ICIC 2013 Conference Proceedings Sebastian Radestock

THE SITUATION AT ROCHE YESTERDAY… AND TODAY

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

23

INTEGRATION OF ROCHE IN-HOUSE DATA

Page 24: ICIC 2013 Conference Proceedings Sebastian Radestock

https://reaxys.roche.com

24

Normalized

Roche-specific

reaction data

Data on references

and/or experiments,

including PDF links

Roche-specific data is included in

the Output (PDF, MS-Word etc.)

Filters on

Roche-specific

data fields

Up to four Roche reaction

data sources are supported

Page 25: ICIC 2013 Conference Proceedings Sebastian Radestock

https://reaxys.roche.com

25

A Roche icon allows

to switch to a Roche

in-house repository

The reaction crosslinking icon

allows to switch to

corresponding reactions in other

data sources

Page 26: ICIC 2013 Conference Proceedings Sebastian Radestock

26

Start building a synthesis tree by

clicking on the synthesize link

https://reaxys.roche.com

Page 27: ICIC 2013 Conference Proceedings Sebastian Radestock

27

https://reaxys.roche.com

Synthesis planner

opens up

The first step of the synthesis

plan is selected from the

Roche data source

Page 28: ICIC 2013 Conference Proceedings Sebastian Radestock

28

One step has

been added

https://reaxys.roche.com

Add a second

step

Page 29: ICIC 2013 Conference Proceedings Sebastian Radestock

29

New reactions are

loaded

https://reaxys.roche.com

The second step of the

synthesis plan is

selected from Reaxys

Page 30: ICIC 2013 Conference Proceedings Sebastian Radestock

30

Roche

Experimental details of the

“mixed” synthesis plan are

summarized in a table.

Another step has

been added

https://reaxys.roche.com

Page 31: ICIC 2013 Conference Proceedings Sebastian Radestock

CUSTOMER FEEDBACK

DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE

31

INTEGRATION OF ROCHE IN-HOUSE DATA

• Usability and acceptance tests by Roche

showed:

• Increased productivity of researchers at

Roche

• Increased discoverability of the Roche

reaction content

• Reduced maintenance effort for Roche:

• Legacy systems were decommissioned

• Roche gets on-going maintenance and

functionality improvements by Elsevier

• Not compromise in security

• Flexible approach:

• Additional data sources have been

added

Page 32: ICIC 2013 Conference Proceedings Sebastian Radestock

32

DRUG DISCOVERY TOMORROW

Therapeutic

target

Chemistry

Check chemical feasibility

Synthesize or buy

Test

Check ADME/Tox

Report

Analyze SAR

Generate chemistry ideas

In-house

Knowledge

survey

ELN

Known ligands

DBs

Journals &

Patents

ELN Flatfiles

Docs

MedScan

Biology

Federated

search system

Page 33: ICIC 2013 Conference Proceedings Sebastian Radestock

THANK YOU – QUESTIONS?

Dr. Sebastian Radestock Product Manager Reaxys

Elsevier Information Systems GmbH Frankfurt am Main, Germany

[email protected]

33