Revolutionizing Laboratory
Instrument Data for the
Pharmaceutical Industry:
How Semantic Technology is Helping Drive New
Standards for Data Management
Eric Little, PhD
VP Data Science
Oliver Hesse
Director Lab Automation & Data Mgmt.
Bayer
Slide 2
The Current Situation in the Lab
Many challenges exist for data to be captured, integrated and shared
Data Silos
Incompatible instruments and software
systems, proprietary data formats
Legacy architectures are brittle and
rigid
SME knowledge resides in people’s
heads, little common vocabulary
Data schemas are not explicitly
understood
Lack of common vision between
business units and scientists
2
Slide 3
How do we change this situation? What did the music industry
teach us?
Data in Standard Format
Metadata in a Standard vocabularyRegulatory GuidanceMethodsRecipesSOPs…
Vendor-Specific Formats
ProcessMaterial
EquipmentResult
Slide 4
The Structure of Allotrope Is Unique
4
•Subject Matter Experts
•Project Funding
Member
Companies
•Project Management
•Legal & Logistical Support
Secretariat
•Framework Development
•Technical Leadership
Professional
Software Firm
•Requirements & Specifications
•Contributions, PoC Applications
Partner Network
Slide 5
The Structure of Allotrope Is Unique
5
•Subject Matter Experts
•Project Funding
Member
Companies
•Project Management
•Legal & Logistical Support
Secretariat
•Framework Development
•Technical Leadership
Professional
Software Firm
•Requirements & Specifications
•Contributions, PoC Applications
Partner Network
AbbVie
Amgen
Baxter
Bayer
Biogen
Boehringer Ingelheim
Bristol-Myers Squibb
Eli Lilly
Genentech/Roche
GlaxoSmithKline
Merck & Co.
Pfizer
Slide 6
The Structure of Allotrope Is Unique
6
•Subject Matter Experts
•Project Funding
Member
Companies
•Project Management
•Legal & Logistical Support
Secretariat
•Framework Development
•Technical Leadership
Professional
Software Firm
•Requirements & Specifications
•Contributions, PoC Applications
Partner Network
Abbot Informatics
ACD/Labs
Agilent
Biovia
Bruker
BSSN
Cytobank
EPAM
Fraunhofer IPA
Global Value Web
IDBS
LabAnswer
Labware
LEAP Technologies
Mestrelab Research
Mettler Toledo
PerkinElmer
Persistent Systems
Riffyn
Qualitest
Rondaxe
Sartorius
Sciex
Shimadzu
Synthace
TetraScience
Thermo Scientific
Transcriptic
Unchained Labs
Waters
Zifo
Erasmus Univ. Med
Center
J. Paul Getty Trust
(UK) Science and
Technology
Facilities Council
University of
Southampton
University of
Strathclyde
Stanford University
Slide 7
The Allotrope Framework
Allotrope Data Format (ADF)
Allotrope Data Models (ADM)
Allotrope Foundation Ontologies (AFO)
Slide 8
The Allotrope Framework
Allotrope Data Format (ADF)
Graph Instances
Allotrope Data Models (ADM)
Constraints
Allotrope Foundation Ontologies (AFO)
Classes and Properties
is populated by
is structured by
provide standardized
vocabulary for
Slide 12
Allotrope Data Format (ADF)
HDF5
Platform Independent File Format
Allotrope Data Format (ADF)
Descriptive metadata about
• Method, instrument, sample, process, result, etc.
• Provenance, audit trail
• Data Cube, Data Package
Analytical data represented by one- or
multidimensional arrays of homogeneous data
structures.
Data represented by arbitrary formats, incl. native
instrument formats, images, pdf, video, etc.
Specifically designed to store and organize large
amounts of scientific data.
Data Description
Semantic Graph Model
Data Cubes
Universal Data Container
Data Package
Virtual File System
APIs
(Java &
.N
ET class libraries)
Slide 13
Allotrope Data Format Example
Platform Independent File Format
Data Description
Data Cubes
Data Package
Request Sample Method Data & ResultsRun
Chromatogram 2D HDFChromatogram 2D
HDF
Chromatogram: 3DChromatogram: 2D
Slide 14
The Foundation for Data Integrity & Analytics
Plan
Analysis
Prepare
Samples
Submit
Samples
Control Inst.
Acquire
Data
Process
Data
Analyze
Data
Reports
Results
Store,
Archive
Data
RequestReport
Find &
Reuse
Sample
Prep Data
Instrument
Instructions
Instrument
Data
Processed
Data
Analyzed
Data
Reported
ResultsStored Data
Analytical
Method
Allotrope Foundation Ontologies (AFO)
Taxonomies
MaterialEquip-
mentProcess Result
Proper-
ties
StabilityBatch
ReleaseSolubility …
HPLC MS NMR …
Allotrope Data Models (ADM)
Stability
Study
Batch Rel.
Study
Solubility
Study…
HPLC-UV
Experiment
MS
Experiment
NMR
Experiment…
Slide 15
Solubility Testing Example *)
Instrument
Level
LIMS/ELN
Level
Solid
Dispense
Liquid
DispenseConditioning Centrifuge Filter Dilution
HPLC
Analysis
Raman
Analysis
xRPD
Analysis
pH
Analysis
LIMS / ELN
Allotrope Foundation Taxonomies
Dispense OntologyConditioning
Ontology
Centrifuge
Ontology
Filter
Ontology
Dilution
Ontology
HPLC
Ontology
Raman
Ontology
xRPD
Ontology
pH
Ontology
Solid
Dispense
Data Model
Liquid
Dispense
Data Model
Conditioning
Data Model
Centrifuge
Data Model
Filter
Data Model
Dilution
Data Model
HPLC
Data Model
Raman
Data Model
xRPD
Data Model
pH
Data Model
Solubility Study Data and Metadata
*) Extensions planned after the initial public release
Solubility Testing
Ontology
Solubility Testing
Data Model
Slide 16
Allotrope Provisional Roadmap
4Q 16 1Q 17 2Q 17 3Q 17 4Q 17
AD
M
ADM 1.0 – Initial Standardized Data Models + Certification + Governance
Scoping ADM 1.0 Delivered
ADM 1.0 Tested
Public release
extensions
AD
F
ADF 1.2 – Regulatory Compliance
ADF 1.2 Delivered
ADF 1.2 Tested
Scoping
ADF 1.3 – Structural Robustness
ADF 1.3 Delivered
ADF 1.3 Tested
Public release
maintenance
AFO
AFO 1.2 – Structural Robustness + Governance
Scoping AFO 1.2 Delivered
AFO 1.2 Tested
Public release
extensions
Design
Bayer • Company Profile 2016Slide 18
Full year sales: €46.3 billion**
115,176 employees*
307 subsidiaries
R&D expenses: €4.3 billion***As of December 31, 2015 (including Covestro) / Employees: as of September 30, 2016 (including Covestro)
* excluding Covestro: 99,517 employees (in full-time equivalents)
** excluding Covestro: €34.3 billion *** excluding Covestro: €4.0 billion
Strategic areas of interestLeveraging Benefits of the Allotrope Framework
Bayer – Allotrope @ SmartData 2017Page 19
Allotrope
Implementation
Strategy
Analytical Method
Management
Transfer Analytical Methods
Archiving
Reprocessable data, long
term readable format , Data
Integrity
Instrument Integration
Electronic Workflows /
ELN & LIMS
Taxonomies
as Reference /Master Data
Assets & Instrument
Management
Internet of Things = live inventory
Data Lake
Post-Analysis of data / Data-
mining
External Collaboration
CRO Integration / Data & Method
Exchange
Application Interfaces
LIMS Connectivity, e.g. to CDS
TaxonomiesReference & Master Data as the Basis
Bayer – Allotrope @ SmartData 2017Page 20
Interview
Research
LIMS/ELN
Publish
Review
Instrument TaxonomiesHPLC / U-HPLCHPLC-MSAmino assaysELISAHTRFElectrophoresisBioanalyzerCapillary Electrophoresis SDS-PAGE/Western BlotiCIEF / iCEqPCRSpectrophotometerFortebio Octet/Blitz
BiacoreMycoplasmaACLMultiplex fluorescent Immunoassay (Mfi)Microtiter plate readers Potency TestingChromogenic PotencyCell-based potencyDownstream ProcessTaxonomiesTangential Flow Filtration (TFF)Prep. Chromatography
Analytical Method ManagementFrom ‘Text’ to Machine Readable
Bayer – Allotrope @ SmartData 2017Page 21 Taken from: Weller HN, Nirschl DS, Paulson JL, Hoffman SL, Bullock WH., ACS Comb Sci. 2012,14(9),
520-526. doi: 10.1021/co300075g.
Material
Process
(method)
Properties
Device
Results
Analytical Method ManagementAs-is: Interrupted Process for Setting up Analytics
Bayer – Allotrope @ SmartData 2017Page 22
LIMS
MANUALLYAssigned
Analysis
TEXT-BASEDMethod Description
MANUALLYtranscribed
HPLC-MSWorkstation 1
HPLC-MSWorkstation 2
HPLC-MSWorkstation 3
Analytical Method ManagementOur Vision
Bayer – Allotrope @ SmartData 2017Page 23
INTERNAL
10010101101011010101010101011010110101010101001010110101101010100101010101
01101011010101010100101011010110101010101001010110101101010101010010101101
01101010010101010101101011010101010100101011010110101010101001010110101101
01010010101101011010101010100101011010110101001010101010110101101010101010
01010110101101010101010010101101011010101001010110101101010101010010101101
01101010010101010101101011010101010100101011010110101010101001010110101101
01010010101101011010101010100101011010110101001010101010110101101010101010
01010110101101010101010010101101011010101001010110101101010101010010101101
01101010010101010101101011010101010100101011010110101010101001010110101101
01010010101101011010101010100101011010110101001010101010110101101010101010
01010110101101010101010010101101011010101001010110101101010101010010101101
01101010010101010101101011010101010100101011010110101010101001010110101101
01010010101101011010101010100101011010110101001010101010110101101010101010
0101011010110101010101001010110101101010101010010101101011010
DATA LAKECompanies’ secret data, IP
Knowledge, Research results
LIMS
DELIVERS work related
methods/data/information
Current work in LIMS/
ELN/etc. triggers
AUTOMATED
SEARCH
Information
Broker
Companies‘
analytical
scientist
PUBLIC
RESEARCHPublished Data,
Published scientific
information, Journals,
Patents
analytical_results.
adf
Slide 25
What is Data Science?
At OSTHUS Data Science has a special meaning
Data Science is more than just statistical analysis
We combine math-based approaches (statistics) with logic-based approaches (semantics)
Conceptual + Computational
Semantics
Provides the vocabularies, definitions, class structures, logical relationships and
conceptual models
Statistics
Provide computations, trending, analysis, learning over time from the data itself
Slide 26
AT OSTHUS LAB DATA SCIENCE IS
B IG ANALYS IS
STA
TIS
TIC
AL
SE
MA
NT
ICS
MA
CH
INE
LE
AR
NIN
G
RE
AS
ON
ING
Slide 27
Machine Learning is Becoming Increasingly Valuable
Very little is known to be certain
in one’s data – abductive
reasoning is needed
Capture what you can
semantically
The rest can be gathered directly
from the data (bottom up)
Hypotheses can be driven from
SMEs and past patterns of
success
Often success of predictive
systems rely on testing the
models
The accuracy of the model can be
helped using semantics
The tests over time can show
problems of fit (alignment)
“Shelf life” Example:
I have data over 2 years – shows a shelf life of “x”
(I have some level of truth for this compound)
Now I take a similar compound “y”
What is its shelf life?
I can make a better guess based on previous
reasoning (induction)
I make a best guess for the shelf life of “y”
Test hypothesis on new data sets
Outcome:
1. Ability to understand and optimize in a shorter
period of time
2. Taxonomies and ontologies can help
understand the trend over time
Slide 28
Smart Data for Smart Labs in the 21st
Century
Smart labs in the future will
provide the enterprise with:
Integrated Data – common
reference data structures
(vocabularies)
Sharable Data – easier interaction
across teams and business units
Scalability – Big data applications
that can be highly elastic
Conceptual Representations –
context and perspective are
captured
Advanced Analytics – complex &
automated problem-solving
capabilities