On chemical structures, substances, nanomaterials and measurements
Post on 11-Aug-2014
407 Views
Preview:
DESCRIPTION
Transcript
NINA JELIAZKOVAIdeaConsult Ltd.Sofia, Bulgariawww.ideaconsult.net
On chemical structures, substances, nanomaterials and measurements
04/07/2023
I D E A C O N S U LT LT D .
Sharing experience about:
OpenTox API and beyond
Chemical structuresSubstance identityExperimental data
challengesProtocolsNanomaterialsFinal thoughts
2
CONTENT
• EC FP7 2008-2011 OpenTox• Distributed framework for
predictive toxicology • Building blocks: data, chemical
structures, algorithms and models.
• Build models, apply models, validate models, access and query data in various ways;
• Tech: REST API, RDF
04/07/2023
I D E A C O N S U LT LT D .
DATASETS, MODELS
3
Open Melting Point Dataset #33
04/07/2023
I D E A C O N S U LT LT D .
PREDICTIONS
4
31 May 2013 : The REACH deadline for registering
substances [100 to 1000 tonnes per
year]
http://ToxPredict.net access statistics
04/07/2023
I D E A C O N S U LT LT D .
• AMBIT REST web services
OpenTox Application Programming Interface (API)Dataset web services Chemical search, data pooling, structure QAComputational web services Descriptor calculation, machine learning,
structure optimisation, tautomers Web Applications using AMBIT
REST web services New 2014: Embeddable JS widgets
5
AMBIT http://ambit.sf.net
04/07/2023
I D E A C O N S U LT LT D . 6
DATA CURATION EXAMPLE (DIISONONYLPHTALATE)
04/07/2023
I D E A C O N S U LT LT D . 7
DATA CURATION EXAMPLE (RN 25155-25-3)
04/07/2023
I D E A C O N S U LT LT D . 8
DATA CURATION EXAMPLE (RN 25155-25-3)
European Chemical Agency Registration dossier
04/07/2023
I D E A C O N S U LT LT D .
SUBSTANCE IDENTITY IN REACH• Guidance for identification and
naming of substances under REACH and CLP (118 pages)
• Substance characterization “During the first 5 months of 2009, around 450 enquiries were received by ECHA, 23% of which were rejected on the grounds that the dossiers were incomplete (e.g. missing spectral data) or the substance identity had not been sufficiently described.”
9
http://echa.europa.eu/documents/10162/13643/substance_id_en.pdf
04/07/2023
I D E A C O N S U LT LT D .
“Only a limited number of tools are capable to provide easily accessible data on substance identity, composition together with chemical structures and high quality and detailed endpoint data”
10
SUBSTANCE IDENTITY/COMPOSITION
04/07/2023
I D E A C O N S U LT LT D .
SUBSTANCE ENDPOINT DATA
11
OECD Harmonized templatesWell defined XML schema for > 100 endpointsExperimental protocols: OECD GuidelinesBioPortal ontologies coverage of OECD guidelines: None
PROTOCOLS, SOP, INVESTIGATIONS, STUDY, ASSAYS
Coordination
action
Systems biology
Human based organ
simulating
device
Endpoints &
markersHuman based
specific target cells
Integrated data analysis
and servicin
g
Computer
modeling &
estimation
techniques
SEP COACH
Towards the replacement of in vivo repeated dose systemic toxicity testing
SEURAT-1 ~ 70 research groups from European Universities, Public Research Institutes and Companies (more than 30%
SMEs) http://www.seurat-1.eu/
http://toxbank.net/FP7 Projects
G O A L SPrediction of repeated dose
toxicityShared repository of know-how
and experimental results from SEURAT-1 research activities and
relevant public sourcesExamples include:
Protocol describing a method for long term maintenance of functional hepatocytes
Results from a repeated dose 14 day transcriptomics study using acetaminophen and iPS-derived hepatocytes
T E C H N O L O G I C A L S O L U T I O N S• REST Web services API• Protocol service• Investigation service• RDF data model• ISA-TAB & ontologies • ISA-TAB converted to RDF• Stored in a triple store• Chemical search (AMBIT)
13
TOXBANK DATA WAREHOUSE
Challenges: • Diverse data types• Changing research
protocols• Data formatting
time consuming
• Data sharing - little incentive
FP7 ENANOMAPPER PROJECT
• Develop an ontology and database unifying information about nanomaterial safety (in humans and the environment)
• Cover the full lifecycle from manufacturing to environmental decay or accumulation
• Pan-European project, 7 partners• Ontology growth through community and re-
use
Objective: Safety by Design
04/07/2023
I D E A C O N S U LT LT D .
NANOINFORMATICS CHALLENGES• nanoSMILES• nanoInChI • Nanomaterial identity - only through characterisation
with multitude of experimental methods• Experiments reproducibility; standards• Experiments description (protocols, experimental
details)• Models: structure based cheminformatics doesn’t
really work• Common database? NO! But Yes! for an integrated search across databases! (requirement analysis feedback)
15
Nanomaterial “unique” challenge of identification?
04/07/2023
I D E A C O N S U LT LT D .
NANOMATERIAL ENDPOINT DATA
16
• Same data model as for substances (ISA-TAB inspired)
• NM specific measurement protocols
• Ontology support – under development eNanoMapper WP2 (Janna Hastings, Egon Willighagen)
04/07/2023
I D E A C O N S U LT LT D .
NANOMATERIAL SEARCH
17
04/07/2023
I D E A C O N S U LT LT D .
LESSONS LEARNEDWhat is more difficult:1. Succeed in implementing a “moving target”
API by a distributed team of developers.2. Succeed in bringing together several wet
lab teams to use a common tool/ format for preparing and sharing experimental data.
18
1. OpenTox: Partners succeeded in creating 5 independent implementations of the OpenTox API; through “rough consensus and running code”; most services are online and being used 3y after the OpenTox project completion; API being used and extended in related projects;
2. In ToxBank we’ve resorted to taking the role of “data managers” in SEURAT-1 cluster; a setup typical to most EU data projects.
04/07/2023
I D E A C O N S U LT LT D .
WHY DATA FORMATTING AND SHARING IS SO DIFFICULT?Thoughts about the technology aspects; not about the incentives to share• Data format – the more flexible the format is, the
more difficult is the data preparation;• Tools typically need to understand both data
modelling and the experimental setup;• Preparing and data sharing requires additional efforts,
which are typically not within the scope of the research projects;
• Typical setup is “data managers” or “Excel templates”
19
Compare with the easiness of sharing, liking and tagging pictures on social networks; liking and tagging essentially creates semantic knowledge!
04/07/2023
I D E A C O N S U LT LT D .
GUESS THE AUTHOR“This proposal concerns the
management of general information about experiments at ???.
It discusses the problems of loss of information about complex evolving systems and derives a solution based on a ???"
20
04/07/2023
I D E A C O N S U LT LT D .
TIM BERNERS-LEE , 1989“This proposal concerns the
management of general information about accelerators and experiments at CERN.
It discusses the problems of loss of information about complex evolving systems and derives a solution based on a distributed hypertext system."
21
http://www.w3.org/History/1989/proposal.html
Non-CentralisationInformation systems start small and grow. They also start isolated and then merge. A new system must allow existing systems to be linked together without requiring any central control or coordination.
04/07/2023
I D E A C O N S U LT LT D .
FINAL THOUGHTS• Facilitate researchers organize their own data locally;• The cost of entering /recording data should be low;• Easy to use tools;• Formats – understandable or hidden behind user
friendly tools;• Non-centralisation;• Added value:“The data-sharing environment must invite collaboration as well as facilitate it. Stakeholders have broad interests that go beyond retrieving existing data — they want to discover materials and forecast enhanced products”
22
http://www.nature.com/news/technology-sharing-data-in-materials-science-1.14224
04/07/2023
I D E A C O N S U LT LT D .
THANK YO
U
Q U E S T I ON S ?
23
top related