A ChemAxon/KNIME based tool for designing chemical libraries
Tim Parrott Dart NeuroScience
September 25, 2013
Brock Luty Dart NeuroScience ChemAxon UGM
Dart NeuroScience
Small molecules to maintain cognitive vitality (LTM)
Currently about 200 FTEs with build-out expected at 260
Privately held LLC by a single individual
Scientific Computing
Scientific Computing collaborates with other DNS Departments to deliver solutions that simplify and
accelerate the drug discovery process.
We rely on our (non-traditional) knowledge and experience in both Science and Technology to develop
novel and efficient systems to meet this goal
Scientific Computing Groups Bioinformatics
Philip Cheung Doug Fenger
+ 1 FTE
Information Management
+ 1 Group Lead
John Jaeger Tim Parrott James Harr
Eileen Tompkins Heather Jones
Methods Development
Ron Blanford Daniel Garden
Kevin Neal Hari Muddana
+ 1 FTE
Computational Chemistry
*Tami Marrone Meg McCarrick
James Na Amy Shih Bill Sinko
Project Support - Modeling - SBDD/Library Design - Apply Methods - Pre-LO/LO/PCC
Data / Biz Analysis - Data Capture - Analytics - Data Access - QA/Scientific Support - Project Management
Software Development - Informatics Software Development - Developing new methods - Enterprise Scale Architecture - RIA (MVC) with SOA - Extensions for ELN, Spotfire, IJC, etc
Project Support - Target ID - Expression Analysis / Pathways - Novel Software algorithms - Enterprise Software (with Methods)
Background
Dart NeuroScience (DNS)
200+ Scientists 50+ Chemists
Parallel Synthesis Group
About 20 chemists involved in the design and creation of chemical libraries
We need a
chemical library
design tool !
A Basic Chemical Library Design Tool
Enumerate Products
Calculate Properties
Analyze & Filter
Select Reactants
Design Test
Analyze
Synthesize
Goals
Support Ease of Use
Productivity
Standardize calculations & reactions (services)
Simplify: wrap processes and minimize import/export operations
Enhance capabilities and speed by doing calculations remotely
Constraints
Limited IT/IM support Chemists already on software overload
Approach
=
Chemical Property Calculations,
Reaction Enumeration
Data Pipelining
Visualization / Analytics
3D Scoring
Platforms
Architecture Heavily invested in Service Oriented architecture (Rest Style API) with
standardized DNS patterns
Domain CRUD (Create, Read, Update, Delete) GUIs written for specific entities using MVC pattern (relying on Backbone.js and standardized DNS patterns)
Traditional Stateless Computational Services (Property Calculation, Enumeration, etc)
Services can be based on Scripts using command-line applications (primary use-case). Services can also be written on KNIME and run in this architecture.
Move all the heavy lifting to the servers (automated parallelization). KNIME as a Service Orchestration Layer
Application
Service
Database
Brock’s Geeky Slide
Tool Overview
Selection & Configuration
Panel
Custom Nodes
Spotfire Export
Reactant Selection
Import curated classes of reactants
(CRUD Service)
Reactant Selection
Import list of Reagent Numbers
(CRUD Service)
Reactant Deduplication
Input Output
Need to identify and remove functionally equivalent reactants (Comp Service)
Reaction Selection
Reactions: A Look under the Hood
“Reactor” nodes can contain multi-
step workflows. (Comp Services)
Server-Side
Calculations
Clustering
Server-Side
Calculations --- OpenEye ROCS
ROCS output includes the Shape/Pose that scored best and the Tanimoto Score against that query. (Computational Service)
Pausing Local Execution
Export to Spotfire
Selections made in Spotfire
Spotfire Selections returned to KNIME
New nodes with selected products
& reactants appear in KNIME
Final Steps
The library design plan contains separate sdf files for the products and each reactant, along with a .csv file listing how many times each reactant is used. The zipped file is parsed on import into a chemist’s electronic laboratory notebook.
Stereochemical codes needed for registration are assigned based on structure. (Computational Service)
Load Library Design Plan into the Agilent ELN
Custom Forms for
planning and products tables
Summary
• June 2011 • June 2012 • Sept 2012 • November 2012 • April 2013 • August 2013
Parallel Synthesis Group formed First release of Library Design Tool (LDT) Additional KNIME training Second release (Clustering, ROCS) Pausable Nodes, Deduplication RN Lookup, Stereo Code Assigner 40 Total Reactions
Acknowledgments Node Development
Services & Deployment Testing and troubleshooting
Management & PM
loki der quaeler
Ron Blanford Karen Do Kenny Leung Zach Young Daniel Garden
Eileen Tompkins Andrew Burritt The SGC Team
Melanie Nelson Heather Jones Brock Luty