Java Solutions for Cheminformatics March 2005
Mar 26, 2015
Java Solutions for Cheminformatics
March 2005
About Us
About UsMolecule Drawing and Visualization
Structure Searching
Cartridge
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
History
Formed:
1998 Budapest, Hungary
Skills base:
• Chemistry,
• Software development,
• Predictive tools
Aim:
Platform independent software for chemistry
Highlights
• 1998: Custom projects
• 1999: Java tools for sketching/viewing structures
• 2000: Structure database support
• 2001: Clustering and diversity analysis
• 2003: Pharmacophore screening, property predictions, reaction processing, fragmenting
• 2004: Cartridge technology, virtual synthesis, improved SMARTS support
People
Developers: 17 (7 Phd, 10 MSc)
Technical expertise
• Cheminformatics
• Synthetic and physico-chemistry
• Virtual drug design
• Java
• Web technology
Business Support: 3 (1 MSc, 2 BSc)
Commercial expertise
• Negotiation & contracting
• Relationship management
• Collaboration steering and development
• Strategic marketing
• Mutually benefitial (win win) business relationships
Selected Application Areas
Global licenses
Custom development projects
Value added constructions
Websites/portal front and back end
Educational
Product developmentC
hem
ica
l dra
win
g
19981998 20032003
JChem
Applets, Molfiles, stereo support, Windows, Unix
SMILES, SMARTS, PDB, Rgroups, isotopes, shortcuts, Marvin Beans
Ball and stick, JPG, PNG, SVG, Cut&Paste with Isis/ChemDraw, 2D cleaning, (de)aromatization, reactions
2002200220002000 2001200119991999
SDF, RDF, XYZ animations, CML, templates, compressed formats, Swing, 3D rendering
Mac support, signed applets, Java Web Start, atom mapping
Partial charge,
pKa, logP, logD,
3D generation, radicals, Sgroups
Oracle, MySQL, SQLServer, Access, hashed fingerprints, substructure and similarity searching
DB2, PostgreSQL, Rgroup searching
reaction searching, reaction processing, pharmacophore analysis. screening, standardization, fragmentation
clustering, diversity
Marvin
20042004
Marvin file format, enhanced stereo, enhanced SMARTS support, shapes, text boxes, multiple groups, TPSA, Donor/Acceptor...
cartridge, enhanced stereo searching, recursive SMARTS, chemical expressions, virtual synthesis… S
truc
ture
Da
tab
ase
and
C
hem
info
rma
tics
too
lkit
Current Products Overview
Multiple Deployment Formats
• Applications
• Java Applets
• Signed Java Applets
• Java Web Start
• Java Beans
• Plugins
• JSP
Why ChemAxon?
• Sophisticated virtual chemistry technology
• Platform independence and Web (Java)
• High performance tools (speed, capacity)
• Client oriented development
• Comprehensive API for the developers
• Detailed documentation
• Competitive prices
• Fast and reliable support
Product Support
• Fast response to support question – max. 24 hour response (fast solution also!)
• Final and beta releases available online.
• Detailed documents available online and extensive help bundled within software
• Skilled and relevant human support quality (direct developer to developer)
• Product development based on support requests
„Developers supporting developers”
Molecule Drawing and Visualization
About Us
Molecule Drawing and VisualizationStructure Searching
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Operating Systems
• 100% pure java
• Windows– 95, 98, Me,
NT, 2000, XP
• Macintosh– OS 9, OS X
• Unix– Linux, Solaris, Irix,
etc.
Web Browsers
• Internet Explorer
• Netscape
• Mozilla
• Safari
• Opera
Marvin
• Various file formats
• Isotopes, charges, radicals
• Alias, pseudo atoms
• Templates
• Abbreviated groups
• Reactions
• Atom maps
• R-groups
• Stereo bonds, stereo configurations (R/S, E/Z)
• Enhanced stereo(ABS/AND/OR)
• SMARTS properties (atoms, bonds, recursive SMARTS)
• Chemical error checking
• Generic atoms and bonds
• Atom lists and not lists
• 2D cleaning
• 3D cleaning
• Various 3D models
• Shapes, text boxes
• Plugins
Various File Formats
Isotopes, Charges, Radicals
Templates
Abbreviated Groups
R-groups
Reactions
Rendered 3D displays with MarvinSpace
Structure Cleaning
CC(C)NCC(O)COC1=C2C=C(C)NC2=CC=C1
3D3D2D2D
topologytopology
Structure Searching
About Us
Molecule Drawing and Visualization
Structure SearchingCartridge
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
• Rapid fingerprint-based database scanning
• Sophisticated graph-based searching
• Integration with databases– Oracle– MS SQL Server– DB2– MYSQL– PostgreSQL– InterBase– Access
• Custom standardization
• JChem Cartridge for searching in Oracle
• JSP integration
JChem Base Features
Import with JChem Base Manager
• Exact structure
• Substructure
• Atom lists and notlists
• Explicit hydrogens
• Generic atoms
• Generic bonds
• SMARTS atom properties– Aliphatic, aromatic– Hydrogen count– Connection count– Valence– Ring count– Smallest ring size– Recursive SMARTS
• Stereo atoms
• Stereo bonds
• R-group queries– R-groups– Occurence– if / then conditions– RestH
• Reaction search– Transformation recognition– Component identification– Stereospecific reactions
(inversion, retention)
• Diastereomers– Enhanced stereo groups
(Abs, And, Or)
Query Features
JChem Base JSP Integration
Thin client support: only a web browser and Java required
Cartridge Technology
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge TechnologyStructure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
JChem Cartridge for Oracle
Oracle can be extended to support chemical database operations using the JChem Cartridge for Oracle
Examples:
Substructure search displaying ID, SMILES codes, and molweight:
SELECT cd_id, cd_smiles, cd_molweight FROM my_structuresWHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1;
Finding benzene derivatives conforming the Lipinski’s rule of five:
SELECT count(*) FROM my_structures WHERE jc_compare(structure, 'c1ccccc1','sep=!t:s!ctFilter:(mass() <= 500) &&(logP() <= 5) && (donorCount() <= 5) &&(acceptorCount() <= 10)') = 1;
JChem Cartridge for Oracle
JChem Cartridge for OracleJChem Cartridge for Oracle
Example Oracle search returning similar structures with logP >1, which were acquired after April 14th, 2002. MarvinView below.
Structure Standardization
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure StandardizationMolecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Standardization
• Explicit hydrogens
• Aromatic bonds
• Mesomers
• Tautomers
• Counterions
Standardization Example
afterafterbeforebefore
Molecular Predictions
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular PredictionsChemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Calculator Plugins
Available Calculations
• Elemental analysis
• Charge distribution
• Polarizability
• pKa
• logP
• logD
• Polar surface area
• Huckel Analysis
• H-bond donor-acceptor
• Major microspecies
• Refractivity
Calculation Interface
• Marvin GUI
• Command line
• Chemical Terms
• API
Elemental Analysis
Polar Surface Area
Partial Charge Distribution
Partial Charge Distribution Calculation
Partial Equalization of Orbital Electronegativities (PEOE)
Orbital electronegativity defined by Mulliken
Orbital electronegativity of atom i:i=at+btqi+ctqi
2 qi: partial charge
Partial charge of atom i is iteratively calculated based on Gasteiger’s method:
i(0)
= atqi(0)
= 0qi
(n+1) = qi
(n) + )n(i- k)/ max(i, k)k: index of a neighbor of atom i
Polarizability
logP
logP = fi
fI: atomic logP increment
logP Example
Validation of the logP prediction
logD
logD is computed using micro ionization constants (ki), micro partition coefficients (pi), and pH
123(0)
1+(1)
2+(2)
3-(3)
1+2+(4)
1+3-(5)
2+3-(6)
1+2+3-(7)
k1
k2
k3
k4
k5
k6
k7
][H][H
][H][H][H
1
]H[][H
][H][H][H
loglog
41
7
2
6
1
5
41
23
21
species ionized-tri
41
77
species ionized-di
2
66
1
55
41
2
4
species ionized-mono
33
22
11
species neutral
0
kkk
kppp
kkppppp
D
kk
kk
kkk
kk
kk
kk
kkk
kk
logD Example
pKa
pKa Plugin - Microconstants
Micro ionization constants (logk) are calculated from regression equations that have three types of calculated parameters:
Polarizabilities
Partial chargesIntramolecular interactions
logk
Macro ionization constants (pKa) are calculated from the microconstants (logk)
pKa Plugin - Macroconstants
Ionization scheme
1- 1-2+
123 2+ 1-3- 1-2+3-
3- 2+3-
1
2
3
Hydrogen Bonds in pKa Calculation
logk = a qi - qk) + ba,b: regression parameters
Intramolecular hydrogen bonds are also taken into account
Validation of the pKa prediction
Chemical Expressions
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical ExpressionsScreening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Chemical Terms
searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300);
goal functions inhibitor = inhibitor.mol;
(similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5);
filtering (mass() <= 500) &&
(logP() <= 5) &&
(donorCount() <= 5) &&
(acceptorCount() <= 10);
• structure matching functions (describing functional groups, reaction sites, similarity…)
• property calculations (partial charge distribution, pKa, logP, electrophility…)
• arithmetic and logic-operators
Elements of the language
Chemical Terms examples
Applications of Chemical Terms
CT
virtual synthesisreaction and synthesis rules
pharmacophore analysispharmacophore definitions
drug designgoal functions
structure searchingadvanced query expressions
Screening
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
ScreeningClustering
Fragment Analysis
Virtual Synthesis
Current Developments
Pharmacophore Mapping
■ hydrophobic (h) ■ aromatic (r)■ acceptor (a) ■ acceptor / donor (a/d)■ donor / cationic (d/c) ■ donor / aromatic (d/r)
atom type colorsatom type colors pharmacophore type colorspharmacophore type colors
Topological Pharmacophore Fingerprint
h h
h
h
h
h
h
d/+
r/d
r
r
r
r
r
r
rr d/a
d/a
Hypothesis Fingerprints
Advantages Disadvantages
Minimumstrict selection of common features
very sensitive to one missing feature
Averagenot that sensitive to
outliersless selective if actives
are similar
Dissimilarity Metrics
Euclidean
• standard
• normalized
• weighted
• asymmetric
Tanimoto
• standard
• scaled
• asymmetric
Screening Optimization
10,000 test compounds (from NCI)
50 active compounds(ß-adrenoreceptor antagonists)
9,700validation
300optimization
1/3training set
1/3spikes
1/3query set
TRAINING
VALIDATION
Screening Validationß2-adrenoreceptor antagonists
All compounds: 9,700Known active compounds: 18minimum hypothesis
before optimization
after optimization
all hits 2,476 18
known active hits 15 18
enrichment 3.27 539.89
Mixing 18 active compounds with random 9,700 NCI molecules. Sorting by pharmacophore similarity.
Active Hit Distributionß2-adrenoreceptor antagonists
Screening Validation
10,000 NCI compounds before optimization after optimization
family actives all hits active hits enrichment all hits active hits enrichment
ACE 7 6,537 6 1.27 171 6 47.01
Angiotensin2 4 177 3 40.40 66 3 105.50
D2 5 417 5 22.90 31 5 269.08
delta 7 60 5 106.70 9 5 495.25
FTP 13 1020 11 7.97 13 10 422.30
mGluR1 7 1744 3 2.38 10 7 571.10
NPY Y5 49 6370 38 1.18 145 45 47.12
thrombin 3 328 2 19.6 57 2 109.64
Optimized ScreeningJSP Example
Optimized ScreeningJSP Example Hits
Clustering
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
ClusteringFragment Analysis
Virtual Synthesis
Current Developments
JKlustor
• Jarvis Patrick
• Ward
• Ward's minimum variance method
• Murtagh's reciprocal nearest neighbor (RNN) algorithm
• O(n2) time complexity
• O(n) memory complexity
Ward Clustering Features
• 8 active compound sets– 5-HT3-antagonists– ACE inhibitors– angiotensin 2 antagonists– D2 antagonists– delta antagonists– FTP antagonists– mGluR1 antagonists– thrombin inhibitors
Ward Pharmacophore Clustering Example
Ward Centroids
A Ward Cluster D2 antagonists
Maximum Common Substructure Clustering
Drug Design
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment AnalysisVirtual Synthesis
Current Developments
RECAP fragmentation example
amide:1amine:2 ether:2
ether:1
amine:1
amide:2
Virtual Synthesis
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual SynthesisCurrent Developments
The Ideal Virtual Reaction
• Generic (simple)– the equation describes the
transformation only– few hundred generic reactions can
form the basic armory of a preparative chemist
• Specific (complex)– chemo-, recognizes reactive and
inactive functional groups– regio-, "knows" directing rules– stereo-, inversion/retention
• Customizable– to improve reaction model quality
• Processing selective "smart" reactions
• Batch mode (sequential or combinatorial combinations)
• Reverse direction
• High performance (speed and capacity)
Customizable Reaction Engine!
Reaction Modeling
Chemoselective Reaction Definition
REACTIVITY: !match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1) && !match(ratom(3), "[N,O,S:1][C,P,S]=[N,O,S]", 1)
Reactants
2920 amines, alcohols and thiols
369 isocyanatesand isothiocyanates
Chemoselective Reaction Products
1,264,391 single site products
Regioselectivity (Markovnikov, Zaitsev)
An elimination reaction definition with Zaitsev’s rule.
r2
Addition reaction definition with the Markovnikov rule.
r1
SELECTIVITY: hcount(ratom(2))
SELECTIVITY: -hcount(ratom(2))
Regioselective Reaction Example
Chlorine migration example in four steps by consecutive elimination and addition reactions.
r2 r1
r2 r1
Regioselectivity (SeAr)
Reaction definition of aromatic electrophile bromination of the benzene ring. The expression defines a regioselectivity rule for the major product.
SELECTIVITY: -charge(ratom(1))TOLERANCE: 0.0045
Regioselectivity (SeAr) Products
The virtual bromination of toluene with the above reacton definition results the ortho and para isomer as main product…
… and bromine is directed into the meta position in case of nitro-benzene.
Regioselectivity (SeAr) Example Products
1,198 monobrominated main products(tolerance is set to zero)
• Multiple steps
• Flexible compound dispatching
• Synthesis rules
• Synthesis tree building
• Memory, file and database mode
• Graphical synthesis browser
• Building block coloring
Customizable Synthesis Engine!
Virtual Synthesis
Synthesis Example
Derek S. Tan, Michael A. Foley, Matthew D. Shair, Stuart L. Schreiber*, J. Am. Chem. Soc., 1998, 120, 8565-8566
lacton aminolysisalkyne coupling
esterification
Synthesis Definition
Component set definition
Set1: ASet2: B1, B2, B3Set3:Set4: D1, D2Set5:Set6: F1, F2Set7:
"Smart" reaction library
R1: alkyl-iodid + alkyne >> alkyl-alkyneR2: lacton + amine >> amideR3: alcohol + carboxylic acid >> ester
Synthesis route definition
Step1: A + B CR1
Step2: C + D ER2
Step3: E + F GR3
Synthesis Browser
Current Developments
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Recent Developments
• Automatic searching of low-energy conformers
• Improved Oracle cartridge
• Structure searching combined with chemical calculations
• Exhaustive Synthesis for metabolism applications
• R-group decomposition
• Maximum common substructure search in molecule pairs and in libraries
Current Developments
• MarvinSpace, an OpenGL based 3D molecule and surface visualisation engine for small and macromolecules
• Instant JChem Base, a desktop and enterprise chemical database client with form builder
• IUPAC naming plugin
• Isoelectric point plugin
• Random Synthesis for building up a diverse virtual space of synthetically feasible compounds
• Extension of the reaction library
• Further descriptors in the Topology Analysis plugin
Future Plans
• Metabolic transformation library
• Diverse database of synthetically accessible compounds
• Search in Markush compounds
• Peptide builder
• Fragment-based activity analysis of compound libraries
• AnalogMaker (fragment based random evolutionary analog design)
• Retrosynthesis
Visit us
• Home page– www.chemaxon.com
• Forum– www.chemaxon.com/forum
• Animated demos and tutorials– www.chemaxon.com/demos
• Presentations and posters– www.chemaxon.com/conf
Máramaros köz 3/a Budapest, 1037Hungary
www.chemaxon.com
Thank you for your attention