Top Banner
What’s new in JChem back- end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics
28

Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Mar 26, 2015

Download

Documents

Jose Morris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new in JChem back-end and Markush storage, search and enumeration

Szabolcs Csepregi

Solutions for Cheminformatics

Page 2: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Contents

• ChemAxon chemical database tools

• Main features of JChem Base, Cartridge

• Example interfaces: JSP, ASP, AJAX examples

• Integration with other CXN products

• Markush structure storage, search and enumeration

• Recent developments, plans

Page 3: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Chemical database products

JChem Base– A library for adding chemical structures into relational

database systems. Available in Java, JSP and .NET– Open-source web application example is available.

JChem Cartridge for Oracle– Extends Oracle SQL with chemical operators and index.– SQL interface for ChemAxon functionality

Instant JChem– An all-in-one desktop chemical database application.

JChem Web Services – SOAP interface to JChem Base

JC4XL – Excel integration (coming)

3

Page 4: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Compatibility and integration

Supported chemical file formats:• SMILES• MDL MOL/RXN/SDF/RDF (v2000 and v3000)• CML, MRV• IUPAC and traditional names• InChI, mol2, PDB, etc.

Database engines:• Oracle, MySQL, MS SQL Server, MS Access,

PostgreSQL, IBM DB2, Derby, etc.

All operating systems through:• Java API (JChem Base)• .NET API (JChem Base + IKVM) – for Windows• SQL (Cartridge)

4

Page 5: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Structure searching: features• Substructure, Similarity,

Full, Full fragment, etc. search types

• Wide range of query atoms

• Query properties

• R-group queries

• Full SMARTS support

• Coordination compounds

• Link nodes

• Pseudo atoms, Lone pairs

• Relative stereo

• Reaction search features

• Polymers

• Position variation

• Hit coloring ...

www.chemaxon.com/conf/Structural_Search.ppt

5

Page 6: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Structure searching: options

Some selected structure search options:– Chemical Terms filter constraint– Tautomer search– Stereo on/off– Ignore charge/isotope/radical/valence/polymers– Vague bond matching modes: „or aromatic”; ignore

bond types– Inverse hit list– Maximum search time / number of hits– SQL SELECT statement for pre-filtering– Ordering of results– etc.

6

Page 7: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Structure search: performance

7

JChem Base 5.2.0,

Intel Quad Q6600 2.4GHz,

8GB RAM; Oracle 10.2.0.3

Number of compounds

Elapsed time

Duplicates not checked

Duplicates checked

10,000 21 s 26 s

100,000 2 min 2 min 36 s

200,000 3 min 45 s 5 min 5 s

Query Number of hits Search time

2 0.81 s

93 0.79 s

5,855 1.457 s

142,950 11.076 s

Compound registration:

Substructure search in PubChem (19.5 million

compounds):

Page 8: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Table typesControl allowed chemical structures and available

operations

• Molecule

• Reaction

• Markush

• Query

• Any structure

8

Page 9: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Example web applications

Open source JSP, ASP examples– Marvin applets

are used for query drawing and structurevisualization

AJAX example– Back-end is JChem

Web Services– No Java is needed

for browsing

Demo

9

Page 10: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Integration

Integration with other ChemAxon tools: – Custom, uniform chemical representation. (Standardizer –

see separate presentation today.)– Automatically calculated properties by Chemical Terms

Calculated columns (Calculator plugins)– Additional similarity calculations (Screen - JChem Base

only) – Tautomer handling:

• Tautomer search

• Tautomer duplicate filter table/index option

• Custom tautomer transforms or canonical tautomer using Standardizer

– Query drawing and structure visualization (Marvin)Provides the most consistent interface and back-end.

10

Page 11: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Integration

Additional Cartridge functionality– JChem index (for non-JChem tables)– Communication with Oracle optimizer– Reaction based enumeration (Reactor)– Format conversions – image generation also– Markush enumeration (Calculator plugins)– Property predictions through Chemical Terms

(Calculator plugins)

11

Page 12: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Registration system

• New component for registration system is under development (API only)

• Main features:– Customizable business logic

• Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures

– Identification, split and registration of salt and solvent structures Storage of input structures in original format

– Mock registration (dry run)

– Pre-registration through a transitory area

– Basic, customizable implementation examples • Separate examples for chemists and registrars

• Web and Instant JChem interfaces will follow later

12

Page 13: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Handling of Markush structures

Page 14: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Markush structures

• Combinatorial Markush structure registration and search features handled in search and enumeration– R-groups (nesting to any depth)– Atom lists, bond lists– Position variation bond– Link nodes– Repeating units– Homology groups (aryl, alkyl, etc.)

• Built-in• User-defined

• Compatible Markush enumeration plugin

Page 15: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Markush Enumeration

• Markush enumeration plugin– Full enumeration– Selected parts only– Random enumeration– Calculate library size:

exact size of huge Markush libraries

arbitrary precision orMagnitude

– Scaffold alignmentand coloring

– Markush code– Optional example

homology groupenumeration

Page 16: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Markush storage & search

• Available in JChem Base and Instant JChem

• No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.)

• Substructure and Full structure search

• Basic query features supported

• Substructure hit visualization: „Markush structure reduction”

Page 17: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Markush demo

Page 18: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new

Page 19: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new: JChem Base

5.1– Position variation in queries– New fast & reliable tautomer duplicate search

5.2– .NET API– Polymer storage and search– New query options and features including searching of

attached data, group matching of undefined R-atoms, repeating units.

– Improved substructure search performance– JChem Web Services– New metrics for similarity search (Tversky, etc.) (5.2.2)

Page 20: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new: JChem Base

Polymer support details

• Polymer brackets and properties(type, connectivity, etc.) considered during search and registration

• Attached data search (optional) – attached to atoms/bonds/brackets

• Source- and structure-based representation equivalence is checked (but can be switched off)– Addition to a double bond. E.g. polystyrene.– Polymerization through elimination of water or HCl. E.g.

polyester, polyamide.

Page 21: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new: JChem Base

Polymer support details (cont.)

• Ladder type polymers

• Phase-shifting (for ht SRU) (can be switched off)

• End group matching:– * atoms: unspecified end groups– Search option to switch on/off end group matching

• Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod

• Polymer mixtures

• New search options

Page 22: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new: Cartridge-specific

5.1– Tautomer duplicate filtering index option– Alter index option– Improved import speed (5.1.3)– Improved upgrade: no need to remove/recreate indices

(5.1.4)

5.2– Interactive installer– Increased substructure search performance (5.2.2)– Tversky similarity search (5.2.2)

Page 23: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

What’s new: Markush

• New Features– Homology groups

• 19 built-in groups• Customizable:

– Examples (for built-in groups, enumeration only),

– Full user-defined homology groupsdefined by R-group definition

• Marvin templates for easier sketching

– Import reagent files as R-groups– Position variation and Repeating units

Page 24: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Plans

Page 25: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Plans: JChem Base & Cartridge

JChem Base

• Further speed improvements (SSS, similarity)

• New vague bond level options

• R-group decomposition integration

• Improved support for Screen molecular descriptors

Cartridge

• Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search

• User-defined descriptor fingerprints

• Markush tables and search

• JChem Server, JChem cluster

Page 26: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Plans: Markush

– .VMN import (format used by Merged Markush Service & Derwent World Patent Index)

– Multiple graphical attachment points of R-groups– Homology variation queries– Overlap analysis of Markush structures– Homology group properties (# of atoms, branching points,

# of heteroatoms, etc.)– Conditions for Markush variables

Page 27: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Summary

• JChem Base and Cartridge are comprehensive and efficient

• Markush structure storage, search and enumeration now reaching patent features coverage

• Continuous development, improvements in the pipeline

Page 28: Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

Find out more

• Product descriptions & linkswww.chemaxon.com/products.html

• Forumwww.chemaxon.com/forum

• Presentations and posterswww.chemaxon.com/conf

• Download

www.chemaxon.com/download.html