July 2009 Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration The Component Collection: Quick facts • Provides access to ChemAxon tools from Pipeline Pilot • Free of charge • Open source : Java sources are also included • Available from ChemAxon or Accelrys • Latest version : 1.6 (as of September 2009) – Requires: • JChem / Marvin 5.1.3 or newer • Pipeline Pilot 6.1.1 or newer
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
July 2009
Szilárd Dóránt
Scientific & technical Presentation
Pipeline Pilot Integration
The Component Collection: Quick facts
• Provides access to ChemAxon tools from Pipeline Pilot
• Free of charge• Open source : Java sources are also included• Available from ChemAxon or Accelrys• Latest version : 1.6 (as of September 2009)
– Requires: • JChem / Marvin 5.1.3 or newer• Pipeline Pilot 6.1.1 or newer
Available functionality (1/2)• Standardizer: structure canonicalization
• Chemical Terms expressions for filtering and calculations (including logP, logD, pKa, HBD, HBA, Isoelectric point, PSA and more)
• Reactor : “smart” virtual reaction processing
• Maximum Common Substructure (MCS) based clustering
• IUPAC Name <-> Molecule conversion (both directions)
• JChem chemical database: insertion, search and retrieval of structures; create and drop structure tables
Available functionality (2/2)• Marvin applets: structure visualization and editing
• Major microspecies (major protonation form)
• Microspecies distribution
• Burden eigenvalue descriptor (BCUT)
• MolConverter: conversion of the wide range of structure formats supported by ChemAxon
• Markush enumeration: enumeration of Markush(generic) structures (e.g. R-groups, link nodes, atom-and bond lists and many more)
Structure to name and name to structure conversion
Clustering with LibMCSMaximum Common Substructure (MCS) based clustering
• Size of smallest common substructure to consider
• Three levels of heuristics:– Exact (no heuristics)– Fast– Very Fast
• Bond type, atom type, charge can optionally be ignored
• Disallow “breaking” rings (default)
Options:
More on LibMCS
• File input
• Enumeration type:– Sequential– Random
• Number of enumerated structures can be limited (per input structure)
• Valence filter
• Scaffold alignment
• Markush code generation. The scaffold ID can be:
– fetched from data field– generated (prefix + number)
Markush EnumerationEnumeration of generic structures
More on Markush Enumeration
TautomerizationComponent for tautomer generation
• Calculation modes:– All tautomers– Canonical tautomer– Generic tautomer– Major tautomer– Dominant tautomer distribution
• Options:– Protect aromaticity, charge,
double bond stereo, tetrahedral stereo
– Exclude antiaromatic compounds– Single fragment mode– Consider pH at specific value
More on Tautomerization
Conformer generation
Component for 3D conformer generation
• Calculation modes:– Multiple conformers– Lowes energy conformer
• Options:– Maximum number of conformers– Diversity limit– Optimization limit, hyperfine option– Time limit– Generate with explicit H atoms– Energy unit kcal/mol or KJ/mol,
into arbitrary property
More on conformer generation
MolConverter“Swiss army knife” for molecular format conversion• Input and output can either be
– File– Property– Pipeline Pilot Molecule
• Specified input format or auto-detection
• Various output formats or custom format string
• Option to halt or continue on error, error messages put into property
• 2D cleaning (coordinate generation) only when needed (default). Unconditional 2D or 3D cleaning or no cleaning can also be selected
More on supported file formats
Database Connection
• Provides a convenient way to define a JDBC connection parameter set within a protocol
• Other JChem Base components refer to this parameter set by a symbolic name (e.g. “myConnection”)
• Multiple instances may be used in a protocol if needed
• Each component creates its own JDBC connection to the database according to these parameters
JChem Base Insert
• Returns cd_id (primary key) values
• Two input modes:– read structure source from a
specified property– if property not specified uses
Pipeline Pilot input molecule
• Insert into additional data fields
Inserts structures into a JChem Base table • Duplicate filtering option (using Pass and Fail ports)
More on JChem Base
• Wide range search options
• Output can be primary key (cd_id) or Molecule
JChem Database Search
Structural search in a JChem Base table
JChem Query Guide
JChem Base demo protocol
System information
Protocol for checking configuration
Displays the most important environment information in a text editor
Release history - major changes
• Version 1.6, August 2009– New component: "ChemAxon 3D Conformers"
• Version 1.5, May 2009– New components: "ChemAxon MolConverter", "ChemAxon
Tautomerization", "ChemAxon Markush Enumeration"
• Version 1.4, November 2008– New components: "LibMCS Clustering", "Molecule to IUPAC
Name", "Molecule from IUPAC Name“– Major upgrade of "ChemAxon Reactor" component
• Version 1.3, July 2008– New component: “Chemical Terms Calculator”
• Version 1.2, March 2008– New components: “ChemAxon Reactor”, “Drop JChem Base
Table”, “Create JChem Base Table”– Several components upgraded
Planned development
Node release cycle is fast and flexible. Please advise us on priority and additional functionality for future node development.