1 Péter Kovács May, 2005 Compound storage / retrieval with JChem Cartridge for Oracle
Mar 26, 2015
1
Péter Kovács
May, 2005
Compound storage / retrieval with JChem Cartridge for Oracle
2
Slide 2
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Contents
Purpose of JChem Cartridge
Constituents of the JChem Cartridge API
Normal Tables vs. JChem Tables
Architecture of JChem Cartridge
3
Slide 3
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Purpose of JChem Cartridge
• Access JChem functionality using SQL:SELECT count(*) FROM nci WHERE jc_contains(structure, 'Brc1cnc2ccccc12') = 1
Access JChem in any programming environment offering Oracle connectivity (Visual Basic, Java, Perl, PHP, Python, Apache mod_plsql...).
• Execute SQL queries efficiently using extensible indexes
Precompute chemical information on structures by creating jc_idxtype indexes:
CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype
The jc_idxtype implementation scans the indexed column for eligible structures in one single uninterrupted operation: domain index scan
4
Slide 4
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Elements of the JChem Cartridge API
• Operators (jc_...) and their functional forms (jcf_...)
• Index parameters and default properties
• DML operators for JChem tables
• Support functions for user defined operators
5
Slide 5
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Operators and their functional forms I.
Typical operator:jc_<some-operation>(<target-structure-column>, <some-operand>)
Operator for substructure search:jc_contains(<target-structure-column>, <query-structure>)
“Swiss-army-knife” search operator:jc_compare(<target-structure-column>, <query-structure>, <options>)
6
Slide 6
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Operators and their functional forms II.
Chemical TermsThe Lipinski-rule in chemical terms:
SELECT count(*) FROM nci_3m WHERE jc_compare(structure, 'O=C1ONC(N1c2ccccc2)-c3ccccc3','sep=! t:s!ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1
Presently, about 100 functions including topological and physiochemical descriptors.
Users can define their own functions.
7
Slide 7
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Operators and their functional forms III.
Chemical Terms and Query Filter:SELECT id, purchase_date FROM compounds_instock WHERE jc_compare(structure, 'C(=S)([N][N])[S]', 'sep=! t:t!simThreshold:0.9!ctFilter:logp()>1!filterQuery:select rowid from compounds_instock where purchase_date > DATE ''2002-01-01''') = 1
Filter queries allow to execute search on a subset of a table's rows and execute the performance sensitive chemical computations in domain index scan mode.
Dynamic generation of static images:SELECT jc_molconvertb(structure, 'png -2') FROM nci where id = :1
Avaliable image formats: png, jpeg, svg, powray, ppm
PNG
8
Slide 8
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Index parameters
Index parameters affect:• Fingerprint attributes• Standardizer configuration• Table space and storage options of the index table
• Generate index jcxnci using structures in the table stfp_keys as structural keys:CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STRUCTURALFP_CONFIG=select structure from stfp_keys')
• Strip hydrogens and use Daylight-style aromatization during index creation:CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STD_CONFIG=dehydrogenize:optional..aromatize:d')
9
Slide 9
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Default properties
Used for SQL statements where no information from JChem indexes is available.
Sample SQL statement without index information:SELECT jc_contains('O=C1C=CNC=C1', 'n1ccccc1') FROM dual
Set default properties:CALL jc_set_default_property('standardizerConfig',
'aromatize:d')
10
Slide 10
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Supported Column Types
• VARCHAR2 columnsTargets and queries are VARCHAR2.Operator names: jc_contains, jc_compare, jc_equals...
• BLOB columnsTargets and queries are BLOB.Operator names: jc_containsb, jc_compareb, jc_equalsb...
Exceptions:jc_molconvertb:takes VARCHAR2, returns BLOB
jc_molconvertbb:takes BLOB, returns BLOB
11
Slide 11
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Supported Table Types
• Regular Table: nci_1k
• JChem Table (generated by jcman or API): jc_nci_1k
CREATE INDEX jcxnci_1k...
Index table:jcxnci_1k_jcx
CREATE INDEX jcxjc_nci_1k...
Rowid of the base table (nci_1k)
12
Slide 12
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Regular Tables vs. JChem Tables
• Regular tables– base table and index table are physically distinct– index properties are specified as index parameters
• JChem tables– base table and index table are physically the same– most of the “index” properties are specified during table creation (jcman or
Java API)
• Pros & Cons:– inserts from outside the database are faster with JChem tables than with
regular tables– JChem tables require Java API or the jcman command line tool (for table
creation) and Java API or special cartridge functions for INSERTs, UPDATEs and DELETEs; standard SQL can be used with regular tables in all cases.
13
Slide 13
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Indexes with JChem Tables
• VARCHAR2Index creation:
CREATE INDEX jcxjc_nci ON jc_nci(cd_smiles) INDEXTYPE IS jc_idxtype
Search:SELECT jc_contains(cd_smiles, 'n1ccccc1') FROM jc_nci
• BLOBIndex creation:
CREATE INDEX jcxjc_nci ON jc_nci(cd_structure) INDEXTYPE IS jc_idxtype
Search:SELECT jc_contains(cd_structure, 'n1ccccc1') FROM jc_nci
14
Slide 14
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Cartridge Architecture I.
Computation intensive operations performed in a separate JVM (currently Tomcat)
Advantages:• fast execution (optimized native code)• starting point for distributed architecture
JChem Server
Search
Oracle
JChem Cartridge Cache
JChem Core
Cache
JChem Streams
JChem Base
Update
HTTP
JDBC
15
Slide 15
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Cartridge Architecture II.
Type Of Database Access
• Single Session: One SQL statement accesses the database through one single database session
– Used for inserting and updating to maintain strict transaction semantics– Computation is still done in Tomcat (for performance)
• Dual Session: One SQL statement accesses the database through multiple database sessions
– Used for searching (for performance and code-reuse reasons)– Only committed changes are seen– “Ex Machina” mechanism to maintain user identity across sessions acting
on behalf of the same operation
16
Slide 16
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Cartridge Architecture III.
Single Session Database Access
JChem Server
Search Engine
Oracle
JChem Cartridge
SQL Plus/any DB application
Index Table
jc_insertCache
Cache
Index Table
. . .
JChem Streams
JChem
Stream
s Adapter
JChem Core
Execution Engine
2.
3.
4.
5.
1.
6.
7.
8.
9.
10.
11.
17
Slide 17
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Cartridge Architecture IV.
Dual Session Database Access
JChem Server
Search Engine
Oracle
JChem Cartridge
SQL Plus/any DB application
Index Table
Cache
jc_contains
Cache
Index Table
. . .
JChem Streams
JChem
Stream
s Adapter
JChem Core
7.
2.3. 4.
5.
6.
Execution Engine
1.
8.
10.
11.
12.
9.
18
Slide 18
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
JChem Cartridge Architecture V.
Dual Session “Semantics”
• Transaction context differing across session
Changes must be committed to include them in searches
• Security context disrupted across sessions:
Two options:• Configure “super user” with many privileges• Use jchem_core_pkg.use_password( password VARCHAR2)for primary database sessions
19
Slide 19
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Performance
Table containing 3003012 structures (12xNCI) in VARCHAR2 colums with 3GHz dual Xeon with 2GB system memory
Index creation (regular tables): 5801 secondsImport w/o duplicate filtering (JChem-tables): 13104 seconds
Substructure search results:Query Structure Hit
CountTime (milliseconds)
JChem-tables Regular tables
C1CN1c2cnnc3c(cncc23)C4=CSC=C4 0 364 374
O=C1ONC(N1c2ccccc2)c3ccccc3 204 456 467
[#8]-c1c(N=N)c(cc2cc(ccc12)S([#8])(=O)=O)S([#8])(=O)=O 1188 1017 1042
C(Sc1ncnc2ncnc12)c3ccccc3 1752 980 1016
[#7]C1=CC=NC2=C1C=CC(Cl)=C2 4632 1987 2074
c1ncc2ncnc2n1 49848 15873 16446
Clc1ccccc1 274356 60459 63139
20
Slide 20
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Future plans
Support more features already included or planned in JChem Base:
• Pharmacophore similarity search
• Custom descriptor (e.g. BCUT, scalar) and metric at similarity search
• Coordination bond support
• Tautomeric search support
• Other S-groups
21
Slide 21
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Summary
JChem Cartridge for Oracle allows to access the rich functionality of JChem Base in a flexible and efficient manner.
JChem Cartridge for Oracle uses creative solutions to broaden the applicability of JChem's core functions while preserving key benefits of the Java platform.
22
Slide 22
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Links
• Documentation– www.jchem.com/doc/admin/cartridge.html– www.jchem.com/doc/guide/cartridge/index.html– www.jchem.com/doc/guide/cartridge/index.html
• Forum– www.chemaxon.hu/forum/forum7.html
• Brochure– www.chemaxon.com/brochures/
JChem_Cartridge.pdf
23
Slide 23
Compound storage / retrieval with JChem Cartridge for Oracle — May 2005
Máramaros köz 3/a Budapest, 1037Hungary
www.chemaxon.com
Thank you for your attention