Top Banner
SDF File analysis Creation, composition, checking
20
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SDF File analysis Creation, composition, checking.

SDF File analysis

Creation, composition, checking

Page 2: SDF File analysis Creation, composition, checking.

Concerning chemical table files

• Chemical table files are files that contain information about chemicals

• Various formatsRGfiles, Rxnfiles, RDfiles, XDfiles and ClipboardMolfile, SDF

Page 3: SDF File analysis Creation, composition, checking.

MDL Molfile

• A file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule

• Most cheminformatics and some computational softwares are able to read

• Standard version: V2000• Containing a header and a connection table

Page 4: SDF File analysis Creation, composition, checking.

MDL Molfile contentGenerated by Molgen 5.0

11 9 0 0 0 0 -0.0666 -1.5989 0.0514 C 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2913 -1.6184 -0.1221 C 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9621 -1.2620 -0.9586 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.0783 1.8974 -0.4702 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4844 1.6346 0.9333 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5244 -1.8601 1.0528 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7535 -1.3543 -1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.9833 -1.8974 0.7324 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.9833 -1.2177 -0.8648 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8090 1.5332 -0.8167 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.3677 1.1615 1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 6 1 0 0 0 0 2 7 1 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 4 5 1 0 0 0 0 4 10 1 0 0 0 0 5 11 1 0 0 0 0M END$$$$

1-3 Header1 Molecule name

2 User/Program/Date/etc information

3 Comment (blank)

4-25 Connection table (Ctab)

4Counts line: 11 atoms, 9 bonds, ..., V2000 standard

5-15Atom block (1 line for each atom): x, y, z, element, etc.

16-25Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc.

25 M END

26 $$$$ Delimiter character (only for SDF)

Page 5: SDF File analysis Creation, composition, checking.

MDL SDF file

• SDF = structure-data file• Wraps the molfile format

Page 6: SDF File analysis Creation, composition, checking.

SDF content §1 – molecular informations

./MinCheck/C2_H6_N0_O3_F0_S0_1.log OpenBabel04161413273DGaussian 09 # G3MP2B3 Opt(Cartesian,Tight,CalcAll,MaxStep=1,MaxCycles=300) QCISD 11 9 0 0 0 0 0 0 0 0999 V2000 0.4466 -1.5390 0.0292 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4790 -2.1676 -0.5273 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2693 -0.5704 -0.6322 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.3941 2.0659 0.3307 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.5836 1.3451 0.7668 O 0 0 0 0 0 0 0 0 0 0 0 0 0.1141 -1.7508 1.0446 H 0 0 0 0 0 0 0 0 0 0 0 0 1.7979 -1.9482 -1.5413 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0238 -2.9170 0.0345 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0239 -0.2837 -0.0806 H 0 0 0 0 0 0 0 0 0 0 0 0 0.0506 1.3459 -0.1697 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.2708 1.8377 0.2828 H 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 0 0 0 0 2 1 2 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 3 1 1 0 0 0 0 4 5 1 0 0 0 0 7 2 1 0 0 0 0 10 4 1 0 0 0 0 11 5 1 0 0 0 0M END

1-3 Header1 Filename

2 User/Program/Date/etc information

3 Command

4-25 Connection table (Ctab)

4Counts line: 11 atoms, 9 bonds, ..., V2000 standard

5-15Atom block (1 line for each atom): x, y, z, element, etc.

16-25Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc.

25 M END

Page 7: SDF File analysis Creation, composition, checking.

SDF content §2 – input and calculated parameters

> <Scale factor> 0.96

> <Stoichiometry> C2H6O3

> <Charge> 0

> <Multiplicity> 1

> <Molecular mass> 78.03169

> <DegreeOfFreedom> 27

> <Permanent dipole moment(B3LYP, Debye)> 1.475

> <ABC(cm-1)> 14.133 1.731 1.655

> <Scaled freq(cm-1)> 49.1 59.1 80.1 182.8 222.6 335.5 460.0 529.6 663.0 762.0 812.3 911.3 928.1 944.3 1124.8 1287.3 1299.6 1321.8 1403.2 1483.7 1689.2 3041.9 3064.2 3147.0 3408.9 3472.7 3557.0

> <IR intensities(rel.)> 4.5 3.8 6.6 7.8 25.1 93.3 16.9 79.8 60.8 214.2 73.0 2.9 55.0 16.5 33.8 210.3 56.9 126.8 4.4 22.8 90.0 19.2 0.4 8.3 59.4 559.4 26.8

> <Temp(K)> 298.150

> <Pressure(atm)> 1.00000> <DfHg_G3MP2B3(kJ/mol)> -269.7

> <Scaled S(J/molK)> 363.4

> <UNScaled CV(J/molK)> 98.9

Scale factor Stoichiometry Charge Multiplicity Molecular mass DegreeOfFreedom Permanent dipole momentABC(cm-1)

Scaled freq(cm-1) IR intensities(rel.) Temp(K) Pressure(atm)

DfHg_G3MP2B3(kJ/mol) Scaled S(J/molK)

UNScaled CV(J/molK)

Page 8: SDF File analysis Creation, composition, checking.

SDF content §3 – molecular descriptors> <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;

> <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C))-C(-H(-C)-H(-C)-C(-H-C-O))-O(-H(-O)-C(-H-C-O))-O(-H(-O)-O(-H-O))-O(-H(-O)-O(-H-O))-H(-C(-H-C-O))-H(-C(-H-H-C))-H(-C(-H-H-C))-H(-O(-H-C))-H(-O(-H-O))-H(-O(-H-O))

> <SMI> C(=C)O.OO

> <MolRT> 3

> <InChi> InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H

> <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N

> <MCDL> CH;CHH;3OH[2,3;;;5]

$$$$

MPD MNA SMI

MolRT InChi

InChiKey MCDL

Page 9: SDF File analysis Creation, composition, checking.

Molecular fragment schemes

• Developed in the ’50s• Screens (strutural keys, fingerprints) have been developed in

the ’70s• Generally they represent big strings can be stored effectively -

> compressed• Important role

in providing efficient substructure searching capabilities in large chemical databases,

in similarity searching, in clustering large data sets, in assessing chemical diversity, in conducting SAR and QSAR studies

Page 10: SDF File analysis Creation, composition, checking.

Images of the optimized structure(depicted differently)

GaussView ChemDraw

www.chemicalize.org (searched after InChI)

Page 11: SDF File analysis Creation, composition, checking.

MPD (MOLPRINT 2D)

• MPD = Molecular Populational Dynamics• A molecular similarity searching technique

based on atom environments• Atom environments are count vectors of

heavy atoms present at a topological distance from each heavy atom of a molecule

> <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;

Page 12: SDF File analysis Creation, composition, checking.

MNA

• MNA = Multilevel Neighbourhood of Atoms

• 2D molecular fragments suitable for use in QSAR modelling

• Output: a complete descriptor fingerprint per molecule• Fragment: starting at the origin, each atom is

appended to the descriptor immediately followed by a parenthesized list of its neighbours

> <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C))-C(-H(-C)-H(-C)-C(-H-C-O))-O(-H(-O)-C(-H-C-O))-O(-H(-O)-O(-H-O))-O(-H(-O)-O(-H-O))-H(-C(-H-C-O))-H(-C(-H-H-C))-H(-C(-H-H-C))-H(-O(-H-C))-H(-O(-H-O))-H(-O(-H-O))

Page 13: SDF File analysis Creation, composition, checking.

SMILES (SMI)

• SMILES = Simplified Molecular Input Line Entry Specification

• A linear text format which can describe the connectivity and chirality of a molecule

• Specifically represents a valence model of a molecule, not a computer data structure, a mathematical abstraction, or an "actual substance"

> <SMI> C(=C)O.OO

Page 14: SDF File analysis Creation, composition, checking.

MolRT

(easter egg, it’s molarity…)

Page 15: SDF File analysis Creation, composition, checking.

InChI

• InChI = International Chemical Identifier, • A reliable computerized method to represent identities• A representation of the chemical structure with details• Simple, but unique identifier for molecules (like a barcode)• Different layers separated with delimiters (/)

Main layer Charge layer Stereochemical layer Isotopic layer Fixed-H layer Reconnected layer> <InChi>

InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H

+

=

=

Page 16: SDF File analysis Creation, composition, checking.

InChiKey

• A shortened and more browser-preferable form of InChI code• Its lengths is fixed in 27 characters• The first 14 represent the molecular skeleton/connectivity

matrix• Next layer contains 8+1 characters • the first 8-character block encodes stereochemistry and

isotopic substitution information• +1 character defines the kind of InChIKey (S=standard, N=non-

standard)• Next character: used version of InChI• Finishing character: protonation indicator

> <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N

Page 17: SDF File analysis Creation, composition, checking.

MCDL

• MCDL = Molecular Chemical Descriptor Language; firstly published in 2001

• Developed for linear representation of structural and other chemical information for chemical databases

• Similar to InChI: both languages are modular, constitution, connectivity, and stereochemistry is represented by individual „modules”

• MCDL provides direct placement of hydrogen atoms, whereas InChI uses a separate block> <MCDL> CH;CHH;3OH[2,3;;;5]

Page 18: SDF File analysis Creation, composition, checking.

Other useful links and references• Todeschini, Roberto / Consonni, Viviana

Molecular Descriptors for Chemoinformatics, 2., revised and enlarged Edition, 2009.ISBN 978-3-527-31852-0 - Wiley-VCH, Weinheim

• Bender A, Mussa HY, Glen RC, Reiling S.: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance, J Chem Inf Comput Sci. 2004 Sep-Oct; 44(5):1708-18.

• Gakh AA, Burnett MN.: Modular Chemical Descriptor Language (MCDL): composition, connectivity, and supplementary modules, J Chem Inf Comput Sci. 2001 Nov-Dec; 41(6):1494-9.

• http://arxiv.org/ftp/arxiv/papers/1311/1311.3723.pdf• http://openbabel.org/wiki/Multilevel_Neighborhoods_of_Atoms• http://openbabel.org/wiki/SMILES• http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html• http://www.inchi-trust.org/ (and references therein)• http://www.iupac.org/home/publications/e-resources/inchi/download.html (and

references therein)• http://www.chemspider.com/inchi-resolver/

Page 19: SDF File analysis Creation, composition, checking.

Your objectives for today

• To check your .sdf file for two chosen isomers• To collect all the codes• To compare them with each other and find

differences

Page 20: SDF File analysis Creation, composition, checking.

Thank you for your attention!