SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG)
Nov 17, 2015
SMILESSimplified Molecular Input Line Entry System (SMILES)Widely used AND computationally efficientUses atomic symbols and a set of intuitive rulesUses hydrogen-suppressed molecular graphs (HSMG)
SMILES BondsSINGLE*
DOUBLE
TRIPLE
AROMATIC** can be omitted-
=
#
:
Butanols2-Butanol
iso-Butanol
tert-Butanol
SMILES BranchesRepresented by enclosure in parenthesesCan be nested or stackedExamples:
CC(O)CC is 2-ButanolOCC(C)C is iso-ButanolOC(C)(C)C is tert-Butanol
SMILES BondsEtheneChloroethene1,1-Dichloroethenecis-1,2-DichloroetheneTrichloroethenePerchloroetheneC=CClC=CClC(Cl)=CClC=CClClC(Cl)=CClClC(Cl)=C(Cl)Cl
SMILES AtomsUse normal chemical symbolsAdd punctuation symbols if necessaryNo super- or subscripts
SMILES SymbolsString of alphanumeric characters and certain punctuation symbolsTerminates at the first space encountered when read left to rightThe ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I
Other SMILES AtomsAliphatic or nonaromatic carbon: CAtom in aromatic ring: lowercase letterDesignate ring closure with pairs of matching digits, e.g.
c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereasC1CCCCC1 is Cyclohexane
SMILES ChargesSpecify attached hydrogens and charges in square bracketsNumber of attached hydrogens is the symbol H followed by optional digit
SMILES Charges[H+][OH-][OH3+][Fe++][NH4+]protonhydroxyl anionhydronium cationiron(II) cationammonium cation
SMILES Cyclic StructuresBreak one single or one aromatic bond in each ringNumber in any orderDesignate ring-breaking atoms by the same digit following the atomic symbol
Cyclic StructuresNumbers indicate start and stop of ringSame number indicates start and end of the ring, entered immediately following the start/end atomsOnly numbers 1 9 are usedA number should appear only twiceAtom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2
Naphthalene
c12ccccc1cccc2
SMILES ConventionsAvoid two consecutive left parentheses if possibleStrive for the fewest number of possible branchesTautomeric bonds are not designated; enter the appropriate form
Further RestrictionsA branch cannot begin a SMILES notationA branch cannot immediately follow a double- or triple-bond symbolExample: C=(CC)C is invalid, butC(=CC)C or C(CC)=C are valid SMILES
SMILES FragmentsNitroNitrateNitriteSulfonic acidCyanide/NitrileAzideAzido
N(=O)(=O)ON(=O)(=O)ON(=O)S(=O)(=O)OC#NN=N#NN+=N-
SMILES Metals[Al] [As] [Au] [Be][Bi] [Cd] [Ca] [Fe][Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb][Sn] [Zn] [Zr]
Disconnected StructuresIndicated by a dotTetramethyl ammonium bromide
C[N+]C(C)C.[Br-]
Isomeric and Chiral SMILESIsomeric configuration indicated by forward and backward slashes: / \Examples:trans-1,2-dibromoethene: Br/C=C/BrDirection of the slash continuescis-1,2-dibromoethene: Br/C=C\BrDirection of the slash reversesChirality indicated by the @ symbol
Some ApplicationsJMDraw/SMILESViewer (Christoph Steinbeck)JME Molecular Editor (Peter Ertl)STN Express (SMILES as output)Tripos (dbtranslate: SMILES to MOL)Marvin (Ferenc Csizmadia)
http://chemaxon.com/marvin/CACTVS http://www2.ccc.uni-erlangen.de/cactvs/
Another ApplicationSMILESCAS Database
http://www.syrres.com/esc/smilecas.htmOver 103,000 SMILES notationsInput CAS Registry NumberLeads to SMILES and thence to a structure search