Sustainable Futures / P2 Framework Manual 2012 EPA-748-B12-001
Appendix F. SMILES Notation Tutorial
F-1
Appendix F. SMILES Notation Tutorial
This is a summary level introduction to SMILES but additional help is available at several online sources
including http://www.epa.gov/ncct/dsstox/MoreonSMILES.html#Tutorials
.
What is SMILES?
SMILES is the āSimplified Molecular Input Line Entry System,ā which is used to translate a chemicalās
three-dimensional structure into a string of symbols that is easily understood by computer software.
SMILES notation are used to enter chemical structure into EPI Suiteā¢ estimation programs and
ECOSAR. Additional examples of SMILES notations are available in the HELP files of EPI Suiteā¢ and
ECOSAR. Software programs are available which can translate a chemical structure into SMILES.
References:
Weininger, D. 1988. SMILES, a Chemical and Information System. 1.
Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput.
Sci. 28(1): 31-6.
Wiswesser, W.J. 1954. A Line-Formula Chemical Notation. New York:
Cromwell.
The purpose of SMILES is to translate the structure to the right, which is
Morphine CAS RN 57-27-2, into a linear representation of the molecule
so that a computer program can understand the structure.
Here is one SMILES Notation for CAS RN 57-27-2
Oc1ccc2CC(N3C)C4C=CC(O)C5Oc1c2C45CC3
Representing Atoms
Atomic symbols and their corresponding SMILES notations:
C methane (CH4) N ammonia (NH3)
O water (H2O) P phosphine (PH3)
S hydrogen sulfide (H2S) Cl hydrogen chloride (HCl)
Normally hydrogen is not shown.
Elements must be shown in brackets: [Au] elemental gold
Representing Bonds
Single, double, triple, and aromatic bonds are represented by the following symbols:
single ā triple # double = aromatic :
Normally single bonds and aromatic bonds do not need to be written in the SMILES notation.
Examples showing bonds are:
CC ethane (CH3CH3) C=C ethylene (CH2=CH2)
COC dimethyl ether (CH3OCH3) CCO ethanol (CH3CH2OH)
C=O formaldehyde (CH2O) O=C=O carbon dioxide (CO2)
O=CO formic acid (HCOOH) C#N hydrogen cyanide (HCN)
[H][H] molecular hydrogen (H2)
Bonds in Linear Structures
For linear structures, SMILES notation corresponds to conventional
diagrammatic notation except that hydrogen can be omitted. Here are
two correct ways to represent Acetone CAS RN 67-64-1, shown here.
The numbered asterisks indicate where on the molecule each SMILES
string begins. The valid SMILES are: 1. CC(=O)C and 2. O=C(C)C
Morphine
57-27-2
O
N
OH
OH
H
H
H
Acetone
67-64-1
CCC
O
*
2
1
*
Acetone
67-64-1
CCC
O
*
2
1
*
CCC
O
*
2
1
*