Chemistry:SYBYL line notation
Filename extension |
|
---|---|
Type of format | chemical file format |
The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SLN differs from SMILES in several significant ways. SLN can specify molecules, molecular queries, and reactions in a single line notation whereas SMILES handles these through language extensions. SLN has support for relative stereochemistry, it can distinguish mixtures of enantiomers from pure molecules with pure but unresolved stereochemistry. In SMILES aromaticity is considered to be a property of both atoms and bonds whereas in SLN it is a property of bonds.
Description
Like SMILES, SLN is a linear language that describes molecules. This provides a lot of similarity with SMILES despite SLN's many differences from SMILES, and as a result this description will heavily compare SLN to SMILES and its extensions.
Attributes
Attributes, bracketed strings with additional data like [key1=value1, key2...]
, is a core feature of SLN. Attributes can be applied to atoms and bonds. Attributes not defined officially are available to users for private extensions.
When searching for molecules, comparison operators such as fcharge>-0.125
can be used in place of the usual equal sign. A !
preceding a key/value group inverts the result of the comparison.
Entire molecules or reactions can too have attributes. The square brackets are changed to a pair of <>
signs.
Atoms
Anything that starts with an uppercase letter identifies an atom in SLN. Hydrogens are not automatically added, but the single bonds with hydrogen can be abbreviated for organic compounds, resulting in CH4
instead of C(H)(H)(H)H
for methane. The author argues that explicit hydrogens allow for more robust parsing.
Attributes defined for atoms include I=
for isotope mass number, charge=
for formal charge, fcharge
for partial charge, s=
for stereochemistry, and spin=
for radicals (s
, d
, t
respectively for singlet, doublet, triplet). A formal charge of charge=2
can be abbreviated as +2
, and vice versa for negative charges; -
and +
is additionally recognized as −1 or +1 charges. *
is a shorthand for spin=d
. Stereochemistry on atoms is mostly tetrahedral, with the R
/S
and D
/L
available among others; it can be explicit (E
) or relative (R
), or specify a mixture (M
) of stereoisomers at this atom. A normal/inverted (N
/I
) notation, equivalent to @@
and @
in SMILES, is provided. A lot of additional attributes are provided for searching.
In addition to elemental atoms SLN supports the specification of wild card atoms: Any
(match any atom), and Hev
(match any heavy atom). It also has an extensive Markush syntax for specifying combinatorial libraries and RGROUP queries. SLN has several query atom types for matching groups of atoms. Each type has the group name, followed by an optional positive integer.
Group Description R
Used to match a side chain. Matched atoms must not have any connection to the core X
Used to match side chains and rings. Atoms matching an X
group can match side chains and ringsRx
Matches side chains and rings, a ring closure must match a second Rx
group
The "0
" mass number denotes the usual isotope, so N[I=0]
equals N[I=14]
matching 14N and N[!I=0]
matching every other isotope.
Bonds
SLN uses largely the same bonding notation as SMILES, with -
, =
, #
, and :
for single, double, triple, and aromatic bonds. .
is used for zero-order bonds, similarly to reaction SMILES, although a +
is preferred for distinct molecules.
Most single bonds are implicit, so CH3CH3
(CH3CH3) can be used instead of CH3-CH3
(CH3–CH3) for ethane. Explicit single bonds are useful for three-center bonds.
The s=
attribute is defined for double bonds, to convey stereochemistry information in E–Z (E
/Z
) or cis–trans (c
/t
) notation. N
/I
is available, and stands for the "main" chain being trans or cis to each other.
Rings
SLN writes rings in a more explicit pattern than SMILES, with benzene specified as C[1]H:CH:CH:CH:CH:CH:@1
. An atom is tagged as an anchor on the ring with a single numeric attribute, and @1
can then be used to specify this (in our case, "number one") atom for bonding back to.
Branching
SLN branches are identical to SMILES branches, with parentheses specifying them. Propionic acid is CH3CH2C(=O)OH
([math]\ce{ \scriptstyle CH3CH2C(=O)OH }[/math]).
Reactions
SLN supports reactions with ->
connecting the reactants and the products. Atom mapping is possible with the use of [#num]
attributes. The reaction center (rc) attribute can be added to bonds, and the chiral conversion (cc) attribute to atoms.
Misc.
Multiple lines can be merged into a syntactical line by writing a \
(backslash) at the end of each line. This allows for breaking a long line into multiple lines, for example in a reaction with each molecule on its own line.
See also
- Simplified molecular input line entry specification (SMILES notation)
- Smiles arbitrary target specification (SMARTS notation)
References
- Ash, Sheila; Cline, Malcolm A.; Homer, R. Webster; Hurst, Tad; Smith, Gregory B. (1997). "SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation". J. Chem. Inf. Comput. Sci. 37: 71–79. doi:10.1021/ci960109j.
- Homer, R. Webster; Swanson, Jon; Jilek, Robert J.; Hurst, Tad; Clark, Robert D. (2008). "SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries". J. Chem. Inf. Comput. Sci. 48 (12): 2294–2307. doi:10.1021/ci7004687.
Original source: https://en.wikipedia.org/wiki/SYBYL line notation.
Read more |