SimpleChem

Summary: Simple markup for chemical formulas in PmWiki 2.x pages
Version: 20081108
Prerequisites: PmWiki 2.0
Status: Experimental
Maintainer: Schilkek
Categories: Markup

Questions answered by this recipe

Is there a simple way to write chemical formulas, such as (C6H5)2CHNH2, without having to include all those markup characters?

Description

The simplechem.zipΔ extension offers simple, inline markup that easily represents even complex chemical formulas with a minimum of markup characters. The recipe takes advantage of the structured format of chemical formulas, and inserts appropriate markup based on simple context rules. The result is surprisingly powerful, yet leaves chemical formulas easily readable by human chemists while editing.

Notes

The goal of these markup rules is to allow authors to concentrate on the discussion of their chemistry, not the intricacies of marking up the chemical formulas. The chemical formulas used in discussions of chemical reaction schemes are often complex, and it is extremely convenient to write, for instance, "the amino-terminal polymer (H2N-PEO-NH2)", and have the subscripts and other special characters be automatically generated.

Installation

  1. Download and extract simplechem.zipΔ to the PmWiki's root directory.
  2. Make sure the cookbook/simplechem.php file is readable by the Web server.
  3. In your local/config.php file, include the simplechem.php script:
  include_once('cookbook/simplechem.php');

Once installed, chemical formulas on all pages are automatically marked up according to the rules below.

Markup Rules and Syntax

Two types of rules are provided to support chemical formulas in text. A simple, transparent rule automatically subscripts numbers in text that appears to be a chemical formula. A more advanced rule is explicitly invoked by a leading ampersand (&) character, and offers much more powerful markup options.

Simple (transparent) chemical formula markup:

Any string that appears to be a common atom followed by a number is rewritten to subscript the number. Strings like CH3CH2CH2OH, polymer-NH2, and H2SO4 are automatically converted to chemical formulas.

Recognized atoms are C, H, O, N, P, S, Na, Si, Cl, and Br. These represent the most commonly encountered in biochemistry; extending this rule to other atoms is trivial.

This markup makes it natural to write about (for instance) H2N-C6H4-COOH (p-aminobenzoic acid), or simple reactions like this:

3NaOH + H3PO4 ↔ Na3PO4 + 3H2O

Some examples of transparent chemical formula markup are:

Source text Displays as
Na2PO4 Na2PO4
H2N-PEO-NH2 H2N-PEO-NH2
H3PO4 H3PO4
CH3CN CH3CN
CaCl2(s) CaCl2(s)

The transparent method can be combined with explicit markup to extend the repertoire of chemical formulas that can be created:

Source text Displays as
(C6H5)'_2_'NCH2CH3 (C6H5)2NCH2CH3
CH3CH((CH2)'_3_'NH2)NH3'^+^' CH3CH((CH2)3NH2)NH3+
The sulfate ion (SO4'^2–^') ... The sulfate ion (SO42–) ...
FeCl3·6H2O FeCl3·6H2O
Nitroxide radical (R'_2_'NO•) ... Nitroxide radical (R2NO•) ...

Clearly, writing the explicit markup quickly becomes cumbersome, and the resulting source is often hard to read and understand while editing.

More advanced chemistry markup:

For more complicated formulas, the author must signal the start of the formula text with a leading ampersand ('&') character. The formula continues until a character other than numbers, letters, or certain punctuation is encountered.

The text "&formula" is interpreted as follows:

  • Strings of numbers following letters, periods or parentheses in the chemical formula are subscripted.
  • The '.' character is replaced with an HTML middot (·) character, as in 'CuSO4 · 6H2O'.
  • The string '==' is converted to a triple bond (rendered as a mathematical "≡" character), as in 'CH3C≡N'.
  • Trailing '*' characters are converted to HTML bullet (•) characters (free radicals, e.g. 'HO•').
  • Trailing '+' or '-' characters become ionic charges (e.g. the text 'SO4-- ion' becomes 'SO42– ion').
  • Lowercase letters following the ')' character in polymer formulas, such as '(PEO)m(PPO)n', are automatically subscripted and italicized ('(PEO)m(PPO)n').
  • Chemical "R-groups" with numerical indices are superscripted instead of subscripted. For example, 'R1-NH2CH2-R2' is rendered as 'R1-NH2CH2-R2 '.

Despite the relative simplicity of these rules, this markup scheme allows a wide range of formulas to be rendered with very minimal markup. See below for more illustrative examples.

Note: No effort is made to ensure that the formula conforms to feasible chemical bonding, nor to IUPAC or SMILES naming conventions. It is up to the author to ensure that the formula makes sense from a physicochemical standpoint... no carbon atoms with 5 substituents, please!

Some examples will help demonstrate the power of this markup:

Source text Displays as
&Na3PO4 Na3PO4
&H2PO4- H2PO4
&(C6H5)2NCH2CH3 (C6H5)2NCH2CH3
&CH3CH((CH2)3NH2)NH3+ CH3CH((CH2)3NH2)NH3+
Pluronics [&(PEO)n(PPO)m(PEO)n] ... Pluronics [(PEO)n(PPO)m(PEO)n] ...
The sulfate ion (&SO4--) ... The sulfate ion (SO42–)
Iron (III) chloride, &FeCl3.6H2O, ... Iron (III) chloride, FeCl3·6H2O, ...
Nitroxide radicals (&RNO*) ... Nitroxide radicals (RNO•) ...
... ionic &Cu++, &SO4-- or &PO4--- ... ... ionic Cu2+, SO42– or PO43– ...
The &Na+&Cl- crystal lattice... The Na+Cl lattice...
Laurate salts [&CH3(CH2)10C(=O)O-&Na+] ... Laurate salts [CH3(CH2)10C(=O)ONa+] ...
Blue-colored &CuSO4.xH2O was added... Blue-colored CuSO4·xH2O was added...
Dabsyl chloride [&(CH3)2N(C6H4)N=N(C6H4)SO2Cl]
reacts with primary amines (-NH2).
 Dabsyl chloride [(CH3)2N(C6H4)N=N(C6H4)SO2Cl] reacts with primary amines (-NH2).
NaN3 (sodium azide) has a toxicity
similar to cyanides (e.g. &NaC==N).
 NaN3 (sodium azide) has toxicity similar to cyanides (e.g. NaC≡N).
&-C-C==N bonds are common. The 
acetylides (&R1-C==C-R2) also exist.
 -C-C≡N bonds are common. The acetylides (R1-C≡C-R2) also exist.
H2O2 exposed to UV light can generate
a hydroxyl radical (&HO*). Organic peroxides
(&R1OOH) can create &R1O* radicals.
 H2O2 exposed to UV light can generate a hydroxyl radical (HO•). Organic peroxides (R1OOH) can create R1O• radicals.
The absorbance (&A490) in aqueous solution... The absorbance (A490) in aqueous solution...

Note that the '+/-' symbols representing a charge must terminate an expression. To get multiple charges (e.g. an ionic pair), simply merge two formula statements:

Source text Displays as
Zwitterionic amino acids (&H3N+-&CHR1-COO-) ... Zwitterionic amino acids (H3N+-CHR1-COO) ...
&Ca++&EDTA-- complexes are used for ...  Ca2+EDTA2– complexes are used for ...

Implementation considerations and limitations

Note that the markup needs to be done after links are resolved, to avoid incorrectly adding HTML markup to link names like:

  • This [[CHEMSKETCH2 | link to CHEMSKETCH2]] doesn't work because the '2' in 'H2' is subscripted.

The simple markup rule is fairly specific, but occasionally it will incorrectly mark up a word. To prevent this, simply wrap the offending text in [= ... =] to prevent markup.

For example, the string 'MP3' appears to be a simple chemical formula (because of the P followed by a digit). To prevent the subscript on the '3', just enclose the string in '[= ... =]' markers:

Source text Displays as
I listen to my MP3 player all the time. I listen to my MP3 player all the time.
I listen to my [=MP3=] player all the time. I listen to my MP3 player all the time.

See also

Contributors

Initial implementation by Karl Schilke (Schilkek) for Oregon State University's School of Chemical, Biological and Environmental Engineering.

This script is heavily based on suggestions by Peter Bowers and John Rankin. The author is grateful for their help and code examples.

Comments

Hi Schilkek, thanks for this useful recipe. I found a conflict with a dozen colors while trying to set a background on a table, (try yourself with more wiki style colors). It is possible to overcome this problem, perhaps enabling the recipe with a markup like (:simplechem:) only on the pages where you use chemical formulas and not automatically on all pages of the site? Thank You. -- Frank (2013-4-9).

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.