[Jonathan wrote in #22](https://forum.azimuthproject.org/discussion/comment/18045/#Comment_18045):

> There's also a very enjoyable puzzle of this type from Douglas Hofstadter's Gödel, Escher, Bach, called the MU puzzle. The puzzle asks: given a set of four rules (akin to our "reactions" here), is it possible to get the string MU starting from the string MI? This is just another kind of reachability problem.

I know that some computational chemists actually use term rewrite rules for expressing reactions.

One such language is called [Reaction SMARTS](http://www.rdkit.org/docs/RDKit_Book.html#reaction-smarts).

[Coley et al. (2017)](https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00064) wrote about extracting reaction SMARTs - they used 15 000 experimental reaction records from granted United States patents.

I am familiar with this and a couple other corpi, this one has the most noise (it is OCR data) :(

The problem is very hard, even with perfect data. You not only need to conserve matter, but also charge and account for [stereochemistry](https://en.wikipedia.org/wiki/Stereochemistry). In addition, it is important to account for reaction rates - I understand one technique is to use various motifs on either side of the reaction equation as features for machine learning Gibbs free energy.