Source: Deep Learning on Medium
1. Chemical Fingerprints
Chemical fingerprints  have long been the representation used to represent chemical structures as numbers, which are suitable inputs to machine learning models. In short, chemical fingerprints indicate the presence or absence of chemical features or substructures, as shown below:
A brief summary of chemical fingerprints is provided in another of my blog posts here.
Fingerprints can easily be computed in Python with RDkit  like so:
Above, we computed the fingerprint for Atorvastatin, a drug which generated over $100B in revenue over 2003–2013.
2. Graph Convolutions
At some point a few years ago, people started to realize  that instead of computing a non-differentiable fingerprint, we can compute a differentiable fingerprint. Then, by backpropagation, we can train not only a deep-learning model but also train the fingerprint-generating function itself. The promise would be to learn richer molecular representations.
The idea is to aggregate features of neighboring nodes together. That is what leads to the name ‘graph convolution’ as we are convolving (performing some sort of aggregation) over neighboring atoms, for each atom.
We can implement the Neural graph fingerprint algorithm as proposed above using PyTorch-Geometric . The following implementation allows for batched training (PyTorch-Geometric models a batch of molecules/graphs as one big disconnected graph).
2a. Atom Features and bond connections (edge indices)
We will use these atom features:
a) Atomic number (which determines atom type as well)
b) The number of hydrogens attached to the atom.
These are basic features but sufficient for our purposes.
Finally, we can define the model.
We can test our model to make sure it works:
3. Learning fingerprints through backpropagation
An interesting finding in  was that randomly initialized neural fingerprints did as well or better at modeling chemical features than conventional fingerprints.
If randomly initialized fingerprints do as well as conventional fingerprints, surely if we train the fingerprints through backpropagation , they ought to do better?
This is the hope and promise, but we must make sure we don’t overfit noise in small datasets. Let’s try our neural fingerprint on a real dataset. We’ll use the BACE  regression dataset from DeepChem .
We now build a small MLP (multi-layer perceptron) on top of our neural fingerprint. We give it just 1 hidden layer (of dimension 100):
Define our utility functions for training and validation:
And finally, our optimizer, and the training loop:
Note that when we train the model, we train both the neural fingerprint as well as the linear layers on top of it.
4. Looking Forward, and Current State-of-the-Art
The above dataset had ~1000 molecules and 1 target. Current datasets have 100k+ molecules and several 100 targets. Therefore, large-scale multitask supervised pretraining can be used to obtain very rich representations .
Furthermore, people are starting to use unsupervised graph pretraining techniques , following the path of success of unsupervised pretraining in NLP .
We saw that neural fingerprints can be used instead of conventional fingerprints. Randomly initialized neural fingerprints perform as well or better than conventional fingerprints. Trained neural fingerprints have the potential to form richer representations, given enough data and measures are taken to avoid overfitting.
6. Next Steps
Connect with me on LinkedIn and let me know if you end up using neural fingerprints.
 D. Rogers, M. Hahn. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50(5):742–754, 2010.
 G. Landrum. RDKit: Open-source cheminformatics. www.rdkit.org. [accessed 11-April-2013].
 D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gomez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292, 2015.
 M. Fey, J. E. Lenssen. Fast Graph Representation Learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428, 2019.
 D. Rumelhart, G. Hinton, R. Williams. Learning representations by back-propagating errors. Nature 323, 533–536 (1986) doi:10.1038/323533a0
 R. Vassar, D. M. Kovacs, Y. Riquang, P. C. Wong. The β-Secretase Enzyme BACE in Health and Alzheimer’s Disease: Regulation, Cell Biology, Function, and Therapeutic Potential. J Neurosci. 2009 Oct 14; 29(41): 12787–12794.
 DeepChem: Deep-learning models for Drug Discovery and Quantum Chemistry. https: //github.com/deepchem/deepchem, Accessed: 2017–09–27.
 B. Ramsundar et al. Massively Multitask Networks for Drug Discovery. arXiv preprint: 1502.02072, 2015
 W. Hu, B. Liu, J.Gomes, M. Zitnik, P. Liang, V. Pande, J. Leskovec. Strategies for Pre-training Graph Neural Networks. arXiv preprint: 1905.12265, 2019
 J. Devlin, M. Chang, K. Lee, K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint: 1810.04805, 2018