On Machine Learning aided drug design; Designing a Covid-19 drug from the perspective of a Data…

Original article was published on Becoming Human: Artificial Intelligence Magazine

On Machine Learning aided drug design

Designing a Covid-19 drug from the perspective of a Data Scientist/Aerospace engineer


Like any other design problem we’ll start by framing the Covid-19 drug design as an optimization problem.

To that end, our objectives are:

  • Maximize drug potency, how efficiently a drug disrupts the virus activity
  • Minimize the synthesis cost, how “easy/cheap” it is to manufacture the drug

and our design space is:

  • the entire chemical space (all possible molecules)
Jobs in Big Data

To be able to solve the aforementioned optimization problem we need two things:

a) a set of predictive models able to, given as input the representation of a molecule, predict our 2 objective functions. For the data scientists among us, this refers to the predictive analytics part of our solution.

b) an effective and efficient Optimization algorithms able to operate on discontinues (non-differentiable) design spaces (as is the chemical space) and able to solve multi-objective optimization problems. This is the prescriptive analytics part of our solution.

Predictive Analytics

As far as predicting the drug potency against Covid-19 is concerned, we’ll be using a Gated Graph Sequence Neural Network (originally introduced here https://arxiv.org/abs/1511.05493) trained on the data published here, predicting MM-GBSA based binding free energy from chemical structure.

Figure 1; GGNN convergence plot

Regarding the cost of synthesis we will be using the synthetic accessibility score as proposed in ( https://www.ncbi.nlm.nih.gov/pubmed/20298526 ) available in http://rdkit.org/ .

Prescriptive Analytics

For our optimizer, we selected Evolutionary Algorithms (EAs) due to their well known ability to solve multi-objective optimization problems and the fact that they can do so without requiring gradient information (hence they can handle non-differentiable design spaces).

Trending AI Articles:

1. AI for CFD: Intro (part 1)

2. Using Artificial Intelligence to detect COVID-19

3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code

4. Machine Learning System Design

EAs utilize the main principles of Darwinian evolution, evolving better and better molecules as generations proceed. They do so by selecting parents based on their environmental fitness (demonstrated good behavior regarding the selected objectives) and breeding new individuals via crossover and mutation (see figure 2).

Figure 2; Crossover and Mutation as applied on Chemical space


The result of every multi-objective optimization is in the form of a Pareto front denoting all the best (non dominated) compromises between the objectives. Below is the computed Pareto Front approximation of the Covid-19 drug design problem (after 40 generations of evolution). For comparison the potency of lopinavir (a drug currently undergoing clinical trial for Covid-19) is noted with the dotted line.

Figure 3; Pareto Front after 40 Generations


We’ve seen that reframing drug design from a simple (but expensive) screening exercise to a multi-objective optimization problem could be beneficial.

We’ve also seen that methods borrowed from Machine Learning and numerical optimization could be used when undergoing such a task.


The purpose of this post was to present a different way of thinking about the drug design problem and NOT to design a new compound.

The results of the optimization (Pareto front) heavily relay on the quality of the predictive models utilized. Models who’s accuracy I have no way of validating! In fact I would suspect that accurate predictive modeling would involve chemical/physics/quantum simulations which would be vastly more computationally demanding than the simple GGNN used here.

To handle that extra computational cost, a very interesting next step would be to borrow a method typically used in aerospace engineering, the notion of Distributed Hierarchical Optimization. More about that on a following post.

Don’t forget to give us your 👏 !

On Machine Learning aided drug design; Designing a Covid-19 drug from the perspective of a Data… was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.