Artificial Intelligence in Drug Discovery?

Source: Deep Learning on Medium

Have you ever wondered how new drugs are discovered? We always hear about people discussing overpriced drugs and how the pharmaceutical companies are ripping people off. Why is it so and with all the buzz about A.I changing the world, Can it be used to revolutionize this industry as well?

Of course, it can! If you ever wondered about all these questions and were looking for answers, Welcome to the series of blog posts on A.I for drug discovery! Like you, I was fascinated by this niche of application of AI as well and decided to dig deeper, but I was surprised by how distributed the information was, so here we are, I have researched all the what, why and how… and combined it into this one-stop knowledge center for all you folks out there who want to get started on applying Deep learning to Drug discovery, or even just people who want to read something interesting on a Sunday morning while sipping their Coffee. (wink wink)

But, before we start talking about how A.I can help with Drug discovery, It is essential for us to understand the design process of discovering a new drug. So, what is the process exactly?

Drug Discovery Design Cycle:

An important thing we need to know about designing a new drug is that it is a mind-blowingly expensive process. Finding a promising molecule is expensive, and running all tests imposed by health authorities like the FDA even more so. So most research is actually into improving existing drugs, mixing them and so on — it’s faster, cheaper, and the authorities require a lot less testing. Even when scientists actually work on new drugs, they tend to work on ones similar to existing drugs — either similar molecules or different molecules working in a similar way.

The first part of designing a drug is identifying the target. WAIT!, what target? Well to understand that, Lets briefly talk about how most diseases work… (In the crudest way possible)

Two things to remember:

  1. Diseases can be majorly classified as Bacterial, Viral and Cancerous.
  2. Many proteins in our body are enzymes, As if we remember anything from school biology class, Enzymes basically change a chemical reaction’s nature without being irreversibly changed themselves.

Disease-causing bacteria and viruses also have enzymes which when entering the human body, interrupt its natural chain of chemical reactions to cause side effects ranging from slight discomfort to death in serious cases.

The efficient way to treat this is by identifying these bad proteins(Enzymes) and block them from interfering with chemical reactions. These “bad proteins” are essentially called “TARGETS”. How do we find these targets?

That’s what the basic medical research is for! By studying how diseases work, we are later able to identify their vulnerable aspects.

The selected target must then be verified. For example, If you think HIV needs reverse transcriptase (a type of enzyme) to be infectious, genetically engineer HIV without that enzyme and check to see if it still is infectious.

Some diseases have many targets to choose from. Bacterial infections are particularly easy. Because bacterial cells are very different from humans, it is extremely simple to find vital enzymes in them that don’t have counterparts in humans and block them. That’s why antibiotics were so successful…

Viruses are far more difficult, because they simply reuse the host’s cellular machinery, and have only a few enzymes of their own. Even more difficult are cancer cells, which are genetically almost identical to normal cells.

How does a drug help?

Well, once we have successfully identified a few of these “targets” and verified them. There are majorly two ways of deactivating these “target” enzymes.

Irreversible inhibition
  1. Irreversible inhibition: This is when a “drug molecule” is sent to chemically destroy an enzyme so as to stop its activity. (like acetylsalicylic acid or aspirin).
  2. Reversible inhibition: This is when a “drug molecule” bonds very tightly to “target” enzymes, most often changing their shape which intern changes their nature, essentially deactivating them. ( like ibuprofen)
Reversible inhibition

In reality, The second method is more preferred since it doesn’t produce too many side effects and most generally, equally effective.

So Reversible inhibition! problem solved! right?… Wrong!

Proteins, in general, are huge complex structures. Most often, there are certain spots within this structure called “Active spots” which are (you guessed it!) most active with chemical reactions than the rest of the places. in order to effectively bond and deactivate these huge structures, we need to pick drug molecules that have a high affinity to bond to these “active spots”.

To add to our woes, Proteins can exist in multiple different forms! And it’s hard to find which is the biologically active form. (This is one of the hottest areas of research!)

So in summary, After the “target” proteins have been identified, The Structure of the protein has to be accurately understood. Then the “active areas” identified, only after which we can move on to select suitable “drug molecules” to target these areas and save our day.

Phew! Now that, that’s done. Let’s move on…

Now that we know where we need to send our tiny “drug” soldiers, the only thing left to do is choose the right kind of soldiers, I mean “molecules”. The catch?

There are theoretically, millions and billions of types molecules! How on earth are we going to select the ones we want?

Luckily, not all of them have the same binding affinity towards the target. Like how in life, we attract some people while completely repelling some others :P, Similarly, some of the molecules have a very strong binding affinity to the target compared to the others.

Therefore, First, the fastest and crudest methods are used to rule out the obviously bad matches. Then increasingly more accurate and increasingly slower methods are applied until we get a reasonable number of molecules (in the range of a few hundred) which has the strongest binding affinity to the target. These molecules are called “HITS”.

This process is called High-throughput screening, This is also one of the first stages in the drug design process where we can use AI to reduce screening time and obtain higher quality Hits. We will be looking into this in much more detail in the next blog.

Hit to Lead:

What the hell is a lead? I know, I know. Bear with me.

We have now narrowed down our potential set of drug candidates to a few hundreds of “hits”. That’s still quite a lot. We need to further narrow it down. Which leads us to ask the question.

What does a good drug look like?

A good drug should definitely be able to treat some disease, but that’s only part of the story. It must be cheap to manufacture, reasonably stable for storage, and fit many other criteria, but the main issues are:

  1. Absorption: Drugs that can be administered orally are strongly preferred, other methods like an injection. In the case of oral drugs, it is extremely important for them to be well absorbed from the digestive tract. Drugs must also be able to pass through the cellular membrane from blood to cells, and in case of drugs affecting the central nervous system, to pass the blood-brain barrier.
  2. Distribution: Drugs are commonly needed in some parts of the body. They also tend to be distributed unequally in various organs and tissues. It is important for a significant portion of the drug to reach the intended site. If the drug isn’t distributed well, it lowers efficiency and increases side effects. It is probably most crucial in the case of cancer, as anti-cancer drugs tend to have severe side effects.
  3. Metabolism: the body doesn’t let foreign substances to move around freely — it uses a wide range of methods to break them down. If drugs are metabolized too easily, efficiency will be low. It would be even worse if products of such metabolism were harmful. Sometimes we actually want the drug to be metabolized, as the product is active, not the original drug.
  4. Excretion: drugs would be very dangerous if they could freely accumulate in the body, and keep affecting it long after administration of the drug ceased. Therefore they need to be able to easily exit the system.
  5. Toxicity: drugs do have side effects, and it is not going to change. More drugs than not may cause nausea, dizziness, headaches, and an occasional allergic reaction, and many important drugs are significantly worse than that. If possible, side effects in new drugs should be less severe.

So coming back to answer “what the hell is a lead?”

List of all the things taken into account, A very diverse set of tests is applied, but basically we want to develop drugs that are “drug-like”, or similar to successful drugs (using rules like Lipinski’s Rule of Five). But we don’t want “drug-like” leads. What we’re looking for are “lead-like” leads, or similar to successful leads. Turning a lead into a drug candidate usually makes it bigger, more complex, and more hydrophobic, so we’re interested in leads that are smaller, simpler, and less hydrophobic than good drugs.

This usually brings the potential drug candidates to single digits (say 5–9 on average), these “Leads” then move on to the next round of the design process.

Lead Optimization:

By now we have a few promising molecules. It’s still not the time for human testing. First, we want to optimize the leads. For each lead, small modifications are made to enhance its features to have more “ideal drug” like features, synthesized and tested. (Think of it this way, the leads we have are the soldiers we chose to infiltrate the enemy organization, the optimization is like giving them the right type of weapons and gadgets and observing which one works best). The most successful ones become drug candidates. Usually, modification is the addition of some chemical group or replacement of one group by another, so the drug candidates tend to be bigger and more complex than leads.

A potential field of research in AI is also to use generative networks to help researchers generate desirable “drug candidate” like molecules to more likely attain drug candidates with the desired attributes.

At this stage, it is also important to develop cheap and efficient ways of synthesizing these drugs since they will b required in large quantities for the next process of lab testing.

First Level of Testing: (Animal testing):

Image credit:

After many experiments with computers, test tubes, and cell cultures, we hopefully have a few promising drug candidates. However, no regulatory authority is going to let us proceed directly to human testing. The safety and efficiency of drugs must be tested on animals first. This is a very annoying part because it’s very expensive, and the results are only weakly correlated to results on humans. The most common test animals are mice (about 80%), rats (about 20%), and all others including other rodents, primates, rabbits, dogs, etc. together make up less than 2%.

Rodents are reasonably cheap but very different from humans, so sometimes rodents with some human genes are used. Other animals are even more expensive, so they’re used mostly when the rodents won’t do. For example, there is no way to infect mice with HIV, so primates need to be used to test HIV drugs.

We eventually hope to reach a level of automation and intelligence where animal testing and even human testing will no longer be required. But at the moment, animal and human testing are irreplaceable and absolutely mandatory. In the drug development industry, Failing early (during the animal testing) is much cheaper than failing in later stages (human trials). therefore, so many drugs candidates which don’t prove to be extremely effective are rejected in this phase. Very few drug candidates pass on to the next stage which is human lab-trials.

Human trials:

This is probably the single most important stage of drug design because real humans are involved. Therefore it is extremely important for the chosen drug candidates to be highly refined. Mistakes here are going to be very very expensive…

Human trials are usually done in three phases:

Phase 1:

Before we start to test the effects of the drugs on diseases in humans, The drug needs to pass regulatory testing called safety testing. Safety testing verifies that the drug has no unexpected adverse effects on a small group (like 30, exact numbers vary a lot depending on the condition so don’t care much about them) of healthy individuals.

Most drugs are expected to have some side effects, but they should all be documented. If an unexpected side effect is found, even a relatively insignificant one, the regulator is likely to require further testing at some earlier stage before proceeding any further.

Phase 2:

Hopefully the few drug candidates we have shown no unexpected behavior in Phase one, It is time for a clinical trial on actual patients (say around 200 people) in highly controlled conditions.

This point, very late in drug development, is the first time where efficiency is evaluated under realistic conditions, and unfortunately, many drugs fail here, and such late failures are very expensive. The tested drug is supposed to be more efficient than all existing drugs, have less severe side effects, be more widely applicable, and so on.

Phase 3:

If Phase II went well, the authorities may approve proceeding to Phase III clinical trials — that is wider randomized testing of the drug, on hundreds or even thousands of patients. At this point, we have preliminary evidence that the drug is safe and efficient, and the wider trials will provide information on interactions with other drugs or conditions, less common side effects, and give a final confirmation that the drug is indeed safer and more efficient.

After Phase III is completed, the company which developed the drug applies for registration. It would be extremely costly and painful to fail here, fortunately, it doesn’t happen that often.


After years for research and multiple levels of testing, We would have a fresh FDA approved the drug on the market. While this whole process takes several years, when a drug passes 2nd phase of human trials scientists already start to work on similar more effective drugs. Interestingly, this is another crucial part of the design cycle where A.I can dramatically help.

Let’s find out in detail in the next blog.


Image credit:

There we have it. The whole design cycle of a drug. This is obviously an overly simplified version of the real process which has a lot more detail to it. But this gives us enough domain knowledge to build a foundation and move on to learning how we can us Deep learning and A.I to solve issues in this process and improve it. This blog is already getting too big, long and boring, so we will stop here.

The next blog post will be mainly focusing on how artificial intelligence is currently being applied to greatly improve this long and expensive process of Drug design.

If you are still here reading this, You have my respect! Thank you for taking the time to visit the blog. If you enjoyed reading it, Please leave a comment and a like. See you in the next blog!