Using Deep Learning to predict properties of Therapeutic Peptides (Part — 1: Introduction)

Original article was published by Moid Hassan on Artificial Intelligence on Medium

Using Deep Learning to predict properties of Therapeutic Peptides (Part — 1: Introduction)

(This is the first part of the blog aimed at briefly introducing the reader to the scope of deep learning in peptide drugs and their history in general. If you only aim at reading about our research, please proceed to Part 2)

I) Introduction to peptide drugs and their history

One of the biggest scientific achievements of the 20th century was the discovery of insulin in 1921. This simple protein which is less than 100 amino acids long truly revolutionized the therapy and prognosis of diabetes forever and eventually earned Banting and Macleod the Nobel Prize in Medicine in 1923.

The significance of this disease and the discovery must not be underestimated. Diabetes is one of the most studied diseases in the history of medicine whose first mentions trace back to a collection of Egyptian medical texts written around 1552 BC and to ancient Indian and Chinese textbooks. The Indian physician, Sushruta, and the surgeon Charaka, who lived around 400 AD, were even able to distinguish between a diabetes type 1 and a diabetes type 2, termed as “madhumeha” (literally, ‘honey urine’).

Before insulin was discovered, people with diabetes didn’t live for long; there wasn’t much that doctors could do for them. The most effective treatment was to put patients with diabetes on very strict diets with minimal carbohydrate intake. This could buy patients a few extra years but couldn’t save them.

But everything changed in January of 1922 when Leonard Thompson, a 14-year-old boy dying from diabetes in a Toronto hospital, became the first person to receive an injection of insulin. Within 24 hours, Leonard’s dangerously high blood glucose levels dropped to near-normal levels. As the news spread and the years passed by, Insulin became more and more popular and people with diabetes greatly benefitted from the drug and could live normal lives like other people. The story of Insulin demonstrates the huge potential peptides have as drugs.

Peptides approved and in active development by therapeutic area.

Since the discovery of Insulin, however, the market has remained fairly constant. Moreover, not much growth has been observed. Insulin still dominates the peptide drug market and remains the most successful and best-selling peptide drug in history. Other significant peptides, albeit few in number, have been discovered along the way and have found applications in various fields of medicine, especially in oncology and metabolic disease.

In recent times though, especially in the past decade, a resurgence in research and development of peptide drugs has been observed.

Why has this happened?

II) The renaissance of peptide drug discovery

Since the turn of the century, The synchronous and spectacular success of recombinant biologics has driven reexamination of the peptide field for new opportunities due to their shared biological characteristics and scientific advances relevant to both areas. The advancing computing power and emergence of large scale publicly available databases in biology has also driven researchers and pharmaceutical companies to focus their efforts and resources towards peptides once again.

Analysis of the market showed that the global peptide therapeutics market was valued at $25.35 billion in 2018 and is expected to reach $50.60 billion by the year 2026 with an average growth rate of 7.7%. The other major factors that are also driving the growth of this market are the rising prevalence of metabolic disorders and the increasing pool of cancer patients. The technological advancements, resulting in a significant reduction in the production cost of peptide drugs, is also boosting this market remarkably.

Let us now have a look at what actually peptide drugs are and why they are so unique.

III) Peptide drugs: Characteristics

Peptide drugs are characterized by their high specificity to their target sites, high biological activity, and low toxicity. While they may not be the ideal drug candidates due to their poor pharmacokinetic properties (such as low bioavailability, low cell permeation, their in-vivo instability, and susceptibility to proteolytic degradation), their properties when used as drugs are worthy of looking into.

With their unique characteristics, peptides are a little difficult to work with as drug candidates but have huge potential if workarounds are found for the properties that limit their intended functionality. In the past decade, considerable efforts have been made by scientists to help further the development of peptide drugs. Numerous techniques to make them favorable as drug candidates have been developed to enable researchers to quickly look through millions of structures and find promising candidates.

One such technique that helps researchers find a peptide suitable to their needs is the development of predictive tools for in-vivo properties of peptides. The four most common and useful properties that researchers need to filter peptide candidates are the absorption, distribution, metabolism, and excretion properties frequently termed as “ADME” properties.

With this in mind, let us move to the problem at hand.

IV) The Problem

Powerful in-silico tools that can predict these properties accurately and efficiently for small molecules are already in use. But the same can not be said about peptides. The development of these tools requires large amounts of experimentally validated data. Data for small molecules is abundant since their development as drugs has been a steady and continuous process. But for peptides, their only recent emergence as drug candidates means that the data available for them is scarce.

Despite all these limitations, there have been multiple tools and web servers that were developed for researchers working in this area. But even the best and the most efficient tools are not at the level at which their counterparts for small molecules are. Most tools available for peptides require extensive amounts of evolutionary information as inputs and are based on simple mathematical models or machine learning techniques such as Support Vector Machines (SVM’s), Random Forests, K-means Clustering, etc.

While these models can effectively predict some properties and give insight into the behavior of peptides in the human body, the fundamental fact that these are only just simple “tools” remains. For real-life applications, to know the properties of a drug candidate, researchers would have to synthesize the drug first, calculate its basic properties and then use those to predict the advanced properties such as ADME properties. But for quick and efficient filtering of potential drug candidates, tools of a much higher caliber, which can give researchers information about peptides based on only their sequences, are required.

V) Available solutions

The recent trends and development of Big Data Analytics, Artificial Intelligence, Deep Learning, and machine learning techniques like Artificial Neural Networks have provided an alternative/upgrade to the existing techniques.

These techniques do require humungous amounts of data, but with developing interest in peptide drug discovery, more and more efforts are being made to collect and generate data for them and much more is expected to happen. In the past three to four years, tools for predicting peptide properties using artificial and deep neural networks have emerged and shown promise of delivering the edge needed to push the research in the field of peptides and to enable researchers to achieve better results.

The characteristic feature of these tools is the use of embeddings for protein sequences. The amino acid sequence is converted into some sort of embedded object which is then fed into a predictive model. The idea behind generating embeddings is to capture the meaning behind the proximity of certain amino acids next to each other which, we know, has a huge role in the function of a protein. So, when working with only the sequences of proteins and nothing else, we need to make the most of it. Luckily, the sequence of amino acids provides a lot of information about the structure, the properties, and the functions of a protein. This is the exact information that embeddings seek to capture. In the past three to four years, Deep Learning to predict properties using those embeddings have been developed and have shown to perform at least up to the level of pre-existing state-of-the-art computational models which use evolutionary information.

One-hot encoded sequences of DNA being fed into a model.

Various kinds of models have been made publicly available. These models are trained on tens of millions of samples from publicly available databases and have been shown to have captured some meaning from the sequence of amino acids. More and more models and tools are expected to arrive in the future, helping researchers working in drug discovery in selecting new peptides and cutting down on the time required to screen them on the basis of their properties.

In the next part of this blog, we will have a look at our work in this area and how it helps solve a particular issue in the peptide drug discovery process.