How Data Science Can Help With Outbreaks Like Coronavirus

Original article can be found here (source): Artificial Intelligence on Medium

According to Wikipedia, Data Science is “an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data”. I have been working on this field for three years now, pursuing my PhD at the University of Piraeus, Greece. I was always fascinated by the prospects of applying knowledge extracted from data in the health sector and the applications made possible by the significant advances in hardware and algorithm design.

A few months now, we are facing an unprecedented situation with the coronavirus outbreak. As a result, the data science community tries to assume its role in this scene and contribute to the fight against the disease it causes, COVID-19. Besides, scientists from different disciplines, as well as public organizations, acknowledge the importance of data analytics by open sourcing the genome of the virus and other datasets in the hope of an AI-guided solution.

In the next sections, I would like to discuss what might those contributions be and express my personal opinion on the matter.


Visualizing a data set permits scientists to gain insights and go into a little more depth; how features correlate, what modelling assumptions to make etc. Moreover, an image is worth a thousand words and envisioning a problem is a great way to explain and embody complicated notions and rationalize intends.

Flattening the curve” is a trending phrase these days. Furthermore, exponential growth is a concept that our minds struggle to grasp. The best way to analyze and demonstrate both of them to the public is by harnessing the power of visualization.

How to “flatten the curve”

To this end, Harry Stevens compiled a fantastic article on “Why outbreaks like coronavirus spread exponentially, and how to flatten the curve”. In this Washington Post article, he showcases why exponential growth has experts worried and how the disease is spreading following different simulation patterns. Finally, he illustrates how measures like forced quarantine and social distancing help alleviate the problem.

This article is a must-read and a great example of the power of visualization in education and adopting the right behavioural patterns.


When talking about research and pandemics, we are mainly thinking about vaccines and medicines. Data science and machine learning could contribute to the discovery of novel drugs, but we are still away from realizing this potential.

Yet, some prospects to point out are DeepMind’s efforts to predict the structure of proteins and innovative ideas that contribute to a large-scale screening of people infected accurately and unobtrusively. Google’s DeepMind put the latest version of “AlphaFold” — a neural network that predicts protein structure — to the task and released structure predictions of several under-studied proteins associated with coronavirus. Although the results have not been experimentally verified, there is hope that they may serve as a basis for future experimental work in developing therapeutics.

On a different path, in a new paper from Yunlu Wang et al., the authors try to identify cases of tachypnea that could indicate an infection by observing the respiratory patterns of patients.

Demo video for the real-time abnormal respiratory patterns classification system

Using an on depth camera and deep learning, the researchers were able to classify six clinically significant respiratory patterns (Eupnea, Tachypnea, Bradypnea, Biots, Cheyne-Stokes and Central-Apnea) with the accuracy, precision, recall and F1 of 94.5%, 94.4%, 95.1% and 94.8% respectively.


On the application front, Ping An, an insurance company of China, launched a COVID-19 smart image-reading system to help control the pandemic. The method leverages the advances in computer vision and deep neural networks to diagnose the decease from Computed Tomography (CT) images. Many doctors have used CT images for diagnosing COVID-19, and a system like this assists with efficient and accurate predictions. More ideas like this are expected since there is a movement of collecting similar data from hospitals around the world.

On the other hand, Network Science Institute uses more traditional modelling approaches to map the social network of coronavirus. Historically, scientists trying to understand a virus and the epidemics associated focused on the virus itself, modelling its inner workings. But viruses need hosts to survive, and personal contact to spread. Thus, in the past decade, scientists started incorporating social networks to their models, to identify patterns of individual behaviour that could increase or dampen the effects of imminent pandemics. The results of such modelling inform policy recommendations, for example, should we shut down schools, and what would be the impact of that?


In the era of big data, data science and machine learning have a significant role to play on situations like this outbreak. Some of its applications are crucial for passing on the right message, while others have major impacts on the policies applied.

On the other hand, assistive applications are useful but still need a lot of work to realize their potential. For example, while a CT smart image-reading system can assist doctors to diagnose the disease even remotely, it suffers from specificity issues. Furthermore, it requires patients having their chest scanned, making it less relevant in places where expensive medical equipment is unavailable. The same specificity issues occur for a system tracking respiratory patterns in patients, as tachypnea is associated with many conditions. Moreover, researchers still struggle to fill the gap between the vast amounts of training data needed and scarce real-world data.

My name is Dimitris Poulopoulos and I’m a machine learning researcher at BigDataStack and PhD(c) at the University of Piraeus, Greece. I have worked on designing and implementing AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA. If you are interested in reading more posts about Machine Learning, Deep Learning and Data Science, follow me on Medium, LinkedIn or @james2pl on twitter.