Machine Learning to Predict Parkinson’s Disease

Original article was published by Frederik Bussler on Artificial Intelligence on Medium

2percent of elderly people are affected by Parkinson’s — a debilitating neurodegenerative disease — but up to a quarter of cases are misdiagnosed¹.

As the Mayo Clinic writes, diagnosing Parkinson’s is no simple task². There’s no specific diagnostic test, and a trained neurologist needs to review a patient’s medical history, symptoms, and conduct a neurological and physical examination, using techniques like dopamine transporter scans, as well as blood tests and imaging tests to help rule out other disorders.

The difficulty in making a diagnosis is amplified in developing countries, which have fewer medical resources. As with many other disorders, undiagnosed and untreated cases of Parkinson’s have likely increased during this pandemic. Parkinson’s may even emerge as a third wave of the pandemic³.

Fortunately, there are faster, easier ways to diagnose Parkinson’s, that don’t really on intensive, in-person visits. Parkinson’s patients exhibit characteristic vocal features, and machine learning can be used to capture these characteristics, effectively screening potential patients.

(Small) Data

Parkinson’s Disease affects speech with symptoms like difficulty articulating sounds, lowered volume, and reduced pitch range.

The University of Oxford collected 195 voice recordings from 31 people: 23 with Parkinson’s and 8 without. The columns include extracted features like the average vocal fundamental frequency and measurements of variations in frequency and amplitude.

With just 195 rows of data, this is a great example of a “small data” use-case. A pervasive myth is that AI needs big data, and while deep neural network accuracy tends to increase with more data, not all AI is necessarily data-intensive. In another article, I showed how a dataset of under 100 rows could be used to predict 2020’s political instability.


We can upload the data as-is to the AutoML tool Apteo to make a predictive model for Parkinson’s Disease.

We select status as our KPI, which refers to the health status of the subject, 1 meaning they have Parkinson’s, and 0 meaning they’re healthy. All the other columns are used as attributes. In the background, a set of machine learning models are made to predict status. We can also see how each attribute (or different audio features) impacts status.