Source: Deep Learning on Medium
Building an AI Algorithm to Catch Heart Disease
It’s America’s number-one killer. Not just that — it’s the world’s number one killer. Chances are you know somebody who battles it every single day. Or perhaps you remember somebody who, despite fighting as hard as they could, lost this very battle.
If you read the title, I’m sure you’ve already guessed what I’m talking about: heart disease.
It’s a painful condition, but it doesn’t have to be lethal. In fact, catching heart disease early can mean the difference between life or death.
The sooner a disease is detected, the more wiggle room both patient and doctor have to do something about it. Now, the patient has the freedom to fight heart disease with a few simple lifestyle changes instead of undergoing surgery after surgery after surgery.
And if the doctors caught the disease too late, and it causes irreversible damage to the patient’s body, all those surgeries may have been for nothing.
To make sure that we’re saving as many lives as possible, doctors should get the extra support they need early on. I think that one of the best ways that we can provide this support is by equipping doctors with powerful, artificially intelligent tools to make sure they can make a diagnosis as soon as possible.
The Problem: How Heart Disease Works
Coronary heart disease is a condition that occurs when there is a blockage in the arteries that carry blood and oxygen to your heart. For our purposes, I think the best way to gain more insight into how the disease works is to understand how it works in real life. So, here’s an explanation of a few of the real-life features I used to train my algorithm:
cp. Chest pain type.
If your cells don’t get enough oxygen, they’ll die.
Thankfully, our body comes with a nifty little superhighway that can make sure every single cell in your body gets the oxygen it needs.
Your blood is what carries this oxygen. Once the blood delivers the oxygen to cells in need, it races back to your heart to be recycled. From there, the heart will pump it to the lungs, which is the perfect place for an oxygen reup. Then your heart pumps the renewed blood into your body, and the cycle can start all over again.
But if your heart doesn’t get enough oxygen-rich blood, it’ll hurt your whole body. This type of pain is called an angina.
There are a lot of different reasons this could happen, but it’s usually a symptom of a bigger problem. Like coronary heart disease.
chol. Serum cholesterol in mg/dL.
Cholesterol is a fat-like substance which is indispensable in small amounts. It makes hormones, keeps your cells intact, and helps produce a host of other biological doodads that help you digest food.
Cholesterol uses molecules called lipoproteins to get around. They travel through your blood to get to where your body needs them to be. But not all lipoproteins do the same thing. See for yourself:
- High density liproproteins (HDL) are a complex family of particles that are associated with keeping your cholesterol at a healthy level. They’re known for carrying cholesterol back to the liver for removal, but they’re no one-trick pony. In most circumstances, it’s a good idea to have a lot of these guys.
- Low density lipoproteins (LDL) transport cholesterol from the liver to different tissues in the body. While in transit, it could collect in the walls of your arteries and form plaques.
- Triglycerides are most common fat in your body, coming from caloric excesses and stored in fat cells. It’s carried out of the liver with very low density liproproteins (VLDL cholesterol), and just like LDLs, they can get stuck in your arteries and clog them up just as bad.
Doctors can measure all these elements at the same time with a serum cholesterol test. They get the final number adding the number of HDLs, LDLs, and 20% of the triglyceride levels.
fbs. Is the fasting blood sugar greater than 120 mg/dl? 1 = true, 0 = false
Glucose is a fuel that powers cells. Just like oxygen, it gets to those cells through your blood. The foods you eat can impact this number pretty heavily, since glucose is in almost everything we eat.
If you want to get a good idea of how much sugar your body produces by itself, you’re gonna need to do a blood test after a period of not eating — also called fasting. This is called a fasting blood sugar test.
We used to think that things only started to get risky if your FBS greater than 120mg/dL. That was the cutoff for type 2 diabetes. These days, the Cleveland Clinic Foundation actually uses >90 mg/dL as the cutoff for coronary heart disease risk, since there are some pre-diabetes in-betweens that can be more accurate. In fact, a FBS of over 100mg/dLcan increase your risk of heart disease by three hundred percent!
trestbps. Resting blood pressure in mm/Hg on admission to the hospital.
Blood pressure is an incredibly important number to know, as it measures the amount of strain your blood puts onto your arteries. If your blood pressure is too high, your heart has to work harder to circulate blood around your body.If your blood is creating too much strain, it can create tiny tears on the insides of your arteries, giving LDLs space to clog them with plaque.
thal. Results of a thallium stress test.
A thallium stress test shows how blood is flowing to different regions of the heart while you’re at rest and while you exercise.
It involves introducing thallium, a harmless radioactive acting as a biological tracer, into one of your veins and is tracked until it gets to your heart. A specialized camera can show the results on a diagram.
“Cold” spots where no thallium is present show that the given area is not getting any blood. Spots that are supposed to be “hot” but stay cold at rest and during exercise may indicate permanent damage. AKA: Fixed damage. If the color comes back during exercise, then it may just indicate some blockage of the arteries and can be reversed.
Now that we’ve learned a little bit more about how heart disease works, how can we prevent it from becoming fatal?
The Solution: Our K-Nearest Neighbors Algorithm!
K-Nearest Neighbors (KNN) algorithms can make predictions about data they’ve never seen before by comparing an unknown datapoint to a known datapoint in the vicinity. The k is a variable that stands in for the number of neighbors our algorithm would use.
To understand this a little better, let’s check out an example:
Each of those datapoints represent one of our patients. Let’s say the blue points represent people without heart disease, and the red points represent people with heart disease. We have a patient that just came in, and he’s showing symptoms that put him right in the middle.
How would our trusty algorithm figure out whether or not they had heart disease? It’s simple! It would measure the distance between our new patient’s position and all the patients that are closest.
The smallest circle uses three of its closest neighbors and the biggest circle uses five of them. With both cases, the majority of the neighbors had heart disease. Thanks
But computers aren’t rulers. Computers are oversized math machines. So how can a computer figure it out?
There are a number of different distance algorithms that we can use, depending on how many features our dataset has and how many datapoints are inside of it. The one we’re going to use in this case is our plain-Jane Euclidean distance formula that we thought we’d never use after the eighth grade.
We all remember the Pythagorean Theorum, right? a² + b² = c²? The formula that lets us find the hypotenuse of a. Here’s a refresher in case we don’t:
Well, some Greek geek named Euclid figured out that you can use that same principle to find the distance between any two points by drawing an imaginary triangle between them. His formula is basically the same equation as Pythagoras’!
The two terms in green and orange help us find our lengths, a and b. We already know it’s a straight line between our points x2 and x1 because they only have one dimension: length. Same thing with our points y2 and y1! So, all we have to do to find their distance is find the difference between our finishing point and our starting point. This gives us the a and b from the original equation.
Next, add them together and square them, just like with our boy Pythagoras’ theorum:
All we have to do to find the Euclidian distance
Our algorithm computes the distances between our point and the k number of nearest neighbors
From there, the computer just counts how many of the neighbors have heart disease, and how many of the neighbors don’t. It draws boundaries based on the data it was trained upon, and then uses those boundaries to sort all the future datapoints.
(And here’s a link to see the code!)
- The most effective way to prevent a disease from killing a patient is to diagnose it early. Machine learning algorithms can help doctors catch them when it counts, and anybody (especially you!) can learn to build one today.
- KNN algorithms can classify unknown datapoints by learning from their neighbors. They’ll use the identities of the known datapoints closest to the unknown to determine the group our unknown belongs to.
- Anybody, especially you, could build one of these bad boys today!