Machine Learning on Graphs: Why Should you Care?

Original article was published on Artificial Intelligence on Medium

Machine Learning on Graphs: Why Should you Care?

A basic overview of graphs and their intersection with machine learning.

A few years ago, “Balboa Creole French” was considered as one language that is to disappear [1]. Balboa Island is located in Newport Beach,California. People their speak their modified version of French because many French families moved there after the first world war and started to learn English, German, and Spanish until the language was formed. There are around 20 people who still speak that language.

Of course, everything I said was a complete hoax, but people did not believe so until someone actually went to the island to learn and the language and ended up finding that the language did not exist in the first place(at least that’s what the rumors say).

Now, you might ask what does this have to do with machine learning on graphs? Well, around 4 years ago, research [2] done at Stanford University came up with classifiers that managed to detect such hoaxes on Wikipedia that had an accuracy of 86% compared to the human-level accuracy of 66%!

The classifier they used was an ensemble of decision trees called Random Forests. The interesting part was how they crafted the features.

Graph Diagrams for Real and Hoax Wikipedia Articles

One of the key ideas in the paper was how real articles link more coherently than false ones. In a Wikipedia article, you would have markup pointing to some other Wikipedia article. For real articles, the markups are linked together more than in a hoax and this turned out as a key factor in figuring out Wikipedia hoaxes.

Now, go to google, and type a question like “When did Leonardo Da Vinci die?”. You will get a lot of results for your search, but at the top, you will see a small box with the answer inside. How did Google know what we wanted?Back in 2012, Google released its Knowledge Graph which models entities in the world and relationships between them as a graph. So the string you input is not a string, rather a node in a huge graph. Leonardo Da Vinci is one node of this graph. The other node is May 2, 1519 which is his death date. There is a link connecting these two nodes. The link’s name or relation is Date of Death.

Of course, querying this graph and finding ways to embed the nodes/relations is another story which I would not tackle here!

Another one of the interesting applications of machine learning on graphs is the prediction of the side-effects due to the consumption of multiple drugs. Basically, many patients have to take sometimes more than one drug. Each drug affects a certain set of proteins. So if we can build a graph where the nodes are drugs and proteins. An arrow indicates that the associated drug affects the protein. Now, we know the effects of some drugs taken together. The problem is that we do not know the effects of all pairs of drugs since there are over 13000 drugs and doing experiments for each pair is time-consuming.

Drug and Protein Graph

The other solution would be to use machine learning to predict these side-effects. Drugs are represented by triangles and proteins by circles. A link from a drug to a protein indicates that the protein is affected by this drug. A link between two drugs indicates that there is a side-effect if the two drugs are taken together. Notice how if drug #1 and drug #2 are taken together, nausea occurs. What happens if drug #2 and drug #3 are taken together? This is a task called Link Prediction where we aim to predict if there is a link between two nodes by taking advantage of the other links in the graph! Several side-effects have been predicted using Machine Learning without spending time on time-consuming experiments.

To end, graphs are gaining an increased attention these couple of years, especially in the machine learning community. They are a language to describe complex data across various domains. Combined with machine learning, they have had a great impact on social networking, drug design, AI reasoning, and many more.

I have given a basic overview of applications of graphs in Machine Learning. I am thinking of publishing articles tackling the theoretical and practical sides. I will cover basic graph theory, social networks, random graph models, spectral clustering, graph neural networks, and deep generative models for graphs. I will also be accompanying this with code to implement. But first, I need to know if there is an audience for this. If you are interested, please let me know what you think!

Thanks for your time!