Active and Semi-Supervised machine learning: Sep 28–Oct 9

Original article was published by Olga Petrova on Deep Learning on Medium


Last weekend was a particularly nerdy one of mine: it included a trip to a planetarium on Saturday, followed by binge-watching the Stanford CS224W course lectures (Machine Learning with graphs) — highly recommend, by the way, you get all the satisfaction of binge-watching without any of the guilt. [Full disclosure: I am a firm believer in not beating yourself up for not making a dent in your to-do list as long as the distraction made you happy. That’s just good work/life balance! Please leave your Netflix recommendations in the comments.]

Speaking of machine learning with graphs, I’ve got another couple of preprints on this topic for you:

and

If you are completely new to graph-based machine learning, but would rather not watch all sixteen hour-and-a-half-long lectures to get started, I recommend watching enough of lecture 1 to get an idea for what a graph is, then watch lectures 6, 7, and 8 to see how to tie that idea together with machine learning models. Alternatively, here is my version of…

… graph-based Machine Learning in a nutshell

Think of each datapoint that you have as a node of a graph. Let’s say each node corresponds to a person’s Facebook profile. We will draw edges between (i.e. connect) those nodes, that are friends with each other on Facebook. The nodes can, but do not have to, have features (e.g. the person’s age, gender, self-described political views, and whether they own a cat). Now let’s say you want to figure out whether a given person is likely to support the idea of a covid-imposed lockdown, or not. A binary classification task, in other words. (You can label those nodes that have explicitly voiced their lockdown support or disdain in their profile, and train off them.) If you did not want to do any of that fancy graph stuff, you could simply train a classifier with the features that your training data has. However, you hypothesize that a person who is friends with lots of lockdown-supporters is more likely to share those views than the person whose Facebook friend circle is decidedly against the stay-at-home orders. If only we could include this kind of relational information in our modeling… Wait, we actually can: as long as we use the graph (nodes + their connections to other nodes) instead of just nodes as an input to our model.

Use the graph, Luke! Image credit: giphy.com

This might sound strange, but in a way, you have already been doing it. Think of image classification, for instance: a 1024 x 1024 image can be thought of as a grid — a very boring graph, that happens to have a fixed size and the same topology everywhere except for the image boundary. Each node (pixel) has a set number of features (that define the pixel’s color). Generalizing the familiar ML notions from a grid to a graph of arbitrary complexity is non-trivial, but can be done.

For those already well-versed in machine learning with graphs, here is a more math-heavy graph-based preprint from about two weeks ago: