Original article was published by Mark Cleverley on Artificial Intelligence on Medium

Each node in this example graph contains three data points, forming a data vector. Let’s say we want to figure out the neighborhood information of the pink node above.

To gather “neighborhood average information” in a similar way to convolutional filters, we can blend together (or simply average) neighboring node information to construct a node’s convoluted data vector. The pink node’s propagated vector will then be `0.7, 0.3, 0.7`

.

A convolutional layer in a GCN might perform this process for each node before passing the ‘estimated’ graph through to the next layer. It gets *much* more complicated than this when we dive into multi-layered GCNs, but the same principles apply throughout.

# Deep graph learning

There’s two major tasks that graph convolutional networks (GCNs) perform:

**Node classification:**based on labeled nodes, predict info about unlabeled nodes**Graph classification:**based on labeled graphs, predict info about new graphs

Wright gives an example of node classification GCNs with a citation network, where each node is an academic paper and each edge between nodes a formal citation.

Graph classification is absolutely fascinating, since it’s the equivalent of image classification (or object recognition) that CNNs often perform.

Electroencephalogram (EEG) readings can be modeled as graphs and used to predict human emotion based on brain activity. After training a graph network on labeled molecules, it can generate new proposed molecules that can aid in drug and material design.

This is *quite radical *to me*, *to be quite honest. Graphs have irregular structure that changes with information content — like water, if you increase the volume you probably won’t be able to practically maintain the same shape.

Yet there we have it: by reconsidering the idea of structure, we can pass through every node in a graph and gather relational information, eventually making predictions of nodes (“filling in the blanks”) or making predictions of entire graphs (“inferring overall purpose”).

How we can encode this to a binary vector is another case entirely, but it’ll probably involve the adjacency matrix in some regard.