Graph Convolutional Network (GCN) with a case study.

Original article was published by Naveen kumar on Artificial Intelligence on Medium


Key Words: Graph Convolutional Network (GCN), Deep Learning, Artificial Intelligence.

Problems Solve: We can build a deep neural network model on unstructured data or graph using Graph Convolutional Network (GCN), generally it can’t do with Convolutional Neural Network (CNN).

Outlines:

1) Motivation

2) Introduction

3) How to convert a graph in a format that the computer can accept?

4) How to train graph using the neural network?

5) Deep Encoder

6) Dataset & Implementation

7) Results & Conclusions

1) Motivation:

Modern machine learning (ML) is working well on structured data. Structure data is like a grid pattern. Convolution neural network is designed for grids and sequential data.

A modern machine learning model.

Real data collected from different applications usually do not have a pre-defined data model or is not organized in a pre-defined manner. Analyzing messy, unstructured data and extracting useful data information is difficult. For the data collected in financial institutions, usually, it has additional topological structures and is amenable to be represented as a graph. For example, social networks, communication networks, financial systems, and payments networks. The graph structure can be built from the connection of each entity, such as financial institutions, customers, or computing centers. We consider how the different entity influences each other’s label through a label prediction problem based on the graph structure. Graph Neural Networks (GNNs) is a powerful tool that can mimic experts’ decision on node labeling.

structured data: It refers to data that has been organized to reside in a relational database with rows and columns.

unstructured data: unstructured data that is not organized to reside in a tabular form in a relational database. Unstructured data includes image, video, and audio, as well as text and tagged formats like XML, SGML, and JSON.

Graphical and fixed grid data.

2) Introduction:

Recently, Graph Neural Network (GNN) has gained increasing popularity in various domains, including social networks, knowledge graphs, recommender systems, and even life science. The power of GNN in modeling the dependencies between nodes in a graph enables the breakthrough in the research area related to graph analysis. Graphs nowadays become ubiquitous owing to the ability to model complex systems such as social relationships, biological molecules, and publication citations. The problem of classifying graph-structured data is fundamental in many areas. Besides, since there is a tremendous amount of unlabeled data in nature and labeling data is often expensive and time-consuming, it is often challenging and crucial to analyze graphs in a semi-supervised manner.

Graph Convolutional Network Architecture.

Graph: Before we get into GNN, let’s first understand what is Graph. In Computer Science, a graph is a data structure consisting of two components, vertices, and edges. A graph G can be well described by the set of vertices V and edges E it contains.

G = (V, E)

Edges can be either directed or undirected, depending on whether there exist directional dependencies between vertices.

A Directed Graph (wiki)

The vertices are often called nodes. In this article, these two terms are interchangeable. The black arrows on the edges represent the kind of relationship between the nodes. It shows whether a relationship is mutual or one-sided. The two different kinds of graphs are directed (connection direction matters between nodes) and undirected (connection order doesn’t matter). Directed graphs can be unidirectional or bidirectional in nature. Twitter followers are an example of a directed graph and Facebook friend is an example of an undirected graph.

Analogy: If a node is a person, then the node label is a person’s name, and node features are a person’s characteristics.

3) How to convert a graph in a format that the computer can accept?

i) Incidence Matrix (I): A matrix of size nxm where n is the number of nodes and m is edges of a graph. Entry is 1 if node & edges are incident (related), and 0 if not.

Incidence matrix for an undirected graph.
Incidence matrix for a directed graph.

ii) Adjacency Matrix (A): It is called a connection matrix. It is a square matrix. The elements of the matrix are 0 or 1, depending upon whether the nodes of the graph are adjacent or not. An adjacency matrix can be weighted, which means each edge has an associate value. So instead of 1, the value is put in the respective matrix co-ordinate.

Adjacent Matrix for a graph.

iii) Degree Matrix (D): A Diagonal matrix that contains information about the degree of each node, that is the number of edges attached to each node.

Degree Matrix for a graph.

iv) Laplacian Matrix (L): It is also known as the Beltrami operator. Laplacian is a measure of smoothness of a vertex i.e; how quickly it changes between adjacent vertex.

L = [ D-A ]

Laplacian Matrix for a graph.

4) How to train graph using the neural network?

1st Approach (Naive approach):

  • Take the adjacency matrix and feature matrix [A, X] Join adjacency matrix and features
  • Feed them into deep into (fully a connected) neural network.
The naive approach to train graph using a neural network.

Problems with Naive approach:

  • O(N) order parameters – Huge number of parameters.
  • Not applicable to graphs of different sizes (Not inductive learning).
  • Not invariant to node ordering.

2nd Approach: Use the above approach with the CNN network. Here GCN consists of:

  • Locality/Neighbourhood nodes
  • Aggregation — how neighboring nodes contribute to the target node with its weight matrix.
  • Stacking Layers — Describe the model, parameters, training, and How to fit the model?

Mathematical Terminology :

Assume we have a graph G:

  • V is the vertex set
  • A is the adjacency matrix (assume binary)
  • X ∈ R:×|=| is a matrix of node features
  • Node features:

– Social networks: User profile, User image

– Biological networks: Gene expression profiles, gene functional information

– No features:

# Indicator vectors (one-hot encoding of a node)

# Vector of constant 1: [1, 1, …, 1]

Node’s neighborhood defines a computation graph: Learn how to propagate information across the graph to compute node features.

Computational graph of feature node.

Generate node embeddings based on local network neighborhoods. And nodes aggregate information from their neighbors using neural networks.

Graph neural network for node A.

The model can be of arbitrary depth:

  • Nodes have embeddings at each layer
  • Layer-0 embedding of node A is its input feature, XA
  • Layer-K embedding gets information from nodes that are K hops away. Here K=[0,1].
Graph neural network of node A with features.

Average information from neighbors and apply a neural network.

Graph neural network of node A with features aggregations.

5) Deep Encoder :

  • Basic approach: Average neighbor messages and apply a neural network
  • 3 basic steps of the GCN-model during training.
Basic equations of GCN-model in training.

Here B and W parameter is trained in an iterative way. We can feed these embeddings into any loss function and run stochastic gradient descent to train the weight parameters.

Equivalent equation of GCN in vector form :

Vector equation GCN-model.

Limitations :

1) No self-loops in the adjacent matrix — resolve by adding the Identity matrix with the Adjacent matrix (A+I).

2) A is not normalized — resolve by in place of feature vectors, A is normalized by degree matrix(D) as inv(D)*A.

6) Dataset & Implementations: A case study of GCN on Pubmed diabetes dataset.

About Pubmed Diabetes dataset: The Pubmed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes (“Diabetes Mellitus, Experimental”, “Diabetes Mellitus Type 1”, “Diabetes Mellitus Type 2”). The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words. Data consists of tab-delimited files where the first line describes the contents of the files and the second line describes the names and types of the attributes.

Code: Please see gcn_model.py.

7) Results & Conclusions :

  • Accuracy on testing dataset = 82.67 %.
  • The plot of comparison between training and validation data.
Comparison between training and validation data.
  • PCA visualization of GCN embeddings for PubMedDiabetes dataset.
PCA visualization of GCN embeddings for PubMedDiabetes dataset.
  • TSNE visualization of GCN embeddings for PubMedDiabetes dataset.
TSNE visualization of GCN embeddings for PubMedDiabetes dataset.

Conclusions: We have learned the basics of Graph Neural Networks, its mathematical terminology and implement using the StellarGraph python library. The power of GNN in modeling complex graph structures is truly astonishing. In view of its effectiveness, I believe, in the near future, GNN will play an important role in AI’s development.

References:

Thank You !!!

Please Clap it.

Name: Naveen

E-mail: naveenk.paymeindia@gmail.com

LinkedIn Id: https://www.linkedin.com/in/naveen-kumar-607037b5/