When GraphSAGE Meets Pinterest

Original article can be found here (source): Artificial Intelligence on Medium

In the previous stories, we introduced Introduction to Graph Embeddings, Random Walk in Node Embeddings, and 4 Graph Neural Networks. I would like to further extend to PingSAGE. It was proposed by Ying et al. in 2018. Ying is one of the authors of GraphSAGE (Hamilton et al., 2018). He and his team apply a modified version of GraphSAGE (Hamilton et al., 2018)in the Pinterest dataset.

We will use the following undirected graph to go though PinSAGE (Ying et al., 2018) algorithm. First of all, we have an input graph with Node A, B, C, D, E, and F. From node respective, and every node is a root node while other nodes are neighbor nodes.

Input Graph (Ying et al., 2018)

We will take Node A is an example. The neighbor nodes of node A include nodes B, C, and D. PinSAGE (Ying et al., 2018) use depth-2 convolutions, so we need to aggregate nodes B, C, and D by their neighbor nodes as well.

In the 1-hop layer (i.e., node B, C, and D), we have aggregation function (i.e., CONVOLVE) to aggregate all neighbor nodes. After aggregated neighbor nodes, both target node’s embeddings (i.e., node A) and neighbor nodes’ embeddings (i.e., nodes B, C, and D). Finally, we get a node A’s representation.

How do we get nodes B, C, and D representation? We use the same technique in the previous step but different weighting to get embeddings. You may notice node A exists again in the second layer of neighbor. It is expected as node A is a neighbor node of nodes B, C, and D. If we remove it from aggregation, we ignore the neighbor relations.

Aggregation of Node A (Ying et al., 2018)

Instead of using all neighbors, Ying et al. propose to use an important-based random walk approach to include the most important neighbors only. It not only reduces the computation footprint but also takes accounts into important neighbors.

Overview of neighbor nodes (Ying et al., 2018)

Take Away

References