Original article was published by Anirudh Shah on Becoming Human: Artificial Intelligence Magazine
Welcome to My Week in AI! Each week this blog will have the following parts:
- What I have done this week in AI
- An overview of an exciting and emerging piece of AI research
Unearthing new tools
This week I came across two tools that I wanted to share, as I think they would be useful additions to a data science toolkit. First is Elasticsearch, an open-source search and analytics library for a range of different data types including textual, numerical and geospatial. It processes data quickly and is very scalable, and because of this has a number of use cases such as log analytics, geospatial data analysis and application search. Second is Streamlit, an easy way to build interactive Python apps for machine learning. The app updates as soon as you make changes to the code, so you can view your changes in almost real time. Overall, it is a powerful tool for Data Scientists and Machine Learning Engineers who want to visualize data and display their results in an aesthetically pleasing and interactive manner.
Learning new skills
From my personal experience and from talking to colleagues, I believe that there is often a knowledge gap for people starting out as Machine Learning Engineers. We often know how to deploy models (for former Software Engineers), or how to train models (for former Data Scientists), but not many of us have knowledge of both. In my research, I’ve found that many good and free resources are available for learning about training models, but I have not been able to find the same for deploying models. That is why I was so excited when a friend of mine recommended the Full Stack Deep Learning course, a free two-day online bootcamp in shipping deep learning projects. The course includes a project in which you have to deploy a deep learning system into production, and this hands-on experience is invaluable. This is definitely a course that I hope to complete over the next couple of weeks.
Efficient graph similarity search
This week’s research is about graph similarity search, a task often associated with identifying similar chemical molecules. Bai et al. present a new method of graph similarity search in: ‘SimGNN: A Neural Network Approach to Fast Graph Similarity Computation.’ They propose a fast and accurate method of determining the similarity between two graphs by turning this task into a learning problem, using a neural network-based function that is trained to compute the similarity score between two graphs.¹
Their method proceeds in the following manner:
1. The nodes are encoded based on the features and structural properties around them.
2. A learnable embedding function generates an embedding vector for each graph, providing a summary of the graph information using an attention mechanism.
3. In the interaction stage, the node-level and graph-level embeddings of two graphs are compared respectively and interaction scores are computed.
4. The two sets of interaction scores are passed into a fully connected layer to obtain a final similarity score.
For the learnable embedding function, the researchers used a Graph Convolutional Network as these types of networks can be configured to be representation-invariant and inductive — two important properties for this task.
This method was consistently the most accurate for graph similarity search on benchmark datasets, when compared with other state-of-the-art methods such as Beam and AttDegree. The authors also found that SimGNN was 46 times faster than Beam search on large graphs, another key performance marker in its favor. The total time taken included the time for training; if a pre-trained version of SimGNN was used, and then fine-tuned to a specific dataset, then this method would be even faster.
The researchers highlighted the potential applications of this method in bioinformatics, social network analysis, recommender systems and more, and I believe this is an exciting new method in the area of graph deep learning.
Next week I will be presenting more of my work in AI and discussing research on the use of AI in drug discovery. Thanks for reading and I appreciate any comments/feedback/questions.
 Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., & Wang, W. (2019). SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. doi:10.1145/3289600.3290967
Don’t forget to give us your 👏 !
My Week in AI: Part 8 was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.