Original article was published on Artificial Intelligence on Medium
Featuring data engineering with SQL, Microsoft Build Conference and some cutting-edge Image Segmentation research.
Welcome to My Week in AI! Each week this blog will have the following parts:
- An update on my work in AI
- An overview of an exciting and emerging piece of AI research
Refreshing SQL skills
This week I decided to focus on renewing my data engineering skills, as this is my least refined and practiced skillset. I started with the basics – SQL and databases, since I have not touched on these areas in almost two years. I started the ‘Master SQL for Data Science’ learning pathway on LinkedIn Learning, which consists of 7 courses focusing on SQL, NoSQL and Presto. I still maintain that SQL is the most intuitive programming language I have learned so there was not much in the way of new topics for me in these courses. However, they were all well taught and comprehensive, and so overall a useful refresh for me of SQL.
Attending Microsoft Build Conference
Another big portion of my week was spent attending talks at Microsoft’s Build Conference. One of the silver linings of events having to move online and be remote due to the pandemic has been that they have been opened up to wider audiences. Whilst I was very interested in this event, it is one that I would not have been otherwise able to attend since it was based in Seattle. The conference offered a multitude of talks ranging from cloud infrastructure and DevOps to AI and sustainability. I mainly attended the AI/ML and sustainability related talks, and had two key takeaways:
There is an industry-wide push for transparency, ethics and accountability in AI.
Azure is a hugely powerful cloud platform.
In terms of ethical AI, two talks stood out to me. The first was titled: “Responsible ML: Protect Privacy and Confidentiality with ML” by Sarah Bird, and the second: “How to Explain Text Models with InterpretML — Deep Dive” by Minsoo Thigpen.
In the first, Bird spoke about the privacy and security related AI initiatives that are being developed by Microsoft, including the WhiteNoise project focusing on differential privacy. This is an open source project that you can plug in to your machine learning pipeline, allowing you to add statistical noise to data to share sensitive datasets in a responsible manner — the key facet of differential privacy. Two applications suggested were anonymizing individuals’ incomes when working with census data and anonymizing medical diagnosis data. Overall this approach keeps the distribution of the data similar to the original distribution such that model performance is minimally (if at all) compromised, whilst protecting individuals’ identities.
In the second, Thigpen discussed InterpretML, another open source project by Microsoft that brings interpretability to NLP models. It currently works with BERT, RNNs and Scikit-Learn models and has three explainers implemented (I won’t go into the details here): Classical Text Explainer, Unified Information Explainer and Introspective Rationale Explainer. One of the applications presented was for hiring managers who use NLP to classify job applicants’ resumes. This package would give those managers insight into which key words in the resume led to a particular classification. The cool thing about this is that usually Deep Learning models are blackbox and interpreting them is very difficult, but this project allows you to do that to some extent.
Another takeaway I have from the conference regards the proliferation, power and ease of use of Azure ML. My dad, who worked at Microsoft in the 80’s and 90’s and who keeps up to date on all things Microsoft, has told me several times that Azure is the cloud computing environment used by the majority of large companies and that AWS is the cloud computing environment used by smaller companies and startups. It’s true: during the Build conference I found out that 95% of the Fortune 500 were using Azure — so in my opinion it’s definitely worth becoming familiar with both AWS and Azure.
Promising Image Segmentation Method
The research I’m presenting this week focuses on Image Segmentation, which is a Computer Vision task that consists of identifying objects within an image. In their paper ‘Leveraging Pretrained Image Classifiers for Language-Based Segmentation’, Golub, El-Kishky and Martín-Martín presented a novel way of segmenting an image with classes that were unseen during training¹. This is exciting because usually for such tasks you need a lot of training data that covers all potential classes seen at test time, which is expensive and time-consuming. Typically, there are two main groups of methods that aim to overcome this problem: weakly supervised methods and few-shot semantic segmentation. However, both require some human generated labels at test time.
The new method proposed by Golub et al. instead utilizes Computer Vision and NLP approaches, and critically does not need any human generated labels during testing. Figure 1 shows diagrammatically their approach.
The input image is passed through a pretrained image classifier (in their case they used Imagenet-1k VGG-19) whilst the label is passed through a language semantic association model (in their case they explored WordNet and Word2vec, eventually landing on WordNet). The outputs of the classifier and language model are combined to produce positive and negative saliency maps for the image, and these two maps are fed into a semantic segmentation model with attention. This model outputs the probability that each pixel belongs to each class label. In experiments, this approach achieved 3.2% higher accuracy than the previous state of the art 5-shot approach.
I think the most fascinating conclusion of this work is that it can generalize to classes unseen during training. This means that you can achieve good results with less training data than required for other image segmentation techniques, and this approach may work well in the real world where you are likely to come across previously unseen objects. As someone with a keen interest in NLP I was intrigued by the way the researchers utilized WordNet in what is ostensibly a computer vision task. Transfer learning is another interest of mine, so the usage of the ImageNet classifier here also piqued my curiosity.