How Big Tech use Machine Learning?

Original article was published by Rahul Bhatt on Artificial Intelligence on Medium


Let’s begin with what is machine learning?

Machine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

  • Machine learning is a method used by data scientists for analysis of the data in order to automate the analytical modelling part of the system.
  • The system learns from huge chunks of data, identifies patterns and then makes predictions with minimal human intervention.

Popular uses of Machine learning are Play Store and App Store recommendations, google maps, email filtering, google translate, google search and so on and so forth. Let’s look at 5 machine learning use case in details:

  1. App Store and Play Store Recommendations
  2. Transportation and commuting
  3. Email filtering of Gmail
  4. Google Search
  5. Chat bots for customer support queries and many more.

1. App Store and Play Store Recommendations: Google play in collaboration with DeepMind uses three main models :

  • Candidate generator
  • Reranker
  • Model to optimize for multiple objectives

They tried 3 different solution which involved using LSTM (Long short-term memory) which gained notable accuracy gains but led to serving delay since LSTM are computationally extortionate. The second solution was to replace LSTM with a Transformer model which is used for sequence-to-sequence prediction and has produced a significant result in NLP. This increased the efficient but also the training cost. The third and final solution was to implement an efficient additive attention model that works for any combination of sequence features, while incurring low computational cost.

Candidate generator is a deep retrieval model that can analyze more than a million apps and retrieve the most suitable ones. It learns from what the user previously installed. It also learns a bias that favors the apps that downloaded 10 times more than another app. To correct this bias, they implemented weighting in their model. It is based on the impression-to-install rate of each individual app in comparison with the median impression-to-install rate across the Play store. So, an app with a below-median install rate will have an importance over the weight less than one.

For each app, there is a Reranker which is a user preference model, which predicts the user’s preferences along multiple dimensions. Usually many recommendation systems uses binary classification for ranking problem and this only ranks one item at a time and fails to capture the context of how apps might or might not be similar. The solution to this was, the Reranker model, where it learned the importance of pair of apps that have been displayed to the user at the same time. The one the user chooses to download will then be assigned each of the pair a positive or negative label, and the model will try to miniseries the number of inversions in ranking, thus improving overall relative ranking of the apps.

Next these predictions are the input to a multi-objective optimization model whose solution gives the most suitable candidates to the user. The algorithm that they use tries to find a tradeoff between a number of metrics and finding the suitable points along the tradeoff curve.

Reference: DeepMind blog

2. How uber uses Machine Learning: Uber started from making 3 models to 10, 000 models in production, maybe even more now. They allow their data scientists to train models across GCP, Tensorflow, Keras and serve all of this models in Robust way. Uber has to gather pile of data in order to find the best routes, make predictions about the changing market demand, respond to a potential fraud and so on. Uber uses Michelangelo open source components like HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

Fig.1. Michelangelo : Uber’s Machine Learning Platform

If you have heard about UberEATS then you will be more excited to know that UberEATS has several models running on Michelangelo, like covering meal delivery time predictions, search rankings, search autocomplete, and restaurant rankings.

Fig.2. The UberEATS app hosts an estimated delivery time feature powered by machine learning models built on Michelangelo.

You can have deeper insights about Michelangelo in their official blog attached link below.

Reference : Uber Engineering Youtube Video , Michelangelo Blog

3. Email filtering of Gmail : If you using Gmail for quite a while now then you must have noticed how good it is at filtering spam and important emails. A Tech giant like Google uses TensorFlow to help filter additional spam for Gmail users. With TensorFlow, they have managed to block around 100 million additional spam every single day. With TensorFlow they are managing to block millions of spams which are difficult to easily identify like image based spams, emails having embedded content, and messages from newly created domains having low volume of spam messages within legitimate traffic. The Gmail team managed to get 0.05% of spams in the inbox which means around 99% of accuracy.

Reference : Google Cloud Blog, Gmail Blog

If you reached at the end and liked the article consider checking the references for more insight to machine learning.

Thank you for reading 🙂