Original article was published by Jitendra Singh Balla on Deep Learning on Medium
How is Machine Learning used in the LinkedIn Recruiter Recommendation System
Let’s find out!
Primary reason LinkedIn users are active on the platform for job recruitment efforts. With more than 20 million companies listed on the site and 14 million open jobs, it’s no surprise to find out that 90% of recruiters regularly use LinkedIn.
In fact, a study found that 122 million people received an interview through LinkedIn, with 35.5 million having been hired by a person they connected on the site.
Heavy Usage of ML and DS:
In addition to nurturing one of the richest datasets in the world, LinkedIn has been constantly experimenting with cutting edge machine learning techniques and pushing the boundaries of research and development.
Specifically, LinkedIn Recruiter is the product that helps recruiter build and manage a talent pool that optimizes the chance of a successful hire.
This product by LinkedIn needs to handle arbitrarily complex queries and filters and deliver results that are relevant to specific criteria.
LinkedIn has built a search stack on top of Lucene called Galene, and contributed to various plug-ins, including capability to live-update search index. The search index consists of two types of fields:
- The inverted field: a mapping from search terms to the list of entities (members) that contain them.
- The forward field: a mapping from entities (members) to metadata about them.
These search index fields contribute to the evaluation of machine learning feature values in search ranking. The freshness of data in the search index fields is also of high importance for machine learning features.
The Ranking Model:
The Recruiter search experience is based on an architecture with two fundamental layers.
- L1: Scoops into the talent pool and scores/ranks candidates. In this layer, candidate retrieval and ranking are done in a distributed fashion.
- L2: Refines the short-listed talent to apply more dynamic features using external caches.
- The Galene broker system fans out the search query request to multiple search index partitions.
- Each partition retrieves the matched documents and applies the machine learning model to retrieved candidates.
- Each partition ranks a subset of candidates, then the broker gathers the ranked candidates and returns them to the federator.
- The federator further ranks the retrieved candidates using additional ranking features that are dynamic or referred to from the cache — this is the L2 ranking layer.
Finding A Good Fit:
Another challenge of the LinkedIn Recruiter experience is to match candidates with related titles such as “Data Scientist” and “Machine Learning Engineer”. This type of correlation is hard to achieve by just using Gradient Boosted Decision Trees(GBDT). To address that LinkedIn introduced representation learning techniques based on network embedding semantic similarity features. In this model, search results will be complemented with candidates with similar titles based on the relevance of the query.
Machine Learning Methodologies used :
- Non-linear modelling with Gradient Boosted Decision Trees.
- Context-aware ranking with pairwise learning-to-rank.
- Deep and representation learning.
- Large-scale information Network Embedding.
- Entry-level personalization with Generalized Linear Mix (GLMix).