LinkedIn Introduces Pensieve: An Embedding Feature Platform Using Supervised Deep Learning Techniques

Original article was published by on AI Magazine


Source: https://engineering.linkedin.com/blog/2020/pensieve

The LinkedIn Talent Solutions and Careers team aims to build an efficient marketplace for job-seeking members and employers. LinkedIn achieves this by matching members to the open job postings. There are a diverse set of products and recommendations modules built for LinkedIn Jobs and LinkedIn Recruiter. AI models are used in tandem with all these products to produce the final results. 

Significant features are created by leveraging supervised deep learning techniques so that models are trained to produce entity embeddings. ‘Representation learning’ or ‘Feature learning’ (through deep learning algorithms) has built a state-of-the-art performance on the LinkedIn platform. Its use in computation is accelerating because the network architectures have begun to use hundreds of millions of parameters. Thus the burden of entity embedding inference is pushed from the request time computation to nearline (or stream) pre-computation with no strict SLA.

Leveraging this technology, LinkedIn introduces Pensieve, an embedding feature platform that pre-computes and publishes the entity embeddings with supervised deep learning techniques. Across Talent Solutions and Careers, the embeddings are used by the ranking models for latency-sensitive applications.

The Pensieve platform can be divided into three pillars.

  • Offline Training Pipeline: The infrastructure streamlines training data generation and distributed training. The modelers can focus on applying deep learning theory in practice. The team focused on agile experimentation when scaling training to hundreds of millions of instances. We can quickly join millions of observations with any sparse features from our Frame Feature Marketplace through a few lines of configuration code. At the same time, Tensorflow On Yarn (TonY) enables the distributed training.
  • Pensieve Modeling: The team trained the neural networks to take sparse features about the entities and effectively encode them to semantic embeddings in low-dimensional space. Almost all of their iteration cycles are spent on applied research to improve the quality of embedding.
  • Embedding Serving Framework: The neural networks are packaged for embedding serving after training. The team set up the parallel offline and nearline embedding operating pipelines for the A/B testing’s multi-model computation. These pipelines publish the pre-computed embeddings to our Feature Marketplace for consumption by other AI models.
https://engineering.linkedin.com/blog/2020/pensieve

Model deployment

The final trained model is divided into two subgraphs:

1. The member pyramid

2. The job pyramid. 

The subgraphs are packaged, versioned, and distributed into the serving framework. This is done to independently pre-compute the embeddings for members and job postings.

System architecture

The following are the motives behind the design of the system architecture.

  • Efficient output: the amount of entity embedding is Minimized, and it’s an essential downstream system concern. We minimize writing by batching multiple versions of embeddings into one single message. Also, many data updates to entities do not change the entity embedding value. 
  • Experimentation velocity: Its ability to quickly experiment with new embeddings models is critical for productivity. When an embedding model is ready, it takes only a one-line change to serve the new embedding version.

Because of its simple, expressive API, Apache Beam is used in Samza for embeddings near the pipeline.

System optimization

It is essential to scale every nearline system sufficiently to handle the incoming message rate. The two major factors that result in a delay in message processing are as follows. 

1. The processor is not tuned correctly to keep up with the peak incoming message rate.

 2. The downstream dependency.

For the 1st problem, the following was implemented.

  • The parallelization of run loop stages was increased across different tasks by increasing the thread pool size of job containers.
  • JVM heap size was increased, and the heap resizing was disabled to reduce JVM pausing by GC and heap expansion, respectively.

For the second issue, the team designed a multi-data-center strategy to solve the downstream dependency failures.

Source: https://engineering.linkedin.com/blog/2020/pensieve

Related