KDD 19′: PinText: A Multitask Text Embedding System in Pinterest

Original article was published by Arthur Lee on Deep Learning on Medium


KDD 19′: PinText: A Multitask Text Embedding System in Pinterest

🤗 Recommendation system paper challenge (23/50)

paper link

🤔 What problem do they solve?

Now they used pre-trained embedding. However, they found there is a huge gap between the pre-trained and the ideal one.

Hence, they would like to develop an embedding for multitask at large scale.

There are many scenarios they apply text embeddings:

🤔 Design principles

storage cost:

Because we have to save the embedding in the memory, it is better to have all-in-one embedding to save storage cost.

memory cost:

They need to compute embeddings on-the-fly in realtime application like query embedding, because there are always unseen queries that churn in everyday. However, the pre-trained model is trained on char-level and have too many languages they don’t need to.

supervised information:

The pre-trained embedding is trained on unsupervised data. Supervised data could guide model learning more efficiently.

throughput and latency:

They have to iterate fast when new data comes and new experiments results are observed. At inference stage, we need infrastructure support for distributed offline computation.

😎 System Overview

offline model training

They use Kafka to transport critical events including impressions, clicks, hides, and repins to our data warehouse.

With proper cleanup, we convert these events to a standard predefined data format such that hadoop jobs can query against it directly to sample training data.

After that, we feed the data into multi-task deep learning model.

Distributed Offline Computation

After learning a word embedding dictionary, they derive the embedding of an entity by averaging its word-level embeddings. Then they do locality sensitive hashing (LSH) [8] based kNN search for retrieval purpose.

Kubernetes Cluster (K8S) for Embedding (entity -> embedding)

They use docker to wrap all the embedding computation logic into an image, then schedule this image on a K8S cluster to compute text embedding of large-scale inputs.

Hadoop(Map-Reduce Job) for LSH Token and KNN search

After each entity is mapped to a real vector on K8S cluster, they are able to do kNN search between a query set and a candidate set.

They use LSH [8] for large scale.

Additionally, they also send LSH tokens to search backend servers to build an inverted index <key: LSH token, value: [entity with this token]>

online serving

Precomputed Key-value Map

They cache the results as ⟨query, list of pins⟩ pairs with an in-house serving system called Terrapin, Like Memcached or Redis, supporting realtime queries together with automatic data deployment management.

In this way, they delegate most of the heavy online search to offline batch jobs, which are much easier to maintain and much cheaper to scale up.

Realtime Embedding and Retrieval

For some unseen text inputs, offline precomputation is unable to cover them. Hence, they deploy the learned word embedding dictionary to an online service and compute vectors and LSH tokens on-the-fly.

Because they have built an inverted index of candidate entities’ LSH tokens, the retrieval logic based on embeddings works exactly the same way as raw textual term based retrieval, which implies no further overhead development cost incurred.

🤨 Experiments

Baseline models

  • fastText: pretrained EN model with wiki [4]
  • GloVe: 6B version pretrained by Standford university [22]
  • word2vec: model pretrained with Google news [19]
  • conceptNet: pretrained mode by [28]


🥳 The Qualitative result of the model

🧐 Reference:

[4] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. TACL 5 (2017), 135–146.

[8] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Localitysensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8–11, 2004. 253–262.

[19] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). arXiv:1301.3781

[22] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar. 1532–1543.

[28] Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. 4444–4451.

🙃 Other related blogs:

CVPR19′ Complete the Look: Scene-based Complementary Product Recommendation

COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19′ Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

WWW’17: Visual Discovery at Pinterest

🧐 Conference

ICCV: International Conference on Computer Vision


CVPR: Conference on Computer Vision and Pattern Recognition


KDD 2020


Top Conference Paper Challenge:


My Website: