Original article was published by Arthur Lee on Deep Learning on Medium
KDD 19′: PinText: A Multitask Text Embedding System in Pinterest
🤗 Recommendation system paper challenge (23/50)
🤔 What problem do they solve?
Now they used pre-trained embedding. However, they found there is a huge gap between the pre-trained and the ideal one.
Hence, they would like to develop an embedding for multitask at large scale.
There are many scenarios they apply text embeddings:
🤔 Design principles
Because we have to save the embedding in the memory, it is better to have all-in-one embedding to save storage cost.
They need to compute embeddings on-the-fly in realtime application like query embedding, because there are always unseen queries that churn in everyday. However, the pre-trained model is trained on char-level and have too many languages they don’t need to.
The pre-trained embedding is trained on unsupervised data. Supervised data could guide model learning more efficiently.
throughput and latency:
They have to iterate fast when new data comes and new experiments results are observed. At inference stage, we need infrastructure support for distributed offline computation.
😎 System Overview
offline model training
They use Kafka to transport critical events including impressions, clicks, hides, and repins to our data warehouse.
With proper cleanup, we convert these events to a standard predefined data format such that hadoop jobs can query against it directly to sample training data.
After that, we feed the data into multi-task deep learning model.
Distributed Offline Computation
After learning a word embedding dictionary, they derive the embedding of an entity by averaging its word-level embeddings. Then they do locality sensitive hashing (LSH)  based kNN search for retrieval purpose.
Kubernetes Cluster (K8S) for Embedding (entity -> embedding)
They use docker to wrap all the embedding computation logic into an image, then schedule this image on a K8S cluster to compute text embedding of large-scale inputs.
Hadoop(Map-Reduce Job) for LSH Token and KNN search
After each entity is mapped to a real vector on K8S cluster, they are able to do kNN search between a query set and a candidate set.
They use LSH  for large scale.
Additionally, they also send LSH tokens to search backend servers to build an inverted index <key: LSH token, value: [entity with this token]>
Precomputed Key-value Map
They cache the results as ⟨query, list of pins⟩ pairs with an in-house serving system called Terrapin, Like Memcached or Redis, supporting realtime queries together with automatic data deployment management.
In this way, they delegate most of the heavy online search to offline batch jobs, which are much easier to maintain and much cheaper to scale up.
Realtime Embedding and Retrieval
For some unseen text inputs, offline precomputation is unable to cover them. Hence, they deploy the learned word embedding dictionary to an online service and compute vectors and LSH tokens on-the-fly.
Because they have built an inverted index of candidate entities’ LSH tokens, the retrieval logic based on embeddings works exactly the same way as raw textual term based retrieval, which implies no further overhead development cost incurred.
- fastText: pretrained EN model with wiki 
- GloVe: 6B version pretrained by Standford university 
- word2vec: model pretrained with Google news 
- conceptNet: pretrained mode by 
🥳 The Qualitative result of the model
 Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. TACL 5 (2017), 135–146.
 Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Localitysensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8–11, 2004. 253–262.
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). arXiv:1301.3781
 Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar. 1532–1543.
 Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. 4444–4451.
🙃 Other related blogs:
WWW’17: Visual Discovery at Pinterest
ICCV: International Conference on Computer Vision
CVPR: Conference on Computer Vision and Pattern Recognition
Top Conference Paper Challenge: