Source: Deep Learning on Medium
Compare different Transformer implementation frameworks and choose the best framework for your own needs
Based on your preference for PyTroch or TensorFlow, I recommend using Fairseq or Tensor2Tensor.
If you are a researcher, Fairseq is flexible enough for customization. But if you are working on some real application and considering deployment, it would be better to choose Tensor2Tensor.
Transformer in Different Frameworks
A sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.
Transformer (self-attention) networks:
- Vaswani et al. (2017): Attention Is All You Need
- Ott et al. (2018): Scaling Neural Machine Translation
- Edunov et al. (2018): Understanding Back-Translation at Scale
- New Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling
- New Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade
We can use encoder and decoder easily.
class fairseq.models.transformer.TransformerModel(encoder, decoder)
Transformer model from “Attention Is All You Need” (Vaswani, et al, 2017).
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
You can try solving the problem with different transformer models and hyperparameters as described in the paper:
- Standard transformer:
- Universal transformer:
- Adaptive universal transformer:
An open-source (MIT) neural machine translation system. It is designed to be research friendly to try out new ideas in translation, summary, image-to-text, morphology, and many other domains.
OpenNMT provides implementations in 2 popular deep learning frameworks:
Extensible and fast implementation benefiting from PyTorch ease of use.
Modular and stable implementation relying on the TensorFlow ecosystem.
An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
AllenNLP supports a Transformer encoder, which is implemented as
encoder = StackedSelfAttentionEncoder(
Recommend reading: Building Seq2Seq Machine Translation Models using AllenNLP
PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, efficient, flexible and scalable deep learning platform, which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet.
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
A widely used backend framework can assure that your model might be used by many people. And if there is some organization behind the frameworks, it is very possible that this framework can exist in a long time. So I collect the related information.
The reader who might see this article mainly research and engineer. So I focus on the debug and deployment in the pros and cons.
Based on your preference for PyTroch or TensorFlow, I recommend using Fairseq or Tensor2Tensor. If you are a researcher, Fairseq is flexible enough for customization. But if you are working on some real application and considering deployment, it would be better to choose Tensor2Tensor.