Choose the Right Transformer Framework for You

Source: Deep Learning on Medium


Compare different Transformer implementation frameworks and choose the best framework for your own needs

Go to the profile of BrambleXu
Image credit: © Flynt — Bigstockphoto.com

TL;DR

Based on your preference for PyTroch or TensorFlow, I recommend using Fairseq or Tensor2Tensor.

If you are a researcher, Fairseq is flexible enough for customization. But if you are working on some real application and considering deployment, it would be better to choose Tensor2Tensor.

comparison of different frameworks

Transformer in Different Frameworks

Fairseq

A sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

Transformer (self-attention) networks:

We can use encoder and decoder easily.

class fairseq.models.transformer.TransformerModel(encoder, decoder)

Transformer model from “Attention Is All You Need” (Vaswani, et al, 2017).

Parameters:

Tensor2Tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

You can try solving the problem with different transformer models and hyperparameters as described in the paper:

  • Standard transformer: --model=transformer --hparams_set=transformer_tiny
  • Universal transformer: --model=universal_transformer --hparams_set=universal_transformer_tiny
  • Adaptive universal transformer: --model=universal_transformer --hparams_set=adaptive_universal_transformer_tiny

Here is a walkthrough to implement the Transformer model from Attention Is All You Need on WMT data.

OpenNMT

An open-source (MIT) neural machine translation system. It is designed to be research friendly to try out new ideas in translation, summary, image-to-text, morphology, and many other domains.

OpenNMT provides implementations in 2 popular deep learning frameworks:

OpenNMT-py

Extensible and fast implementation benefiting from PyTorch ease of use.

The transformer implementation code

OpenNMT-tf

Modular and stable implementation relying on the TensorFlow ecosystem.

AllenNLP

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.

AllenNLP supports a Transformer encoder, which is implemented as StackedSelfAttentionEncoder

encoder = StackedSelfAttentionEncoder(
input_dim=EN_EMBEDDING_DIM,
hidden_dim=HIDDEN_DIM,
projection_dim=128,
feedforward_hidden_dim=128,
num_layers=1,
num_attention_heads=8)

Recommend reading: Building Seq2Seq Machine Translation Models using AllenNLP

PaddlePaddle

PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, efficient, flexible and scalable deep learning platform, which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.

Implementation of the Transformer model in “Attention is All You Need”: English, Chinese

Sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet.

Lingvo

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Summary

A widely used backend framework can assure that your model might be used by many people. And if there is some organization behind the frameworks, it is very possible that this framework can exist in a long time. So I collect the related information.

The reader who might see this article mainly research and engineer. So I focus on the debug and deployment in the pros and cons.

Based on your preference for PyTroch or TensorFlow, I recommend using Fairseq or Tensor2Tensor. If you are a researcher, Fairseq is flexible enough for customization. But if you are working on some real application and considering deployment, it would be better to choose Tensor2Tensor.