NLP News Cypher | 08.09.20

Original article was published by Quantum Stat on Artificial Intelligence on Medium

DeLighT Transformers

DeLighT transformer library gives us a new look onr the most popular model in NLP — the transformer. The new architecture helps reduce parameter size in addition to making models deeper. Which means this new architecture can match or achieve better results with the traditional transformer architecture but being much lighter. As of right now, the architecture can help with language modeling and machine translation. According to authors, more tasks are on the way.



NLP Beyond English

Sebastian Ruder opines on the state of NLP and specifically, how our limitations with low resource languages is a much bigger problem we should be focusing on. His blog post discusses the different areas of impact from societal to cognitive road-blocks on the lack of these datasets. Below are the bullet-points from the blog on what you can do to help.

What you can do

“Datasets If you create a new dataset, reserve half of your annotation budget for creating the same size dataset in another language.

Evaluation If you are interested in a particular task, consider evaluating your model on the same task in a different language. For an overview of some tasks, see NLP Progress or our XTREME benchmark.

Bender Rule State the language you are working on.

Assumptions Be explicit about the signals your model uses and the assumptions it makes. Consider which are specific to the language you are studying and which might be more general.

Language diversity Estimate the language diversity of the sample of languages you are studying (Ponti et al., 2020).

Research Work on methods that address challenges of low-resource languages. In the next post, I will outline interesting research directions and opportunities in multilingual NLP.”


Stanza Update

Stanford’s Stanza updated its library to include support for the medical/clinical domain including:

  1. Bio pipelines and NER models, which specialize in processing biomedical literature text;
  2. A clinical pipeline and NER models, which specialize in processing clinical text.


PyKEEN: Knowledge Graph Embeddings Library

This new graph library comes packed with models and datasets. It comes with a handy pipeline API which really simplifies the initialization of models and datasets. At the moment there’s 13 datasets and 23 models to play with.

Here’s a quick example:

from pykeen.pipeline import pipeline
result = pipeline(




We all know of the hard stop of 512 token limitation of BERT. And this annoyance is one of the main reasons why BigBird was created.

This new design helps to scale performance “to much longer sequence lengths (8x) on standard hardware (∼16GB memory) for large size models.” 🧐

The cool thing about BigBird is because it leverages a sparse attention framework, it can do more with less. Meaning, it has less memory overhead (even versus other long context models like the Longformer).

The paper shows how it performs for both encoder only and encoder-decoder scenarios.

Performance wise it offers SOTA on question answering and long document summarization.



Someone helped to bridge the T5 model with ONNX (for inference speedup)😎 . You can get up to 4X speed improvement over PyTorch as long as you keep your context length relatively short (around <500 words).


Kite Auto Complete

For all the Jupyter notebook fans, Kite code autocomplete is now supported!


Dataset of the Week: SPLASH

What is it?

SPLASH is dataset for the task of semantic parse correction with natural language feedback.

Task in Action

Where is it?

(dataset/code hasn’t dropped yet, be on the lookout)