NLP News Cypher | 12.29.19

Source: Deep Learning on Medium

Queue the tumbleweeds. T’was a slow week. The holiday slumber has tamed us all. As a result, this week’s Cypher will be on the short end. 😔

First, want to thank all the readers who have emailed saying how much they enjoy Cypher. We have a great time sharing with you all what the NLP world has to offer (one week at a time). So Thank You!

Second, in the upcoming weeks, we’ll be taking a couple projects live! Really excited and hopefully these projects will make data capture easier for all developers. (Don’t tell anyone, it’s classified😁😁)

Now…

This Week:

Faster Tokenizers for BERT and GPT-2

Living on the Edge

LIGHT: Fantasy Adventure Game

Data Labeling is a Business

From Stanford With Love

Faster Tokenizers for BERT and GPT-2

This is good news for quickening inference time!!

Living on the Edge

If you are brave enough to tame a large AI model on an edge device (aka not in the server), this article is for you. And good news, the Hugging Face peeps have articulated their foray into this issue. Their article covers the deployment of GPT-2 (text generation) on Android devices. They slightly sacrificed output performance (due to model size reduction) for model fitting purposes. I included their GitHub.

How AI progresses into mobile will be an interesting angle to keep track of as the industry matures:

GitHub:

LIGHT: Fantasy Adventure Game

Facebook’s Conversational AI group dubbed “ParlAI” ( pronounced “par-lay”) has a really cool AI model that is not only entertaining, but highly educational. Check this out: they embedded a bunch of dialogue pertaining to “663 locations, 3462 objects and 1755 character types, described entirely in natural language.” In total, they collected 11,000 episodes of character interactions that you, as a developer, can prompt. Meaning, you can choose a location and characters and then you start a dialogue under those specified conditions, and…..it’s really cool. If you’re bored during New Years, try it out!

Data Labeling is a Business

If you’ve worked on an AI project, you’ve probably dealt with data labeling (a.k.a the excel blinking contest). All you ever wanted to know about this industry (Yes, it’s an industry) was made available in a report.

Highlights:

● The market for AI and machine learning relevant data preparation solutions is over $500M in 2018 growing to $1.2B by end of 2023.
● Data preparation and engineering tasks represent over 80% of the time consumed in most AI and Machine Learning projects.
● The market for third-party Data Labeling solutions is $150M in 2018 growing to over $1B by 2023 .
● For every 1X dollar spent on Third-Party Data Labeling , 5X dollars are spent on internal data labeling efforts, over $750M in 2018, growing to over $2B by end of 2023.
● For every 1X dollar spent on Third-Party Data Labeling solutions, 2X dollars are spent on internal data efforts to support or enhance those labeling efforts.
● AI projects relating to object / image recognition, autonomous vehicles, and text and image annotation are the most common workloads for data labeling efforts.

Full Report:

LINK

From Stanford With Love

Hey, catch up on all things NLU directly from Stanford. These lecture series are highly intuitive and convenient if you require a nice read thru of the past and present state of natural language understanding: