Original article was published on Artificial Intelligence on Medium
You may have already used the Tacotron model found in the Super Duper NLP Repo for text 2 speech experimentation. Well now NVIDIA has released FlowTron and it comes with its own controllable style modulation. In fact, if you hear the keynote narration in the Huang video above, FlowTron is the model being used. If interested, check out their blog page showing various style demos alongside Tacotron 2.
Those JSON records can get convoluted really quick especially if two objects share the same key like “name” for people and “name” for company name. Below is a quick guide to get through nested JSON data with a function for isolating the right key, giving all of us a new hope. 😁
Visualizing AI Model Training
The title says it all. This is a step by step guide (w/Colab) for infusing Weights and Biases visualizations and Hugging Face’s Transformers library. For this example, DistilBERT on CoLA dataset is used to observe the Mathew’s correlation coefficient metric:
Searching over large amount of documents can often lead to multi-hop problems. Oftentimes, a question may require to search multiple areas of a knowledge base to answer a query accurately. In this work, the authors at CMU attempt to comb through documents (like a graph) without converting documents into a graph (leaving documents in original state) — which is easier to build than a knowledge graph and offering a major speed boost.
How does it perform?
On the MetaQA task:
the model outperforms the previous best by 6% and 9% on 2-hop and 3-hop questions, respectively, while being up to 10x faster. 🔥🔥🔥🔥
On the HotpotQA:
method trades off some accuracy (F1 score 43 vs 61) to deliver a 100x improvement in terms of speed.
16-core CPU Demo:
Text 2 Speech on CPUs
New text-2-speech model from Facebook can generate one second of audio in 500 milliseconds on CPU. In addition, they’ve included style embeddings allowing the AI voice to mimic an assistant, soft, fast, projected, and formal style!
There’s demo speech in the link below. Bad news though as this seems to not be open-sourced. 😌
T5 Inspires BART to Question
Open-domain QA, made famous by DrQA, usually involves a 2 stage model approach where you search over an external knowledge base (e.g. Wikipedia) and then use another model to retrieve data for a query. For closed-domain QA, like the SQuAD task, the downstream task involves feeding a general pre-trained model text and a question, and the model is tasked to find the answer span in the text. However, in this repo using the BART-large model, Sewon Min uses a model pre-trained on the knowledge itself and then fine-tuned to answer questions! This style, called open-domain closed-book, was inspired and described in the T5 paper below. Straight fire 🔥🔥.
Paper based off the T5:
Colab of the Week: T5 Tuning 🔥🔥
Learn to use T5 for review classification, emotion classification and commonsense inference!
CMUs ML Video Collection
From Graham Neubig, this great collection offers 24 lecture videos for your machine learning edification. You know the collection is good when attention is discussed in the 7th video😁. In these video clips we get everything from search trees, document level models to machine reading and NLG:
Dataset of the Week: Street View Text (SVT)
What is it?
Dataset contains street scene images with annotations used for scene text recognition task.