Original article was published by Alex Moltzau 莫战 on Artificial Intelligence on Medium
They have open-sourced the PRADO model they built last year, and encourage the community to use it as a jumping off point for new model architectures.
Their PRADO model is a neural architecture built in 2019. Their model with less than 200K parameters reached ‘state of the art’ performance.
There is a need for NLP models that can be run on-device rather than in data centers.
Their new model pQRNN advances NLP performance with a minimal model size.
“The novelty of pQRNN is in how it combines a simple projection operation with a quasi-RNN encoder for fast, parallel processing.”
According to a paper on the topic that the author has linked:
“…quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels.”
How does it work in Kaliamoorthi’s model?
According to the author [arrows added]:
- “Normally, the text input to NLP models is first processed into a form that is suitable for the neural network,
→ by segmenting text into pieces (tokens) that correspond to values in a predefined universal dictionary (a list of all possible tokens).
- The neural network then uniquely identifies each segment using a trainable parameter vector, which comprises the embedding table.
→ in which text is segmented has a significant impact on the model performance, size, and latency.
The figure below is made by Kaliamoorthi and shows the spectrum of approaches used by the NLP community and their pros and cons.
In short, what he says is that not all NLP models need to know everything.
Most tasks can be solved by knowing a small subset of segments.
“Hence, allowing the network to determine the most relevant segments for a given task results in better performance.”
There is difference in complexity to consider:
Their previous model that this one builds on, PRADO, was designed to: “…learn clusters of text segments from words rather than word pieces or characters, which enabled it to achieve good performance on low-complexity NLP tasks. Since word units are more meaningful, and yet the most relevant words for most tasks are reasonably small, many fewer model parameters are needed to learn such a reduced subset of relevant word clusters.”
The pQRNN model has three building blocks according to Kaliamoorthi:
- A projection operator that converts tokens in text to a sequence of ternary vectors.
- A dense bottleneck layer.
- A stack of Quasi-Recurrent Neural Networks (QRNN) encoders.
It is illustrated as follows: