Source: Microsoft Research
In the larger quest to make the Internet of Things (IoT) a reality for people everywhere, building devices that can be both ultrafunctional and beneficent isn’t a simple matter. Particularly in the arena of resource-constrained, real-time scenarios, the hurdles are significant. The challenges for devices that require quick responsiveness—say, smart implants that warn of impending epileptic seizures or smart spectacles providing navigation for low-vision people—are multifold. Small form factors and tiny microcontrollers mean that the training and prediction, via machine learning, that would make these devices smart and helpful must take place in the cloud, requiring significant amounts of data to be amassed and uploaded in real time. This introduces very real hurdles in areas such as connectivity, bandwidth, latency, power, and even privacy and security. For an individual prone to seizures, enjoying a swim in the community center pool, timing (latency) is everything, and the ability to leave the house for an entire day on a single charge (power) is survival itself. In the case of smart spectacles, constantly uploading video to the cloud, too, would soon cause bandwidth, latency, and power concerns and almost certainly introduce privacy issues.
The solution then would seem to lie in the area of making machine learning and prediction algorithms that currently reside in the cloud local to the devices themselves. And yet the hardware capacity of such devices is severely constrained, often relying on IoT endpoints having just 2 KB of RAM and 32 KB flash memory.
“We are trying to change the IoT paradigm fundamentally.” – Manik Varma, Principal Researcher, Microsoft Research India
The EdgeML team at Microsoft Research India has been examining this challenge from the point of view of machine learning and is building a library of ML algorithms—the EdgeML library—intended to have a range of both traditional ML algorithms, as well as deep learning algorithms, including the use of recurrent neural networks (RNNs) that could be used to build such devices and tackle some of these applications. RNNs are powerful deep learning models in how they make use of sequential information and incorporate context from previous inputs; just as humans don’t start thinking from scratch every second, RNNs are networks with loops in them that allow information to persist.
Squeezing RNN models and code into a few kilobytes could allow RNNs to be deployed on billions of IoT devices, potentially transforming many existing challenges for individuals and communities across myriad life scenarios. Downsizing the RNN also could significantly reduce the prediction time and energy consumption and make RNNs feasible for real-time applications such as wake-word detection, predictive maintenance, and human activity recognition.
The problem is that RNN training is inaccurate and unstable as the time interval over which the sensor signal is being analyzed increases. And in the types of resource-constrained and real-time applications that we’re talking about above, an additional concern is RNN model size and prediction time.
In FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network—being presented at the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) in Montreal, Canada—Aditya Kusupati, Prateek Jain, and Manik Varma of Microsoft Research India, along with Manish Singh of the Indian Institute of Technology Delhi and Kush Bhatia and Ashish Kumar of the University of California, Berkeley, introduce innovative new architectures for efficient RNN training and prediction on severely resource-constrained IoT devices too tiny to hold existing RNN models.
FastGRNN stands for Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network algorithm, designed to address the twin RNN limitations of inaccurate training and inefficient prediction. It turns out that FastGRNN matches the accuracies and training times of state-of-the-art unitary and gated RNNs but has significantly lower prediction costs. Models range from 1 to 6 KB for real-world applications.
“We asked ourselves, how can we get machine learning to actually run on such severely resource-constrained microcontrollers and IoT devices,” recalled India’s EdgeML team member and Principal Researcher Manik Varma. “The traditional IoT paradigm has been that these devices have been too weak to do AI, so everyone thought that all the data had to be sent to the cloud and all the decision making would happen there. But, unfortunately, this traditional paradigm cannot address lots of critical scenarios where you need to make decisions on the device itself.”
The team set out to conquer the four challenge areas presented by localizing machine learning on the microcontroller itself: bandwidth, latency, power, and privacy/security.
They had started this project roughly two years ago and turned heads at ICML 2017 when they published two papers (Bonsai and ProtoNN) showing how they had managed to deploy traditional machine learning on the world’s tiniest devices—microcontrollers smaller than a grain of rice, such as the ARM Cortex M0 with just 2 KB of RAM, and miniscule IoT boards, such as the Arduino Pro Mini, based on an 8-bit Atmel ATmega328P microcontroller operating at 8 MHz without any floating point support in hardware, with 2 KB RAM and 32 KB read-only flash memory.
It may have been the first time anyone in the world had gone so small with machine learning—and it got some serious attention.
“We are tackling critical scenarios beyond the pale of the traditional IoT paradigm, where it is not feasible to transmit sensor data to the cloud due to latency, bandwidth, energy, privacy, or security concerns and where the decision making needs to happen locally on the IoT edge or endpoint device itself.” – Manik Varma
Intent on building upon the success, the team intensified its focus on the more challenging problem of deep learning. In an IoT world, almost everything happening takes the form of a time series. Think of the case of a moisture sensor embedded in the soil on a farm taking periodic readings on water moisture at a specific location; based on a series of chronological readings, it would make a decision on whether to irrigate that particular location. The state of the art for analyzing time series is RNNs. And so, they started looking at leading RNNs as a way of solving the size and resource problem.
But RNNs have a couple of issues. One is that they are not very easy to train. Most RNNs and other deep learning methods are trained based on gradient descent¬–type algorithms. Unfortunately, in the case of RNNs, the gradients are not very stable. They explode in some directions (to infinity) and vanish (to zero) in others. This has been a problem for RNNs since the time they were developed. Researchers have come up with many ways to solve this problem. One is unitary RNNs, which restrict the range of the state transition matrix’s singular values. Unfortunately, they also increase the model size, as they require a larger number of hidden units to make up for the loss in expressive power. Therefore, unitary RNNs are not ideal for these tiny devices, where you want to conserve memory and make predictions as quickly as possible.
Gated RNNs, another idea that researchers have experimented with to address this issue, stabilize training by adding extra parameters. Gated RNNs can deliver state-of-the-art prediction accuracies, but the models themselves are sometimes even larger than unitary RNNs.
The EdgeML Team came up with another approach.
“What we realized is, if you take the standard RNN architecture and just add a simple residual connection, it stabilizes the RNN training and it does so provably,” said Varma. “It only has two extra scalar parameters—as compared to an RNN—and it gets you better accuracy than any of the unitary methods proposed so far.”
Based on this insight, they then modified the residual connection slightly by converting it to a gate. “This achieved an accuracy that matches the state of the art in LSTMs, GRUs, and so on, but with a model that is two to four times smaller,” explained Varma. Gated RNN that hit speed, achieved accuracy, remained stable and – was tiny. FastGRNN.
To compress this model even further, the researchers then took all the FastGRNN matrices and made them low rank, sparse, and quantized. This reduced the size by a factor of 10.
“Based on this, we were able to build a wake-word detector for Cortana using a 1KB model and fit it on the Arduino boards,” said Varma.
The team’s code is available online for free on Github.
Accommodating real world IoT off the cloud
The real-life applications brought into the realm of the possible by FastGRNN are seemingly unlimited, with ideas cropping up across smart health care, precision agriculture, augmenting of abilities for people with special needs, and even space exploration. The EdgeML team is prototyping a smart cane for low-vision people.
“We’re focusing on getting the machine learning algorithms as compact as possible. Our hope is that if you can fit them onto the tiniest microcontroller, then any other microcontroller can also run them,” said Senior Researcher and Edge ML teammate Prateek Jain.
Hence the EdgeML team’s smart cane prototype that can interpret gestures and then can interact with the user’s phone. A twirl with the cane gets the user’s phone to report present location. A double-swipe gets the cane to answer the owner’s phone. A fall detector, for example, for the blind or for the elderly could instruct the owner’s phone to call for help.
Smart spectacles for people with low-vision represent another example of on-the-spot, real-time training and prediction that could transform lives and one deeply significant to Varma, who is himself low-vision. “It would be enormously helpful to have a camera on my glasses that would tell me what’s happening in the world, who I am looking at, and so on, he said. “You can’t send the whole video stream to the cloud; it would be too costly and there wouldn’t be enough bandwidth.” And again, latency is an issue if the spectacles were to be depended upon to warn one of hazards when walking on the street.”
Privacy is paramount and a problem that is addressed by miniaturizing deep learning. “You don’t want your visual or voice data being streamed to the cloud all the time. That would be creepy, everything you say or see in your home being recorded and sent to the public cloud,” said Varma. With the EdgeML team’s methodology, voice detection is run locally and not being sent to the cloud at all.
“We spent a lot of time talking to many different Microsoft product groups, startups, scientists, and the government trying to figure out applications,” recalled Jain.
An interesting application the team came across was in astronomy and space exploration. Resource scarcity—specifically energy—in spacecraft and machines that are sent into deep space is a huge issue. Another is the fact that satellites and probes collect an enormous amount of data via telescopes, cameras, and other sophisticated sensors; yet astonishingly, only a miniscule fraction of the data that is sensed is ever seen by a human being. If there were energy-efficient, low-latency machine learning available on the sensors themselves, the on-chip algorithms could learn what data is most interesting, and then determine which data would be sent for human analysis.
Varma has been invited as a Visiting Miller Research Professor at UC Berkeley to work on some of these problems. “It’s one of the great things about Microsoft Research, the amount of freedom and support you get for blue skies research, to take risks and to collaborate with people inside and outside Microsoft,” he smiled. In the case of FastGRNN, we may be looking at the stars.