EDGE AI — The future of AI

Original article was published by Yash Agrawal on Deep Learning on Medium

EDGE AI — The future of AI

In the last few years, the growth of Artificial Intelligence is exponential. One of the main reason for that is the availability of a huge amount of data for processing.

But there is still a huge amount of data(which is way more than the currently available data) of which we don’t have access, the reason being that it either resides on edge devices or is private which can’t be shared with the public or can’t be uploaded on the cloud. such data includes medical reports, personal photos, videos, etc. This acts as a barrier for many possible applications including user personalization. EDGE AI solves this problem.

so What is EDGE AI ???

Edge AI means that AI algorithms are processed locally on a hardware device (such as mobile phones, IoT devices, Embedded devices, micro-controllers, etc) instead of the cloud. The algorithms use the data(photos, videos, sensors data, etc) that are created on the device.

A device using Edge AI does not need to be connected to the internet to work properly, it can process data and take decisions independently without a connection.

According to Gartner, 91% of today’s data is processed in centralized data centers. But by 2022, about 74% of all data will need analysis and action on the edge.

There are several Benefits of Edge AI.

Privacy — Nowadays privacy is a big concern, consumers are more conscious of where their data is located, edge AI lets the companies process the user data locally on devices which means there is no need to send data to the cloud for processing. This will allow companies to deliver more AI-enabled personalized features.

Security — Since the data is being processed locally on devices and not on the cloud, this reduces the chances of Data breach.

Latency — Since the data is being processed locally so there is no need to send the data on cloud which improves the latency. This enables algorithms to work in Real-time.

Bandwidth — According to an article on Forbes, There are about 2.5 quintillion bytes of data created each day, and that pace is only accelerating. Sending this amount of data requires a high bandwidth which will increase the cost but with edge AI one doesn’t need to send this data on the cloud which saves money.

Getting into Edge AI

Edge devices have less memory and less computation power, so one can’t directly run-heavy models such as VGG-16, YOLO, ResNet, etc on them. To run these models one either have to modify the hardware(edge devices) or the software(model).

This post will be more focused on the software side as modifying hardware requires deep knowledge and understanding of these devices. Though many companies are working on the hardware side such as Intel, Google, Xnor, Habana Labs, etc you can check them out.

To make the models compatible to run on edge devices one needs to reduce the model size but compressing it. There are many techniques which one could use to compress their models, some of the important one’s are:-

PruningPruning means to remove the less important weights. Generally, the weights whose values are below a threshold are removed from the network. There are many ways to choose this threshold, The article by Jacob Gildenblat explained them. After Pruning we re-train the model to compensate for the loss. The table below shows the result from the paper Pruning filters for efficient convNets.

Quantization Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activation’s can be represented using 8-bit integers (or INT8) without incurring a significant loss in accuracy.

Few other techniques include :-

  1. Federated Learning
  2. Low-rank Factorization
  3. Knowledge Distillation
  4. Singular Value Decomposition
  5. Tensor Decomposition

Few Papers related to model compression: