Source: Deep Learning on Medium
Moving the execution of deep neural networks (DNNs) closer to end devices, enabling fast, robust and scalable DNN inference.
The applications of deep neural networks (DNN) have experienced unprecedented growth in recent years. But there is a problem: the inference computations of large DNN models are too intensive to run on a single end device. To solve this problem, this article introduces a novel platform called Intelligence Distribution Network (IDN) that is specifically designed to make DNN inference fast, robust and scalable. With IDN, it is possible to spread the inference computation over a distributed computing hierarchy of multiple devices, edges, and the cloud, allowing the application to support larger models and improve fault tolerance.
Web developers can integrate intelligence into their web applications with ease and without being limited by compute resources.
At its core, IDN is a distributed peer-to-peer network of computing devices that moves the execution of deep neural network inference closer to the end devices. By leveraging and combining the capabilities of spare computing resources of nearby devices, edges, and the cloud, IDN provides high-performance low-latency inference to meet the growing demand from Artificial Intelligence (AI) applications.
IDN’s peer-to-peer network transparently distributes and routes the model execution from the target inference application to nearby participating nodes, which in turn cooperate to distribute the load in order to render a response quickly and efficiently. By shifting the execution of deep learning inference from the cloud to the edge and end devices, everyone with spare computing resources can become a node in the IDN network and therefore, a participant in the our society’s transition towards Artificial Intelligence (AI) by hosting DNN models and performing inference computation. By moving intelligence algorithms closer to the end user and making it more accessible, IDN lives up to its name as an Intelligence Distribution Network.
Website visitors can participate and share compute resources for DNN inference.
Here is how IDN achieves its core benefits: making inference 1) fast, 2) robust, and 3) scalable.
IDN enables fast inference by moving the execution closer to the end devices running the AI applications. Beyond this gain from latency reduction, IDN can further speedup inference by augmenting the target model with branches via BranchyNet. BranchyNet augments DNN models with branches at various points in the network to enable fast inference by terminating the execution early when enough confidence is accumulated, therefore making additional computation unnecessary (see Fast Inference for Deep Learning Models for more details).
IDN enables robust inference by sending inference requests to multiple peers simultaneously in parallel. Further, when a peer is known to be inactive, the inference request can be rerouted to another available peer automatically. Because of this built-in redundancy, if any request or response is lost or corrupted, the inference result can still be obtained from other nodes. This enables robust and reliable inference for your applications.
IDN enables scalable inference by being able to send inference requests from the end device to unlimited number of nearby peers. Any available peer near your end device can be a host for your application’s inference computations. By not having every inference request hitting your server, IDN enables improved scalability for your applications.
Inference on IDN
1) Computation is distributed across multiple devices over peer devices, edges and the cloud
2) Peers can execute computation in parallel
3) There are multiple output (exit) points
4) Each output returns a confidence value
5) Each output can be progressively enhanced by combining it with previous output
6) The inference can terminate early if the output has high enough confidence
Below is an example of how you can use IDN to move the execution of your DNN models closer to end devices to realize these important gains.
Consider a scenario in which you define 1) a model descriptor describing the identity of the model, 2) the path where the model can be accessed, and 3) the type of the model. Here, we use ipfs to host our model, the type of the model is exported from torch, and we specify that the model should be run using NVIDIA CUDA GPU. In addition to torchjs/cuda model type, other model types can be used as well, including WebDNN, which allows nodes to execute the inference computation entirely on the browsers.
With IDN, everyone will be able to tap into the power of Artificial Intelligence via their browsers. Web developers can integrate intelligence into their web applications with ease without being limited by compute resources. Website visitors can participate and share compute resources for DNN inference.
Once the model is defined, IDN sends initModel requests to peers, which triggers them to download the model and prepare it for inference requests. After the model is initialized, we can now send inference requests to the peers and wait for the response event to come back. The response event consists of outputs, the confidences for those outputs, and a finish callback function to terminate the ongoing computation early.
Below is an example of videos processed on IDN with the ResNet18 model. The video is a 224-by-224 4s-long 30fps video passed through ResNet18 model before each frame was rendered and displayed.
As the example video demonstrates, we see initial promising results of inference execution using IDN. With 3 nearby peers, we see a comparable result of executing the inference locally.
This post is edited by our editor in chief Marcus Comiter.
If you have comments or ideas, please let us know below. Please tell us how this could be useful in your project or how you could use it. We are writing more content on IDN and will be releasing an IDN library for everyone to experiment with in the near future. Please stay tuned. Thank you!