Comparative Analysis: Raven Protocol v/s Conventional Methods

How Raven Protocol differentiates itself from other deep learning distribution methods and frameworks.

Raven Protocol was built on recognising the limited resource availability, surrounding the building of an accessible and sustainable architecture for any type of business model. Specifically, for those individuals and business that want to utilise the power of Artificial Intelligence and Machine Learning. Coming right to the point, Deep Learning is the most advanced and still mostly uncharted form of Machine Learning that many are apprehensive of applying, owing to the simple non-availability of, wait for it… Compute Power.

Deep Neural Networks have opened new gateways for Image Recognition, Natural Language Processing, Speech Recognition and in Computer Vision. Deep Learning ‘survives’ on a magnitude of data, and then extracts a million parameters on them to identifies structure and patterns. Needless to say, this becomes a computationally intensive process. Various experiments have been conducted on optimising the performance in the calculations and on speeding up the process of training a DNN, and they have come up with Distributed training and Parallelisation of networks to significantly (around *40 times) reduction in time taken to train a model. Additionally, with the availability of in-house CPU-GPU computation power, the results have been so far amazing.

But, consider the non-availability or compute-demand that is hard to meet, of GPU resources to train a model, or a very huge requirement that requires abundant compute resources to train the models. This calls for innovative methods to perform DL training. Traditional methods involve Data and Model Parallelism, which partially quenches that demand, with distributed systems.

Data Parallelism

Fig. 1.0. Data Parallelism in Conventional Distribution Method

Data Parallelism is sought in distributed training, as data cannot be contained in one single machine, and also to achieve faster training. Thus, data gets cut into smaller chunks to be utilised at different machines (ref. Fig 1.0). For this, the Model is replicated at those systems. These Model replicas are trained individually with the Data Shards (Bn) and then the weights are collated at the Parameter Server to get the final model. This method proved to contain a lot of latency in the overall execution.

Model Parallelism

Researchers came up with a different method to overcome the limitations of data parallelism by splitting the model architecture over the Intranet. This method of distribution is called Model Parallelism, where the dataset is kept in a system or storage, accessible across machines on which architecture-splits are kept ready to be trained.

Fig. 1.1. Model Parallelism in Conventional Distribution Method

But, even with this method, each system participating in the training needs to be equipped with sophisticated compute resources such as advanced GPUs. And, thus has its own limitation in terms of scalability, which becomes a bottleneck in terms of latency in the network (ref. Fig. 1.1).

Distributed Training In Raven

Raven Protocol uses Distributed Training of Deep Learning Models using a Shared Network of Compute Power within a blockchain environment. The dearth of availability of economical supply of ample compute power for individuals and businesses to perform the resource intensive DL training, brought forth this concept of gathering compute resources from the willing public. This is basically crowd-sourcing the compute power from as low a source as a Smartphone in your pocket, or a PC on your desks. Being set on a blockchain, this provides additional security and anonymity while distributing the training across multiple devices over the Internet. This also brings new revenue opportunities to the contributors and partners who are coming forward and growing the ecosystem in the form of a constant source of income from such DL trainings.

And hence, to stay true to the initial objective of Raven Protocol, optimising and speeding up training of DNNs, was in the agenda. And, we have come up with an alternative solution to this problem.

Dynamic Graph Computation

All the frameworks operate on tensors and are built on the computational graph as a Directed Acyclic Graph. In most of the current and popular deep learning frameworks including Tensorflow (before Eager Execution), the computational graph is static in nature. However, frameworks like PyTorch is dynamic, giving a lot more options to researchers and developers to fiddle around with their creativity and imagination.

A major difference between static and dynamic computation graph is that in the former, the model optimization is preset, and the data substitutes the placeholder tensors. Whereas, in the latter the nodes in a network are executed without a need for any placeholder tensors. Dynamic computation holds a very distinguishable advantage in cases like language modelling where, the shape of the tensors are variable during the course of the training. The benefit of a dynamic graph is its concurrency, and it is robust enough to handle the contributor addition or deletion, making the whole Raven training sustainable.

Raven takes both Data and Model Parallelisation approaches to form a different model of distribution.

Raven is thus capable of eliminating the latency and scalability issues, with both the approaches. Hence, distributing the training of any deeper neural network and their larger datasets, by getting rid of the added dependency on the Model replication. Data is also sharded in smaller snippets. In fact, the Model is intact at the Master Node, and the heavy lifting is distributed in the tiniest snippets of data subsets over the network of contributors. The resultant gradients, after the calculations that happen at the node/contributor ends, are sent back to the Master Node.

This creates a ton of difference, as it is easier for calculations to pass through from machine to machine, rather than creating multiple replicas of a complicated Model.

A majority of the human crowd are still oblivious of the extensive struggle that a small section of the AI community are facing, to make AI an easy and accessible affair for all. This stems from the realisation that AI is here and will become part of our lives in ways that we may not yet fathom. Regular AI companies or companies seeking to implement AI into their systems strive to bring about new ways to improve life with AI are finding themselves at a crippled stage to fully explore their ideas. Raven aims to help such set of individuals and companies to fully exploit the potential of AI, economically.

Source: Deep Learning on Medium