Neural Networks Training with Approximate Logarithmic Computations

Source: Deep Learning on Medium

Fig 1: Hardware Accelerated Learning

Neural Networks Training with Approximate Logarithmic Computations

Neural Network training is expensive in terms of both computation and memory accesses — around three to five times computationally expensive from inference. Together these two factors contribute significantly to the net power requirements when training a neural network on edge-devices (devices connected to the edge of the internet — wearables, smartphones, self-driving cars, etc). To make real-time training as well as inference possible on such edge devices, computation reduction is of paramount importance. Although a lot of solutions to the problem posed above has been proposed, such as sparsity, pruning and quantization based methods, we propose yet another — design end-to-end training in a logarithmic number system. Note,

  • for this to work, all significant Neural Network operations need to be defined in LNS.
  • In LNS, multiplication reduces to addition. But addition itself becomes computationally expensive.
  • Hence we resort to Approximate Logarithmic Computations with the intuition that back-propagation noise tolerance would be able to absorb the uncertainty of our log-domain operations.
Fig 2: An LNS-DNN-MLP where the neurons have logarithmic activation functions and the activations and weights are in fixed point LNS

The mapping between real numbers and Logarithmic Numbers are given as,

This kind of logarithmic mapping implies that basic operations of the vector space ℝ needs to be modified, these operations being addition and multiplication. In log-domain, multiplication trades it’s computational complexity with addition.

That is, multiplications are eliminated and replaced with additions. Additions, on the other hand, are tricky. The exact form that addition takes is given below,

The delta term induces non-linearity and a lot of extra calculations while performing addition in log-domain.

A graphical representation of △ w.r.t. d shows the non-linearity clearly.

Fig 3: Correction term △

In a similar fashion subtraction can be defined too,

Since a single multiplication is now just an addition, intuitively, we can deduce that exponentiation would be simpler too in log-domain,