Breakthrough of The Decade in Evolutionary Algorithms for Deep Neural Networks

Source: Deep Learning on Medium

Breakthrough of The Decade in Evolutionary Algorithms for Deep Neural Networks

Schematic of Gradient Descent

What is Gradient Descent?

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

The Gradient of a Function gives the Direction in which the Function ‘Increases’ most rapidly. And hence the ‘Negaitive’ of the Gradient of the Function gives the Direction which ‘Decreases’ the Function most rapidly.

What is Stochastic Gradient Descent?

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof.

(calculated from a randomly selected subset of the data)

What is Backpropagation?

In machine learning, specifically deep learning, backpropagation is an algorithm widely used in the training of feedforward neural networks for supervised learning; generalizations exist for other artificial neural networks (ANNs), and for functions generally. Backpropagation efficiently computes the gradient of the loss function with respect to the weights of the network for a single input-output example. This makes it feasible to use gradient methods for training multi-layer networks, updating weights to minimize loss; commonly one uses gradient descent or variants such as stochastic gradient descent. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, iterating backwards one layer at a time from the last layer to avoid redundant calculations of intermediate terms in the chain rule

The Problems w/ Back Propagation!

The Vanishing and Exploding Gradient problem — There are situations where the Gradient becomes almost Zero or shoots towards Infinity. The algorithm basically crashes.

There is no definite answer to what is the best learning rate or how to choose one.

There is no definite answer to the best way to design a neural network for a given task. Everything is based on guesses and heuristics. We need a way to do all of this very simply and easily.

Besides, the Back Propagation Algorithm is very computationally intensive.

Back Propagation is extremely difficult and cumbersome for Complex Deep Neural Networks. And to solve real world problems we have to make even more complex DNN’s.

Also the Back Propagation Algorithm is inherently Pseudo-Parallel. This is a huge bottleneck. We need something which is truely/natively parallel so that we can create bigger more complex DNN’s and solve more and more real world problems.

We need to be able to do all of this faster. Every major breakthrough in Deep Learning today takes 1–2 years of hard word by super-specialists Ph.D.’s and Post Doctoral Fellows. And huge amounts of money spent on computational infrastructure.

We need to simply everything so that even relatively inexperienced people can get reasonable results.

So What Have We Achieved? What is our Breakthrough?

We eliminated Back Propagation Completely. We only use Forward Propagation.

Our algorithm is hence Truely/Natively Parallel. And Linearly Scalable. And is Extremely Fast and Efficient.

Also it is Complete (Deterministic) it will definitely (eventually) find the best Neural Network Parameters.

It is very very simple to implement under the hoods…

  • So, even inexperience people can get great results
  • We can design and create exponentially more complex DNN’s
  • We can solve Real World Problems Efficiently and Accurately

All of this is easily orchestratable and we can create AutoML on Top of all this.

Comparison with Genetic Algorithms (GA) for DNN’s

In the field of deep learning, deep neural networks (DNNs) with many layers and millions of connections are now trained routinely through stochastic gradient descent (SGD). Many assume that the ability of SGD to efficiently compute gradients is essential to this capability. However, using neuroevolution, neural networks can be optimized through evolutionary algorithms, and it is also an effective method to train deep neural networks for reinforcement learning (RL) problems.

GA Algorithms give very poor results, in comparison to the Globally Optimum Theoretically Best Possible Solution.

Though you can still read more about Neuro Evolution aka GA Methods to Train Neural Networks here…

But for us @ Automatski

Its time to light a Cigar. Sorry OpenAI et. al. we have a breakthrough thats unmatched by anything thats out there.

About us

This is our website http://automatski.com