Deep Learning Demystified



Machine Learning is one of the hottest buzzwords today,and Deep Learning is an extremely important branch of this field. Deep Learning deals with algorithms which are inspired by the most complex computational device known to man-the human brain.These algorithms have true potential to be scaled,which means we can theoretically expect performance gain as we feed in more data.

AI vs ML vs DL

There is a common misconception that Artificial Intelligence,Machine Learning and Deep Learning all mean the same thing,but they are in fact different,albeit being related to each other.

Venn Diagram showing relationship between AI,ML and DL

This picture depicts the differences between these three fields.

Artificial Intelligence is the field of science which deals with emulating human intelligence on “artificial” machines.

Machine Learning is a subset of AI,and basically provides a means to achieve artificial intelligence.

Deep Learning is a subset of ML,and essentially implements learning by modelling the way our brains learn.

Deep Learning-A brief history

Development of Deep Learning (1943-Present)

Deep Learning has been in existence for quite a while now, and can be traced back to as early as 1943 when Walter Pitts and Warren McCulloch created a computation model which mimicked neural networks in our brain.From then on,an immense amount of research and development has been done,from the development of the backpropogation algorithm to the advent of Convolutional Neural Networks(CNNs) and models like Generative Adverserial Networks(GANs) today.Deep Learning has come a long way,and today makes a strong case to be the present and future of technology.

What’s changed?

By looking at the timeline above,we can observe a very interesting point-Deep Learning has been in existence since the 1940’s ,but has been extensively used only now in the 20th century.Why is this so?

This sudden resurgence of an existing methodology is analogous to old wine in new bottles.Three major reasons contribute to this,namely

  1. Explosion of Data
  2. Speed and power of computation
  3. Low cost of computing

These advances have enabled us to begin utilizing the true potential of these scale able algorithms.

What is Deep Learning?

So the question which arises is this,What is Deep Learning?

Deep Learning is a branch of Machine Learning that deals with learning patterns in data extremely efficiently.

It does so by utilizing algorithms which learn in a way that mimics the way our brains work. This means that given large input data, algorithms are able to understand it better and respond accordingly.

Models built using Deep Learning can achieve state of the art accuracy, and have sometimes even surpassed human-level performance for specific tasks.

Hype or Reality?

Now we need to consider whether Deep Learning is all hype, or whether it actually has a meaningful place in today’s world.

This is a chart depicting how Google has been using Deep Learning in their projects.

As we can see,there were almost no listed directories in 2012,but there are close to 5000 listed directories by the end of Q4 in 2016. As of today,Google uses Deep Learning in all of its services, and has invested around $3.9 billion in the development of AI.

This makes a very strong case for Deep Learning and AI to be a reality which is going to dominate over the years to come.

Deep Learning-Basics

Now let’s get into some technical details.

The whole idea of Deep Learning was to eliminate the need for Feature Engineering. Feature Engineering refers to determining what data points or features to look at when trying to infer a pattern from the given data. This approach works for simple datasets with a few parameters to be considered,but does not scale to most of the real world complex datasets which require in depth analysis.

Feature Engineering eliminated in Deep Learning

Deep Learning handles this problem quite elegantly,as the learning algorithm learns what to look at on its own and accounts for features in that way too.This reduces the amount of pre-processing time considerably,as there is no need to perform feature extraction. This particular feature of deep learning has enabled solutions for many complex problems.

The Neuron

Since Deep Learning models the working of the human brain,it’s only natural that the simplest units of a deep learning model mimic our neurons.

These artificial neurons are trained to filter and detect specific specific features(patterns) by receiving a weighted input,transforming it using an activation function(usually non linear to accommodate complex data) and passing it to the outgoing connections leading to the output.

Biological Neuron vs Artificial Neuron

Architecture

Image classifier

The above image represents an artificial neural network model developed to classify input images where the layers of the image classifier are mapped to the actual regions of our brain responsible to perform this task. The input image gets transformed and weighted at each stage and the output of the final layer is the prediction.

As we can see,a deep neural network consists of a hierarchy of layers.

Each layer transforms the input data into a more abstract form.This helps the model isolate the relevant data points to look at and discard the features which do not affect the output of classification.

The output layer combines the transformations of all previous layers and provides a net result or prediction.

Simple Neural Network

The previous image showed a deep neural network,but the figure on the left shows a much much simpler implementation.

The general architecture of a simple artificial neural network includes one input layer,one output layer and one or more hidden layers in between.These middle layers are called hidden because we cannot see the incoming and outgoing weights and feature matrices.We are only able to look at the input and the prediction out of the output layer.Each of these layers is made up of singular units called nodes or neurons.

There is a progressive extraction of higher level features as the data moves through the network,and the final output layer makes a decision about what exactly the input is trying to show us. Higher the number of layers,higher the feature learning capacity.However,increasing the number of layers is not always the answer to get a better prediction. The architecture of such neural networks depends on its application, and the data being fed into it.

Training Process

Now we need to look at how the model learns.

The learning is done by the process of generating an error signal which measures the difference between actual and predicted data.This error signal is used to correct the values of parameters(weights)to increase accuracy of prediction.

The input data is propagated in a forward direction,but the errors are corrected and propagated in a backward direction.

One cycle of forward propagation of input and backward propagation of error is called one epoch. The model learns by executing multiple epochs and improving errors and predictions.

A function called the cost function exists to measure the error.

Generative Adversarial Networks(GANs)

A very interesting application of Deep Learning is in the use of Generative Adversarial Networks.

GANs are essentially a class of algorithms implemented by a system of two neural network models-the Generator and the Discriminator,contesting against each other in a zero-sum game framework.

Working of GANs

GANs work in an adversarial setting.There are two main components-a Generator and a Discriminator.

The Generator takes random noise as input and tries to generate a data sample. The Generator G(z) above takes input z as a sample from the probability distribution p(z). It then generates data to be sent to the Discriminator D(z).The Generator in theory tries it’s best to fool he Discriminator.

The job of the Discriminator is to try and predict whether the generated data is authentic or fake(generated). It takes an input x from pdata(x) where pdata(x) is our real data distribution. D(x) then solves this classification problem using a sigmoid function giving output in the range of 0 to 1,where 0 represents completely fake and 1 represents completely authentic.

The Generator tries it’s best to trick the Discriminator,and the Discriminator tries it’s best to catch this deceit.This continuous game leads to some really interesting results,and this is the reason why GANs are one of the most exciting and promising new ideas in the field of Deep Learning.

The most exciting thing about GANs is that we can now generate data,as opposed to just visualizing and inferring from it.

Applications of GANs

GANs can be applied to solve many interesting problems,namely image retrieval for historical archives,translation of text to images and generation of datasets,an extremely important application could be in the field of medicine.

Drug Discovery

GANs are applied usually to image or video data.However,some researchers from Insilico Medicine have proposed an approach to drug discovery using GANs.

Their proposed approach goes something like this.The Generator is first trained with samples of drug candidates for a specific disease to existing drugs from an existing database.

After training the Generator to look at drug candidates with all possible drugs,the Generator should output a drug which it feels would pass off as a cure to the disease being sampled,and then the Discriminator can be used to determine whether this drug is authentic or fake,that is,whether it actually cures the specific disease or not.

With this approach,we can,in theory,generate drugs for previously incurable diseases.

Model Success Criteria

There are a few rules of thumb to ensure that any model you try to develop becomes successful.They are:

1.Large dataset of good quality : A wise man once said,”A model is only as good as the data it can see”. This is extremely noteworthy, as a well structured dataset can be the difference between an accurate prediction and a completely inaccurate one.

2.Measurable and describable goals : We must be able to define some cost function to optimize. This cost function should be a reflection of the goal we want to reach. If the wrong cost function is chosen, it goes without saying that the prediction is bound to be inaccurate.

3.Enough Computing Power : Deep Learning algorithms involve a large number of complex matrix operations and calculations. This is extremely taxing on the computer resources, and adequate GPU and CPU power is needed to effectively train and test models.

4.Specific tasks : Deep Learning will work brilliantly for specific tasks which involve fundamental units (eg : pixels,words) which have very little meaning on their own, but start to give a useful meaning when combined in some fashion.

Tools in use-All Open Source

There are many tools used by developers to implement Deep Learning algorithms, and almost all of them are open source-making it extremely accessible to anyone interested in trying to develop their own models for any purpose. Given below is a chart of all tools being used currently,along with popularity ratings:

Open Source Tools

Takeaways

The key takeaways from this post are as follows:
1.We have machines today that are capable of learning to represent the world from experience.

2.Deep Learning isn’t magic!It’s just statistics in a black box,but it’s extremely effective at learning patterns.

3.We still haven’t figured out how to emulate creativity and human empathy.

4.Transitioning from research to consumer products will make the tools we use everyday work better,faster and smarter.

Source: Deep Learning on Medium