Deep Learning: A Comprehensive Introduction

Source: Deep Learning on Medium

Deep Learning: A Comprehensive Introduction

Let’s Get an Intuition About Nowaday’s Most Exciting Science Field

https://criptotendencia.com/2019/12/04/deep-learning-es-utilizado-para-predecir-el-precio-de-bitcoin/

Deep Learning

Welcome you Data Enthusiast! Throughout this very first story, we will cover:

Deep Learning Introduction

  • From Machine Learning to Deep Learning
  • Historical context
  • Our workign environment: Google Colab
  • Gradient Descent

So, without further a do, let’s jump into it!

Deep Learning Introduction

From Machine Learning to Deep Learning

https://www.pinterest.es/pin/745134700823698580/

During this section we will answer the following questions:

  • ¿What is Deep Learning?
  • ¿What differentiates it from Machine Learning?
  • ¿And from Artificial Intelligence?

Artificial Intelligence

AI is about getting machines to perform tasks that require human intelligence. It can be divided into two fields:

  • General AI: it consists of giving machines all our capabilities and senses. For example, C-3PO or Terminator. But do not worry, we do not expect to e something similar to Skynet around in the short term.
  • Narrow AI: it consists of providing machines with the ability to perform a certain task, such as recognizing faces, traffic signs, speech, etc. It is in this field that great progress is currently being made.

Machine Learning

Machine Learning is a sub-field within Artificial Intelligence. Basically, it consists of using a large amount of data to extract useful information for people.

For example, imagine we have the latest results of the NBA and we want to be able to predict the outcome of the next game between the Lakers vs the Bucks.

  • First, we would need to parse and scrap the data, clean it up, remove incomplete entries, study the distribution of variables and choose the characteristics or attributes (feature engineering) that would allow us to predict this result as accurately as possible.
  • Once we were clear about the best characteristics to use, we would need to find the best possible algorithm for our dataset.

The feature engineering (selection of the features) part, which may seem simple at first, is the most complicated and time-consuming step and, moreover, requires a high degree of knowledge about the problem in question and about feature extraction techniques.

And this is where Deep Learning comes in.

Deep Learning

Deep Learning is a subfield of Machine Learning, and solves the above problem of feature selection. Deep Learning, also called hierarchical learning, achieves this by applying a learning method to recognize subtle patterns in the data, then building a mathematical model and a integrating it as a final classifier.

The magic is that we no longer need to go crazy looking for the best characteristics or attributes for each problem, but this is done automatically by our algorithm.

And this is the cause of the good and bad of Deep Learning, it enables the automatic learning of very complex tasks, but is a complete ‘black box’. In other words, we have no clue of what is going on inside the algorithms during the learning.

This is why although Deep Learning models are proven to be extremely powerful, they are not yet quite extended across the Industry. When pitching a Predictive Model output to he Head Officers of an Organization, it is key to be able to understand why the model is making such predictions.

Therefore, currently, Machine Learning algorithms such as Decision Trees and Random Forests are way more popular than Convolutional Neural Networks or GANS.

Finally, it is important to have a large amount of data to be able to use ML or DL techniques. Besides, the more quality the data has, the better our models will behave.

Let’s read what nowaday’s most iconic Deep Learning speaker and teacher has to say about it.

”I think AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket you need a huge engine and a lot of fuel. “

Andrew Ng (source: Wired)

And Why Deep Learning?

Deep Learning recieves its name because it uses an sequantial method of transforming the data, one transformation in each step (it goes Deep), until it comes up to differentiate or classify data as we want to.

https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

Historical Context

https://noeliagorod.com/2019/06/26/deep-learning-timeline/

Probably, you would think that DL and Neural Networks are a fad that have just came out. Nothing further from the truth! they have been around since a long time.

The first Artificial Neural Network (ANN) was built in 1958. The issue was that there wasn’t available enough computing power in order to be able to make relevant progress until the past few years, when high performing GPU’s became popular (thank you gamers 🙂 ).

The key milestones in Deep Learning, since its beginning are:

1950: Alan Turing creates the Turing Test
1952: Arthur Samuel creates the first program that learns game after game to play checkers
1956: Martin Minsky coined the term “Artificial Intelligence” to refer to this new field
1958: Frank Rosenblatt designs the Perceptron, the first artificial neural network.
1967: The field of pattern recognition is born with the appearance of the “Nearest Neighbor” algorithm.

The following period is known as the AI Winter, where the advances were so few and so far away in time that there was a big risk in the dissapearance of these field.

Some of the most relevant facts, until its reappearence in 2006 are:

1979: The Stanford Cart is created, a robot capable of automatically navigating a room by avoiding obstacles
1981: Gerald Dejong introduces “Explanation Based Learning”, the precursor of Machine Learning. The computer analyzed training data and created rules to discard less important data
1985: Terry Sejnowski invents NetTalk, an algorithm capable of learning to pronounce words as a child would
1990s: Changes the Machine Learning paradigm: from a knowledge-oriented to a data-oriented approach. They begin to draw conclusions from large amounts of data.
1997: DeepBlue defeats Kasparov for the first time.

Since the end of this period, there has been and incredible boom in the Machine Learning and Deep Learning fields.

2006: Deep architectures appear, coined as Deep Learning by Geoffrey Hinton
2011: Watson IBM beats its human competitors in Jeopardy contest
2012: Geoffrey Hinton wins ImageNet contest with deep neural network
2012: Google X creates GoogleBrain, capable of detecting cats in videos
2014: Facebook develops DeepFace, capable of recognizing human faces with an accuracy of 97.25%, only 0.28% below a human
2014: Google buys DeepMind, an English stertup that had created an algorithm capable of learning to play Atari games simply by watching the video
2015: Elon Musk and Sam Altman create OpenAI to promote good use of AI
2016: Google DeepMind wins for the first time in the game Go, getting creative moves

Google Colab

The environment that we will use along these series is Google Colab. We have chose it because of its easiness of use and accesibility. Anyone with a Google accoun can create a notebook and start using it.

Google Colab will be at the same time our notebook and our IDE, as it allows us to take advantage of the free GPUs NVIDIA K80 (wonderful isn’t it?), which will be where we will execute and train our models.

To use the GPU, what we need to do is go to Edit, in the toolbar, then Notebook Settings and then Select: Python 3 and GPU. We will save this settings and we are ready to code!

Gradient Descendent

The mechanism that enables a neural network to learn is the Gradient Descent. Neural networks are trained by updating a series of weights, and the more they train, the more precise these weights become and a better solution is obtained.

In other words, neural networks are a very ingenious and powerful method of optimizing functions. And they carry out this optimization through a technique called Gradient Descent, which is what has largely allowed us to train networks with millions of parameters and be able to obtain results in a reasonable time.

¿What is Gradient Descent?

Imagine that we are at the top of a mountain, we want to get down but there is impenetrable fog, so we are not able to see anything.

How would we go down the mountain?

https://wallpaperaccess.com/snowy-mountain

One solution would be to take a small step in each of the 4 possible directions (North, South, East and West), choose the one in which we descend more and take the step in that direction.

We could carry out this process again and again continuously and we would come to a point where we would find ourselves at the bottom.

The problem with this approach, is that it is tremendously inefficient, it would take a lot of time. A more efficient way would be to determine the slope at each point and always move in the direction of maximum slope.

This can be done by calculating the derivative. The derivative formula is:

The first derivative of a function measures how quickly a function changes, that is, how much it grows or decreases. So what we can do is calculate the slope for each point we get to continue lowering that slope to a minimum point.

This simple mechanism is the core of neural networks. In short, the gradient decrease is an optimization algorithm that allows you to minimize any function (as long as it is differentiable, that is, we can calculate its derivatives).

https://www.pinterest.es/pin/572520171380230872/

In the practice, gradient descent is used to minimize the cost function, or the error that the Neural Network obtains when comparing the predicited value with the actual one.

It does this by adjusting, in each iteration, the weights of the coefficients of the model a certainin amount, so the error is the minimum possible. Thanks to the current power of the GPU’s the algorithms are able to perform this task really fast and so, obtaining pretty good results in very short time.

Coming back to our mountain analogy, current technology is like a pair of skis, they allow us to not only get quickly to the bottom of the mountain, but also having a good time meanwhile.

There are several gradient descent variations (vanilla gradient decent, batch gradient decent …) but all are based in what we have just just explained.

Conclusion

This article has served as an introduction to Deep Learning. We have learned what is it, what differentiates it from Machine Learning and Artifical Intelligence and gained an intuition of how they learn.