Original article was published on Deep Learning on Medium
Beginner’s Guide to Deep Learning Concepts
Learning through experience , memorizing the things learnt are the skills which is taken care by our brain… So does anyone thought whether a machine can think like us, learn like us ? Yes , Machines can think like us and more ever can think more than a human , learn like us by using some algorithms. This phenomenon is called “Machine Learning”.
Deep Learning is the subset of Machine Learning and Machine Learning is the subset of AI. Basically Deep Learning can be known as the improvement to Machine Learning.
Machine Learning : Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
Deep Learning : Deep learning is concept that imitates the working of the human brain in processing data , for use in decision making and for making predictions , for image recognition and so on. Here comes the concept of neurons that are the main part of the brain for transmitting data and Neural Networks that we will be seeing further going into the course.
Before going to learn Deep Learning first we need to know about “Linear Regression” and “Logistic Regression” which will be useful.
Linear Regression : Linear Regression is a Machine Learning algorithm which performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). The equation is of the form “ y = ax + b “.
This Linear regression places a separation line in between 2 sets of data points to distinguish it into its respective classes.
Logistic Regression : Logistic Regression is a supervised learning classification algorithm used to predict the probability of a target variable. It is a statistical method for analyzing where there are one or more independent variables that determine an outcome. The Linear transformation (ax + b) is passed through activation function say sigmoid and the output is given from sigmoid activation function.
Linear Regression vs Logistic Regression :
Linear Regression doesn’t use any activation function, but that is not the case in Logistic Regression where the linear transformation (ax + b) passes through the activation function sigmoid and the output is generated.
Now we will be going into the main concepts of Deep Learning which includes neurons, neural networks, weights , bias , Cost Function ,Back propagation, Hyper parameters, gradient descent and so on.
1) Neurons: Neurons are one of the main blocks of neural networks which are nothing but a unit that carry information and pass it to another layer.In neural networks the input data is passed to the neurons which is the start of neural networks.
Here in the above architecture, input layers, hidden layers have some nodes which are nothing but the neurons which take the data and pass on further.
2) Neural Network : A Neural Network is nothing but a collection of neurons and hidden Layers.
3) Bias : Bias is an additional parameter along with weight in the Neural Network which is used to adjust the output along with the weighted sum of the inputs ( X * W) to the neuron.Therefore Bias is a constant which helps the model in a way that it can fit best for the given data.
4) Weights : Weight is the parameter in a neural network that transforms input data. It is usually represented by ‘W’. There is no specific notation but generally we use the variable ‘W’. We multiply the weight with the input data and add bias to it. Let ‘X’ be the input data and ‘W’ be the weight and ‘b’ be the bias so the data that will go to the hidden layer will be “activation function(X * W + b)” where the activation function can be anything be tanh, sigmoid so on. Basically we assign a random number for the weights and bias at the starting and periodically we improve the weights through special concept called Back propagation which we will be learning further.
In Neural Network, the Training of the model is done by using concept called Back Propagation.
If you are familiar with machine learning and training the data using a model , we just call the method of that model api like “train” which uses back propagation. If you get a chance to view the model api code you will find the back propagation code in training method.
5) Cost Function : Cost Function is a measure of how good a neural network is with respect to it’s given training sample and the expected output. It is the sum of individual errors.There are various formulas for finding out Cost Function. Some of them are:
Cost = True Labels * log(Predicted Labels)
Loss Likelihood(Cost) = sum(True Labels * log(Predicted Labels) -(1 -True Labels ) * log(1 -Predicted Labels)
6) Back Propagation : As i have already made clear that the training in neural networks is back propagation, we will be discussing the main concept of back propagation.
Back Propagation is the concept of training the model where the randomly assigned weights and the bias are adjusted such that the cost will get be decreasing to minimum using the derivatives of weights and bias.When everything in the code is correct and the hyper parameters you have choosen is perfect for your network the slope of the cost will be like the below figure.
7) Hyper Parameters: Hyper Parameters are divided basically into 4
i) Learning Rate : Learning Rate is one of the hyper-parameter that controls how much we are adjusting the weights of our network.Learning Rate is used for back propagation where the learning rate is multiplied by the derivation of weights and the bias. (w += Learning rate * derivative_w , b += Learning rate * derivative_b).Basically for best performance of the network, we need to choose the learning rate to be very small.
ii) Regularization : One of the problem while fitting the data is over fitting, This problem can be avoided by using regularization.
iii) Hidden Layers : A hidden Layer in an neural network is a layer in between input layers and output layers, where neurons take in a set of weighted inputs and produce an output through an activation function.
iv) Activation Function : An Activation Function is non Linear Transformation which has its own set of algorithm to generate the output.The Linear Transformation ie. weighted sum along with bias is passed through the Activation Function for generating the output without it will not be efficient to calculate such high complex mappings mathematically.Some of the Activation Functions used in neural networks are Tanh, Sigmoid, ReLu, Softmax.
Sigmoid Function Formula : 1 / (1 + e^-x)
Softmax Function Formula : a = exp(x), a / a.sum(axis = 1)
where x is the data lets say weighted sum + bias.
ReLu(Rectified Linear Unit)Softmax Function : y = max(0, x)
Tanh(hyperbolic tangent) Activation Function : y = tanh(x) -> (e^x -e^-x) / (e^x +e^-x)
we adjust the weights and bias for back propagation in the following manner
For updating w2 which is the weight from hidden layer to the output layer:
w2 += Learning Rate * Z.T.dot(Target -predicted)
For updating b2 which is the bias from hidden layer to the output layer:
b2 += Learning Rate * (Target -predicted).sum()
For updating w1 which is the weight from input layer to hidden layer:
For Sigmoid Activation Function: dZ = numpy.outer(Target -predicted, w2) * Z * (1 — Z)
For Tanh Activation Function : dZ = numpy.outer(Target -predicted, w2) * (1 — Z * Z)
For ReLu Activation Function : dZ = (Target-predicted).dot(w2.T) * Z *(1 — Z)
dz is the error at hidden nodes. So the final weight updation formula would be:
w1 += Learning Rate * X.T.dot(dZ)
For updating b1 which is the bias from input layer to hidden layer:
b1’s dZ will be the same as that of w1’s dZ with its respective Activation Function.
So the final bias updation formula would be:
b1 += Learning Rate * dZ.sum(axis = 0)
where where Z is the output of the neuron at the hidden layer , Target is the actual labels and the predicted is the predicted label.
These Updation of weights and bias are nothing but “gradient descent”.
So the question which comes into mind is that how to choose the Hyper parameters? If you ask me whether there any particular way of finding the hyper parameters, My answer would me no… Based on the input and the type of classification you are doing you must do a try and error method for learning rate, hidden layers.So for which configuration of hyper parameters you get a perfect accuracy of the network and the best classification rate , those configurations would be the best..
So for Binary Classification, we use “Sigmoid Activation function” and for multiclass Classification, we use “Softmax Activation Function”.
As I have already said that the Learning Rate should be small, if the Learning Rate is high then the cost function would go to ‘INF’ or ‘NaN’.
Some of the popularly used Neural Network Architectures are:
- Convolutional Neural Networks(CNN)
- Recurrent Neural Networks(RNN)
- Long Short Term Memory(LSTM)
- Boltzmann Machine
- Deep Belief Networks
So if you want to know how these neural networks work you need to practice and code. Unless you practice it will be difficult for you to understand.So I have some list of free datasets for you to practice.
You can also see how these neural networks work with different hyper parameters by going to “google’s neural network playground” in your browser. It is a visualization of neural networks.
FINAL NOTES :
I made the concepts in a simple terms so that it wont get complicated and it will be easier to understand and you would get interest in learning.I hope I contributed some knowledge and you have learnt something…! If Yes I am very happy about publishing this article. Any sort of feedback is appreciated. Thanks folks !!!