Source: Deep Learning on Medium
Neural Network All Fundamentals
I understand it now and you should too — !!!
Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated.
It is composed of four principal objects:
- Layers: all the learning occurs in the layers. There are 3 layers 1) Input 2) Hidden and 3) Output
- feature and label: Input data to the network(features) and output from the network (labels)
- loss function: Metric used to estimate the performance of the learning phase
- optimizer: Improve the learning by updating the knowledge in the network
let’s get deeper to understand how things work:
They are used to connect each neuron in one layer to every neuron in the next layer. Weight determines the strength of the connection between the neurons. If we increase the input then how much influence does it have on the output. Weights near-zero mean changing this input will not change the output. Many algorithms will automatically set those weights to zero in order to simplify the network.
Bias in Machine Learning is defined as the phenomena of observing results that are systematically prejudiced due to faulty assumptions. However, without assumptions, an algorithm would have no better performance on a task than if the result was chosen at random, a principle which was formalized by Wolpert in 1996 into what we call the No Free Lunch theorem.
A layer is where all the learning takes place. Inside a layer, there is an infinite amount of weights (neurons). A typical neural network is often processed by densely connected layers (also called fully connected layers). It means all the inputs are connected to the output.
A typical neural network takes a vector of input and a scalar that contains the labels. The most comfortable set up is a binary classification with only two classes: 0 and 1.
Features and Labels
A feature is a piece of information that you draw from the data and the label is the tag you want to assign to the input based on the features you draw from it. Features help in assigning a label. Thus, the better the features the more accurately will you be able to assign a label to the input.
Activation functions are mathematical equations that determine the output of a neural network. The function is attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction. Activation functions also help normalize the output of each neuron to a range between 1 and 0 or between -1 and 1.
Some Examples of Activation Function:
A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameters accomplish the task the network is intended to do.
Optimization algorithms helps us to minimize (or maximize) an Objective function (another name for Error function) E(x) which is simply a mathematical function dependent on the Model’s internal learnable parameters which are used in computing the target values(Y) from the set of predictors(X) used in the model. For example — we call the Weights(W) and the Bias(b) values of the neural network as its internal learnable parameters which are used in computing the output values and are learned and updated in the direction of optimal solution i.e minimizing the Loss by the network’s training process and also play a major role in the training process of the Neural Network Model.
The input X provides the initial information that then propagates to the hidden units at each layer and finally produces the output y^. The architecture of the network entails determining its depth, width, and activation functions used on each layer. Depth is the number of hidden layers. Width is the number of units (nodes) on each hidden layer since we control neither the input layer nor output layer dimensions. There are quite a few sets of activation functions such as Rectified Linear Unit, Sigmoid, Hyperbolic tangent, etc. Research has proven that deeper networks outperform networks with more hidden units. Therefore, it’s always better and won’t hurt to train a deeper network (with diminishing returns).
Allows the information to go back from the cost backward through the network in order to compute the gradient. Therefore, loop over the nodes starting at the final node in reverse topological order to compute the derivative of the final node output with respect to each edge’s node tail. Doing so will help us know who is responsible for the most error and change the parameters in that direction. The following derivatives’ formulas will help us write the back-propagate functions: Since b^l is always a vector, the sum would be across rows (since each column is an example).
I hope I was able to clarify it a little to you kNN it is one of the basic Algorithms, I will be applying a lot of more explanation of algorithms because why not 🙂