Neural Networks in Deep Learning and Image Classification

Original article was published on Deep Learning on Medium


Neural Networks in Deep Learning and Image Classification

The term neural network for machine learning is inspired by the biological neuron. In the diagram above there are labeled hidden layers and a final output layer. In a neural network, there can be many more hidden layers.

Each node consists of a ‘cell’ with a corresponding ‘nerve’ that connects it with another node in the next layer. This nerve or connection can have a ‘weight’ associated with it. Each node has a specific value associated with it once there is an output of an image.

The first time the computer sees the image, when we are training it for example, it sends pixel data thru the neural network and outputs the image in the language of neural networks. In our case, a number recognition system in black and white, we train the machine to recognize the image to be a four(4).

The flow of the neural network is characterized by a set of weights times the input of a ReLu factor to reach the next hidden layer which in turn gives it another set of values and weights and to the next layer which is the preceding layer’s values weights and ReLu. This goes on continually until you reach the output layer.

The mathematical process and flow of these calculations is know as “forward propagation.”

Once it reaches the output layer we associate a loss function with the output answer. The loss function needs to be minimized because the minimized loss function gives us the best classification of the number. This process of finding the minimal loss function is called gradient descent. The minimal loss should be where the slope of the line that lies tangent to the curve at that point is 0.

When the loss function is realized, the output sends data feedback to the previous layer, which in turn will adjust its factors and weights to better approximate the classification of the image. This adjustment goes on to the next layer where factors and weights are readjusted according to the loss function. This happens across layers until it reaches the input layer. The entire flow of adjustments in factors and weights from output layer to input layer across the hidden layers is called “back propagation.” The neural network then does forward propagation again which in turn has another loss function at the end where gradient descent is used to find a local minima and back propagation takes place again, followed by forward propagation, a loss function and back propagation and the cycle continues. Each step in forward and back propagation and the sue of the loss function, the neural network adjusts (changes) its factors and weights in order to adjust the loss function. The lower the loss function the more accurate our model becomes in its identification.

We can do this for the numbers 0–9. Black and white is simpler because the pixel data does not need to code for color but if there was color the pixel data would be stacked upon each other in groups of three where it becomes a tensor. The neural network would be harder to train as well as manage to the inexperienced trainer.

During training thousand of images of handwritten numbers are trained in our neural network and forward and back propagation to tune the factors and weights until the processed data becomes accurate and our model has “learned.”

In many instances, this classic example of image recognition achieves an accuracy rate of 96%. The state of the art model can identify numbers up to 99.5%.

This neural network however only does what you trained it to do. It has no sense what the markings mean as in real language to humans. It can not draw the number as well and only classifies what you define the image to be.

In closing, image recognition really started becoming popular and fruitful in the 1980’s. It is now an established aspect of machine learning and AI. Its application are in the field of medical diagnostics and has gotten far in facial recognition. The future looks bright for this technology.