Source: Deep Learning on Medium
An Artificial Neural Network (ANN) is made of many interconnected neurons:
Each neuron takes in some floating point numbers (e.g. 1.0, 0.5, -1.0) and multiplies them by some other floating point numbers (e.g. 0.7, 0.6, 1.4) known as weights (1.0 * 0.7 = 0.7, 0.5 * 0.6 = 0.3, -1.0 * 1.4 = -1.4). The weights act as a mechanism to focus on, or ignore, certain inputs. The weighted inputs then get summed together (e.g. 0.7 + 0.3 + -1.4 = -0.4) along with a bias value (e.g. -0.4 + -0.1 = -0.5).
The summed value (x) is now transformed into an output value (y) according to the neuron’s activation function (y = f(x)). Some popular activation functions are shown below:
e.g. -0.5 → -0.05 if we use the Leaky Rectified Linear Unit (Leaky ReLU) activation function: y = f(x) = f(-0.5) = max(0.1*-0.5, -0.5) = max(-0.05, -0.5) = -0.05
The neuron’s output value (e.g. -0.05) is often an input for another neuron.
However, one of the first ANNs was known as the perceptron and it consisted of only a single neuron.
The output of the perceptron’s (only) neuron acts as the final prediction.
Lets code our own Perceptron:
import numpy as np
def __init__(self, n_inputs, bias = 0., weights = None):
self.b = bias
if weights: self.ws = np.array(weights)
else: self.ws = np.random.rand(n_inputs)
def __call__(self, xs):
return self._f(xs @ self.ws + self.b)
def _f(self, x):
return max(x*.1, x)
(Note: we have not included any learning algorithm in our example above — we shall cover learning algorithms in another tutorial)
perceptron = Neuron(n_inputs = 3, bias = -0.1, weights = [0.7, 0.6, 1.4])
perceptron([1.0, 0.5, -1.0])
So why do we need so many neurons in an ANN if any one will suffice (as a classifier)?
Unfortunately, individual neurons are unable to classify non-linearly separable data because they can only ever learn a linear decision boundary.
However, by combining neurons together, we essentially combine their decision boundaries. Therefore, an ANN composed of many neurons is able to learn non-linear decision boundaries.
Neurons are connected together according to a specific network architecture. Though there are different architectures, nearly all of them contain layers. (NB: Neurons in the same layer do not connect with one another)
There is typically an input layer (containing a number of neurons equal to the number of input features in the data), an output layer (containing a number of neurons equal to the number of classes) and a hidden layer (containing any number of neurons).
There can be more than one hidden layer to allow the neural net to learn more complex decision boundaries (Any neural net with more than one hidden layer is considered a deep neural net).
Lets build a deep NN to paint this picture:
Lets download the image and load its pixels into an array
from PIL import Image
image = Image.open('twitter-logo.jpg?w=100')
import numpy as np
image_array = np.asarray(image)
Now teaching our ANN to paint is a supervised learning task, so we need to create a labelled training set (Our training data will have inputs and expected output labels for each input). The training inputs will have 2 values (the x,y coordinates of each pixel) and the training outputs will have 3 values (the r,g,b colour values for each pixel). Our ANN will essentially learn to associate certain colours to certain regions of the picture.
training_inputs,training_outputs = ,
for row,rgbs in enumerate(image_array):
for column,rgb in enumerate(rgbs):
r,g,b = rgb
Now lets create our ANN:
- It should have 2 neurons in the input layer (since there are 2 values to take in: x & y coordinates).
- It should have 3 neurons in the output layer (since there are 3 values to learn: r, g, b).
- The number of hidden layers and the number of neurons in each hidden layer are two hyperparameters to experiment with (as well as the number of epochs we will train it for, the activation function, etc) — I’ll use 10 hidden layers with 100 neurons in each hidden layer (making this a deep neural network)
from sklearn.neural_network import MLPRegressor
ann = MLPRegressor(hidden_layer_sizes= tuple(100 for _ in range(10)))
The trained network can now predict the normalised rgb values for any coordinates (e.g. x,y = 1,1).
array([[0.95479563, 0.95626562, 0.97069882]])
lets use the ANN to predict the rgb values for every coordinate and lets display the predicted rgb values for the entire image to see how well it (qualitatively)
predicted_outputs = ann.predict(training_inputs)
predicted_image_array = np.zeros_like(image_array)
i = 0
for row,rgbs in enumerate(predicted_image_array):
for column in range(len(rgbs)):
r,g,b = predicted_outputs[i]
predicted_image_array[row][column] = [r*255,g*255,b*255]
i += 1
Try changing the hyperparameters to get better results. Enjoy!
You can find all the code in this tutorial on a notebook which can be run via Google Colab here: https://colab.research.google.com/github/mohammedterry/ANNs/blob/master/ML_ANN.ipynb