Source: Deep Learning on Medium
Welcome to the Self-driving car course part 9. This blog course will introduce us to the world of self-driving cars, how do self-driving cars work, self-driving cars pros and cons, what are self-driving cars companies
What we learned ???
In part 1 of this section, we implemented concepts of graphs, forward propagation, Learning and loss, and linear transformation in our miniflow script.
What’s coming up
In this lab, you’ll continue to build Miniflow which is our own version of TensorFlow!
- Sigmoid function
- Gradient descent
- Stochastic gradient descent
Neural networks take advantage of alternating transforms and activation functions to better categorize outputs. The sigmoid function is among the most common activation functions.
Linear transforms are great for simply shifting values, but neural networks often require a more nuanced transform. For instance, one of the original designs for an artificial neuron, the perceptron, exhibits binary output behavior. Perceptrons compare a weighted input to a threshold. When the weighted input exceeds the threshold, the perceptron is activated and outputs 1, otherwise, it outputs 0.
You could model a perceptron’s behavior as a step function:
Activation, the idea of binary output behavior, generally makes sense for classification problems. For example, if you ask the network to hypothesize if a handwritten image is a ‘9’, you’re effectively asking for a binary output — yes, this is a ‘9’, or no, this is not a ‘9’. A step function is the starkest form of a binary output, which is great, but step functions are not continuous and not differentiable, which is very bad. Differentiation is what makes gradient descent possible.
The sigmoid function, Equation (3) above, replaces thresholding with a beautiful S-shaped curve (also shown above) that mimics the activation behavior of a perceptron while being differentiable. As a bonus, the sigmoid function has a very simple derivative that that can be calculated from the sigmoid function itself, as shown in Equation (4) below.
Notice that the sigmoid function only has one parameter. Remember that sigmoid is an activation function (non-linearity), meaning it takes a single input and performs a mathematical operation on it.
Conceptually, the sigmoid function makes decisions. When given weighted features from some data, it indicates whether or not the features contribute to a classification. In that way, a sigmoid activation works well following a linear transformation. As it stands right now with random weights and bias, the sigmoid node’s output is also random. The process of learning through backpropagation and gradient descent, which you will implement soon, modifies the weights and bias such that activation of the sigmoid node begins to match expected outputs.
Now that I’ve given you the equation for the sigmoid function, I want you to add it to the Miniflow library. To do so, you’ll want to use np.exp(documentation) to make your life much easier.
You’ll be using Sigmoid in conjunction with Linear. Here’s how it should look:
def __init__(self, node):
def _sigmoid(self, x):
This method is separate from `forward` because it
will be used with `backward` as well.
`x`: A numpy array-like object.
return 1. / (1. + np.exp(-x)) # the `.` ensures that `1` is a float
input_value = self.inbound_nodes.value
self.value = self._sigmoid(input_value)
It may have seemed strange that _sigmoid was a separate method. As seen in the derivative of the sigmoid function, Equation (4), the sigmoid function is actually a part of its own derivative. Keeping _sigmoid separate means you won’t have to implement it twice for forward and backward propagations.
This is exciting! At this point, you have used weights and biases to compute outputs. And you’ve used an activation function to categorize the output. As you may recall, neural networks improve the accuracy of their outputs by modifying weights and biases in response to training against labeled datasets.
There are many techniques for defining the accuracy of a neural network, all of which center on the network’s ability to produce values that come as close as possible to known correct values. People use different names for this accuracy measurement, often terming it loss or cost. I’ll use the term cost most often.
Next, you will calculate the cost using the mean squared error (MSE). It looks like so:
Full article available here