Source: Deep Learning on Medium

Welcome to the ** Self-driving car course** part 9. This blog course will introduce us to the world of self-driving cars, how do self-driving cars work, self-driving cars pros and cons, what are self-driving cars companies

**What we learned ???**

In part 1 of this section, we implemented concepts of graphs, forward propagation, Learning and loss, and linear transformation in our miniflow script.

**What’s coming up**

In this lab, you’ll continue to build Miniflow which is our own version of TensorFlow!

- Sigmoid function
- Cost
- Gradient descent
- Backpropagation
- Stochastic gradient descent

### Sigmoid Function

Neural networks take advantage of alternating transforms and activation functions to better categorize outputs. The sigmoid function is among the most common activation functions.

Linear transforms are great for simply *shifting* values, but neural networks often require a more nuanced transform. For instance, one of the original designs for an artificial neuron, the perceptron, exhibits binary output behavior. Perceptrons compare a weighted input to a threshold. When the weighted input exceeds the threshold, the perceptron is **activated** and outputs 1, otherwise, it outputs 0.

You could model a perceptron’s behavior as a step function:

Activation, the idea of binary output behavior, generally makes sense for classification problems. For example, if you ask the network to hypothesize if a handwritten image is a ‘9’, you’re effectively asking for a binary output — *yes*, this is a ‘9’, or *no*, this is not a ‘9’. A step function is the starkest form of a binary output, which is great, but step functions are not continuous and not differentiable, which is *very bad*. Differentiation is what makes gradient descent possible.

The sigmoid function, Equation (3) above, replaces thresholding with a beautiful S-shaped curve (also shown above) that mimics the activation behavior of a perceptron while being differentiable. As a bonus, the sigmoid function has a very simple derivative that that can be calculated from the sigmoid function itself, as shown in Equation (4) below.

Notice that the sigmoid function only has one parameter. Remember that sigmoid is an *activation* function (*non-linearity*), meaning it takes a single input and performs a mathematical operation on it.

Conceptually, the sigmoid function makes decisions. When given weighted features from some data, it indicates whether or not the features contribute to a classification. In that way, a sigmoid activation works well following a linear transformation. As it stands right now with random weights and bias, the sigmoid node’s output is also random. The process of learning through backpropagation and gradient descent, which you will implement soon, modifies the weights and bias such that activation of the sigmoid node begins to match expected outputs.

Now that I’ve given you the equation for the sigmoid function, I want you to add it to the Miniflow library. To do so, you’ll want to use np.exp(documentation) to make your life much easier.

You’ll be using Sigmoid in conjunction with Linear. Here’s how it should look:

`class Sigmoid(Node):`

def __init__(self, node):

Node.__init__(self, [node])

` def _sigmoid(self, x):`

"""

This method is separate from `forward` because it

will be used with `backward` as well.

` `x`: A numpy array-like object.`

"""

return 1. / (1. + np.exp(-x)) # the `.` ensures that `1` is a float

` def forward(self):`

input_value = self.inbound_nodes[0].value

self.value = self._sigmoid(input_value)

It may have seemed strange that **_sigmoid** was a separate method. As seen in the derivative of the sigmoid function, Equation (4), the sigmoid function is actually *a part of its own derivative*. Keeping **_sigmoid** separate means you won’t have to implement it twice for forward and backward propagations.

This is exciting! At this point, you have used weights and biases to compute outputs. And you’ve used an activation function to categorize the output. As you may recall, neural networks improve the **accuracy** of their outputs by modifying weights and biases in response to training against labeled datasets.

**Cost**

There are many techniques for defining the accuracy of a neural network, all of which center on the network’s ability to produce values that come as close as possible to known correct values. People use different names for this accuracy measurement, often terming it **loss** or **cost**. I’ll use the term *cost* most often.

Next, you will calculate the cost using the mean squared error (MSE). It looks like so:

Full article available here