Original article was published by Bhanu Soni on Artificial Intelligence on Medium

# The TensorFlow Way(Part 2)

## Loss Functions

# Implementing Loss Functions

Loss functions are very important to machine learning algorithms. They measure the distance between the model outputs and the target (truth) values.

## Getting ready…

In order to optimize our machine learning algorithms, we will need to evaluate the outcomes. Evaluating outcomes in TensorFlow depends on specifying a loss function. **A loss function tells TensorFlow how good or bad the predictions are compared to the desired result.** In most cases, we will have a set of data and a target on which to train our algorithm.** The loss function compares the target to the prediction and gives a numerical distance between the two.**

**How to do it…**

## Loss functions for regression

That is, predicting a continuous dependent variable. To start, we will create a sequence of our predictions and a target as a tensor. We will output the results across 500 x-values between -1 and 1.

`import matplotlib.pyplot as plt`

import tensorflow as tf

x_vals = tf.linspace(-1., 1., 500)

target = tf.constant(0.)

**L2 norm loss**

The** L2 norm loss is also known as the Euclidean loss function**. It is just the square of the distance to the target. Here we will compute the loss function as if the target is zero. The L2 norm is a great loss function because it is very curved near the target and algorithms can use this fact to converge to the target more slowly, the

closer it gets.

l2_y_vals = tf.square(target - x_vals)

l2_y_out = sess.run(l2_y_vals)TensorFlow has a built -in form of the L2 norm, called nn.l2_loss().

This function is actually half the L2-norm above. In other words, it is same as previously but divided by 2.

## L1 norm loss

The **L1 norm loss is also known as the absolute loss function**. Instead of squaring the difference, we take the absolute value. The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values. **One issue to be aware of is that the L1 norm is not smooth at the target and this can result in algorithms not converging well**. It appears as follows:

`l1_y_vals = tf.abs(target — x_vals)`

l1_y_out = sess.run(l1_y_vals)

## Pseudo-Huber

**Pseudo-Huber loss** is a continuous and smooth approximation to the Huber loss function. This loss function attempts to take the best of the L1 and L2 norms by being **convex near the target and less steep for extreme values**. The form depends on an extra parameter, delta, which dictates how steep it will be. We will plot two forms, delta1 = 0.25 and delta2 = 5 to show the difference,

delta1 = tf.constant(0.25)phuber1_y_vals = tf.mul(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_vals)/delta1)) - 1.)phuber1_y_out = sess.run(phuber1_y_vals)delta2 = tf.constant(5.)phuber2_y_vals = tf.mul(tf.square(delta2), tf.sqrt(1. +

tf.square((target - x_vals)/delta2)) - 1.)phuber2_y_out = sess.run(phuber2_y_vals)

## Classification loss

Classification loss functions are used to evaluate loss when predicting categorical outcomes.

redefine our predictions ( x_vals ) and target. We will save the outputs and plot them in the next section. Use the following:

`x_vals = tf.linspace(-3., 5., 500)`

target = tf.constant(1.)

targets = tf.fill([500,], 1.)

## Hinge loss

**Hinge loss** is mostly used for support vector machines but can be used in neural networks as well. It is meant to **compute a loss between two target classes**, 1 and -1. In the following code, we are using the target value 1, so the as closer our predictions as near are to 1, the lower the loss value:

`hinge_y_vals = tf.maximum(0., 1. — tf.mul(target, x_vals))`

hinge_y_out = sess.run(hinge_y_vals)

## Cross-entropy loss

**Cross-entropy loss** for a binary case is also sometimes **referred to as the logistic loss function.** It comes about when we are predicting the two classes 0 or 1. We wish to measure a distance from the actual class ( 0 or 1 ) to the predicted value, which is usually a real number between 0 and 1. To measure this distance, we can use the cross-entropy formula from information theory, as follows:

`xentropy_y_vals = - tf.mul(target, tf.log(x_vals)) - tf.mul((1. -`

target), tf.log(1. - x_vals))

xentropy_y_out = sess.run(xentropy_y_vals)

## Sigmoid cross-entropy loss

**Sigmoid cross-entropy loss **is very similar to the previous loss function

except we transform the x-values by the sigmoid function before we put them in the cross-entropy loss, as follows:

`xentropy_sigmoid_y_vals = tf.nn.sigmoid_cross_entropy_with_`

logits(x_vals, targets)

xentropy_sigmoid_y_out = sess.run(xentropy_sigmoid_y_vals)

## Weighted cross-entropy loss

Weighted cross-entropy loss is a weighted version of the sigmoid cross-entropy loss. We provide weight on the positive target. For example, we will weight the positive target by 0.5, as follows

`weight = tf.constant(0.5)`

xentropy_weighted_y_vals = tf.nn.weighted_cross_entropy_with_

logits(x_vals, targets, weight)

xentropy_weighted_y_out = sess.run(xentropy_weighted_y_vals)

## Softmax cross-entropy loss

**Softmax cross-entropy loss **operates on non-normalized outputs. This function is used to** measure a loss when there is only one target category instead of multiple**. Because of this, the function transforms the outputs into a probability distribution via the softmax function and then computes the loss function from a true probability distribution, as follows

`unscaled_logits = tf.constant([[1., -3., 10.]])`

target_dist = tf.constant([[0.1, 0.02, 0.88]])

softmax_xentropy = tf.nn.softmax_cross_entropy_with_

logits(unscaled_logits, target_dist)

print(sess.run(softmax_xentropy))

[ 1.16012561]

## Sparse softmax cross-entropy loss

**Sparse softmax cross-entropy loss** is the same as previously, except instead

of the target being a probability distribution, it is an index of which category is true. Instead of a sparse all-zero target vector with one value of one, we just pass in the index of which category is the true value, as follows:

`unscaled_logits = tf.constant([[1., -3., 10.]])`

sparse_target_dist = tf.constant([2])

sparse_xentropy = tf.nn.sparse_softmax_cross_entropy_with_

logits(unscaled_logits, sparse_target_dist)

print(sess.run(sparse_xentropy))

[ 0.00012564]

# How it works…

Use matplotlib to plot the regression loss functions:

`x_array = sess.run(x_vals)`

plt.plot(x_array, l2_y_out, 'b-', label='L2 Loss')

plt.plot(x_array, l1_y_out, 'r--', label='L1 Loss')

plt.plot(x_array, phuber1_y_out, 'k-.', label='P-Huber Loss (0.25)')

plt.plot(x_array, phuber2_y_out, 'g:', label='P'-Huber Loss (5.0)')

plt.ylim(-0.2, 0.4)

plt.legend(loc='lower right', prop={'size': 11})

plt.show()

Use matplotlib to plot the various classification loss functions:

`x_array = sess.run(x_vals)`

plt.plot(x_array, hinge_y_out, 'b-', label='Hinge Loss')

plt.plot(x_array, xentropy_y_out, 'r--', label='Cross Entropy Loss')

plt.plot(x_array, xentropy_sigmoid_y_out, 'k-.', label='Cross Entropy

Sigmoid Loss')

plt.plot(x_array, xentropy_weighted_y_out, g:', label='Weighted Cross

Enropy Loss (x0.5)')

plt.ylim(-1.5, 3)

plt.legend(loc='lower right', prop={'size': 11})

plt.show()

# Summary

Table summarizing the different loss functions that we have described

*Most of the classification loss functions described here are for two-class two-class predictions. This can be extended to multiple classes via summing the cross-entropy terms over each prediction/target.*