Original article was published on Deep Learning on Medium
We used gradient descent as our optimization strategy for linear regression.
Consider a situation where you are walking along the graph below, and you are currently at the ‘black’ cross at the top. Your aim is to reach the minimum i.e the ‘green’ dot, but from your position, you are unable to view it.
Possible actions would be
You might go upward or downward.If you decide on which way to go, you might take a bigger step or a little step to reach your destination which is called as ‘Learning Rate’.
‘Wi’ is the initial weight of the plane and we take the derivative of the weight along with the error (how far are the points from the plane).
Negative sign : is to go towards the min error
Derivative : is the rate of change.
The Gradient is going to point to a direction where the error function increases the most there for the (-) sign is to point down to the direction where the function decreases the most.
How do you find the Error then and what are the matrix to find it.??
The final step is to evaluate the performance of the algorithm. This step is particularly important to compare how well different algorithms perform on a particular datasets. For regression algorithms, three evaluation metrics are commonly used:
(RMSE(Root Mean Squared Error), which is just the square root of MSE(Mean Squared Error) to make it on the same scale as MAE(Mean Absolute Error).
Deciding which loss function to use
If the outliers represent anomalies that are important for business and should be detected, then we should use MSE.
On the other hand if we believe that the outliers just represent corrupted data, then we should choose MAE as loss.
Hope this article helped you and you have learned something.
Please let me know if any suggestions.