Original article was published on Artificial Intelligence on Medium
Logistic Regression Explained
Explaining Logistic Regression as easy as it could be.
In linear regression, the Y variable is always continuous. If the Y variable is categorical, you cannot use the linear regression model.
So what would you do when the Y is a categorical variable with 2 classes?
Logistic regression can be used to model and solve such problems, also called as binary classification problems.
Logistic Regression is yet another type of supervised learning algorithm, but its goal is just contrary to its name, rather than regression it aims to classify the data points in two different classes. It is a linear model and produces binary output.
The line differentiating the two classes is known as hyperplane, farther the data point from the hyperplane more will be the confidence of it belonging to that class.
Its goal is to find a legit separating hyperplane that classifies both classes in the best way.
A key point to note here is that Y can have 2 classes only and not more than that. If Y has more than 2 classes, it would become a multi-class classification and you can no longer use the vanilla logistic regression for that.
Here are some examples of binary classification problems:
- Spam Detection: Predicting if an email is Spam or not
- Credit Card Fraud: Predicting if a given credit card transaction is fraud or not
- Health: Predicting if a given mass of tissue is benign or malignant
We use the sigmoid function, as it is non-linear and exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output. we use sigmoid to map predictions to probabilities. Since the probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
The function is differentiable and thus we can find the slope of the sigmoid curve at any two points.
If output computed by sigmoid function ≥0.5 then we assume that point belongs to class 0
If output by sigmoid function <0.5 the point is considered to be in class 1.
Choosing optimal Hyperparameter
Our ultimate goal is to choose the best hyperparameters.
If P is probability of object belonging to class 1 then (P-1) would be probability of object belonging to class 0 as
Probability always lies between 0 and 1.
We have combined probability for both classes and derived likelihood that we intend to maximize.
We’ve successfully derived updated hyperparameters.
Multiclass Logistic Regression
But what if we want to have many outputs using Logistic Regression, for that we can use one v/s rest model.
For illustration, let’s suppose our output could be anything belonging to dog, cat and 10 such other classes, but Logistic Regression is a binary model, so our approach would to implement Vanilla Logistic Regression for dog v/s other classes; if output predicted is dog it’s fine, but if test image belongs to some other class we can iterate our former model that is cat v/s other classes and so on.
Logistic Regression using Sci-Kit Learn
Advantages of Logistic Regression
- High efficiency
- Low Variance
- Can be easily updated with new data using stochastic Gradient Descent.
Disadvantages of Logistic Regression
- Doesn’t handle large number of categorical variables well.
- Requires transformation of non-linear features.
Features of Logistic Regression
- Target is discrete variable
- Predicted values are probability of targeted values.
Hopefully, this article has not only increased your understanding of Logistic Regression but also made you realize, it is not difficult and is already happening in our daily life.
As always, thank you so much for reading, and please share this article if you found it useful! 🙂