Logistic Regression in less than 5 minutes using Python

Original article was published on Artificial Intelligence on Medium

Logistic regression is one of the most important Machines learning algorithms after Linear regression.

Suppose we are interested in the factors that influence whether a political candidate wins an election or not. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively, and whether or not the candidate is an incumbent. These are the explanatory or independent variables.

Logistic regression is a supervised classification algorithm. It is a discriminative algorithm, meaning it tries to find boundaries between two classes. It models the probabilities of one class.

Logistic Regression gives the probability associated with each category or each individual outcome. The probability function is joined with the linear equation using a probability distribution. In Logistic Regression, we use binomial distribution where we work on two category problems types of logistic regression

1- Binary: The categorical response has only two 2 possible outcomes(Pass/Fail)
2- Multi: Three or more categories without ordering(Cats, Dogs, Sheep)
3- Ordinal: Three or more categories with ordering (Low, Medium, High)

Sigmoid activation
In order to map predicted values to probabilities, we use the sigmoid function. The function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.

Hypothesis representation

The goal of the logistic regression algorithm is to determine what class a new input should fall into. Here is an example application. See, line fitting does not make sense for this application. We need discrete classification into yes or no categories.

Decision Boundary

To predict which class a data belongs, a threshold can be set. Based on this threshold, the obtained estimated probability is classified into classes.
Say, if predicted_value ≥ 0.5, then classify an email as spam else as not spam.
Decision boundary can be linear or non-linear. Polynomial order can be increased to get complex decision boundaries

Cost function

It is a function that measures the performance of a Machine Learning model for given data. Cost Function quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways.
instead of Mean Squared Error, we use a cost function called Cross-Entropy, also known as Log Loss. Cross-entropy loss can be divided into two separate cost functions: one for y=1 and one for y=0.

Linear VS Logistic

We can call a Logistic Regression a Linear Regression model but the Logistic Regression uses a more complex cost function, this cost function can be defined as the ‘Sigmoid function’ or also known as the ‘logistic function’ instead of a linear function.

The output of linear regression:

The output of logistic regression(sigmoid curve):

Implementation using Python

Evaluation of the model using other metrics, If you are interested in learning the metrics visit:

Conclusion:

I hope I was able to clarify it a little to you Logistic Regression it is one of the basic Algorithms, I will be uploading a lot of more explanation of algorithms because why not 🙂

Those are my personal research, if you have any comments please reach out to me.

Welcome to my medium page

Github, LinkedIn, Zahra Elhamraoui, Upwork

References :

[1] Wikipedia

[2] Logistic Regression

[3]Logistic regression