Original article can be found here (source): Artificial Intelligence on Medium
Logistic Regression: Odds and Log Odds pattern for equidistant Observations
In case of Logistic Regression, we often try to predict a variable that happens to categorical in nature and takes binary value. Few examples could be to predict if a customer will churn or not, to predict if a patient has cancer or not etc. There is a certain probability associated with all observation and finally depending on the threshold probability, finally an observation is categories as yes or no case (or 0 or 1 as we code in ML).
Logistic Function (Sigmoid Function)
As we are trying to predict the probability of an event, the logistic function should result in a value between 0 and1. Therefore, we are limiting the output of the logistic equation into a range of [0,1]. The probability distribution of such problem can be explained by the Sigmoid function, which is given as:
Where X is the value of independent or predictor variables. This equation is not very intuitive for some given coefficients β0 and β1. It’s not very clear what would be the impact on probability if you increase the value of X by certain value. In order to understand the relationship between X and P better, we will modify this equation in terms of odds and log odds.
Let’s recall that odds is nothing but the ratio of the (probability that the event will occur) / (probability that the event will not occur).
Using the sigmoid function, it can be derived that
Taking a log on both side, log odds,
Now, let’s say X takes the value X1, X2, X3….Xn. The log odds at these points can be defined as log odds 1, log odds 2, log odds 3, …., and log odds n.
Let’s see how these odds value changes when predictor variable X changes.
If we denote the ration odds 2/ odds1 with R, then we can say that
This asserts the fact that if we increase the value of X constantly, then value of odds will increase with a constant factor of R. We can prove this mathematically. We will assume that
Let’s say value of X is 10,20,30,40…. i.e. the difference between two immediate value of X = 10. In this case,
which means that odds for subsequent observation will increase with a factor of 1.82 if observations increase by a constant value 10.
Now, you can also find the distance between two observations if you want their odds value in a certain proportion. Say, you want to know how my observations should be distributed if I want my odds to increase by a ratio 2, i.e. the odds value should be doubled for any two subsequent observation. Here, R = 2, and say co-efficient values remains same i.e. 0.06 . We know that
putting the above value, we get X2-X1 = ln(2)/0.06 = 0.6931/0.06 = 11.5
Thus, if we increase the observation by a constant 11.5, the ratio of odds will increase by a factor of 2. We can verify this by an equation. Let’s say that the equation for log odds is:
When X increases by 11.5, ratio of odds increases by nearly 2.
Similarly, when X increases by 18.32, the ratio of odds becomes 3.
Hence, we can conclude that if the observations are equidistant, their log odds are in a linearly increasing pattern whereas the odds are in a multiplicatively increasing pattern.