Original article was published on Deep Learning on Medium
Challenges with machine learning Algorithms:
The conventional machine-learning algorithm can’t classify well for the imbalanced dataset.
Standard classifier algorithms like Decision Tree, Logistic Regression they are biased towards the classes. They tend to predict only the majority class. And the feature of minority class treated as noise and it’ll get ignored. Therefore there is a high probability to predict only a majority class over the minority class.
We are going to use DummyClassifier, LogisticClassifier, and RandomForestClassifer
2. LogisticClassifier :
3. RandomForest Classifier :
Also, let’s print the classification matrix for each algorithm.
As we can see we are getting good results by using RandomForest over logistic Classification.
Also, Let’s try with Deep learning :
So, to test how Neural network works on an imbalanced dataset created a small model.
The confusion matrix for this approach is as below:
We can see our model, complete bias towards the majority class. and to increase the accuracy and reduce the error it’s not considering the minority class.
Now let’s see how we can use the Sampling techniques to reduce the class imbalance problem.