Original article was published on Artificial Intelligence on Medium
To end, lets explore the characteristics of the most common Boosting models out there.
Different Boosting Models
Short for Adaptive Boosting, AdaBoost works by the exact process described before of training sequentially, predicting, and updating the weights of the miss-classified samples and of the corresponding weak models.
It is mostly used with Decision Tree Stumps: decision trees with just a root node and two leave nodes, where only 1 feature of the data is evaluated. As we can see, by taking into account only 1 feature of our data to make predictions, each stump is a very very weak model. However, by combining many of them, a very robust and accurate ensemble model can be built.
If you want to know more about AdaBoost, check out the following video by StatQuest.
Very similar to AdaBoost, Gradient Boosting Machines train weak learners sequentially, adding more and more estimators, but instead of adapting the weights of the data, it tries to predict the residual errors made by the previous estimators.
Because of this, we no longer have sample weights, and all the weak models have the same amount of say or importance. Again, most times, Decision trees are used as the base predictors, however, they’re not stumps, but bigger, fixed sized trees. GBMs use a learning rate and takes a small steps towards better results, in a similar manner conceptually to what is done in Gradient Descent.
Again, if you wanna dive deeper, check out the video by StatQuest.
Short for eXtreme gradient boosting, like in Gradient boosting, we fit our trees to the residuals of the previous trees predictions, however, instead of using conventional, fixed size decision trees, XGBoost uses a different kind of trees: XGBoost trees we could call them.
It builds these trees by calculating similarity scores between the observations that end up in a leave node. Also, XGBoost allows for regularisation, reducing the possible overfitting of our individual trees and therefore of the overall ensemble model.
Lastly, XGBoost is optimised to push the limit of the computational resources of boosted tree algorithms, making it a very high performance and fast algorithm in terms of time and computation.
You can watch the following video XGBoost Part 1: Regression, to get a deeper vision of what XGBoost is all about.
Light Gradient Boosting Machines, known by the short name of LigthGBM, are yet another turnaround of improvements for Gradient Boosting algorithms. Instead of using a level-wise growing strategy for the decision trees like in XGBoost, it uses a leaf-wise growth strategy, giving it the chance to achieve a higher error reduction per jump than other tree based algorithms. Also, compared to XGBoost, LigthGBM is generally faster, specially on large data sets.
You can learn more about it on its official docu page.
Conclusion and additional Resources
That is it! As always, I hope you enjoyed the post, and that I managed to help you understand what boosting is, how it works, and why it is so powerful.
Here you can find some additional resources in case you want to learn more about the topic:
If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium, and stay tuned for my next posts! Also, you can check out this repository for more resources on Machine Learning and AI!
- Cover Image from Unsplash.
- All other images are self made.