What Makes LightGBM Light

Original article was published by Soner Yıldırım on Artificial Intelligence on Medium

What Makes LightGBM Light

The speed matters.

Photo by Shiro hatori on Unsplash

Gradient boosted decision trees (GBDT) is such a well-performing algorithm that it has served as a base for many state-of-the-art algorithms such as XGBoost, LightGBM, and CatBoost.

In this post, we will focus on what makes the LightGBM actually light and fast. LightGBM was created by researchers at Microsoft aiming to build a more efficient implementation of GDBT than the other ones in use.

Let’s start by briefly discussing how GBDT algorithm works. We will focus on what makes LightGBM special.

The gradient boosting means sequentially combining weak learners in a way that each new learner fits the residuals from the previous step. Thus, each new learner improves the overall model. The final model aggregates the results from each step and a strong learner is achieved. In the case of GBDT, weak learners are decision trees.

The weak learner of GBDT, decision tree, learns by splitting the observations (i.e. data instances) based on feature values. The algorithm looks for the best split that will result in the highest information gain.

It turns out that finding the best split is the most time-consuming part of the learning process of a decision tree. The prior implementations of GBDT use either the pre-sorted or histogram-based algorithm to find the best split.

  • Pre-sorted: Feature values are pre-sorted and all possible split points are evaluated.
  • Histogram-based: Continuous features are divided into discrete bins and feature histograms are created.

Histogram-based is more efficient than the pre-sorted algorithm. Both of them get slower as the size of the dataset increases both in terms of observations and features.

LightGBM started off with the histogram-based algorithm since it is the more efficient one.

The problem with the histogram-based algorithm is that all the data instances are scanned to find the best split with regards to the information gain. This is done for each feature. Thus, the complexity of the histogram-based algorithm is dominated by the number of data instances and features.

To overcome this issue, LightGBM uses two techniques:

  • GOSS (Gradient One-Side Sampling)
  • EFB (Exclusive Feature Bundling)

Let’s go into detail about what these techniques do and how they make LightGBM “light”.