Interview Guide to Boosting Algorithms: Part-1

Original article was published on Artificial Intelligence on Medium

Interview Guide to Boosting Algorithms: Part-1

Photo by Sebastian Herrmann on Unsplash

To start with, here are a few question which will develop you interest in this particular domain.

Questions:

  1. Is AdaBoost less or more prone to over-fitting?
  2. What are some misconceptions around over-fitting?
  3. What is the difference between Gradient boosting and AdaBoost?
  4. In what situation/use-case, we prefer the XGBoost Algorithm?
  5. Where do boosting algorithms fit in the world of AI/ML?

Just read the above questions and then think about a minute.

Read below till end, you’ll find your answers by yourself.

Photo by Gary Butterfield on Unsplash

Table of content to guide you across this article:

  • A Short Introduction to Boosting
  • The strength of weak learnability
  • Central Idea of Boosting algorithms
  • Is AdaBoost less or more prone to over-fitting?
  • Some misconceptions around over-fitting
  • What is the difference between Gradient boosting and AdaBoost?

A Short Introduction to Boosting:

Boosting is a general method for improving the accuracy of any given learning algorithm.

The strength of weak learnability:

A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances.

The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.

Photo by Priscilla Du Preez on Unsplash

PAC Learning:

The requirements for a concept class to be PAC learnable are quite stringent.

The learning algorithm has to work for all target concepts in the class, for all input distributions, and for any setting of accuracy and confidence (δ) parameters.

Central Idea of Boosting algorithms:

The central idea of the boosting approach is the following.

Initially, we can use a weak learning algorithm that gives us a hypothesis that performs slightly better than random guessing.

We could repeatedly run this weak learning algorithm, though it may return the same hypothesis.

However, if we modify the distribution so that the hypothesis already returned is no longer valid, i.e., under the new distribution it has error exactly 1/2, then the weak learning algorithm is required to provide us with a different hypothesis.

Photo by AbsolutVision on Unsplash

By doing this repeatedly, we can combine several hypotheses to produce one that has low error.

All boosting algorithms make use of this high-level approach.

Is AdaBoost less or more prone to over-fitting?

In practical experience AdaBoost is quite robust to over-fitting, and LPBoost (Linear Programming Boosting) even more so (because the objective function requires a sparse combination of weak learners, which is a form of capacity control). The main factors that influence it are:

  • The “strength” of the “weak” learners: If you use very simple weak learners, such as decision stumps (1-level decision trees), then the algorithms are much less prone to over-fitting. Whenever you try using more complicated weak learners (such as decision trees or even hyper-planes) You’ll find that over-fitting occurs much more rapidly.
  • The noise level in the data: AdaBoost is particularly prone to over-fitting on noisy datasets. In this setting the regularized forms (RegBoost, AdaBoostReg, LPBoost, QPBoost) are preferable.
  • The dimensionality of the data: We know that in general, we experience over-fitting more in high dimensional spaces (“the curse of dimensionality”), and AdaBoost can also suffer in that respect, as it is simply a linear combination of classifiers which themselves suffer from the problem. Whether it is as prone as other classifiers is hard to determine.
Photo by Glenn Carstens-Peters on Unsplash
  • Of course we can use heuristic methods such as validation sets or k-fold cross-validation to set the stopping parameter (or other parameters in the different variants) as you would for any other classifier.

Some misconceptions around over-fitting:

It was wrongly interpreted that AdaBoost is better than random forest in terms of over-fit.

In fact, according to theory (look at the original random forest paper by Breiman), Random Forest is absolutely immune against over-fitting as long as its weak classifiers don’t over-fit to data.

What is the difference between Gradient boosting and AdaBoost?

  • Both methods use a set of weak learners.
  • They try to boost these weak learners into a strong learner.
Photo by Kelly Sikkema on Unsplash