# Interview Guide to Boosting Algorithms: Part-1

Original article was published on Artificial Intelligence on Medium

# Interview Guide to Boosting Algorithms: Part-1

To start with, here are a few question which will develop you interest in this particular domain.

# Questions:

1. Is AdaBoost less or more prone to over-fitting?
2. What are some misconceptions around over-fitting?
4. In what situation/use-case, we prefer the XGBoost Algorithm?
5. Where do boosting algorithms fit in the world of AI/ML?

• A Short Introduction to Boosting
• The strength of weak learnability
• Central Idea of Boosting algorithms
• Is AdaBoost less or more prone to over-fitting?
• Some misconceptions around over-fitting

# A Short Introduction to Boosting:

Boosting is a general method for improving the accuracy of any given learning algorithm.

# The strength of weak learnability:

A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances.

The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.

# PAC Learning:

The requirements for a concept class to be PAC learnable are quite stringent.

The learning algorithm has to work for all target concepts in the class, for all input distributions, and for any setting of accuracy and confidence (δ) parameters.

# Central Idea of Boosting algorithms:

The central idea of the boosting approach is the following.

Initially, we can use a weak learning algorithm that gives us a hypothesis that performs slightly better than random guessing.

We could repeatedly run this weak learning algorithm, though it may return the same hypothesis.

However, if we modify the distribution so that the hypothesis already returned is no longer valid, i.e., under the new distribution it has error exactly 1/2, then the weak learning algorithm is required to provide us with a different hypothesis.

By doing this repeatedly, we can combine several hypotheses to produce one that has low error.

All boosting algorithms make use of this high-level approach.

# Is AdaBoost less or more prone to over-fitting?

In practical experience AdaBoost is quite robust to over-fitting, and LPBoost (Linear Programming Boosting) even more so (because the objective function requires a sparse combination of weak learners, which is a form of capacity control). The main factors that influence it are:

• The “strength” of the “weak” learners: If you use very simple weak learners, such as decision stumps (1-level decision trees), then the algorithms are much less prone to over-fitting. Whenever you try using more complicated weak learners (such as decision trees or even hyper-planes) You’ll find that over-fitting occurs much more rapidly.
• The noise level in the data: AdaBoost is particularly prone to over-fitting on noisy datasets. In this setting the regularized forms (RegBoost, AdaBoostReg, LPBoost, QPBoost) are preferable.
• The dimensionality of the data: We know that in general, we experience over-fitting more in high dimensional spaces (“the curse of dimensionality”), and AdaBoost can also suffer in that respect, as it is simply a linear combination of classifiers which themselves suffer from the problem. Whether it is as prone as other classifiers is hard to determine.
• Of course we can use heuristic methods such as validation sets or k-fold cross-validation to set the stopping parameter (or other parameters in the different variants) as you would for any other classifier.

# Some misconceptions around over-fitting:

It was wrongly interpreted that AdaBoost is better than random forest in terms of over-fit.

In fact, according to theory (look at the original random forest paper by Breiman), Random Forest is absolutely immune against over-fitting as long as its weak classifiers don’t over-fit to data.