The Power of Ensemble Methods in Machine Learning

Original article was published on Artificial Intelligence on Medium

Bagging

Bagging means aggregating the predictions of several weak learners. We can think of it combining weak learners in parallel. The average of the predictions of several weak learners is used as the overall prediction. The most common algoritm that uses bagging method is random forests.

The base estimator of random forests is decision tree which partitions data by iteratively asking questions. Random forests are built by combining several decision trees with bagging method. If used for a classification problem, the overall prediction is based on majority vote of the results received from each decision tree. For regression, the prediction of a leaf node is the mean value of the target values in that leaf. Random forest regression takes mean value of the results from decision trees.

The success of a random forest highly depends on using uncorrelated decision trees. If we use same or very similar trees, overall result will not be much different than the result of a single decision tree. Random forests achieve to have uncorrelated decision trees by bootstrapping and feature randomness.

Bootsrapping is randomly selecting samples from training data with replacement. They are called bootstrap samples. The following figure clearly explains this process:

Figure source

Feature randomness is achieved by selecting features randomly for each decision tree in a random forest. The number of features used for each tree in a random forest can be controlled with max_features parameter.

Feature randomness

Bootstrap samples and feature randomness provide the random forest model with uncorrelated trees.

Hyperparemetes are key parts of learning algorithms which effect the performance and accuracy of a model. Two critical hyperparameters of random forests are max_depth and n_estimators.

max_depth: The maximum depth of a tree. Depth of a tree starts from 0 (i.e. the depth on root node is zero). If not specified, the model keeps splitting until all leaves are pure or until all leaves contain less than min_samples_split samples. Increasing the depth more than necessary creates the risk of overfitting.

n_estimators: Represents the number of trees in a forest. To a certain degree, as the number of trees in a forest increase, the result gets better. However, after some point, adding additional trees do not improve the model. Please keep in mind that adding additional trees always mean more time for computation.