Random forest simple and intuitive way with python

Original article was published on Artificial Intelligence on Medium

Decision tree learning is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems, it is an acyclic graph that can be used to make decisions. In each branching node of the graph, a specific feature j of the feature vector is examined. If the value of the feature is below a specific threshold, then the left branch is followed; otherwise, the right branch is followed. As the leaf node is reached, the decision is made about the class to which the example belongs.

You can find my post about decision Tree :

Random forest is a supervised learning algorithm. The “forest” it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result. One big advantage of a random forest is that it can be used for both classification and regression problems, which form the majority of current machine learning systems. Let’s look at the random forest in classification since classification is sometimes considered the building block of machine learning. Below you can see how a random forest would look like with two trees

The image can be found here

How are Random Forests trained?

Random Forests are trained via the bagging method. Bagging or Bootstrap Aggregating, consists of randomly sampling subsets of the training data, fitting a model to these smaller data sets, and aggregating the predictions. This method allows several instances to be used repeatedly for the training stage given that we are sampling with replacement. Tree bagging consists of sampling subsets of the training set, fitting a Decision Tree to each, and aggregating their result.

the image can be found here

The Random Forest method introduces more randomness and diversity by applying the bagging method to the feature space. That is, instead of searching greedily for the best predictors to create branches, it randomly samples elements of the predictor space, thus adding more diversity and reducing the variance of the trees at the cost of equal or higher bias. This process is also known as “feature bagging” and it is this powerful method that leads to a more robust model.

How to make predictions with Random Forests?

Remember that in a Decision Tree a new instance goes from the root node to the bottom until it is classified in a leaf node. In the Random Forests algorithm, each new data point goes through the same process, but now it visits all the different trees in the ensemble, which are were grown using random samples of both training data and features. Depending on the task at hand, the functions used for aggregation will differ. For Classification problems, it uses the mode or most frequent class predicted by the individual trees (also known as a majority vote), whereas for Regression tasks, it uses the average prediction of each tree.

The weaknesses of Random forest methods :

Although Random forest is a powerful and accurate method used in Machine Learning, you should always cross-validate your model as there may be overfitting. Also, despite its robustness, the Random Forest algorithm is slow, as it has to grow many trees during the training stage and as we already know, this is a greedy process.

Python Implementation


Random forest is a great algorithm to train early in the model development process, to see how it performs. Its simplicity makes building a “bad” random forest a tough proposition.

I hope I was able to clarify it a little to you Random forest it is one of the basic Algorithms, I will be uploading a lot of more explanation of algorithms because why not 🙂

Those are my personal research, if you have any comments please reach out to me.

Github, LinkedIn, Zahra Elhamraoui, Upwork

References :

[1] How are Random Forests trained?

[2] Random Forest