Decision Tree Regression and it’s Mathematical Implementation

Original article was published on Deep Learning on Medium

Decision Tree Regression and it’s Mathematical Implementation

What is Decision Tree?
Decision Tree is one of the most popular and powerful algorithm for regression as well as classification.It falls under the category of non-parameteric supervised learning.
It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.
A decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a tree-like shape.

Lets understand the decision tree :(In this example we have to predict that a person is fit or not and for predicting we have some decision parameters like age,exercise in morning and eating pizza or not)

Simple Decision Tree

we a have a decision tree with us.

Classification Rules:

Classification rules are the cases in which all the scenarios are taken into consideration and a class variable is assigned to each.

Class Variable:

Each leaf node is assigned a class-variable. A class-variable is the final output which leads to our decision.

Let us derive the classification rules from the Decision Tree created:

  1. If person’s age is below 30 years and eat lots of pizzas →UNFIT
  2. If person’s age is below 30 years and doesn’t eat lots of pizzas →FIT
  3. If person’s age is above 30 years and does exercise in the morning→FIT
  4. If person’s age is above 30 years and doesn’t exercise in the morning→UNFIT

Important Terminologies:

1.Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets.
2.Splitting: It is a process of dividing a node into two or more sub-nodes.
3.Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
4.Leaf/ Terminal Node
: Nodes do not split is called Leaf or Terminal node.
5.Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.
6.Branch / Sub-Tree
: A sub section of entire tree is called branch or sub-tree.
7.Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node.


Entropy : A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values.

Information Gain: The information gain is based on the decrease in entropy after a data-set is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).

The information gain can be calculated as follows:

Levels of ENTROPY

If the data is completely homogenous, the entropy is 0, else if the data is divided (50–50%) entropy is 1.


There are couple of algorithms there to build a decision tree , we only talk about a few which are

CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.


Training a decision tree consists of iteratively splitting the current data into two branches. Say we had the following datapoints.


Right now, we have 1 branch with 5 blues and 5 greens. Let’s make a split at x = 2 This is a perfect split! It breaks our dataset perfectly into two branches:

● Left branch, with 5 blues.

● Right branch, with 5 greens.

Split at x=2


What if we’d made a split at x = 1.5
This imperfect split breaks our dataset into these branches:

● Left branch, with 4 blues.

● Right branch, with 1 blue and 5 greens.

Split at x=1.5

The formula for ginni impurity will be:

Now lets understand the working of DECISION TREE through an example:

Suppose we have a basket full of diiferent categories of fruit, we have to separate each fruit depending on their category.

In the above figure we can see that the fruits of diiferent categories are now separated.


In this post, we read about decision trees in detail and gained insights about the working and the mathematics behind them. They’re widely used and strongly supported.