Decision Tree Classification _v1(Supervised-Learning)

Source: Deep Learning on Medium

Decision Tree Classification _v1(Supervised-Learning)

Decision Tree Analysis is a general, predictive modelling tool that has applications spanning several different areas. In general, decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. It is one of the most widely used and practical methods for supervised learning.

A decision tree is a tree-like graph with nodes representing the place where we pick an attribute and ask a question; edges represent the answers the to the question, and the leaves represent the actual output or class label. They are used in non-linear decision making with simple linear decision surface.

Decision trees classify the examples by sorting them down the tree from the root to some leaf node, with the leaf node providing the classification to the example. Each node in the tree acts as a test case for some attribute, and each edge descending from that node corresponds to one of the possible answers to the test case. This process is recursive and is repeated for every subtree rooted at the new nodes.

Let me make it more simple :

let us take a data set of electronic store database where the data set tells about the student will buy the computer or no.

Observing the above database we get to know that the class label attribute

buys_computer has two distinct values ( namely,{Yes,no}); therefore, there are two distinct classes(i.e m=2).

There are 9, Yes and 5 No… Let us find out the information gain.

Mathematically it is expressed like the below.

so, therefore, calculating information gain for the above dataset

next, we need to compute the expected information required for each attribute.

Let’s start with the attribute age. we need to look at the distribution of yes and no tuples for each category of age.

For the age category “youth”, here are two yes tuples and three no tuples . for category “middle_aged”, there are four tuples and zero tuples.


Hence, the gain in information from such partitioning would be

Similarly,we can compute Gain(income) =0.029 bits, Gain (student)=0.151 bits, and gain (credit_rating) =0.048 bits. Because age has the highest information gain among the attributes, it is selected as the splitting attribute.

Node N is labelled with age, and branches are grown for each of the attribute’s values. The tuples are then partitioned accordingly.

Notice that the tuple falling into the partition for age= middle_aged all belong to the same class. Because they all belong to the same class “yes”, the leaf should, therefore, be created at the end of this branch and labelled “yes”.

The attribute age has the highest information gain and therefore becomes the splitting attribute at the root node of the decision tree. Branches are grown for each outcome of age. The tuples are partitioned accordingly.

Thank You !