Decision Trees easy intuitive way with python

Original article was published on Artificial Intelligence on Medium

Decision tree learning is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems, it is an acyclic graph that can be used to make decisions. In each branching node of the graph, a specific feature j of the feature vector is examined. If the value of the feature is below a specific threshold, then the left branch is followed; otherwise, the right branch is followed. As the leaf node is reached, the decision is made about the class to which the example belongs., for example, the figure shows a decision tree for playTennis:

Decision trees for the concept play tennis.

Types of Decision Trees

Types of decision trees are based on the type of target variable we have. It can be of two types:

  1. Categorical Variable Decision Trees: Decision Trees which have a categorical target variable then it called categorical variable decision trees.
  2. Continuous Variable Decision Trees: Decision Trees has a continuous target variable then it is called as Continuous Variable Decision Trees.

Terminology

Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
Leaf/Terminal Node: Nodes do not split is called Leaf or Terminal node.

the image can be found here

How does the Decision Trees work?

Transform a dataset into Decision Trees here

Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable. The decision trees split the nodes on all available variables and then select the split which results in most homogeneous sub-nodes.

Strengths and Weakness of the Decision Tree approach

The strengths of decision tree methods are:

  • Decision trees perform classification without requiring a lot of calculations.
  • Decision trees are capable of generating understandable rules.
  • Decision trees are able to manage both continuous and categorical variables.

The weaknesses of decision tree methods :

  • Decision trees are less suitable for estimation tasks where the goal is to predict the value of a continuous attribute.
  • Decision trees are subject to errors in classification problems with many classes and a relatively small number of training examples.

Python Implementation

You can find the code and the data set on my GitHub Repository or you can update it for your dataset.

Conclusion:

I hope I was able to clarify it a little to you Decision Trees it is one of the basic Algorithms, I will be uploading a lot of more explanation of algorithms because why not 🙂

Those are my personal research, if you have any comments please reach out to me.

Github, LinkedIn, Zahra Elhamraoui, Upwork

References :

[1] Wikipedia

[2] Predictive Analytics

[3] Decision Trees