Ensemble learning Simplified

Source: Artificial Intelligence on Medium

Ensemble learning Simplified

Inline with my earlier blog on “Reinforcement learning Simplified” that I recieved excellent reception, I now have Ensemble learning basics and algorithms explained using analogies. Please don’t forget to share your inputs and valuable feedback.

Ensemble, in general, means a group of things that are usually seen as a whole rather than in terms of the value as against the individual value. Ensembles follow a divide-and-conquer approach used to improve performance.

We will start understanding the specific algorithm with an introduction to the to the famous concept of Wisdom of Crowds.

Wisdom of Crowds

Imperfect judgments when aggregated in a right way result in a collective intelligence, thus in a superior outcome. The wisdom of Crowds is all about this collective intelligence. In general, the term crowd is usually associated with irrationality and the common perception that there is some influence, which sways the behavior of the crowd in the context of mobs and cults. However, the fact is that this need not always be negative and works well when working with collating intellect. The key concept of Wisdom of Crowds is that the decisions made by a group of people are always robust and accurate than those made by individuals. The ensemble learning methods of machine learning methods have exploited this idea effectively to produce efficiency and accuracy in their results.

The term Wisdom of Crowds was discovered by Galton in 1906. Once he attended a farmer’s fair where there was a contest to guess the weight of an ox that is butchered and dressed. The closest guess won the prize for a total of 800 contestants. He chose to collect all the responses and analyze them. When he took the average of the guesses, he was shocked to notice that they were very close to the actual value. This collective guess was both better than the contestant who won the prize and also proved to be the best in comparison to the guesses by the cattle experts. The democracy of thoughts was a clear winner. For such a useful output, it is important that each contestant had his/ her strong source of information. The independent guess provided by the contestant should not be influenced with his/her neighbor’s guess, and also, there is an error-free mechanism to consolidate the guesses across the group. So in short, this is not an easy process. Another important aspect is also to the fact these guesses were superior to any individual expert’s guess.

Some basic everyday examples include:

  • Google search results that usually have the most popular pages listed at the top
  • In a game like “Who wants to be a billionaire”, the audience poll is used for the answers that the contestant has no knowledge about. Usually, the answer that is maximum voted by the crowd is gathered to be the right answer.

The results of the Wisdom of Crowds approach is not guaranteed. Following is the basic criteria for an optimal result using this approach:

  1. Aggregation: There needs to be a foolproof way of consolidating individual responses into a collective response or judgment. Without which the core purpose of collective views or responses across the board goes in vain.
  2. Independence: Within the crowd, there needs to be a discipline around controlling the response from one entity over the rest in the crowd. Any influence would skew the response thus, impacting the accuracy.
  3. Decentralization: The individual responses have their source and thrive on the limited knowledge.
  4. The diversity of opinion: It is important that each person has a response that is isolated it; the response’s unusualness is still acceptable.

The word ensemble means grouping. To build ensemble classifiers, we first need to build a set of classifiers from the training data, aggregate the predictions made by these classifiers, and predict a class label of a new record using this data.

The Ensemble Process

Technically, the core building blocks include a training set, an inducer, and an ensemble generator. Inducer handles defining classifiers for each of the ample training datasets. The ensemble generator creates classifiers and a combiner or aggregator that consolidates the responses across the combiners. With these building blocks and the relationships between them, we have the following properties that we will be using to categorize the ensemble methods. The next section covers these methods:

  1. Usage of a combiner: This property defines the relationship between the ensemble generator and the combiner
  2. Dependency between classifiers: This property defines the degree of which the classifiers are dependent on each other
  3. Generating diversity: This property defines the procedure used to ensurediversity across combiners
  4. The size of the ensemble: This property denotes the number of classifiers used in the ensembles
  5. Cross inducers: This property defines how the classifiers leverage theinducers. There are cases where the classifiers are built to work with acertain set of inducers

In summary, the building model ensembles first involve building experts and letting them provide a response/vote. The expected benefit is an improvement in prediction performance and produces a single global structure. Though, any interim results produced might end up being difficult to analyze.

Let’s look at how the performance of an aggregated/combined classifier works better in a comprehensive manner. Let’s consider three classifiers that have an error rate of 0.35(ԑ) or an accuracy of 0.65. For each of the classifiers, the probability that the classifier goes wrong with its prediction is 35%.

Given here is the truth table denoting the error rate of 0.35(35%) and the accuracy rate of 0.65(65%):

Supervised ensemble methods

In the case of supervised learning methods, the input is always a labeled data. The combining by learning method includes boosting stack generalization and the rule ensembles techniques. The combining by consensus methods includes bagging, Random forests, and Random decision trees techniques. The following shows the process flow for combining by learning followed by another model for combining by consensus:

There are a few technniques like Bagging, Wagging, Boosting techniques that I will cover in seperate blogs for simplicity sake.

Unsupervised ensemble methods

As a part of unsupervised ensemble learning methods, one of the consensus-based ensembles is the clustering ensemble. The next diagram depicts the working of the clustering-based ensemble:

For a given unlabeled dataset D={x1,x2,…,xn}, a clustering ensemble computes a set of clusters C = { C1,C2,…,Ck}, each of which maps the data to a cluster. A consensusbased unified cluster is formed. The following diagram depicts this flow:

Note: This is an updated version and excerpt from one of my publications on Machine learning.