Skip to content Skip to footer

A Simple Introduction to Ensemble Learning

Table of content

  1. What is Ensemble Machine Learning
  2. Ensemble learning Techniques
  1. Averaging
  2. Weighted Averaging
  3. Advanced Techniques
  4. Stacking
  5. Blending
  6. Bagging
  7. Boosting

What is Ensemble Machine Learning?

As discussed above, ensemble learning is an approach that comes under supervised machine learning, where we use the combined results of several supervised learning models. So let’s try to understand it more using an example. Let’s say a person has written an article on an interesting topic, and he wants to know the preliminary feedback before publishing it. So he thinks about the following possible ways.

  • Ask five friends to rate the article: This way, he got the proper idea of the article because some of them chose to give an honest rating to him. But there is also a possibility that the people are not subject matter experts on the topic of his article.
  • Ask 50 people to rate the article: Here, he has included all his friends and some strangers to give feedback and got more generalized and diversified feedback. This approach can be the best of all the approaches he chose to get feedback for his work.

Ensemble Learning Techniques

The following are techniques to perform ensemble learning:

Simple techniques

  • Max Voting: we generally use this method for solving classification problems. Using each data point, multiple models give their outcome, and this outcome is considered as the vote. Using the majority vote-winning technique, we reach the final result.
  • Friend 2 = 4
  • Friend 3 = 5
  • Friend 4 = 2
  • Friend 5 = 3
  • Averaging: Just like in the max voting system, here also, all the models take each data point to make predictions, but we consider the final result as the average of results from all the models. The averaging method is mostly applied in regression problems.

Advanced techniques

  • Stacking: if the above-discussed methods can be considered basic ensemble learning methods, then methods after this can be considered advanced ensemble learning. Stacking is a method where several learners are attached one after another. Decision tree, KNN and SVM algorithms can be considered examples of base models used in stacking learning. The following steps a stacked ensemble learning model takes to give final results:

  1. Using the trained model, we calculate the predictions using the test data.

  1. Models are trained on the training set.
  2. A validation set and a train set are used to make the prediction.
  3. Validation set and predictions made using validation set used as a feature to train a new model.
  4. A new model and test data are used to make the final prediction.
  • Bagging: Bagging is an advanced form of ensemble learning where it uses multiple models to give their individual results on a sub-part of data. By combining these results gives a final outcome. Since multiple models have a high chance of giving the same results while inputs are similar, bootstrapping comes into the picture to fail this condition. It helps create various subsets of whole data and then trains multiple models on those subsets. The below picture is an illustration of the bagging technique.

  1. A base model is assigned to learn from each subset.
  2. The final prediction comes out as the combined result from all the models.
  1. At the initial stages, all the data points have similar weightage.
  2. A base model gets trained on the subset and gives predictions using the whole data.
  3. Errors are calculated using the initial model’s original value and predicted value.
  4. Incorrectly predicted data points take higher weights.
  5. Again a base model is used to get trained and give predictions on the dataset.
  6. The process from steps 3 to 6 is repeated until the final learner doesn’t occur.

Final words

Here in the article, we have discussed the basic introduction of ensemble machine learning. Using an example, we tried to understand how it works and learn about the different ensemble learning techniques, such as max voting, averaging, bagging and boosting. In our next articles, we will discuss the models based on ensemble learning techniques.

About DSW

Data Science Wizards (DSW) is an Artificial Intelligence and Data Science start-up that primarily offers platforms, solutions, and services for making use of data as a strategy through AI and data analytics solutions and consulting services to help enterprises in data-driven decisions.