What Is Generalization In Machine Learning?

Before talking about generalization in machine learning, it’s important to first understand what supervised learning is. To answer, supervised learning in the domain of machine learning refers to a way for the model to learn and understand data. With supervised learning, a set of labeled training data is given to a model. Based on this training data, the model learns to make predictions. The more training data is made accessible to the model, the better it becomes at making predictions. When you’re working with training data, you already know the outcome. Thus, the known outcomes and the predictions from the model are compared, and the model’s parameters are altered until the two line up. The aim of the training is to develop the model’s ability to generalize successfully.

Table of Contents

What is generalization?

The term ‘generalization’ refers to the model’s capability to adapt and react properly to previously unseen, new data, which has been drawn from the same distribution as the one used to build the model. In other words, generalization examines how well a model can digest new data and make correct predictions after getting trained on a training set.

How well a model is able to generalize is the key to its success. If you train a model too well on training data, it will be incapable of generalizing. In such cases, it will end up making erroneous predictions when it’s given new data. This would make the model ineffective even though it’s capable of making correct predictions for the training data set. This is known as overfitting. The inverse (underfitting) is also true, which happens when you train a model with inadequate data. In cases of underfitting, your model would fail to make accurate predictions even with the training data. This would make the model just as useless as overfitting.

The ideal solution

You would ideally want to choose a model that stands at the sweet spot between overfitting and underfitting. To achieve this goal, you can track the performance of a machine learning algorithm over time as it’s working with a set of training data. You can plot both the skill on the training data and the skill on a test dataset that you’ve held back from the training process. As the algorithm learns over time, the level of error for the model on the training data would decrease and so would the error on the test dataset. Training the model for too long would cause a continual decrease in the performance on the training dataset due to overfitting. At the same time, due to the model’s decreasing ability for generalization, the error for the test set would start to increase again. The sweet spot is the point just before the error on the test dataset begins to rise where the model shows good skill on both the training dataset as well as the unseen test dataset.

To limit overfitting in a machine learning algorithm, two additional techniques that you can use are:

Using a resampling method to estimate the accuracy of the model
Holding back a validation dataset

So, during your machine learning training, keep an eye on generalization when estimating your model accuracy on unseen data.

← Previous Next →

Data Preprocessing and Feature Engineering in Machine Learning

Mar 8, 2024 | Machine Learning Blog

While machine learning algorithms are powerful,...

Machine Learning Tools and Technologies

Feb 14, 2024 | Machine Learning Blog

Machine learning, a subset of artificial...

Supervised Vs. Unsupervised Learning: Understanding The Differences

Feb 22, 2023 | Machine Learning Blog

Algorithms and statistical models are used in the field of machine learning to help computers learn from data. The distinction between supervised and unsupervised learning is essential in machine learning. In this article, we will look at the differences between these two approaches and when to use each one.

What Is Generalization In Machine Learning?

What is generalization?

The ideal solution

Recent Posts

Categories

Tags

Related Articles

Data Preprocessing and Feature Engineering in Machine Learning

Machine Learning Tools and Technologies

Supervised Vs. Unsupervised Learning: Understanding The Differences

Contact

About

Follow Us