What is Model-based Learning?

Original article was published on Artificial Intelligence on Medium

What is Model-based Learning?

[ML0to100] — S1E12

Another way to generalize* from a set of examples is to build a model of these examples and then use that model to make predictions.

*Generalization usually refers to a ML model’s ability to perform well on new unseen data rather than just the data that it was trained on.

The new instance would be classified as a triangle no matter the majority of the most similar instances belong to Squar(1 Triangle vs 2 Squares) as the model defines so.

Example-

Suppose you want to know if money makes people happy, so you download the Better Life Index data from the OECD’s website and stats about gross domestic product (GDP) per capita from the IMF’s website. Then you join the tables and sort by GDP per capita.

an excerpt of what you get
plot the data for these countries

Although the data is noisy (i.e., partly random), it looks like life satisfaction goes up more or less linearly as the country’s GDP per capita increases.

So you decide to model life satisfaction as a linear function of GDP per capita.

This step is called model selection: you selected a linear model of life satisfaction with just one attribute, GDP per capita.

life_satisfaction = θ0 + θ1 × GDP_per_capita

This model has two model parameters, θ0 and θ1. By tweaking these parameters, you can make your model represent any linear function, as shown

Before you can use your model, you need to define the parameter values θ0 and θ1.

So you need to specify a performance measure. You can either define a-

Utility function (or fitness function) that measures how good your model is.

Cost function that measures how bad it is.

For Linear Regression problems, we typically use a cost function that measures the distance between the linear model’s predictions and the training examples; the objective is to minimize this distance.

This is where the Linear Regression algorithm comes in: you feed it your training examples, and it finds the parameters that make the linear model fit best to your data. This is called training the model.

In our case, the algorithm finds that the optimal parameter values are

θ0 = 4.85 and θ1 = 4.91 × 10–5 .

The linear model that fits the training data best

Terminology- “model” can refer to-

— type of model (e.g., Linear Regression)

— fully specified model architecture (e.g., Linear Regression with one input and one output)

— final trained model ready to be used for predictions (e.g., Linear Regression with one input and one output, using θ0 = 4.85 and θ1 = 4.91 × 10–5)

You are finally ready to run the model to make predictions-

Say you want to know how happy Indians are (OECD data does not have the answer)

You can use your model to make a good prediction:

You look up India’s GDP per capita, find $2,009.98 and then apply your model and find that life satisfaction is likely to be somewhere around

4.85 + 2,009.987× 4.91 × 10–5 = 4.95