Building Blocks of Machine Learning.

Source: Deep Learning on Medium

Building Blocks of Machine Learning.

I have taught the foundations of machine learning to few thousands of students at all levels (Bsc to Post-Docs). Many students who start on machine learning struggle with the concepts of features, labels and hypothesis spaces which are the conceptual building blocks of many machine learning methods.

At first sight, these concepts appear quite abstract and mathematical. However, it turns out that subscribing to these concepts is very useful in practice. Let me try to convince you why this is the case.

When facing a particular application (like choosing a T-shirt based on its ecological footprint) there is no unique correct choice for what you consider as data points, features, labels and the hypothesis space.

In the very end, these choices are up to you (= the machine learning engineer/data analyst) and coming up with a useful definition of data points, features, labels and hypothesis space might be the most challenging step towards developing a machine learning solution for the application at hand.

As soon as you have found a useful definition for the features And labels of
a data point you can rather easily apply ready-made implementations of machine learning algorithms, e.g., using some Python library.
Data Points. You should define data points such that you have many of them available. Most machine learning methods rely on statistical principles. Statistics works best when you have a large population of many data points such that you can average over them.

Features. Efficient machine learning requires to characterize data points
with few quantities or measurements that we call “features”. In principle,
we can use any quantity as a feature as long as we can easily compute or
measure that quantity (like red, green and blue values of image pixels).

Labels. In many applications, we are interested in some property of a data
point (such as the amount of water used for producing a T-shirt) that is not
easy to determine. Therefore, We learn a predictor or hypothesis map that
reads in the features of the data point and delivers an estimate/approximation/guess
of the label.

Hypothesis Space. In principle, we could use as predictor any map or function that maps a given value of the data points feature to the predicted label. However, given limited computational resources, we need to restrict ourselves to a smaller subset of possible predictor functions. This smaller subset of predictor functions is called “hypothesis space”.

As features and labels, also the hypothesis space is a design choice. There is
no unique correct choice for the hypothesis space. The hypothesis space should be chosen such that its predictor functions can be computed efficiently with the available computational resources.

One example of a hypothesis space could be given by the set of all functions or
maps implemented by the Python function

def some_predictor(x):
hat_y = …. # compute predicted label
return hat_y # return predicted label

We could define a hypothesis space by all such Python functions that require less than 1 millisecond on a particular computer. Another (larger) hypothesis space is obtained by all such Python functions that require no more than 2 milliseconds.