In both Supervised and Unsupervised machine learning, most algorithms are centered around minimising (or, equivalently) maximising some objective function. This function is supposed to somehow represent what the model knows/can get right. Normally, as one would expect, the objective function does not always reflect exactly what we want.
The objective function presents 2 main problems: 1. how do we minimise it (the answer to this is up for debate and there is lots of interesting research about efficient optimisation of non-convex functions and 2) assuming we can minimise it perfectly, is it the correct thing to be minimising?
It is point 2 which is the focus of this post.
Let’s take the example of square-loss-linear-regression. To do so we train a linear regression model with a square loss . (Where we are taking the inner product of learned weights with a vector of features for each observation to predict the outcome).…