Irreducible Error: A quick guide.

In Machine learning, it’s important to understand prediction errors. Previously, we looked at the bias-variance tradeoff, now we'd be looking at "Irreducible Error".

- Photo by Pixabay from Pexels

What is Irreducible Error? 🤷‍♂️

Take an instance, we want to predict a value Y, based on a set of independent variables X1, X2, X3, ..., Xn, which we'd refer to as Xset. For our predictions to be good enough, we need to make sure that Xset contains the variables which drive/influence the outcome of Y. But a fact still stands, "there would nearly always be some crucial variables that affect the outcome of Y which would be omitted from our set of independent variables, in this case, Xset".

Therefore, Irreducible error arises from the fact that Xset doesn’t completely determine Y. That is, there are "variables" outside of Xset that still have some small effect on Y.

This error can't be reduced by making the model better due to the fact that the error is a result of missing variables (most of the time) in our data (Xset). 😯

To minimize prediction error, we need to understand its source.

The only way to improve prediction error related to irreducible error is to identify these outside influences and incorporate them as predictors, which most of the time isn't just that simple to do because you might end up having too many predictors.

So, If there are Irreducible errors, are there reducible errors? 🧐

Absolutely, yes. Everything has an opposite.

Reducible errors, in a simple sentence, are errors that can be reduced by making certain improvements (eg. feature engineering) to the model and are not a result of missing variables in our predictors set.

Thank you, and see you next time 🥳