Addressing the Issue of “Black Box” in Machine Learning

There is no doubt that machine learning models have taken the world by storm in recent decades. Their ability to identify patterns and generate predictions that far exceed any other form of statistical technique is truly remarkable and hard to contend with.

However, despite all of its promising advantages, many still remain sceptical. Specifically, one of the main setbacks that machine learning models struggle with is the lack of transparency and interpretability.

In other words, although machine learning models are highly capable of generating predictions that are very robust and accurate, it often comes at the expense of complexity when one tries to inspect and understand the logic behind those predictions.

Our goal in this article is to unpack and address the issue of black-box models by answering two fundamental questions:

  • What features in the data did the model think are most important?
  • How does each feature affect the model’s predictions in a big picture sense as well as on a case by case basis?

To help us answer those questions, we will be exploring 4 unique techniques and discuss how each of them can be used to create more transparency in model predictions:

  • Feature importance
  • Permutation importance
  • Partial dependence plots
  • SHAP values

So, if you are ready to start peeling back and examining how exactly your model is using input data to make predictions, let’s begin!

The reference notebook to this article can be found here.


Jason Chong – Actuarial Science Graduate & Aspiring Data Scientist


Leave a Comment

Your email address will not be published. Required fields are marked *