🤓[DS Concept]: Partial Dependence Plots

"Why did the machine learning model refuse to show its partial dependence plot? It was too self-conscious about its high variance!"

Feb 18, 2023

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model (J. H. Friedman 2001³⁰). A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex. For example, when applied to a linear regression model, partial dependence plots always show a linear relationship.
- Christoph Molnar

🤔 What is "Partial Dependence Plots"?

Explain it like I'm a CEO:

Partial Dependence Plots (PDP) are a way to visualize how much each input variable in a machine learning model affects the output. They help you identify which inputs have the greatest impact on your model, so you can optimize them and improve the accuracy of your predictions.

Why do I care about Partial Dependence Plots?

Let's say you're the CEO of a healthcare startup that uses machine learning to predict patient outcomes. You want to make sure your model is accurate and reliable, so you can provide the best possible care to your patients. Partial Dependence Plots can help you identify which variables are most important in predicting patient outcomes, so you can prioritize those variables and improve the accuracy of your model.

How can I apply Partial Dependence Plots?

Let's say you have a machine learning model that predicts house prices based on the number of bedrooms, bathrooms, and square footage. You can use Partial Dependence Plots to visualize the impact of each input variable on the model's predictions. For example, you might see that square footage has the greatest impact on the model's predictions, so you can focus on optimizing that variable to improve the accuracy of your predictions.

🤓 For the experts

Three principles to remember and master:

PDPs visualize the impact of each input variable on the model's predictions: PDPs are a way to visualize how much each input variable in a machine learning model affects the output. They help you identify which inputs have the greatest impact on your model, so you can optimize them and improve the accuracy of your predictions.
PDPs show how input variables interact with each other: PDPs can also show how input variables interact with each other to affect the model's predictions. This can help you identify complex relationships between variables and optimize your model accordingly.
PDPs are useful for identifying feature importance and feature interactions: By using PDPs to identify which features are most important and how they interact with each other, you can optimize your machine learning model and improve its accuracy.

Christoph Molnar’s free book is amazing, check out his PDP chapter.

📖 A bit of history

Trevor Hastie and Robert Tibshirani are two of the pioneers of Partial Dependence Plots. They are well-known statisticians and machine learning experts, who have made significant contributions to the field. Hastie is a professor of statistics and computer science at Stanford University, while Tibshirani is a professor of biostatistics and statistics at Stanford.

Let’s code it

Here's the Python code to generate partial dependence plots using the scikit-learn package:

from sklearn.inspection import partial_dependence
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt

# load the Boston Housing dataset
X, y = load_boston(return_X_y=True)

# create the Gradient Boosting Regressor model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=0)

# fit the model to the data
model.fit(X, y)

# create the partial dependence plot for the first and third features
features, feature_names = [(0,)], [f"Features #{i}" for i in range(X.shape[1])]
pd_results = partial_dependence(model, X, features=0, kind="average", grid_resolution=20)

# extract the results
pd_results2 = pd.DataFrame.from_dict({k: v[0] for k, v in pd_results.items()})


# plot them
pd_results2.plot(x='values', y = 'average')
plt.show()

The code above loads the Boston Housing dataset, creates a GradientBoostingRegressor model and fits it to the data. Then, it generates a partial dependence plot for the first features using the partial_dependence function from the sklearn.inspection module. Finally, it shows the plot using the plt.show() function from the matplotlib package.

The resulting plot shows how the predicted target variable changes as a function of the selected features, while keeping all other features constant. This can help us understand how important each feature is in determining the target variable and can be used to identify non-linear relationships between features and the target variable.

R code

library(pdp)
library(randomForest)
data(Boston, package = "MASS")

# create a random forest model
model <- randomForest(medv ~ ., data = Boston)

# generate partial dependence plots for the first and third features
partial_plots <- partial(model, pred.var = c("crim", "age"))

# plot the partial dependence plots
plot(partial_plots, which_vars = c("crim", "age"))

The code above loads the Boston Housing dataset and creates a random forest model using the randomForest function from the randomForest package. It then generates partial dependence plots for the first and third features using the partial function from the pdp package. Finally, it shows the plots using the plot function from the pdp package.

The resulting plots show how the predicted target variable changes as a function of the selected features, while keeping all other features constant. This can help us understand how important each feature is in determining the target variable and can be used to identify non-linear relationships between features and the target variable.

Code it by hand, bro

Using deprecated plot_partial_dependence() function:

from sklearn.inspection import partial_dependence, PartialDependenceDisplay, plot_partial_dependence
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt

# load the Boston Housing dataset
X, y = load_boston(return_X_y=True)

# create the Gradient Boosting Regressor model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=0)

# fit the model to the data
model.fit(X, y)

# way #1:
fig, ax = plot_partial_dependence(model, X, [0, 2], feature_names=load_boston().feature_names)
plt.show()

Using Pandas

# Way #2: create the partial dependence plot for the first feature
features, feature_names = [(0,)], [f"Features #{i}" for i in range(X.shape[1])]
pd_results = partial_dependence(model, X, features=0, kind="average", grid_resolution=20)
pd_results2 = pd.DataFrame.from_dict({k: v[0] for k, v in pd_results.items()})
pd_results2.plot(x='values', y = 'average')
plt.show()

Code it by hand

I’t s just the average of the predicted target across bins of the feature.

# define the feature of interest
import numpy as np
feature_idx = 0

# create a grid of values for the feature
feature_values = np.linspace(np.min(X[:, feature_idx]), np.max(X[:, feature_idx]), num=100)

# initialize an array to store the predicted target values for each feature value
target_values = np.zeros_like(feature_values)

# loop through the feature values and predict the target variable for each value
for i, val in enumerate(feature_values):
    X_test = np.copy(X[0, :])
    X_test[feature_idx] = val
    target_values[i] = model.predict(X_test.reshape(1, -1))
    
# plot the partial dependence curve
plt.plot(feature_values, target_values)
plt.xlabel(load_boston().feature_names[feature_idx])
plt.ylabel('Predicted target variable')
plt.show()

Drop your knowledge below!

Thoughts? reactions? come on, is anyone even reading this far… don’t leave me hanging friends.

Data Science Daily

Discussion about this post