Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 14 additions & 9 deletions summary_examples/blogpost1.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# We construct a synthetic problem for which a linear model is not able to predict better than
# the empty model, i.e., the model predicting the observed average of the target.
#
# As we can see, the model assigns a zero coefficient to features 1 and 2, indicating that it can learn
# As we can see, the model assigns a zero coefficient to features $X_1$ and $X_2$, indicating that it can learn
# nothing from them. We might want to conclude that these features hold no information on the target.
# However, since the performance of our model is really bad on external data, we should not draw any conclusion
# on the underlying process and focus on improving the model first.
Expand All @@ -40,10 +40,11 @@

linear_regressor = LassoCV(random_state=rng)
linear_regressor.fit(X_train, y_train)
# maybe a dataframe feature | coef will be better looking ?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks nicer for instance:

print("Coefficients of the linear model:")
print(
    pd.DataFrame(
        {f"x{idx}": f"{linear_regressor.coef_[idx]:.3f}" for idx in range(X.shape[1])},
        index=["Coefficient"],
    )
)

And for the second model:

print("Coefficients of the linear model:")
print(
    pd.DataFrame(
        {
            f"{feature_names[idx]}": f"{linear_regressor.coef_[idx]:.3f}"
            for idx in np.argsort(linear_regressor.coef_)[::-1]
        },
        index=["Coefficient"],
    )
)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was thinking something even simpler like:

pd.options.display.precision = 3
pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_})

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want ordering by coef:

pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_}).sort_values(by="coef")

print("Features with non zero coef:")
print(
[
f"Feature {idx}, coef: {linear_regressor.coef_[idx]:.2f}"
f"x{idx} coef={linear_regressor.coef_[idx]:.2f}"
for idx in range(3)
if linear_regressor.coef_[idx] != 0
]
Expand All @@ -63,9 +64,10 @@
#
# Discarding $X_1$ at the previous step would have been a mistake: it is used by the properly specified
# model. It was ignored by the simpler model as its impact on the target is only through its square and its
# interaction with $X_0$. We can now say that $X_1$ is important for the underlying process. Some features involving
# $X_2$ are receiving low but non zero coefficients in the second model. It is not clear from just these coefficients
# if $X_2$ has a low impact on the target or if the model is overfitting on it.
# interaction with $X_0$. We can now say that $X_1$ is important for the underlying process. Some features involving
# $X_2$ are receiving low but non zero coefficients in the second model. In our synthetic case, we know
# that the target $Y = X_0 + (X_0+X_1)^2 + \text{noise}$ does not depend on $X_2$, so these small nonzero coefficients
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this clarification.
I use + \mathcal{N}(0, \sigma^2) later for the noise so we should pick one to be consistent.
I think + \text{noise} might be better since we don't need to introduce \sigma in this case (which I did not do).

# are only finite sample errors.


# %%
Expand Down Expand Up @@ -161,14 +163,14 @@
ax1.set_xlabel("Importance")
ax1.set_title("Train Set")

idx_test = np.argsort(rf_pi_test.importances_mean)
# same ordering for the barplots
ax2.barh(
range(n_features),
rf_pi_test.importances_mean[idx_test],
xerr=rf_pi_test.importances_std[idx_test],
rf_pi_test.importances_mean[idx_train],
xerr=rf_pi_test.importances_std[idx_train],
)
ax2.set_yticks(range(n_features))
ax2.set_yticklabels(np.array(feature_names)[idx_test])
ax2.set_yticklabels(np.array(feature_names)[idx_train])
ax2.set_xlabel("Importance")
ax2.set_title("Test Set")

Expand Down Expand Up @@ -196,6 +198,8 @@
# features to improve the quality of our model. Recursive Feature Elimination with Cross
# Validation (`RFECV`) provides a good way to trim down irrelevant features.
# [Justify that permutation importance is sensible by citing Reyero Lobo et al. ?]
# [Yes ! Maybe explain that j irrelevant means X_j \perp Y | X_{-j}, and that PFI
# (in the optimal setting) is able to detect such irrelevant features]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll think of something.


from sklearn.feature_selection import RFECV

Expand Down Expand Up @@ -298,6 +302,7 @@ def permutation_importance_getter(model, feature_indices, X_val, y_val, random_s
)

# %% [markdown]
# [I feel the summary/recap is a bit dry, maybe give more details?]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll improve it, it's kind of a placeholder for now

# We showed here that we should be careful with feature importance: it can lead to misleading
# conclusions when the model is not optimal. We also showed how to use feature importance to
# perform feature selection and how that can lead to a better model.