Sklearn Get Feature Importance

Understanding which features are most influential in predicting your target variable is crucial for interpreting your machine learning model and improving its performance. In this guide, we’ll explore how to get feature importance using various methods in Scikit-learn (sklearn), a powerful Python library for machine learning. We’ll cover tree-based feature importance, permutation importance, and coefficients for linear models.

Introduction to Feature Importance

Feature importance refers to techniques that assign a score to input features based on their utility in predicting the target variable. This helps in understanding the data better and reducing the dimensionality by focusing on the most important features.

Why Feature Importance Matters

Model Interpretation: Helps in understanding how the model makes predictions.
Dimensionality Reduction: Reduces the number of features, leading to simpler and faster models.
Improved Performance: Enhances model performance by focusing on the most relevant features.
Feature Selection: Assists in selecting the most predictive features for building robust models.

Tree-Based Feature Importance

Tree-based models like Decision Trees, Random Forests, and Gradient Boosting have built-in feature importance scores based on how much a feature reduces the impurity (Gini or entropy).

Decision Trees Feature Importance

In a decision tree, feature importance is calculated based on the reduction in the criterion (like Gini impurity or entropy) used for splitting the nodes. The greater the reduction, the higher the importance.

from sklearn.tree import DecisionTreeClassifier

# Train a Decision Tree model
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)

# Get feature importances
importances = dt.feature_importances_
features = X.columns

# Display feature importances
for feature, importance in zip(features, importances):
    print(f"Feature: {feature}, Importance: {importance:.4f}")

Random Forest Feature Importance

Random Forest is an ensemble method that fits multiple decision trees and averages their predictions. Feature importance in Random Forest is calculated as the mean decrease in impurity.

from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import seaborn as sns

# Train the model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Get feature importances
importances = rf.feature_importances_
feature_names = X.columns

# Create a DataFrame for visualization
importances_df = pd.DataFrame({"feature_names": feature_names, "importances": importances})
importances_df.sort_values(by="importances", ascending=False, inplace=True)

# Plot feature importances
sns.barplot(x=importances_df["importances"], y=importances_df["feature_names"])
sns.despine(left=True, bottom=True)

Gradient Boosting Feature Importance

Gradient Boosting builds an ensemble of trees sequentially, with each tree trying to correct the errors of the previous one. Feature importance is also based on the reduction in impurity.

from sklearn.ensemble import GradientBoostingClassifier

# Train the model
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)

# Get feature importances
importances = gb.feature_importances_
features = X.columns

# Display feature importances
for feature, importance in zip(features, importances):
    print(f"Feature: {feature}, Importance: {importance:.4f}")

Permutation Feature Importance

Permutation feature importance is a model-agnostic technique that measures the decrease in model performance when a single feature value is randomly shuffled. This method can be applied to any model, not just tree-based ones.

Permutation Importance Example

from sklearn.inspection import permutation_importance

# Evaluate the model on the test set
r = permutation_importance(rf, X_test, y_test, n_repeats=30, random_state=42)

# Display the feature importance
for i in r.importances_mean.argsort()[::-1]:
    print(f"{X.columns[i]}: {r.importances_mean[i]:.3f} +/- {r.importances_std[i]:.3f}")

Permutation importance is particularly useful because it provides a direct measure of feature importance in terms of model performance degradation.

Advantages of Permutation Importance

Model-Agnostic: Can be applied to any model.
Reflects Real Impact: Shows the impact of a feature on the model’s performance.

Disadvantages of Permutation Importance

Computationally Intensive: Requires multiple model evaluations.
Not Suitable for High-Dimensional Data: Can be slow and less effective with many features.

Feature Importance in Linear Models

Linear models like Linear Regression, Ridge, Lasso, and Logistic Regression provide coefficients that can be interpreted as feature importance scores. These coefficients indicate the direction and magnitude of the relationship between each feature and the target variable.

Linear Regression Feature Importance

from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Train the model
lr = LinearRegression()
lr.fit(X_train, y_train)

# Get feature importances (coefficients)
coefficients = lr.coef_
features = X.columns

# Create a DataFrame for visualization
coefficients_df = pd.DataFrame({"feature_names": features, "coefficients": coefficients})
coefficients_df.sort_values(by="coefficients", ascending=False, inplace=True)

# Plot feature importances
plt.figure(figsize=(10, 6))
plt.barh(coefficients_df["feature_names"], coefficients_df["coefficients"])
plt.xlabel("Coefficient Value")
plt.ylabel("Feature")
plt.title("Feature Importance in Linear Regression")
plt.show()

This approach provides a straightforward interpretation of feature importance based on their coefficients in the model.

Ridge and Lasso Regression

Ridge and Lasso are regularized versions of Linear Regression that include a penalty term to prevent overfitting. The coefficients still indicate feature importance but are adjusted to account for the regularization.

from sklearn.linear_model import Ridge, Lasso

# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
ridge_importances = ridge.coef_

# Lasso Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
lasso_importances = lasso.coef_

# Display Ridge and Lasso importances
print("Ridge Importances:")
print(ridge_importances)
print("Lasso Importances:")
print(lasso_importances)

Feature Importance for Classification Models

For classification tasks, models like Logistic Regression can be used to determine feature importance. The coefficients represent the log odds of the outcome.

Logistic Regression Feature Importance

from sklearn.linear_model import LogisticRegression

# Train the model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Get feature importances (coefficients)
coefficients = log_reg.coef_[0]
features = X.columns

# Display feature importances
for feature, importance in zip(features, coefficients):
    print(f"Feature: {feature}, Importance: {importance:.4f}")

Practical Applications and Insights

Improving Model Performance

By focusing on the most important features, you can simplify your model, reduce overfitting, and potentially improve performance.

Understanding the Data

Feature importance helps in gaining insights into which features have the most influence on the target variable, aiding in better understanding and communication of model results.

Feature Selection

Using feature importance, you can select a subset of features that contribute the most to the model’s predictive power, leading to more efficient and interpretable models.

Advanced Techniques

SHAP (SHapley Additive exPlanations)

SHAP values provide a unified measure of feature importance, considering the contribution of each feature across all possible subsets of features.

import shap

# Train a model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Create a SHAP explainer
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test)

# Plot SHAP summary plot
shap.summary_plot(shap_values, X_test)

LIME (Local Interpretable Model-agnostic Explanations)

LIME explains individual predictions by approximating the model locally with an interpretable model.

from lime import lime_tabular

# Create a LIME explainer
explainer = lime_tabular.LimeTabularExplainer(X_train.values, feature_names=X.columns, class_names=['class_0', 'class_1'], discretize_continuous=True)

# Explain a single prediction
i = 0  # Index of the instance to explain
exp = explainer.explain_instance(X_test.iloc[i], model.predict_proba)
exp.show_in_notebook()

Conclusion

Determining feature importance is a critical step in understanding your model and improving its performance. Whether you use tree-based methods, permutation importance, or coefficients from linear models, Scikit-learn offers robust tools to help you extract and visualize feature importance.

By focusing on the most influential features, you can build simpler, faster, and more interpretable models that perform well on your data. Integrating these methods into your data science workflow will enhance your ability to make informed decisions and achieve better results.