This gives a simple example of explaining a linear logistic regression sentiment analysis model using shap. Note that with a linear model the SHAP value for feature i for the prediction $f(x)$ (assuming feature independence) is just $\phi_i = \beta_i \cdot (x_i - E[x_i])$. Since we are explaining a logistic regression model the units of the SHAP values will be in the log-odds space.
The dataset we use is the classic IMDB dataset from this paper. It is interesting when explaining the model how words that are absent from the text are sometimes just as important as those that are present.
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import numpy as np
import shap
shap.initjs()
corpus,y = shap.datasets.imdb()
corpus_train, corpus_test, y_train, y_test = train_test_split(corpus, y, test_size=0.2, random_state=7)
vectorizer = TfidfVectorizer(min_df=10)
X_train = vectorizer.fit_transform(corpus_train)
X_test = vectorizer.transform(corpus_test)
model = sklearn.linear_model.LogisticRegression(penalty="l1", C=0.1)
model.fit(X_train, y_train)
explainer = shap.LinearExplainer(model, X_train, feature_dependence="independent")
shap_values = explainer.shap_values(X_test)
X_test_array = X_test.toarray() # we need to pass a dense version for the plotting functions
shap.summary_plot(shap_values, X_test_array, feature_names=vectorizer.get_feature_names())
Remember that higher means more likely to be negative, so in the plots below the "red" features are actually helping raise the chance of a positive review, while the negative features are lowering the chance. It is interesting to see how what is not present in the text (like bad=0 below) is often just as important as what is in the text. Remember the values of the features are TF-IDF values. It is interesting that "and" is the most important feature of the text, perhaps because it captures some high level notion of the text structure (having lots of "and"s apparently indicates a more positive review).
ind = 0
shap.force_plot(
explainer.expected_value, shap_values[ind,:], X_test_array[ind,:],
feature_names=vectorizer.get_feature_names()
)
print("Positive" if y_test[ind] else "Negative", "Review:")
print(corpus_test[ind])
ind = 1
shap.force_plot(
explainer.expected_value, shap_values[ind,:], X_test_array[ind,:],
feature_names=vectorizer.get_feature_names()
)
print("Positive" if y_test[ind] else "Negative", "Review:")
print(corpus_test[ind])
ind = 2
shap.force_plot(
explainer.expected_value, shap_values[ind,:], X_test_array[ind,:],
feature_names=vectorizer.get_feature_names()
)
print("Positive" if y_test[ind] else "Negative", "Review:")
print(corpus_test[ind])