當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paper：LIME之《Why Should I Trust You? Explaining the Predictions of Any Classifier为什么要相信你?解释任何分类器的预测》翻

發布時間：2023/12/14 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paper：LIME之《Why Should I Trust You? Explaining the Predictions of Any Classifier为什么要相信你?解释任何分类器的预测》翻小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Paper：LIME之《Why Should I Trust You??Explaining the Predictions of Any Classifier為什么要相信你?解釋任何分類器的預測》翻譯與解讀

Paper：《"Why Should I Trust You?": Explaining the Predictions of Any Classifier》翻譯與解讀

ABSTRACT

1.INTRODUCTION

2.THE CASE FOR EXPLANATIONS

Desired Characteristics for Explainers解釋者所需的特征

3.LOCAL INTERPRETABLE?MODEL-AGNOSTIC EXPLANATIONS本地可解釋的與模型無關的解釋

3.1 Interpretable Data Representations可解釋的數據表示

3.2 Fidelity-Interpretability Trade-off保真度-可解釋性權衡

3.3 Sampling for Local Exploration局部勘探取樣

3.4 Sparse Linear Explanations稀疏線性解釋

3.5 Example 1: Text classification with SVMs使用 SVM 進行文本分類

4.SUBMODULAR PICK FOR EXPLAINING MODELS用于解釋模型的子模塊選擇?

5.1 Experiment Setup實驗設置

5.2 Are explanations faithful to the model?解釋是否忠實于模型？

5.3 Should I trust this prediction?我應該相信這個預測嗎？

5.4 Can I trust this model?我可以相信這個模型嗎？

6.EVALUATION WITH HUMAN SUBJECTS用人類受試者評估

6.1Experiment setup實驗設置

6.2Can users select the best classifier?用戶可以選擇最好的分類器嗎？?

6.3Can non-experts improve a classifier?非專家可以改進分類器嗎？??

6.4 Do explanations lead to insights?解釋會帶來洞察力嗎？?

8.CONCLUSION AND FUTURE WORK結論和未來工作

Acknowledgements致謝

REFERENCES

Paper：《"Why Should I Trust You?": Explaining the Predictions of Any Classifier》翻譯與解讀

來源

arXiv:1602.04938v3 [cs.LG] 2016年8月9日
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Cite as: arXiv:1602.04938 [cs.LG]

??(or arXiv:1602.04938v3 [cs.LG] for this version) ?

??https://doi.org/10.48550/arXiv.1602.04938

作者

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

原文

https://arxiv.org/abs/1602.04938

ABSTRACT

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.

In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

盡管被廣泛采用，機器學習模型仍然被看作是一種黑匣子。然而，理解預測背后的原因對于評估信任是非常重要的，尤其是如果一個人計劃根據預測采取行動，或在選擇是否部署一個新模型時，這是至關重要的。這樣的理解也提供了對模型的洞察，可以用來將一個不可信的模型或預測轉換為一個可信的模型或預測。

在這項工作中，我們提出了一種新的解釋技術LIME，通過學習一個局部的可解釋模型，以一種可解釋和忠實的方式解釋任何分類器的預測。我們還提出了一種解釋模型的方法，通過以非冗余的方式呈現具有代表性的個體預測及其解釋，將任務框架為子模塊優化問題—Submodular。我們通過解釋文本（例如隨機森林）和圖像分類（例如神經網絡）的不同模型來展示這些方法的靈活性。在各種需要信任的場景中，我們通過模擬和人類受試者的新實驗展示了解釋的效用：定一個人是否應該信任一個預測，在模型之間選擇，改進一個不可信的分類器，以及確定為什么一個分類器不應該被信任。

1.INTRODUCTION

Machine learning is at the core of many recent advances in science and technology. Unfortunately, the important role of humans is an oft-overlooked aspect in the field. Whether humans are directly using machine learning classifiers as tools, or are deploying models within other products, a vital concern remains: if the users do not trust a model or a prediction, they will not use it. It is important to differentiate between two different (but related) definitions of trust: (1) trusting a prediction, i.e. whether a user trusts an individual prediction sufficiently to take some action based on it, and (2) trusting a model, i.e. whether the user trusts a model to behave in reasonable ways if deployed. Both are directly impacted by?how much the human understands a model’s behaviour, as opposed to seeing it as a black box.

機器學習是近年來許多科學技術進步的核心。不幸的是，人類的重要作用在該領域經常被忽視。無論人類是直接使用機器學習分類器作為工具，還是在其他產品中部署模型，一個至關重要的問題仍然存在：如果用戶不信任模型或預測，他們將不會使用它。區分兩個不同（但相關）的信任定義很重要：（1）信任一個預測，即用戶是否充分信任一個個人預測，從而采取一些基于該預測的行動，以及（2）信任一個模型，即用戶是否信任一個模型在部署后能夠以合理的方式行事。兩者都直接受到人類對模型行為的理解程度的影響，而不是將其視為黑匣子。

Determining trust in individual predictions is an important problem when the model is used for decision making. When using machine learning for medical diagnosis [6] or terrorism detection, for example, predictions cannot be acted upon on blind faith, as the consequences may be catastrophic.

Apart from trusting individual predictions, there is also a need to evaluate the model as a whole before deploying it “in the wild”. To make this decision, users need to be confident that the model will perform well on real-world data, according to the metrics of interest. Currently, models are evaluated using accuracy metrics on an available validation dataset. However, real-world data is often significantly different, and further, the evaluation metric may not be indicative of the product’s goal. Inspecting individual predictions and their explanations is a worthwhile solution, in addition to such metrics. In this case, it is important to aid users by suggesting which instances to inspect, especially for large datasets.

當模型用于決策時，確定個體預測的可信度是一個重要問題。例如，當使用機器學習進行醫學診斷?[6] 或恐怖主義檢測時，不能盲目相信預測，因為后果可能是災難性的。

除了信任個體的預測外，還需要在“自然場景下”部署模型之前對模型進行整體評估。為了做出這個決定，用戶需要確信模型將根據興趣的指標在現實即世界數據上表現良好。目前，模型是在可用的驗證數據集上使用精度指標進行評估的。然而，現實世界的數據往往有很大的不同，此外，評估指標可能并不代表產品的目標。除了這些指標之外，檢查個體預測及其解釋是一個有價值的解決方案。在這種情況下，重要的是通過建議檢查哪些實例來幫助用戶是很重要的，尤其是對于大型數據集。

In this paper, we propose providing explanations for indi- vidual predictions as a solution to the “trusting a prediction” problem, and selecting multiple such predictions (and expla- nations) as a solution to the “trusting the model” problem. Our main contributions are summarized as follows.

(1)、LIME, an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.

(2)、SP-LIME, a method that selects a set of representative instances with explanations to address the “trusting the model” problem, via submodular optimization.

(3)、Comprehensive evaluation with simulated and human sub- jects, where we measure the impact of explanations on trust and associated tasks. In our experiments, non-experts using LIME are able to pick which classifier from a pair generalizes better in the real world. Further, they are able to greatly improve an untrustworthy classifier trained on 20 newsgroups, by doing feature engineering using LIME. We also show how understanding the predictions of a neu- ral network on images helps practitioners know when and why they should not trust a model.

在本文中，我們提出為個體預測提供解釋作為“相信預測”問題的解決方案，并選擇多個此類預測（和解釋）作為“信任模型”問題的解決方案。我們的主要貢獻總結如下。

(1)、LIME，一種算法，可以通過一個可解釋的模型局部逼近，以忠實的方式解釋任何分類器或回歸器的預測。

(2)、SP-LIME，一種通過子模塊優化—Submodular選擇一組具有解釋的代表性實例來解決“信任模型”問題的方法。

(3)、通過模擬實驗和人類實驗進行綜合評估，衡量解釋對信任和相關任務的影響。在我們的實驗中，使用 LIME 的非專家能夠從一對分類器中挑選出哪個分類器在現實世界中的泛化效果更好。此外，通過使用 LIME 進行特征工程，他們能夠極大地改進在 20 個新聞組上訓練的不可信分類器。我們還展示了理解神經網絡對圖像的預測如何幫助從業者知道他們何時以及為什么不應該信任模型。

2.THE CASE FOR EXPLANATIONS

左邊是一個醫學診斷模型，它通過輸入某個病人的一些基本癥狀得出"Flu"的診斷結果，通過解釋器LIME的處理，可以得到，如放大鏡所示，LIME解釋器對于模型診斷結果的背后的解釋。其中綠色代表促進這個結果的特征，紅色代表反對這個結果的特征。我們將最后的選擇權交還到醫生的手里，醫生可以通過這些”解釋“來判斷是否采納機器學習的預測結果。

Figure 1: Explaining individual predictions. A model predicts that a patient has the u, and LIME highlights the symptoms in the patient's history that led to the prediction. Sneeze and headache are portrayed as contributing to the \ u" prediction, while \no fatigue" is evidence against it. With these, a doctor can make an informed decision about whether to trust the model's prediction.	圖 1：解釋個人預測。模型預測患者有 u，LIME 強調了導致這個預測的病人病史中的癥狀。打噴嚏和頭痛被描繪為有助于“u”預測，而“無疲勞”是反對它的證據。有了這些，醫生可以就是否信任模型的預測做出明智的決定。
By “explaining a prediction”, we mean presenting textual or visual artifacts that provide qualitative understanding of the relationship between the instance’s components (e.g. words in text, patches in an image) and the model’s prediction. We argue that explaining predictions is an important aspect in getting humans to trust and use machine learning effectively, if the explanations are faithful and intelligible.	通過“解釋預測”，我們的意思是呈現文本或視覺工件，以提供對實例組件（例如文本中的單詞、圖像中的補丁）與模型預測之間關系的定性理解。我們認為，如果解釋是忠實和可理解的，那么解釋預測是讓人類信任和有效使用機器學習的一個重要方面。
The process of explaining individual predictions is illus- trated in Figure 1. It is clear that a doctor is much better positioned to make a decision with the help of a model if intelligible explanations are provided. In this case, an ex- planation is a small list of symptoms with relative weights – symptoms that either contribute to the prediction (in green) or are evidence against it (in red). Humans usually have prior knowledge about the application domain, which they can use to accept (trust) or reject a prediction if they understand the reasoning behind it. It has been observed, for example, that providing explanations can increase the acceptance of movie recommendations [12] and other automated systems [8]. Every machine learning application also requires a certain measure of overall trust in the model. Development and evaluation of a classification model often consists of collect- ing annotated data, of which a held-out subset is used for automated evaluation. Although this is a useful pipeline for many applications, evaluation on validation data may not correspond to performance “in the wild”, as practitioners often overestimate the accuracy of their models [20], and thus trust cannot rely solely on it. Looking at examples offers an alternative method to assess truth in the model, especially if the examples are explained. We thus propose explaining several representative individual predictions of a model as a way to provide a global understanding.	解釋個人預測的過程如圖 1 所示。很明顯，如果提供了可理解的解釋，醫生可以更好地借助模型做出決定。在這種情況下，解釋是一小部分具有相對權重的癥狀——這些癥狀要么有助于預測（綠色），要么是反對預測的證據（紅色）。人們通常有關于應用領域的先驗知識，如果他們理解預測背后的原因，他們可以使用這些知識來接受(信任)或拒絕預測。例如，據觀察，提供解釋可以提高電影推薦 [12] 和其他自動化系統 [8] 的接受度。每個機器學習應用程序還需要對模型的整體信任度進行一定的衡量。分類模型的開發和評估通常包括收集帶注釋的數據，其中的一個外置子集用于自動評估。盡管這對于許多應用程序來說是一個有用的管道，但對驗證數據的評估可能并不對應于“自然場景下”的性能，因為從業者經常高估他們模型[20]的準確性，因此不能完全依賴于它。查看示例提供了一種評估模型真實性的的另一種方法，特別是在對示例進行了解釋的情況下。因此，我們建議解釋一個模型的幾個有代表性的個體預測，作為一種提供全局理解的方式。
There are several ways a model or its evaluation can go wrong. Data leakage, for example, defined as the uninten- tional leakage of signal into the training (and validation) data that would not appear when deployed [14], potentially increases accuracy. A challenging example cited by Kauf- man et al. [14] is one where the patient ID was found to be heavily correlated with the target class in the training and validation data. This issue would be incredibly challenging to identify just by observing the predictions and the raw data, but much easier if explanations such as the one in Figure 1 are provided, as patient ID would be listed as an explanation for predictions. Another particularly hard to detect problem is dataset shift [5], where training data is different than test data (we give an example in the famous 20 newsgroups dataset later on). The insights given by expla- nations are particularly helpful in identifying what must be done to convert an untrustworthy model into a trustworthy one – for example, removing leaked data or changing the training data to avoid dataset shift.	模型或其評估可能會出錯的方式有多種。例如，數據泄漏被定義為無意間會向部署[14]時不會出現的訓練(和驗證)數據泄漏信號，這可能會提高準確性。 Kaufman 等人引用的一個具有挑戰性的例子[14] 是在訓練和驗證數據中發現患者 ID 與目標類別高度相關的一種。僅通過觀察預測和原始數據來識別這個問題將非常具有挑戰性，但如果提供圖 1 中的解釋則容易得多，因為患者 ID 將被列為預測的解釋。另一個特別難以檢測的問題是數據集移位?[5]，其中訓練數據不同于測試數據（我們稍后會在著名的 20 個新聞組數據集中給出一個示例）。解釋給出的見解特別有助于確定必須做什么才能將一個不值得信任的模型轉換為值得信任的模型——例如，刪除泄露的數據或更改訓練數據以避免數據集轉移。
Machine learning practitioners often have to select a model from a number of alternatives, requiring them to assess the relative trust between two or more models. In Figure 2, we show how individual prediction explanations can be used to select between models, in conjunction with accuracy. In this case, the algorithm with higher accuracy on the validation set is actually much worse, a fact that is easy to see when explanations are provided (again, due to human prior knowledge), but hard otherwise. Further, there is frequently a mismatch between the metrics that we can compute and optimize (e.g. accuracy) and the actual metrics of interest such as user engagement and retention. While we may not be able to measure such metrics, we have knowledge about how certain model behaviors can influence them. Therefore, a practitioner may wish to choose a less accurate model for content recommendation that does not place high importance in features related to “clickbait” articles (which may hurt user retention), even if exploiting such features increases the accuracy of the model in cross validation. We note that explanations are particularly useful in these (and other) scenarios if a method can produce them for any model, so that a variety of models can be compared.	機器學習從業者通常必須從多個備選模型中選擇一個模型，要求他們評估兩個或多個模型之間的相對信任度。在圖 2 中，我們展示了如何使用單個預測解釋來選擇模型，以及準確性。在這種情況下，在驗證集上具有更高精度的算法實際上糟糕得多，當提供解釋時很容易看到這一事實(同樣，由于人類先驗知識)，但很難看到其他情況。此外，我們可以計算和優化的指標（例如準確性）與實際感興趣的指標（例如用戶參與度和留存率）之間經常存在不匹配。雖然我們可能無法衡量這些指標，但我們知道特定的模型行為如何影響它們。因此，從業者可能希望選擇一個不那么精確的內容推薦模型，該模型不會高度重視與“標題黨”文章相關的特性(這可能會損害用戶留存率)，即使利用這些特征會提高模型在交叉中的準確性驗證。我們注意到，如果一種方法可以為任何模型生成解釋，那么解釋在這些（和其他）場景中特別有用，以便可以比較各種模型。
?Figure 2: Explaining individual predictions of competing classiers trying to determine if a document is about \Christianity" or \Atheism". The bar chart represents the importance given to the most relevant words, also highlighted in the text. Color indicates which class the word contributes to (green for Christianity", magenta for \Atheism").	左邊如圖所示：一個文章分類的例子，判定該文章是描述“Christianity”還是“Atheism”?？梢灾纼纱蔚姆诸惗际钦_的。但仔細觀察可以發現Algorithm 2所判斷的主要依據是Posting、Host，這個詞匯其實與Atheism本身并沒有太多的聯系，雖然它的Accuracy很高，但它依然不可信的。因此我們可以認為這種”解釋“的行為給了我們選擇/不選擇一個模型的理由。圖 2：解釋競爭分類者試圖確定文檔是關于“基督教”還是“無神論”的個人預測。條形圖表示對最相關單詞的重要性，也在文本中突出顯示。顏色表示該詞屬于哪個類別（綠色代表基督教”，洋紅色代表“無神論”）。

Desired Characteristics for Explainers解釋者所需的特征

We now outline a number of desired characteristics from explanation methods.

An essential criterion for explanations is that they must be interpretable, i.e., provide qualitative understanding between the input variables and the response. We note that interpretability must take into account the user’s limitations. Thus, a linear model [24], a gradient vector [2] or an additive model [6] may or may not be interpretable. For example, if hundreds or thousands of features significantly contribute to a prediction, it is not reasonable to expect any user to comprehend why the prediction was made, even if individual weights can be inspected. This requirement further implies that explanations should be easy to understand, which is not necessarily true of the features used by the model, and thus the “input variables” in the explanations may need to be different than the features. Finally, we note that the notion of interpretability also depends on the target audience. Machine learning practitioners may be able to interpret small Bayesian networks, but laymen may be more comfortable with a small number of weighted features as an explanation. Another essential criterion is local fidelity. Although it is often impossible for an explanation to be completely faithful unless it is the complete description of the model itself, for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted. We note that local fidelity does not imply global fidelity: features that are globally important may not be important in the local context, and vice versa. While global fidelity would imply local fidelity, identifying globally faithful explanations that are interpretable remains a challenge for complex models.

我們現在從解釋方法中概述一些期望的特征。

解釋的一個基本標準是它們必須是可解釋的，即在輸入變量和響應之間提供定性的理解。我們注意到，可解釋性必須考慮到用戶的限制。因此，線性模型 [24]、梯度向量 [2] 或加性模型 [6] 可能是可解釋的，也可能是不可解釋的。例如，如果數百或數千個特征對預測有顯著貢獻，那么期望任何用戶理解為什么做出預測是不合理的，即使可以檢查各個權重。這一要求進一步意味著解釋應該易于理解，而模型使用的特征不一定如此，因此解釋中的“輸入變量”可能需要與特征不同。最后，我們注意到可解釋性的概念還取決于目標受眾。機器學習從業者可能能夠解釋小型貝葉斯網絡，但外行可能更愿意使用少量加權特征作為解釋。另一個基本標準是局部保真度。盡管一個解釋通常不可能完全可靠，除非它是模型本身的完整描述，但要使一個解釋有意義，它必須至少在局部是可靠的，也就是說，它必須與模型在被預測的實例附近的行為相對應。我們注意到局部保真度并不意味著全局保真度：全局重要的特征在局部上下文中可能并不重要，反之亦然。雖然全局保真度意味著局部保真度，但確定可解釋的全局保真解釋仍然是復雜模型的挑戰。

While there are models that are inherently interpretable [6, 17, 26, 27], an explainer should be able to explain any model,and thus be model-agnostic (i.e. treat the original model as a black box). Apart from the fact that many state-of- the-art classifiers are not currently interpretable, this also provides flexibility to explain future classifiers.

In addition to explaining predictions, providing a global perspective is important to ascertain trust in the model. As mentioned before, accuracy may often not be a suitable metric to evaluate the model, and thus we want to explain the model. Building upon the explanations for individual predictions, we select a few explanations to present to the user, such that they are representative of the model.

雖然有些模型本質上是可解釋的?[6, 17, 26, 27]，但解釋器應該能夠解釋任何模型，因此與模型無關（即將原始模型視為黑盒）。除了許多最先進的分類器目前無法解釋的事實之外，這也為解釋未來的分類器提供了靈活性。

除了解釋預測之外，提供全局視角對于確定對模型的信任也很重要。如前所述，準確性通?？赡懿皇窃u估模型的合適指標，因此我們想解釋模型。基于對單個預測的解釋的基礎上，我們選擇一些解釋來呈現給用戶，這樣它們就能代表模型。

3.LOCAL INTERPRETABLE?MODEL-AGNOSTIC EXPLANATIONS局部可解釋且與模型無關的解釋

We now present Local Interpretable Model-agnostic Explanations (LIME). The overall goal of LIME is to identify an interpretable model over the interpretable representation that is locally faithful to the classifier.

我們現在提出局部可解釋且與模型無關的（LIME）。 LIME 的總體目標是在局部忠實于分類器的可解釋表示上識別可解釋模型。

3.1 Interpretable Data Representations可解釋的數據表示

Before we present the explanation system, it is important to distinguish between features and interpretable data representations. As mentioned before, interpretable expla- nations need to use a representation that is understandable to humans, regardless of the actual features used by the model. For example, a possible interpretable representation for text classification is a binary vector indicating the pres- ence or absence of a word, even though the classifier may use more complex (and incomprehensible) features such as word embeddings. Likewise for image classification, an in- terpretable representation may be a binary vector indicating the “presence” or “absence” of a contiguous patch of similar pixels (a super-pixel), while the classifier may represent the image as a tensor with three color channels per pixel. We

denote x 2 Rd be the original representation of an instance being explained, and we use x0 2 f0; 1gd0

to denote a binary vector for its interpretable representation.

在我們介紹解釋系統之前，區分特征和可解釋的數據表示是很重要的。如前所述，可解釋的解釋需要使用人類可以理解的表示，而不管模型使用的實際特征如何。例如，文本分類的一個可能的可解釋表示是一個二進制向量，表示一個單詞的存在或不存在，即使分類器可能使用更復雜（和難以理解）的特征，如單詞嵌入。同樣地，對于圖像分類，可解釋的表示可以是表示相似像素（超像素）的連續補丁“存在”或“不存在”的二進制向量，而分類器可以表示圖像為一個張量，每個像素有三個顏色通道。我們

表示 x 2 Rd 是被解釋實例的原始表示，我們使用 x0 2 f0； 1gd0

表示其可解釋表示的二進制向量。

3.2 Fidelity-Interpretability Trade-off保真度-可解釋性權衡

Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule lists [27], i.e. a model g ∈ G can be readily presented to the user with visual or textual artifacts. The domain of g is {0,1}, i.e. g acts over absence/presence of the interpretable components. As not every g 2 G may be simple enough to be interpretable ?thus we let

(g) be a measure of complexity (as opposed to interpretability) of the explanation g 2 G. For example, for decision trees ?(g) may be the depth of the tree, while for linear models, ?(g) may be the number of non-zero weights.

形式上，我們將解釋定義為模型 g ∈ G，其中 G 是一類潛在的可解釋模型，例如線性模型、決策樹或下降規則列表 [27]，即模型 g ∈ G 可以很容易地呈現給具有視覺或文本偽影的用戶。 g 的域是 {0,1}，即 g 作用于可解釋組件的缺失/存在。由于并非每個 g 2 G 都可能足夠簡單以至于可以解釋，因此我們讓

(g) 是解釋 g 2 G 的復雜性（相對于可解釋性）的度量。例如，對于決策樹，(g) 可能是樹的深度，而對于線性模型，(g) 可能是數量非零權重。

Let the model being explained be denoted f : Rd ! R. In classication, f(x) is the probability (or a binary indicator) that x belongs to a certain class1. We further use _x0019_x(z) as a proximity measure between an instance z to x, so as to dene locality around x. Finally, let L(f; g; _x0019_x) be a measure of how unfaithful g is in approximating f in the locality dened by _x0019_x. In order to ensure both interpretability and local delity, we must minimize L(f; g; _x0019_x) while having (g) be low enough to be interpretable by humans. The explanation produced by LIME is obtained by the following:

?This formulation can be used with different explanation families G, fidelity functions L, and complexity measures ?. Here we focus on sparse linear models as explanations, and on performing the search using perturbations.

讓正在解釋的模型表示為 f : Rd ！ R. 在分類中，f(x) 是 x 屬于某個類別的概率（或二元指標）。我們進一步使用 _x0019_x(z) 作為實例 z 到 x 之間的接近度度量，以定義 x 周圍的局部性。最后，讓 L(f; g; _x0019_x) 衡量 g 在 _x0019_x 定義的局部性中逼近 f 的程度。為了確保可解釋性和局部保真度，我們必須最小化 L(f; g; _x0019_x)，同時讓 (g) 低到足以被人類解釋。 LIME 產生的解釋是通過以下方式獲得的：

該公式可以與不同的解釋族 G、保真度函數 L 和復雜度度量 Ω 一起使用。在這里，我們專注于稀疏線性模型作為解釋，并使用擾動執行搜索。

3.3 Sampling for Local Exploration局部勘探取樣

We want to minimize the locality-aware loss?L(f, g, πx)?without making any assumptions about?f?, since we want the explainer to be?model-agnostic. Thus, in order to learn?the local behavior of?f?as the interpretable inputs vary, we?approximate?L(f, g, πx)?by drawing samples, weighted by?πx.?We sample instances around?xl?by drawing nonzero elements of?xl?uniformly at random (where the number of

such draws is also uniformly sampled). Given a perturbed sample?zl?∈ {0,?1}dl?(which contains a fraction of the nonzero elements of?xl), we recover the sample in the original representation?z?∈?Rd?and obtain?f?(z), which is used as a?label?for the explanation model. Given this dataset?Z?of perturbed samples with the associated labels, we optimize Eq. (1) to?get an explanation?ξ(x). The primary intuition behind LIME is presented in Figure?3, where we sample instances both?in the vicinity of?x?(which have a high weight due to?πx)?and far away from?x?(low weight from?πx). Even though the original model may be too complex to explain globally, LIME presents an explanation that is locally faithful (linear?in this case), where the locality is captured by?πx.?It is worth noting that our method is fairly robust to sampling noise?since the samples are weighted by?πx?in Eq. (1). We now present a concrete instance of this general framework.

我們希望在不對 f 做任何假設的情況下最小化局部感知損失 L(f, g, πx)，因為我們希望解釋器與模型無關。因此，為了學習 f 在可解釋輸入變化時的局部行為，我們通過抽取樣本來近似 L(f, g, πx)，加權為 πx。我們通過隨機均勻地繪制 xl 的非零元素來對 xl 周圍的實例進行采樣（其中

這樣的抽獎也是統一抽樣的）。給定一個擾動樣本 zl ∈ {0, 1}dl （其中包含 xl 的一部分非零元素），我們恢復原始表示中的樣本 z ∈ Rd 并獲得 f (z)，它用作解釋模型。給定這個帶有相關標簽的擾動樣本數據集 Z，我們優化方程。 (1) 得到解釋 ξ(x)。 LIME 背后的主要直覺如圖 3 所示，我們在 x 附近（由于 πx 具有高權重）和遠離 x （由于 πx 的低權重）對實例進行采樣。盡管原始模型可能過于復雜而無法全局解釋，但 LIME 提供了一種局部忠實的解釋（在這種情況下為線性），其中局部性由 πx 捕獲。值得注意的是，我們的方法對采樣噪聲相當魯棒，因為樣本在方程式中由 πx 加權。 (1)。我們現在介紹這個通用框架的一個具體實例。

3.4 Sparse Linear Explanations稀疏線性解釋

For the rest of this paper, we let G be the class of linear models, such that g(z) = wg ·z. We use the locally weighted square loss as L, as de?ned in Eq. (2), where we let πx(z) = exp(?D(x, z)2/σ2) be an exponential kernel de?ned on some distance function D (e.g. cosine distance for text, L2 distance for images) with width σ.	對于本文的其余部分，我們讓 G 是線性模型的類別，使得 g(z) = wg·z。我們使用局部加權平方損失作為 L，如方程式中所定義。 (2)，其中我們讓 πx(z) = exp(?D(x, z)2/σ2) 是定義在某個距離函數 D（例如文本的余弦距離，圖像的 L2 距離）上的指數核，寬度為 σ .
	z′是z的可解釋形式（interpretable version）（可以理解為只有部分特征的實例）。對于文本來說，interpretable version就是選出K維的bag of words；對于圖像來說，interpretable version就是一組K維super-pixel組成的向量，K是人工設置的一個常數。LIME偽代碼所示
For text classification, we ensure that the explanation is interpretable?by letting the?interpretable representation?be a bag of words, and by setting a limit?K?on the number of?words, i.e. ?(g) =?∞1[?wg?0?>?K]. Potentially,?K?can be adapted to be as big as the user can handle, or we could have different values of?K?for different instances. In this paper we use a constant value for?K, leaving the exploration?of different values to future work. We use the same ? for image classification, using “super-pixels” (computed using any standard algorithm) instead of words, such that the interpretable representation of an image is a binary vector where 1 indicates the original super-pixel and 0 indicates a grayed out super-pixel. This particular choice of ? makes directly solving Eq. (1) intractable, but we approximate it by first selecting?K?features with Lasso (using the regularization path [9]) and then learning the weights via least squares (a procedure we call K-LASSO in Algorithm 1). Since Algo- rithm 1?produces an explanation for an individual prediction, its complexity does not depend on the size of the dataset, but instead on time to compute?f?(x) and on the number of samples?N?. In practice, explaining random forests with 1000 trees using scikit-learn (http://scikit-learn.org) on a laptop with?N?= 5000 takes under 3 seconds without any optimizations such as using gpus or parallelization. Explain- ing each prediction of the Inception network [25] for image classification takes around 10 minutes.	對于文本分類，我們確保解釋是通過讓可解釋的表示是一個詞袋并通過對詞的數量設置一個限制 K 來解釋，即 Ω(g) = ∞1[?wg?0?>?K]。潛在地，K 可以適應用戶可以處理的最大大小，或者我們可以為不同的實例設置不同的 K 值。在本文中，我們為 K 使用一個常數值，將不同值的探索留給未來的工作。我們使用相同的 Ω 進行圖像分類，使用“超像素”（使用任何標準算法計算）而不是單詞，這樣圖像的可解釋表示是二進制向量，其中 1 表示原始超像素，0 表示變灰超像素。 Ω 的這種特殊選擇使得直接求解方程。 (1) 難以處理，但我們首先通過 Lasso 選擇 K 個特征（使用正則化路徑 [9]）然后通過最小二乘法學習權重（我們在算法 1 中稱為 K-LASSO 的過程）來近似它。由于算法 1 對單個預測產生了解釋，因此其復雜性不取決于數據集的大小，而是取決于計算 f (x) 的時間和樣本數 N。在實踐中，在 N = 5000 的筆記本電腦上使用 scikit-learn (http://scikit-learn.org) 解釋具有 1000 棵樹的隨機森林需要不到 3 秒的時間，而無需任何優化，例如使用 gpus 或并行化。解釋 Inception 網絡 [25] 對圖像分類的每個預測大約需要 10 分鐘。
Any choice of interpretable representations and?G?will have some inherent drawbacks. First, while the underlying model can be treated as a black-box, certain interpretable representations will not be powerful enough to explain certain behaviors. For example, a model that predicts sepia-toned images to be?retro?cannot be explained by presence of absence of super pixels. Second, our choice of?G?(sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation. However, we can estimate the faithfulness of?the explanation on?Z, and present this information to the user. This estimate of faithfulness can also be used for selecting an appropriate family of explanations from a set of?multiple interpretable model classes, thus adapting to the given dataset and the classifier. We leave such exploration for future work, as linear explanations work quite well for multiple black-box models in our experiments.	任何可解釋的表示和 G 的選擇都會有一些固有的缺點。首先，雖然底層模型可以被視為一個黑盒，但某些可解釋的表示不足以解釋某些行為。例如，預測棕褐色圖像是復古的模型不能用超像素的缺失來解釋。其次，我們對 G（稀疏線性模型）的選擇意味著，如果基礎模型即使在預測的局部也是高度非線性的，則可能沒有忠實的解釋。但是，我們可以估計 Z 上解釋的真實性，并將此信息呈現給用戶。這種忠實度估計也可用于從一組多個可解釋模型類中選擇適當的解釋族，從而適應給定的數據集和分類器。我們將這種探索留給未來的工作，因為在我們的實驗中，線性解釋對于多個黑盒模型非常有效。
Figure 3: Toy example to present intuition for LIME.The black-box model’s complex decision function f (unknown to LIME) is represented by the blue/pink?background, which cannot be approximated well by a linear model. The bold red cross is the instance?being explained. LIME samples instances, gets pre- dictions using f , and weighs them by the?proximity to the instance being explained (represented here by size). The dashed line is the?learned explanationthat is locally (but not globally) faithful.	圖 3：可視化LIME 直覺的玩具示例。黑盒模型的復雜決策函數 f（LIME 未知）由藍色/粉紅色背景表示，線性模型無法很好地近似。加粗的紅叉是正在解釋的實例。 LIME 對實例進行采樣，使用 f 獲取預測，并根據與所解釋的實例的接近程度來衡量它們(這里用大小表示)。虛線是局部（但不是全局）忠實的學習解釋。

3.5 Example 1: Text classification with SVMs使用 SVM 進行文本分類

When using sparse linear explanations for image classifiers, one may wish to just highlight the super-pixels with posi- tive weight towards a specific class, as they give intuition as to why the model would think that class may be present.

We explain the prediction of Google’s pre-trained Inception neural network [25] in this fashion on an arbitrary image (Figure 4a). Figures 4b, 4c, 4d show the superpixels expla- nations for the top 3 predicted classes (with the rest of the image grayed out), having set K = 10. What the neural network picks up on for each of the classes is quite natural to humans - Figure 4b in particular provides insight as to why acoustic guitar was predicted to be electric: due to the fretboard. This kind of explanation enhances trust in the classifier (even if the top predicted class is wrong), as it shows that it is not acting in an unreasonable manner.

當對圖像分類器使用稀疏線性解釋時，人們可能希望僅突出顯示對特定類別具有正權重的超像素，因為它們可以直觀地說明模型為什么認為該類別可能存在。

我們以這種方式在任意圖像上解釋了谷歌預訓練的 Inception 神經網絡 [25] 的預測（圖 4a）。圖 4b、4c、4d 顯示了前 3 個預測類別的超像素解釋（圖像的其余部分變灰），設置 K = 10。神經網絡為每個類別拾取的內容非常自然對人類 - 圖 4b 特別提供了關于為什么原聲吉他被預測為電吉他的見解：由于指板。這種解釋增強了對分類器的信任（即使最上面的預測類是錯誤的），因為它表明它并沒有以不合理的方式行事。

Figure 4: Explaining an image classication prediction made by Google's Inception neural network. The top 3 classes predicted are \Electric Guitar" (p = 0:32), \Acoustic guitar" (p = 0:24) and \Labrador" (p = 0:21)

圖 4：解釋由 Google 的 Inception 神經網絡進行的圖像分類預測。預測的前 3 類是 \電吉他" (p = 0:32)、\木吉他" (p = 0:24) 和 \拉布拉多" (p = 0:21)

4.SUBMODULAR PICK FOR EXPLAINING MODELS用于解釋模型的子模塊選擇?

Although an explanation of a single prediction provides some understanding into the reliability of the classifier to the user, it is not sufficient to evaluate and assess trust in the model as a whole. We propose to give a global understanding of the model by explaining a set of individual instances. This approach is still model agnostic, and is complementary to computing summary statistics such as held-out accuracy.

Even though explanations of multiple instances can be insightful, these instances need to be selected judiciously, since users may not have the time to examine a large number of explanations. We represent the time/patience that humans have by a budget B that denotes the number of explanations?they are willing to look at in order to understand a model. Given a set of instances X, we define the pick step as the task of selecting B instances for the user to inspect.

盡管對單個預測的解釋為用戶提供了對分類器可靠性的一些理解，但這不足以評估和評估模型作為一個整體的信任。我們建議通過解釋一組單獨的實例來對模型進行全局理解。這種方法仍然與模型無關，并且是對計算匯總統計數據（例如保持準確性）的補充。

盡管對多個實例的解釋可能是深刻的，但需要明智地選擇這些實例，因為用戶可能沒有時間檢查大量的解釋。我們用預算 B 表示人類所擁有的時間/耐心，預算 B 表示他們為了理解模型而愿意查看的解釋數量。給定一組實例 X，我們將挑選步驟定義為選擇 B 個實例供用戶檢查的任務。

The pick step is not dependent on the existence of explana- tions - one of the main purpose of tools like Modeltracker [1] and others [11] is to assist users in selecting instances them- selves, and examining the raw data and predictions. However, since looking at raw data is not enough to understand predic- tions and get insights, the pick step should take into account the explanations that accompany each prediction. Moreover, this method should pick a diverse, representative set of expla- nations to show the user – i.e. non-redundant explanations that represent how the model behaves globally.

Given the explanations for a set of instances?X?(|X|?=?n), we construct an?n?×?dl?explanation matrix?W?that represents the local importance of the interpretable components for each instance. When using linear models as explanations,?for an instance?xi?and explanation?gi?=?ξ(xi),?we set?Wij?=|wgij?|. Further, for each component (column)?j?in?W, we?let?Ij?denote the?global?importance of that component in the explanation space. Intuitively, we want?I?such that

features that explain many different instances have higher importance scores. In Figure?5, we show a toy example?W, with?n?=?dl?= 5, where?W?is binary (for simplicity). The?importance function?I?should score feature f2 higher than?feature f1, i.e.?I2?>?I1,?since feature f2 is used to explain more instances. Concretely for the text applications, we set?

?For images, I must measure something features. We formalize this non-redundant coverage intuition that is comparable across the super-pixels in different images,such as color histograms or other features of super-pixels; we leave further exploration of these ideas for future work.

While we want to pick instances that cover the important components, the set of explanations must not be redundant in the components they show the users, i.e. avoid selecting instances with similar explanations. In Figure?5, after the second row is picked, the third row adds no value, as the user has already seen features f2 and f3 - while the last row exposes the user to completely new features. Selecting the second and last row results in the coverage of almost all the?features. We formalize this?non-redundant coverage intuition?that is comparable across the super-pixels in different images,in Eq. (3), where we define coverage as the set function?c?that, given?W?and?I, computes the total importance of the features that appear in at least one instance in a set?V?.

選擇步驟不依賴于解釋的存在——像Modeltracker [1] 和其他 [11] 等工具的主要目的之一是幫助用戶自己選擇實例，并檢查原始數據和預測。然而，由于查看原始數據不足以理解預測和獲得洞察力，因此選擇步驟應考慮每個預測附帶的解釋。此外，這種方法應該選擇一組多樣化的、有代表性的解釋來向用戶展示——即非冗余的解釋，這些解釋代表了模型的全局行為。

給定一組實例X (|X| = n)的解釋，我們構造了一個n × dl解釋矩陣W，表示每個實例的可解釋組件的局部重要性。當使用線性模型作為解釋時，對于一個實例xi和解釋gi = ξ(xi)，我們設Wij =|wgij |。進一步，對于W中的每個分量(列)j，我們讓Ij表示該分量在解釋空間中的全局重要性。直覺上，我們想要這樣

解釋許多不同實例的特征具有更高的重要性分數。在圖?5 中，我們展示了一個玩具示例 W，其中 n = dl = 5，其中 W 是二進制的（為簡單起見）。重要性函數 I 應該對特征 f2 評分高于特征 f1，即 I2 > I1，因為特征 f2 用于解釋更多實例。具體對于文本應用，我們設置

?對于圖像，我必須測量一些特征。我們將這種非冗余覆蓋直覺形式化，這種直覺在不同圖像中的超像素之間具有可比性，例如顏色直方圖或超像素的其他特征；我們將這些想法的進一步探索留給未來的工作。

雖然我們希望選擇涵蓋重要組件的實例，但解釋集在它們向用戶展示的組件中不能是多余的，即避免選擇具有相似解釋的實例。在圖?5中，選擇第二行后，第三行沒有增加任何價值，因為用戶已經看到了特征 f2 和 f3——而最后一行向用戶展示了全新的特征。選擇第二行和最后一行會導致幾乎所有特征的覆蓋。我們將這種非冗余覆蓋直覺形式化，它在不同圖像中的超像素之間具有可比性，在方程式中(3)，我們將覆蓋率定義為集合函數 c，它在給定 W 和 I 的情況下計算出現在集合 V 中至少一個實例中的特征的總重要性。

The problem in Eq. (4) is maximizing a weighted coverage function, and is NP-hard [10]. Let?c(V?∪{i},?W, I)?c(V,?W, I) be the marginal coverage gain of adding an instance?i?to a set?V . Due to submodularity, a greedy algorithm that iteratively adds the instance with the highest marginal coverage gain to the solution offers a constant-factor approximation guarantee of 1-1/e to the optimum [15]. We outline this approximation in Algorithm 2, and call it submodular pick.

式(4)中的問題是加權覆蓋函數的最大化，是NP-hard[10]。設 c(V?∪{i},?W, I)?c(V,?W, I) 是將實例 i 添加到集合 V 的邊際覆蓋增益。由于子模塊性，一個貪心算法迭代地將具有最高邊際覆蓋增益的實例添加到解決方案中，提供了一個常數因子近似保證1-1/e 到最優值 [15]。我們在算法 2 中概述了這種近似，并將其稱為子模選取—submodular pick。

Figure 5: Toy example W. Rows represent instances (documents) and columns represent features(words). Feature f2 (dotted blue) has the highest importance. Rows 2 and 5 (in red) would be selected by the pick procedure, covering all but feature f1.

圖 5：Toy案例 W。行表示實例（文檔），列表示特征（單詞）。特征 f2（藍色虛線）的重要性最高。選擇程序將選擇第 2 行和第 5 行（紅色），涵蓋除特征 f1 之外的所有內容。

5. SIMULATED USER EXPERIMENTS模擬用戶實驗

In this section, we present simulated user experiments to evaluate the utility of explanations in trust-related tasks. In?particular, we address the following questions: (1) Are the explanations faithful to the model, (2) Can the explanations aid users in ascertaining trust in predictions, and (3) Are the explanations useful for evaluating the model as a whole. Code and data for replicating our experiments are available at https://github.com/marcotcr/lime-experiments.

在本節中，我們展示了模擬用戶實驗，以評估解釋在信任相關任務中的效用。特別是，我們解決了以下問題：（1）解釋是否忠實于模型，（2）解釋能否幫助用戶確定對預測的信任，以及（3）解釋是否對評估整個模型有用。用于復制我們的實驗的代碼和數據可在 https://github.com/marcotcr/lime-experiments 獲得。

5.1 Experiment Setup實驗設置

We use two sentiment analysis datasets (books?and?DVDs, 2000 instances each) where the task is to classify prod- uct reviews as positive or negative [4]. We train decision trees (DT), logistic regression with L2 regularization (LR), nearest neighbors (NN), and support vector machines with RBF kernel (SVM), all using bag of words as features. We also include random forests (with 1000 trees) trained with the average word2vec embedding [19] (RF), a model that is impossible to interpret without a technique like LIME. We use the implementations and default parameters of scikit- learn, unless noted otherwise. We divide each dataset into train (1600 instances) and test (400 instances).

To explain individual predictions, we compare our pro- posed approach (LIME), with?parzen?[2], a method that approximates the black box classifier globally with Parzen windows, and explains individual predictions by taking the gradient of the prediction probability function. For parzen, we take the?K?features with the highest absolute gradients as explanations. We set the hyper-parameters for parzen and LIME using cross validation, and set?N?= 15,?000. We also compare against a?greedy?procedure (similar to Martens and Provost [18]) in which we greedily remove features that contribute the most to the predicted class until the prediction changes (or we reach the maximum of?K?features), and a?random?procedure that randomly picks?K?features as an explanation. We set?K?to 10 for our experiments.

For experiments where the pick procedure applies, we either do random selection (random pick,?RP) or the procedure described in?§4?(submodular pick,?SP). We refer to pick- explainer combinations by adding RP or SP as a prefix.

我們使用兩個情緒分析數據集（書籍和 DVD，每個 2000 個實例），其中任務是將產品評論分類為正面或負面 [4]。我們訓練決策樹 (DT)、帶有 L2 正則化 (LR) 的邏輯回歸、最近鄰 (NN) 和帶有 RBF 內核 (SVM) 的支持向量機，所有這些都使用詞袋作為特征。我們還包括用平均 word2vec 嵌入 [19] (RF) 訓練的隨機森林（有 1000 棵樹），如果沒有像 LIME 這樣的技術，這種模型是無法解釋的。除非另有說明，否則我們使用 scikit-learn 的實現和默認參數。我們將每個數據集分為訓練（1600 個實例）和測試（400 個實例）。

為了解釋個體預測，我們將我們提出的方法 (LIME) 與 parzen [2] 進行比較，parzen [2] 是一種使用 Parzen 窗口全局近似黑盒分類器的方法，并通過采用預測概率函數的梯度來解釋個體預測。對于 parzen，我們以絕對梯度最高的 K 個特征作為解釋。我們使用交叉驗證設置 parzen 和 LIME 的超參數，并設置 N = 15, 000。我們還與貪婪程序（類似于 Martens 和 Provost [18]）進行比較，在該程序中，我們貪婪地刪除了對預測類別，直到預測發生變化（或者我們達到 K 個特征的最大值），以及一個隨機選擇 K 個特征作為解釋的隨機過程。我們將 K 設置為 10 進行實驗。

對于采用挑選程序的實驗，我們要么進行隨機選擇(random pick, RP)，要么采用§4中描述的子模塊選擇(submodular pick, SP)。我們通過添加RP或SP作為前綴來表示挑選-解釋組合。

5.2 Are explanations faithful to the model?解釋是否忠實于模型？

Figure 6: Recall on truly important features for two interpretable classiers on the books dataset.
Figure 7: Recall on truly important features for two interpretable classiers on the DVDs dataset.

圖 6：回憶書籍數據集上兩個可解釋分類器的真正重要特征。
圖 7：回憶 DVD 數據集上兩個可解釋分類器的真正重要特征。

We measure faithfulness of explanations on classifiers that are by themselves interpretable (sparse logistic regression?and decision trees). In particular, we train both classifiers such that the maximum number of features they use for any instance is 10, and thus we know the gold set of features that the are considered important by these models. For each prediction on the test set, we generate explanations and compute the fraction of these gold features that are recovered by the explanations. We report this recall averaged over all the test instances in Figures 6 and 7. We observe that the greedy approach is comparable to parzen on logistic regression, but is substantially worse on decision trees since changing a single feature at a time often does not have an effect on the prediction. The overall recall by parzen is low, likely due to the difficulty in approximating the original high- dimensional classifier. LIME consistently provides > 90% recall for both classifiers on both datasets, demonstrating that LIME explanations are faithful to the models.

我們測量對本身可解釋的分類器（稀疏邏輯回歸和決策樹）的解釋的忠實度。特別是，我們訓練兩個分類器，使它們用于任何實例的最大特征數為 10，因此我們知道這些模型認為重要的特征的黃金集。對于測試集上的每個預測，我們生成解釋并計算由解釋恢復的這些黃金特征的比例。我們報告了圖 6 和圖 7 中所有測試實例的平均召回率。我們觀察到，貪心方法在邏輯回歸上與 parzen 相當，但在決策樹上要差得多，因為一次更改單個特征通常沒有對預測的影響。 parzen 的整體召回率很低，可能是由于難以逼近原始高維分類器。 LIME 在兩個數據集上始終為兩個分類器提供 > 90% 的召回率，這表明 LIME 解釋忠實于模型。

5.3 Should I trust this prediction?我應該相信這個預測嗎？

In order to simulate trust in individual predictions, we first randomly select 25% of the features to be “untrustworthy”, and assume that the users can identify and would not want to trust these features (such as the headers in 20 newsgroups, leaked data, etc). We thus develop oracle “trustworthiness” by labeling test set predictions from a black box classifier as “untrustworthy” if the prediction changes when untrustworthy features are removed from the instance, and “trustworthy” otherwise. In order to simulate users, we assume that users deem predictions untrustworthy from LIME and parzen ex- planations if the prediction from the linear approximation changes when all untrustworthy features that appear in the explanations are removed (the simulated human “discounts” the effect of untrustworthy features). For greedy and random, the prediction is mistrusted if any untrustworthy features are present in the explanation, since these methods do not provide a notion of the contribution of each feature to the prediction. Thus for each test set prediction, we can evaluate whether the simulated user trusts it using each explanation method, and compare it to the trustworthiness oracle.

Using this setup, we report the F1 on the trustworthy?predictions for each explanation method, averaged over 100 runs, in Table 1. The results indicate that LIME dominates others (all results are significant at?p?= 0.01) on both datasets, and for all of the black box models. The other methods either achieve a lower recall (i.e. they mistrust predictions more than they should) or lower precision (i.e. they trust too many predictions), while LIME maintains both high precision and high recall. Even though we artificially select which features are untrustworthy, these results indicate that LIME is helpful in assessing trust in individual predictions.

為了模擬對個體預測的信任，我們首先隨機選擇 25% 的特征為“不可信”，并假設用戶可以識別并且不想信任這些特征（例如 20 個新聞組中的標題、泄露的數據）， ETC）。因此，我們通過將來自黑盒分類器的測試集預測標記為“不可信”來開發預言機“可信度”，如果在從實例中刪除不可信特征時預測發生變化，否則為“可信”。為了模擬用戶，我們假設用戶認為來自 LIME 和 parzen 解釋的預測是不可信的，如果當所有出現在解釋中的不可信特征都被移除時線性近似的預測發生變化（模擬人“折扣”不可信的影響特征）。對于貪婪和隨機，如果解釋中存在任何不可信的特征，則預測是不可信的，因為這些方法沒有提供每個特征對預測的貢獻的概念。因此，對于每個測試集預測，我們可以使用每種解釋方法評估模擬用戶是否信任它，并將其與可信度預言機進行比較。

使用此設置，我們在表?1 中報告了每種解釋方法的可信預測的 F1，平均超過 100 次運行。結果表明 LIME 在兩個數據集上均優于其他方法（所有結果在 p = 0.01 時均顯著），并且對于所有黑盒模型。其他方法要么實現較低的召回率（即他們對預測的不信任程度超過了應有的程度），要么實現了較低的精度（即他們相信太多的預測），而 LIME 則保持了高精度和高召回率。盡管我們人為地選擇了哪些特征是不可信的，但這些結果表明 LIME 有助于評估個人預測的信任度。

Table 1: Average F1 of trustworthiness for different explainers on a collection of classiers and datasets.

Figure 8: Choosing between two classiers, as the number of instances shown to a simulated user is varied. Averages and standard errors from 800 runs.

表 1：不同解釋器在分類器和數據集集合上的平均可信度 F1。

圖 8：在兩個分類器之間進行選擇，因為向模擬用戶顯示的實例數量是不同的。 800 次測試的平均值和標準誤差。

5.4 Can I trust this model?我可以相信這個模型嗎？

In the final simulated user experiment, we evaluate whether the explanations can be used for model selection, simulating the case where a human has to decide between two competing models with similar accuracy on validation data. For this purpose, we add 10 artificially “noisy” features. Specifically, on training and validation sets (80/20 split of the original training data), each artificial feature appears in 10% of the examples in one class, and 20% of the other, while on the test instances, each artificial feature appears in 10% of the examples in each class. This recreates the situation where the models use not only features that are informative in the real world, but also ones that introduce spurious correlations. We create pairs of competing classifiers by repeatedly training pairs of random forests with 30 trees until their validation accuracy is within 0.1% of each other, but their test accuracy differs by at least 5%. Thus, it is not possible to identify the better classifier (the one with higher test accuracy) from the accuracy on the validation data.

The goal of this experiment is to evaluate whether a user can identify the better classifier based on the explanations of B instances from the validation set. The simulated human marks the set of artificial features that appear in the B explanations as untrustworthy, following which we evaluate how many total predictions in the validation set should be trusted (as in the previous section, treating only marked features as untrustworthy). Then, we select the classifier with?fewer untrustworthy predictions, and compare this choice to the classifier with higher held-out test set accuracy.

在最終的模擬用戶實驗中，我們評估解釋是否可用于模型選擇，模擬人類必須在驗證數據上具有相似準確性的兩個競爭模型之間做出決定的情況。為此，我們添加了 10 個人為的“嘈雜”特征。具體來說，在訓練集和驗證集（原始訓練數據的 80/20 分割）上，每個人為特征出現在一個類中 10% 的示例中，在另一個類中出現 20%，而在測試實例上，每個人為特征出現在每個班級的 10% 的例子中。這重現了模型不僅使用在現實世界中提供信息的特征，而且還使用引入虛假相關性的特征的情況。我們通過重復訓練具有 30 棵樹的隨機森林對來創建競爭分類器對，直到它們的驗證準確度在彼此的 0.1% 以內，但它們的測試準確度至少相差 5%。因此，不可能從驗證數據的準確度中識別出更好的分類器（具有更高測試準確度的分類器）。

本實驗的目的是評估用戶是否可以根據驗證集中對 B 個實例的解釋來識別更好的分類器。模擬人將出現在 B 解釋中的人工特征集標記為不可信，然后我們評估驗證集中有多少總預測應該被信任（如上一節中，僅將標記的特征視為不可信）。然后，我們選擇具有較少不可信預測的分類器，并將該選擇與具有較高保留測試集準確度的分類器進行比較。

We present the accuracy of picking the correct classifier as B varies, averaged over 800 runs, in Figure 8. We omit SP-parzen and RP-parzen from the figure since they did not produce useful explanations, performing only slightly better than random. LIME is consistently better than greedy, irre- spective of the pick method. Further, combining submodular pick with LIME outperforms all other methods, in particular it is much better than RP-LIME when only a few examples are shown to the users. These results demonstrate that the trust assessments provided by SP-selected LIME explana- tions are good indicators of generalization, which we validate with human experiments in the next section.

我們在圖 8 中展示了隨著 B 變化而選擇正確分類器的準確度，平均超過 800 次運行。我們從圖中省略了 SP-parzen 和 RP-parzen，因為它們沒有產生有用的解釋，只比隨機的表現好一點。無論選擇哪種方法，LIME都比greedy要好。此外，將 submodular pick 與 LIME 相結合優于所有其他方法，特別是當僅向用戶顯示幾個示例時，它比 RP-LIME 好得多。這些結果表明，SP 選擇的 LIME 解釋提供的信任評估是很好的泛化指標，我們在下一節中通過人體實驗驗證了這一點。

6.EVALUATION WITH HUMAN SUBJECTS用人類受試者評估

In this section, we recreate three scenarios in machine learning that require trust and understanding of predictions and models. In particular, we evaluate LIME and SP-LIME in the following settings: (1) Can users choose which of two?classifiers generalizes better (§ 6.2), (2) based on the explanations, can users perform feature engineering to improve the?model (§ 6.3), and (3) are users able to identify and describe classifier irregularities by looking at explanations (§ 6.4).

在本節中，我們在機器學習中重新創建了三個需要信任和理解預測和模型的場景。特別是，我們在以下設置中評估 LIME 和 SP-LIME：（1）用戶可以選擇兩個分類器中哪一個更好地泛化（第 6.2 節），（2）基于解釋，用戶可以執行特征工程來改進模型（ § 6.3) 和 (3) 是用戶能夠通過查看解釋來識別和描述分類器違規行為（§ 6.4）。

6.1Experiment setup實驗設置

For experiments in §6.2 and §6.3, we use the “Christianity” and “Atheism” documents from the 20 newsgroups dataset mentioned beforehand. This dataset is problematic since it?contains features that do not generalize (e.g. very informative header information and author names), and thus validation accuracy considerably overestimates real-world performance.

In order to estimate the real world performance, we create a new religion dataset for evaluation. We download Atheism and Christianity websites from the DMOZ directory and human curated lists, yielding 819 webpages in each class. High accuracy on this dataset by a classifier trained on 20 newsgroups indicates that the classifier is generalizing using semantic content, instead of placing importance on the data specific issues outlined above. Unless noted otherwise, we use SVM with RBF kernel, trained on the 20 newsgroups data with hyper-parameters tuned via the cross-validation.

對于 §6.2 和 §6.3 中的實驗，我們使用前面提到的 20 個新聞組數據集中的“Christianity”和“Atheism”文檔。這個數據集是有問題的，因為它包含不能泛化的特征（例如非常豐富的標題信息和作者姓名），因此驗證準確性大大高估了現實世界的表現。

為了估計真實世界的表現，我們創建了一個新的宗教數據集進行評估。我們從 DMOZ 目錄和人工管理列表下載無神論和基督教網站，每個班級產生 819 個網頁。在 20 個新聞組上訓練的分類器對該數據集的高精度表明該分類器正在使用語義內容進行泛化，而不是重視上述數據特定問題。除非另有說明，否則我們使用帶有 RBF 內核的 SVM，對 20 個新聞組數據進行訓練，并通過交叉驗證調整超參數。

6.2Can users select the best classifier?用戶可以選擇最好的分類器嗎？?

In this section, we want to evaluate whether explanations can help users decide which classifier generalizes better, i.e., which classifier would the user deploy “in the wild”. Specif- ically, users have to decide between two classifiers: SVM trained on the original 20 newsgroups dataset, and a version of the same classifier trained on a “cleaned” dataset where many of the features that do not generalize have been man- ually removed. The original classifier achieves an accuracy score of 57.3% on the religion dataset, while the “cleaned” classifier achieves a score of 69.0%. In contrast, the test accu- racy on the original 20 newsgroups split is 94.0% and 88.6%, respectively – suggesting that the worse classifier would be selected if accuracy alone is used as a measure of trust.

?在本節中，我們要評估解釋是否可以幫助用戶決定哪個分類器可以更好地概括，即用戶將“自然場景下”部署哪個分類器。具體來說，用戶必須在兩個分類器之間做出選擇：在原始 20 個新聞組數據集上訓練的 SVM，以及在“清理”數據集上訓練的同一分類器的版本，其中許多不能泛化的特征已被手動刪除.原始分類器在宗教數據集上的準確度得分為 57.3%，而“清潔”分類器的得分為 69.0%。相比之下，原始 20 個新聞組拆分的測試準確度分別為 94.0% 和 88.6%——這表明如果僅使用準確度作為信任度衡量標準，則會選擇較差的分類器。

We recruit human subjects on Amazon Mechanical Turk – by no means machine learning experts, but instead people with basic knowledge about religion. We measure their ability to choose the better algorithm by seeing side-by- side explanations with the associated raw data (as shown in Figure 2). We restrict both the number of words in each explanation (K) and the number of documents that each?person inspects (B) to 6. The position of each algorithm and the order of the instances seen are randomized between subjects. After examining the explanations, users are asked to select which algorithm will perform best in the real world. The explanations are produced by either greedy (chosen as a baseline due to its performance in the simulated user experiment) or LIME, and the instances are selected either by random (RP) or submodular pick (SP). We modify the greedy step in Algorithm 2 slightly so it alternates between explanations of the two classifiers. For each setting, we repeat the experiment with 100 users.

The results are presented in Figure 9. Note that all of the methods are good at identifying the better classifier, demonstrating that the explanations are useful in determining which classifier to trust, while using test set accuracy would result in the selection of the wrong classifier. Further, we see that the submodular pick (SP) greatly improves the user’s ability to select the best classifier when compared to random pick (RP), with LIME outperforming greedy in both cases.

我們在 Amazon Mechanical Turk 上招募人類受試者——絕不是機器學習專家，而是具有宗教基本知識的人。我們通過查看相關原始數據的并排解釋來衡量他們選擇更好算法的能力（如圖 2 所示）。我們將每個解釋中的單詞數 (K) 和每個人檢查的文檔數 (B) 都限制為 6。每個算法的位置和看到的實例的順序在受試者之間是隨機的。在檢查了解釋之后，用戶被要求選擇哪種算法在現實世界中表現最好。解釋由貪心（由于其在模擬用戶實驗中的表現而被選為基線）或 LIME 產生，并且通過隨機（RP）或子模塊選擇（SP）選擇實例。我們稍微修改算法 2 中的貪心步驟，使其在兩個分類器的解釋之間交替。對于每個設置，我們對 100 個用戶重復實驗。

結果如圖 9 所示。請注意，所有方法都擅長識別更好的分類器，這表明這些解釋對于確定要信任的分類器很有用，而使用測試集的準確性會導致選擇錯誤的分類器。此外，我們看到與隨機選擇 (RP) 相比，子模塊選擇 (SP) 極大地提高了用戶選擇最佳分類器的能力，而 LIME 在這兩種情況下都優于貪心。

Figure 9: Average accuracy of human subject (with standard errors) in choosing between two classiers.

Figure 10: Feature engineering experiment. Each shaded line represents the average accuracy of subjects in a path starting from one of the initial 10 subjects. Each solid line represents the average across all paths per round of interaction.

圖 9：人類受試者在兩個分類器之間進行選擇時的平均準確度（帶有標準誤差）。

圖 10：特征工程實驗。每條陰影線代表從最初 10 個主題之一開始的路徑中主題的平均準確度。每條實線代表每輪交互中所有路徑的平均值。

6.3Can non-experts improve a classifier?非專家可以改進分類器嗎？??

If one notes that a classifier is untrustworthy, a common task in machine learning is feature engineering, i.e. modifying the set of features and retraining in order to improve gener- alization. Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize.

We use the 20 newsgroups data here as well, and ask Ama- zon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training, for the worse classifier from the previous section (§6.2). In each?round, the subject marks words for deletion after observing?B = 10 instances with K = 10 words in each explanation (an interface similar to Figure 2, but with a single algorithm). As a reminder, the users here are not experts in machine learning and are unfamiliar with feature engineering, thus are only identifying words based on their semantic content. Further, users do not have any access to the religion dataset – they do not even know of its existence. We start the experi- ment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed). The explanations for each classier are then presented to a set of 5 users in a new round of interaction, which results in 50 new classiers. We do a final round, after which we have 250 classiers, each with a path of interaction tracing back to the rst 10 subjects.

如果有人注意到分類器不可信，那么機器學習中的一項常見任務是特征工程，即修改特征集并重新訓練以提高泛化能力。解釋可以通過呈現重要特征來幫助這個過程，特別是對于刪除用戶認為不能概括的特征。

我們在這里也使用了 20 個新聞組數據，并要求 Amazon Mechanical Turk 用戶識別解釋中的哪些單詞應該從后續訓練中刪除，以獲取上一節中更差的分類器（第 6.2 節）。在每一輪中，受試者在觀察 B = 10 個實例（每個解釋中的 K = 10 個單詞）后將單詞標記為刪除（界面類似于圖 2，但使用單一算法）。提醒一下，這里的用戶不是機器學習專家，也不熟悉特征工程，因此只是根據語義內容來識別單詞。此外，用戶無權訪問宗教數據集——他們甚至不知道它的存在。我們從 10 個受試者開始實驗。在他們標記要刪除的單詞后，我們訓練了 10 個不同的分類器，每個主題一個（刪除了相應的單詞）。然后在新一輪交互中將每個分類器的解釋呈現給一組 5 個用戶，從而產生 50 個新分類器。我們進行最后一輪，之后我們有 250 個分類器，每個分類器都有一條交互路徑可以追溯到前 10 個主題。

The explanations and instances shown to each user are produced by?SP-LIME?or?RP-LIME. We show the average?accuracy on the religion dataset at each interaction round for the paths originating from each of the original 10 subjects (shaded lines), and the average across all paths (solid lines) in Figure 10. It is clear from the gure that the crowd workers are able to improve the model by removing features they deem unimportant for the task.Further, SP-LIME outperforms RP-LIME, indicating selection of the instances to show the users is crucial for efficient feature engineering.

Each subject took an average of 3.6 minutes per round of cleaning, resulting in just under 11 minutes to produce?a classifier that generalizes much better to real world data. Each path had on average 200 words removed with SP, and 157 with RP, indicating that incorporating coverage of important features is useful for feature engineering. Further, out of an average of 200 words selected with SP, 174 were selected by at least half of the users, while 68 by all the users. Along with the fact that the variance in the accuracy decreases across rounds, this high agreement demonstrates that the users are converging to similar correct models. This evaluation is an example of how explanations make it easy to improve an untrustworthy classifier – in this case easy enough that machine learning knowledge is not required.

向每個用戶顯示的說明和實例由 SP-LIME 或 RP-LIME 生成。我們展示了宗教數據集在每個交互輪次中源自原始 10 個主題（陰影線）的路徑的平均準確度，以及圖 10 中所有路徑的平均值（實線）。從圖中可以清楚地看出，群眾工作者能夠通過刪除他們認為對任務不重要的特征來改進模型。此外，SP-LIME 優于 RP-LIME，這表明選擇實例以向用戶展示對于高效的特征工程至關重要。

每個受試者每輪清潔平均需要 3.6 分鐘，因此只需不到 11 分鐘即可生成一個分類器，該分類器可以更好地泛化到現實世界的數據。每條路徑用 SP 平均刪除了 200 個單詞，用 RP 刪除了 157 個單詞，這表明合并重要特征的覆蓋對于特征工程是有用的。此外，在使用 SP 選擇的平均 200 個單詞中，至少有一半的用戶選擇了 174 個，而所有用戶選擇了 68 個。除了準確度的差異在各輪之間減小的事實外，這種高度一致性表明用戶正在收斂到相似的正確模型。該評估是解釋如何使改進不可信分類器變得容易的一個例子——在這種情況下很容易，不需要機器學習知識。

6.4 Do explanations lead to insights?解釋會帶來洞察力嗎？?

Often artifacts of data collection can induce undesirable correlations that the classifiers pick up during training. These issues can be very difficult to identify just by looking at the raw data and predictions. In an effort to reproduce such a setting, we take the task of distinguishing between photos of Wolves and Eskimo Dogs (huskies). We train a logistic regression classifier on a training set of 20 images, hand selected such that all pictures of wolves had snow in the background, while pictures of huskies did not. As the features for the images, we use the first max-pooling layer of Google’s pre-trained Inception neural network [25]. On a collection of additional 60 images, the classifier predicts “Wolf” if there is snow (or light background at the bottom), and “Husky” otherwise, regardless of animal color, position, pose, etc. We trained this bad classifier intentionally, to evaluate whether subjects are able to detect it.

The experiment proceeds as follows: we first present a balanced set of 10 test predictions (without explanations), where one wolf is not in a snowy background (and thus the prediction is “Husky”) and one husky is (and is thus predicted as “Wolf”). We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm?to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions.

通常，數據收集的假象會誘導分類器在訓練過程中獲得不必要的相關性。僅通過查看原始數據和預測可能很難識別這些問題。為了重現這樣的場景，我們進行了區分狼和愛斯基摩狗(哈士奇)照片的任務。我們在包含 20 張圖像的訓練集上訓練邏輯回歸分類器，這些圖像是手工選擇的，使得所有狼的圖片背景中都有雪，而哈士奇的圖片則沒有。作為圖像的特征，我們使用谷歌預訓練的 Inception 神經網絡 [25] 的第一個最大池化層。在另外 60 張圖像的集合中，如果有雪（或底部有淺色背景），分類器會預測“狼”，否則會預測“哈士奇”，而不管動物的顏色、位置、姿勢等。我們有意訓練了這個糟糕的分類器，以評估受試者是否能夠檢測到它。

實驗進行如下：我們首先提出了一組平衡的 10 個測試預測（沒有解釋），其中一只狼不在雪背景中(因此預測為“哈士奇”)，另一只哈士奇在雪背景中(因此預測為“狼”)。我們在圖 11a 中展示了“Husky”錯誤。其他 8 個示例分類正確。然后我們問受試者三個問題：（1）他們是否相信這個算法在現實世界中運行良好，（2）為什么，以及（3）他們如何認為這個算法能夠區分這些狼和哈士奇的照片。在得到這些回復后，我們會展示帶有相關解釋的相同圖像，如圖 11b 所示，并提出相同的問題。

Since this task requires some familiarity with the notion of spurious correlations and generalization, the set of subjects for this experiment were graduate students who have taken at least one graduate machine learning course. After gathering the responses, we had 3 independent evaluators read their reasoning and determine if each subject mentioned snow, background, or equivalent as a feature the model may be using. We pick the majority to decide whether the subject was correct about the insight, and report these numbers before and after showing the explanations in Table?2.

Before observing the explanations, more than a third trusted the classifier, and a little less than half mentioned the snow pattern as something the neural network was using?although all speculated on other patterns. After examining the explanations, however, almost all of the subjects identi- fied the correct insight, with much more certainty that it was a determining factor. Further, the trust in the classifier also dropped substantially. Although our sample size is small, this experiment demonstrates the utility of explaining indi- vidual predictions for getting insights into classifiers knowing when not to trust them and why.

由于這項任務需要對偽關聯和泛化的概念有一定的熟悉，所以這個實驗的實驗對象是至少上過一門機器學習課程的研究生。在收集了回答后，我們讓3名獨立的評估人員閱讀他們的推理，并確定每個受試者是否提到了雪、背景或等效物作為模型可能使用的特征。我們選擇大多數來決定主題對洞察力的判斷是否正確，并在表?2中顯示解釋之前和之后報告這些數字。

在觀察解釋之前，超過三分之一的人相信分類器，不到一半的人提到雪花圖案是神經網絡正在使用的東西，盡管所有人都推測其他圖案。然而，在檢查了這些解釋之后，幾乎所有的受試者都確定了正確的洞察力，并且更加確定這是一個決定性因素。此外，對分類器的信任度也大幅下降。盡管我們的樣本量很小，但這個實驗證明了解釋個體預測對深入了解分類器的效用，知道什么時候不相信它們以及為什么不相信它們。

Figure 11: Raw data and explanation of a bad model's prediction in the \Husky vs Wolf" task.

Table 2: \Husky vs Wolf" experiment results.

圖 11：“Husky vs Wolf”任務中不良模型預測的原始數據和解釋。

表 2：“Husky vs Wolf”實驗結果。

The problems with relying on validation set accuracy as the primary measure of trust have been well studied. Practi- tioners consistently overestimate their model’s accuracy [20], propagate feedback loops [23], or fail to notice data leaks [14]. In order to address these issues, researchers have proposed tools like Gestalt [21] and Modeltracker [1], which help users navigate individual instances. These tools are complemen- tary to LIME in terms of explaining models, since they do not address the problem of explaining individual predictions. Further, our submodular pick procedure can be incorporated in such tools to aid users in navigating larger datasets.

Some recent work aims to anticipate failures in machine?learning, specifically for vision tasks [3, 29]. Letting users know when the systems are likely to fail can lead to an increase in trust, by avoiding “silly mistakes” [8]. These solutions either require additional annotations and feature engineering that is specific to vision tasks or do not provide insight into why a decision should not be trusted. Further- more, they assume that the current evaluation metrics are reliable, which may not be the case if problems such as data leakage are present. Other recent work [11] focuses on ex- posing users to different kinds of mistakes (our pick step). Interestingly, the subjects in their study did not notice the serious problems in the 20 newsgroups data even after look- ing at many mistakes, suggesting that examining raw data is not sufficient. Note that Groce et al. [11] are not alone in this regard, many researchers in the field have unwittingly published classifiers that would not generalize for this task. Using LIME, we show that even non-experts are able to identify these irregularities when explanations are present. Further, LIME can complement these existing systems, and allow users to assess trust even when a prediction seems “correct” but is made for the wrong reasons.

依賴驗證集準確性作為信任的主要衡量標準的問題已經得到了很好的研究。實踐者總是高估他們模型的準確性[20]，傳播反饋循環[23]，或者沒有注意到數據泄漏[14]。為了解決這些問題，研究人員提出了 Gestalt [21] 和 Modeltracker [1] 等工具，它們可以幫助用戶導航各個實例。這些工具在解釋模型方面是 LIME 的補充，因為它們沒有解決解釋單個預測的問題。此外，我們的子模塊挑選過程可以合并到這些工具中，以幫助用戶導航更大的數據集。

最近的一些工作旨在預測機器學習中的失敗，特別是視覺任務 [3, 29]。通過避免“愚蠢的錯誤”[8]，讓用戶知道系統何時可能會失敗可以增加信任。這些解決方案要么需要特定于視覺任務的額外注釋和特征工程，要么不提供關于為什么不應該信任決策的洞察力。此外，他們假設當前的評估指標是可靠的，如果存在數據泄漏等問題，情況可能并非如此。最近的其他工作 [11] 側重于讓用戶暴露在不同類型的錯誤（我們的選擇步驟）。有趣的是，他們研究的受試者即使在查看了許多錯誤之后也沒有注意到 20 個新聞組數據中的嚴重問題，這表明檢查原始數據是不夠的。請注意，Groce 等人 [11] 在這方面并不是唯一的，該領域的許多研究人員都無意中發表了不會對這一任務進行概括的分類器。使用 LIME，我們表明即使是非專家也能夠在存在解釋時識別這些不規則現象。此外，LIME 可以補充這些現有系統，并允許用戶評估信任，即使一個預測看起來是“正確的”，但卻是基于錯誤的原因做出的。

Recognizing the utility of explanations in assessing trust, many have proposed using interpretable models [27], espe- cially for the medical domain [6, 17, 26]. While such models may be appropriate for some domains, they may not apply equally well to others (e.g. a supersparse linear model [26]?with 5 ? 10 features is unsuitable for text applications). Interpretability, in these cases, comes at the cost of flexibility,?accuracy, or efficiency. For text, EluciDebug [16] is a full human-in-the-loop system that shares many of our goals (interpretability, faithfulness, etc). However, they focus on an already interpretable model (Naive Bayes). In computer vision, systems that rely on object detection to produce candidate alignments [13] or attention [28] are able to pro- duce explanations for their predictions. These are, however, constrained to specific neural network architectures or inca- pable of detecting “non object” parts of the images. Here we focus on general, model-agnostic explanations that can be applied to any classifier or regressor that is appropriate for the domain - even ones that are yet to be proposed.

認識到解釋在評估信任中的效用，許多人提出使用可解釋模型?[27]，特別是在醫學領域?[6,17,26]。雖然此類模型可能適用于某些領域，但它們可能不同樣適用于其他領域（例如，具有 5-10 個特征的超稀疏線性模型 [26] 不適合文本應用程序）。在這些情況下，可解釋性是以靈活性、準確性或效率為代價的。對于文本，EluciDebug [16] 是一個完整的人在環系統，它共享我們的許多目標（可解釋性、忠實度等）。然而，他們專注于一個已經可以解釋的模型（樸素貝葉斯）。在計算機視覺中，依賴對象檢測來產生候選對齊 [13] 或注意力 [28] 的系統能夠對其預測產生解釋。然而，這些受限于特定的神經網絡架構或無法檢測圖像的“非目標”部分。在這里，我們專注于通用的、與模型無關的解釋，這些解釋可以應用于適用于該領域的任何分類器或回歸器——即使是尚未提出的分類器或回歸器。

A common approach to model-agnostic explanation is learn- ing a potentially interpretable model on the predictions of the original model [2, 7, 22]. Having the explanation be a gradient vector [2] captures a similar locality intuition to that of LIME. However, interpreting the coefficients on the gradient is difficult, particularly for confident predictions (where gradient is near zero). Further, these explanations ap- proximate the original model globally, thus maintaining local fidelity becomes a significant challenge, as our experiments demonstrate. In contrast, LIME solves the much more feasi- ble task of finding a model that approximates the original model locally. The idea of perturbing inputs for explanations has been explored before [24], where the authors focus on learning a specific contribution model, as opposed to our general framework. None of these approaches explicitly take cognitive limitations into account, and thus may produce non-interpretable explanations, such as a gradients or linear models with thousands of non-zero weights. The problem becomes worse if the original features are nonsensical to humans (e.g. word embeddings). In contrast, LIME incor- porates interpretability both in the optimization and in our notion of interpretable representation, such that domain and task specific interpretability criteria can be accommodated.

與模型無關的解釋的一種常見方法是在原始模型的預測上學習一個潛在的可解釋模型?[2, 7, 22]。將解釋設為梯度向量 [2] 可以捕獲與 LIME 相似的局部直覺。然而，解釋梯度上的系數是困難的，特別是對于自信的預測（梯度接近于零）。此外，正如我們的實驗所證明的，這些解釋在全局范圍內近似于原始模型，因此保持局部保真度成為一項重大挑戰。相比之下，LIME 解決了更可行的任務，即找到一個在局部逼近原始模型的模型。在 [24] 之前已經探索過擾動解釋輸入的想法，作者專注于學習特定的貢獻模型，而不是我們的一般框架。這些方法都沒有明確地考慮到認知的局限性，因此可能會產生不可解釋的解釋，例如具有數千個非零權重的梯度或線性模型。如果原始特征對人類無意義（例如詞嵌入），問題會變得更糟。相比之下，LIME 在優化和我們的可解釋表示概念中都包含了可解釋性，因此可以適應特定領域和任務的可解釋性標準。

8.CONCLUSION AND FUTURE WORK結論和未來工作

In this paper, we argued that trust is crucial for effective human interaction with machine learning systems, and that explaining individual predictions is important in assessing trust. We proposed LIME, a modular and extensible ap- proach to faithfully explain the predictions of?any?model in an interpretable manner. We also introduced SP-LIME, a method to select representative and non-redundant predic- tions, providing a global view of the model to users. Our experiments demonstrated that explanations are useful for a variety of models in trust-related tasks in the text and image domains, with both expert and non-expert users: deciding between models, assessing trust, improving untrustworthy models, and getting insights into predictions.

在本文中，我們認為信任對于人類與機器學習系統的有效互動至關重要，并且解釋個體預測在評估信任方面很重要。我們提出了LIME，這是一種模塊化和可擴展的方法，可以以可解釋的方式忠實地解釋任何模型的預測。我們還介紹了?SP-LIME，這是一種選擇代表性和非冗余預測的方法，為用戶提供了一個模型的全局視圖。我們的實驗表明，解釋對于文本和圖像領域中與信任相關的任務中的各種模型很有用，包括專家和非專家用戶：在模型之間做出決定、評估信任、改進不可信模型以及深入了解預測。

There are a number of avenues of future work that we would like to explore. Although we describe only sparse linear models as explanations, our framework supports the exploration of a variety of explanation families, such as de- cision trees; it would be interesting to see a comparative study on these with real users. One issue that we do not mention in this work was how to perform the pick step for images, and we would like to address this limitation in the future. The domain and model agnosticism enables us to explore a variety of applications, and we would like to inves- tigate potential uses in speech, video, and medical domains, as well as recommendation systems. Finally, we would like to explore theoretical properties (such as the appropriate number of samples) and computational optimizations (such as using parallelization and GPU processing), in order to provide the accurate, real-time explanations that are critical for any human-in-the-loop machine learning system.

我們想要探索的未來工作有許多途徑。雖然我們只描述稀疏的線性模型作為解釋，但我們的框架支持對各種解釋家族的探索，例如決策樹；與真實用戶進行比較研究會很有趣。我們在這項工作中沒有提到的一個問題是如何執行圖像的挑選步驟，我們希望在未來解決這個限制。領域和模型不可知論使我們能夠探索各種應用，我們希望研究語音、視頻和醫學領域以及推薦系統的潛在用途。最后，我們想探索理論特性（例如適當數量的樣本）和計算優化（例如使用并行化和 GPU 處理），便提供準確、實時的解釋，這對任何人在循環的機器學習系統都是至關重要的。

Acknowledgements致謝

We would like to thank Scott Lundberg, Tianqi Chen, and Tyler Johnson for helpful discussions and feedback. This work was supported in part by ONR awards #W911NF-13- 1-0246 and #N00014-13-1-0023, and in part by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.

我們要感謝 Scott Lundberg、Tianqi Chen 和 Tyler Johnson 的有益討論和反饋。這項工作部分得到了 ONR 獎項 #W911NF-13-1-0246 和 #N00014-13-1-0023 的支持，部分得到了 STARnet 六個中心之一的 TerraSwarm 的支持，這是一個由 MARCO 和 DARPA 贊助的半導體研究公司項目.

REFERENCES

[1] S. Amershi, M. Chickering, S. M. Drucker, B. Lee,

P. Simard, and J. Suh. Modeltracker: Redesigning performance analysis tools for machine learning. In?Human Factors in Computing Systems (CHI), 2015.

[2] D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe,

K. Hansen, and K.-R. Mu¨ller. How to explain individual classification decisions.?Journal of Machine Learning Research, 11, 2010.

[3] A. Bansal, A. Farhadi, and D. Parikh. Towards transparent systems: Semantic characterization of failure modes. In?European Conference on Computer Vision (ECCV), 2014.

[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In?Association for Computational Linguistics (ACL), 2007.

[5] J. Q. Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence.?Dataset Shift in Machine Learning. MIT, 2009.

[6] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and

N. Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In?Knowledge Discovery and Data Mining (KDD), 2015.

[7] M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks.?Neural information processing systems (NIPS), pages 24–30, 1996.

[8] M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, L. G. Pierce, and H. P. Beck. The role of trust in automation reliance.?Int. J. Hum.-Comput. Stud., 58(6), 2003.?

[9] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression.?Annals of Statistics, 32:407–499, 2004.

[10] U. Feige. A threshold of ln n for approximating set cover.?J. ACM, 45(4), July 1998.

[11] A. Groce, T. Kulesza, C. Zhang, S. Shamasunder,

M. Burnett, W.-K. Wong, S. Stumpf, S. Das, A. Shinsel,

F. Bice, and K. McIntosh. You are the only possible oracle: Effective test selection for end users of interactive machine learning systems.?IEEE Trans. Softw. Eng., 40(3), 2014.

[12] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In?Conference on Computer Supported Cooperative Work (CSCW), 2000.

[13] A. Karpathy and F. Li. Deep visual-semantic alignments for generating image descriptions. In?Computer Vision and Pattern Recognition (CVPR), 2015.

[14] S. Kaufman, S. Rosset, and C. Perlich. Leakage in data mining: Formulation, detection, and avoidance. In?Knowledge Discovery and Data Mining (KDD), 2011.

[15] A. Krause and D. Golovin. Submodular function maximization. In?Tractability: Practical Approaches to Hard Problems. Cambridge University Press, February 2014.

[16] T. Kulesza, M. Burnett, W.-K. Wong, and S. Stumpf.

Principles of explanatory debugging to personalize interactive machine learning. In?Intelligent User Interfaces (IUI), 2015.

[17] B. Letham, C. Rudin, T. H. McCormick, and D. Madigan.

Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model.?Annals of Applied Statistics, 2015.

[18] D. Martens and F. Provost. Explaining data-driven document classifications.?MIS Q., 38(1), 2014.

[19] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and

J. Dean. Distributed representations of words and phrases and their compositionality. In?Neural Information Processing Systems (NIPS). 2013.

[20] K. Patel, J. Fogarty, J. A. Landay, and B. Harrison.

Investigating statistical machine learning as a tool for software development. In?Human Factors in Computing Systems (CHI), 2008.

[21] K. Patel, N. Bancroft, S. M. Drucker, J. Fogarty, A. J. Ko, and J. Landay. Gestalt: Integrated support for implementation and analysis in machine learning. In?User Interface Software and Technology (UIST), 2010.

[22] I. Sanchez, T. Rocktaschel, S. Riedel, and S. Singh. Towards extracting faithful and descriptive representations of latent variable models. In?AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches, 2015.

[23] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips,

D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden technical debt in machine learning systems. In?Neural Information Processing Systems (NIPS). 2015.

[24] E. Strumbelj and I. Kononenko. An efficient explanation of individual classifications using game theory.?Journal of Machine Learning Research, 11, 2010.

[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,

D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In?Computer Vision and Pattern Recognition (CVPR), 2015.

[26] B. Ustun and C. Rudin. Supersparse linear integer models for optimized medical scoring systems.?Machine Learning, 2015.

[27] F. Wang and C. Rudin. Falling rule lists. In?Artificial Intelligence and Statistics (AISTATS), 2015.

[28] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville,

R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In?International Conference on Machine Learning (ICML), 2015.

[29] P. Zhang, J. Wang, A. Farhadi, M. Hebert, and D. Parikh.

Predicting failures of vision systems. In?Computer Vision and Pattern Recognition (CVPR), 2014.

總結

以上是生活随笔為你收集整理的Paper：LIME之《Why Should I Trust You? Explaining the Predictions of Any Classifier为什么要相信你?解释任何分类器的预测》翻的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：怎么将CAJ转换成PDF
下一篇：蓝牙地址解析(NAP/UAP/LAP)

编程问答

Paper：LIME之《Why Should I Trust You? Explaining the Predictions of Any Classifier为什么要相信你?解释任何分类器的预测》翻

Paper：《"Why Should I Trust You?": Explaining the Predictions of Any Classifier》翻譯與解讀

ABSTRACT

1.INTRODUCTION

2.THE CASE FOR EXPLANATIONS

Desired Characteristics for Explainers解釋者所需的特征

3.LOCAL INTERPRETABLE?MODEL-AGNOSTIC EXPLANATIONS局部可解釋且與模型無關的解釋

3.1 Interpretable Data Representations可解釋的數據表示

3.2 Fidelity-Interpretability Trade-off保真度-可解釋性權衡

3.3 Sampling for Local Exploration局部勘探取樣

3.4 Sparse Linear Explanations稀疏線性解釋

3.5 Example 1: Text classification with SVMs使用 SVM 進行文本分類

4.SUBMODULAR PICK FOR EXPLAINING MODELS用于解釋模型的子模塊選擇?

5. SIMULATED USER EXPERIMENTS模擬用戶實驗

5.1 Experiment Setup實驗設置

5.2 Are explanations faithful to the model?解釋是否忠實于模型？

5.3 Should I trust this prediction?我應該相信這個預測嗎？

5.4 Can I trust this model?我可以相信這個模型嗎？

6.EVALUATION WITH HUMAN SUBJECTS用人類受試者評估

6.1Experiment setup實驗設置

6.2Can users select the best classifier?用戶可以選擇最好的分類器嗎？?

6.3Can non-experts improve a classifier?非專家可以改進分類器嗎？??

6.4 Do explanations lead to insights?解釋會帶來洞察力嗎？?

7.RELATED WORK相關工作

8.CONCLUSION AND FUTURE WORK結論和未來工作

Acknowledgements致謝

REFERENCES

總結