transformers-interpret


Nametransformers-interpret JSON
Version 0.10.0 PyPI version JSON
download
home_page
SummaryModel explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
upload_time2023-04-06 23:05:29
maintainer
docs_urlNone
authorCharles Pierse
requires_python>=3.7,<4.0
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
    <a id="transformers-intepret" href="#transformers-intepret">
        <img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/tight%401920x_transparent.png" alt="Transformers Intepret Title" title="Transformers Intepret Title" width="600"/>
    </a>
</p>

<p align="center"> Explainability for any 🤗 Transformers models in 2 lines.</p>

<h1 align="center"></h1>

<p align="center">
    <a href="https://opensource.org/licenses/Apache-2.0">
        <img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/>
    <a href="https://github.com/cdpierse/transformers-interpret/actions/workflows/unit_tests.yml">
        <img src="https://github.com/cdpierse/transformers-interpret/actions/workflows/unit_tests.yml/badge.svg">
    </a>
            <a href="https://github.com/cdpierse/transformers-interpret/releases">
        <img src="https://img.shields.io/pypi/v/transformers_interpret?label=version"/>
    </a>
        <a href="https://pepy.tech/project/transformers-interpret">
        <img src="https://static.pepy.tech/personalized-badge/transformers-interpret?period=total&units=abbreviation&left_color=black&right_color=brightgreen&left_text=Downloads">
    </a>
</p>

Transformers Interpret is a model explainability tool designed to work exclusively with the 🤗 [transformers][transformers] package.

In line with the philosophy of the Transformers package Transformers Interpret allows any transformers model to be explained in just two lines. Explainers are available for both text and computer vision models. Visualizations are also available in notebooks and as savable png and html files.

Check out the streamlit [demo app here](https://share.streamlit.io/cdpierse/transformers-interpret-streamlit/main/app.py)

## Install

```posh
pip install transformers-interpret
```

## Quick Start

### Text Explainers

<details><summary>Sequence Classification Explainer and Pairwise Sequence Classification</summary>

<p>
Let's start by initializing a transformers' model and tokenizer, and running it through the `SequenceClassificationExplainer`.

For this example we are using `distilbert-base-uncased-finetuned-sst-2-english`, a distilbert model finetuned on a sentiment analysis task.

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# With both the model and tokenizer initialized we are now able to get explanations on an example text.

from transformers_interpret import SequenceClassificationExplainer
cls_explainer = SequenceClassificationExplainer(
    model,
    tokenizer)
word_attributions = cls_explainer("I love you, I like you")
```

Which will return the following list of tuples:

```python
>>> word_attributions
[('[CLS]', 0.0),
 ('i', 0.2778544699186709),
 ('love', 0.7792370723380415),
 ('you', 0.38560088858031094),
 (',', -0.01769750505546915),
 ('i', 0.12071898121557832),
 ('like', 0.19091105304734457),
 ('you', 0.33994871536713467),
 ('[SEP]', 0.0)]
```

Positive attribution numbers indicate a word contributes positively towards the predicted class, while negative numbers indicate a word contributes negatively towards the predicted class. Here we can see that **I love you** gets the most attention.

You can use `predicted_class_index` in case you'd want to know what the predicted class actually is:

```python
>>> cls_explainer.predicted_class_index
array(1)
```

And if the model has label names for each class, we can see these too using `predicted_class_name`:

```python
>>> cls_explainer.predicted_class_name
'POSITIVE'
```

#### Visualize Classification attributions

Sometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the `visualize()` method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions.

If you are in a notebook, calls to the `visualize()` method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.

```python
cls_explainer.visualize("distilbert_viz.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example.png" width="80%" height="80%" align="center"/>
</a>

#### Explaining Attributions for Non Predicted Class

Attribution explanations are not limited to the predicted class. Let's test a more complex sentence that contains mixed sentiments.

In the example below we pass `class_name="NEGATIVE"` as an argument indicating we would like the attributions to be explained for the **NEGATIVE** class regardless of what the actual prediction is. Effectively because this is a binary classifier we are getting the inverse attributions.

```python
cls_explainer = SequenceClassificationExplainer(model, tokenizer)
attributions = cls_explainer("I love you, I like you, I also kinda dislike you", class_name="NEGATIVE")
```

In this case, `predicted_class_name` still returns a prediction of the **POSITIVE** class, because the model has generated the same prediction but nonetheless we are interested in looking at the attributions for the negative class regardless of the predicted result.

```python
>>> cls_explainer.predicted_class_name
'POSITIVE'
```

But when we visualize the attributions we can see that the words "**...kinda dislike**" are contributing to a prediction of the "NEGATIVE"
class.

```python
cls_explainer.visualize("distilbert_negative_attr.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example_negative.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example_negative.png" width="80%" height="80%" align="center" />
</a>

Getting attributions for different classes is particularly insightful for multiclass problems as it allows you to inspect model predictions for a number of different classes and sanity-check that the model is "looking" at the right things.

For a detailed explanation of this example please checkout this [multiclass classification notebook.](notebooks/multiclass_classification_example.ipynb)

### Pairwise Sequence Classification

The `PairwiseSequenceClassificationExplainer` is a variant of the the `SequenceClassificationExplainer` that is designed to work with classification models that expect the input sequence to be two inputs separated by a models' separator token. Common examples of this are [NLI models](https://arxiv.org/abs/1705.02364) and [Cross-Encoders ](https://www.sbert.net/docs/pretrained_cross-encoders.html) which are commonly used to score two inputs similarity to one another.

This explainer calculates pairwise attributions for two passed inputs `text1` and `text2` using the model
and tokenizer given in the constructor.

Also, since a common use case for pairwise sequence classification is to compare two inputs similarity - models of this nature typically only have a single output node rather than multiple for each class. The pairwise sequence classification has some useful utility functions to make interpreting single node outputs clearer.

By default for models that output a single node the attributions are with respect to the inputs pushing the scores closer to 1.0, however if you want to see the
attributions with respect to scores closer to 0.0 you can pass `flip_sign=True`. For similarity
based models this is useful, as the model might predict a score closer to 0.0 for the two inputs
and in that case we would flip the attributions sign to explain why the two inputs are dissimilar.

Let's start by initializing a cross-encoder model and tokenizer from the suite of [pre-trained cross-encoders ](https://www.sbert.net/docs/pretrained_cross-encoders.html)provided by [sentence-transformers](https://github.com/UKPLab/sentence-transformers).

For this example we are using `"cross-encoder/ms-marco-MiniLM-L-6-v2"`, a high quality cross-encoder trained on the [MSMarco dataset](https://github.com/microsoft/MSMARCO-Passage-Ranking) a passage ranking dataset for question answering and machine reading comprehension.

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from transformers_interpret import PairwiseSequenceClassificationExplainer

model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")

pairwise_explainer = PairwiseSequenceClassificationExplainer(model, tokenizer)

# the pairwise explainer requires two string inputs to be passed, in this case given the nature of the model
# we pass a query string and a context string. The question we are asking of our model is "does this context contain a valid answer to our question"
# the higher the score the better the fit.

query = "How many people live in Berlin?"
context = "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."
pairwise_attr = pairwise_explainer(query, context)
```

Which returns the following attributions:

```python
>>> pairwise_attr
[('[CLS]', 0.0),
 ('how', -0.037558652124213034),
 ('many', -0.40348581975409786),
 ('people', -0.29756140282349425),
 ('live', -0.48979015417391764),
 ('in', -0.17844527885888117),
 ('berlin', 0.3737346097442739),
 ('?', -0.2281428913480142),
 ('[SEP]', 0.0),
 ('berlin', 0.18282430604641564),
 ('has', 0.039114659489254834),
 ('a', 0.0820056652212297),
 ('population', 0.35712150914643026),
 ('of', 0.09680870840224687),
 ('3', 0.04791760029513795),
 (',', 0.040330986539774266),
 ('520', 0.16307677913176166),
 (',', -0.005919693904602767),
 ('03', 0.019431649515841844),
 ('##1', -0.0243808667024702),
 ('registered', 0.07748341753369632),
 ('inhabitants', 0.23904087299731255),
 ('in', 0.07553221327346359),
 ('an', 0.033112821611999875),
 ('area', -0.025378852244447532),
 ('of', 0.026526373859562906),
 ('89', 0.0030700151809002147),
 ('##1', -0.000410387092186983),
 ('.', -0.0193147139126114),
 ('82', 0.0073800833347678774),
 ('square', 0.028988305990861576),
 ('kilometers', 0.02071182933829008),
 ('.', -0.025901070914318036),
 ('[SEP]', 0.0)]
```

#### Visualize Pairwise Classification attributions

Visualizing the pairwise attributions is no different to the sequence classification explaine. We can see that in both the `query` and `context` there is a lot of positive attribution for the word `berlin` as well the words `population` and `inhabitants` in the `context`, good signs that our model understands the textual context of the question asked.

```python
pairwise_explainer.visualize("cross_encoder_attr.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/pairwise_cross_encoder_example.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/pairwise_cross_encoder_example.png" width="100%" height="100%" align="center" />
</a>

If we were more interested in highlighting the input attributions that pushed the model away from the positive class of this single node output we could pass:

```python
pairwise_attr = explainer(query, context, flip_sign=True)
```

This simply inverts the sign of the attributions ensuring that they are with respect to the model outputting 0 rather than 1.

</details>

<details><summary>MultiLabel Classification Explainer</summary>
<p>

This explainer is an extension of the `SequenceClassificationExplainer` and is thus compatible with all sequence classification models from the Transformers package. The key change in this explainer is that it caclulates attributions for each label in the model's config and returns a dictionary of word attributions w.r.t to each label. The `visualize()` method also displays a table of attributions with attributions calculated per label.

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import MultiLabelClassificationExplainer

model_name = "j-hartmann/emotion-english-distilroberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


cls_explainer = MultiLabelClassificationExplainer(model, tokenizer)


word_attributions = cls_explainer("There were many aspects of the film I liked, but it was frightening and gross in parts. My parents hated it.")
```

This produces a dictionary of word attributions mapping labels to a list of tuples for each word and it's attribution score.

<details><summary>Click to see word attribution dictionary</summary>

```python
>>> word_attributions
{'anger': [('<s>', 0.0),
           ('There', 0.09002208622000409),
           ('were', -0.025129709879675187),
           ('many', -0.028852677974079328),
           ('aspects', -0.06341968013631565),
           ('of', -0.03587626320752477),
           ('the', -0.014813095892961287),
           ('film', -0.14087587475098232),
           ('I', 0.007367876912617766),
           ('liked', -0.09816592066307557),
           (',', -0.014259517291745674),
           ('but', -0.08087144668471376),
           ('it', -0.10185214349220136),
           ('was', -0.07132244710777856),
           ('frightening', -0.4125361737439814),
           ('and', -0.021761663818889918),
           ('gross', -0.10423745223600908),
           ('in', -0.02383646952201854),
           ('parts', -0.027137622525091033),
           ('.', -0.02960415694062459),
           ('My', 0.05642774605113695),
           ('parents', 0.11146648216326158),
           ('hated', 0.8497975489280364),
           ('it', 0.05358116678115284),
           ('.', -0.013566277162080632),
           ('', 0.09293256725788422),
           ('</s>', 0.0)],
 'disgust': [('<s>', 0.0),
             ('There', -0.035296263203072),
             ('were', -0.010224922196739717),
             ('many', -0.03747571761725605),
             ('aspects', 0.007696321643436715),
             ('of', 0.0026740873113235107),
             ('the', 0.0025752851265661335),
             ('film', -0.040890035285783645),
             ('I', -0.014710007408208579),
             ('liked', 0.025696806663391577),
             (',', -0.00739107098314569),
             ('but', 0.007353791868893654),
             ('it', -0.00821368234753605),
             ('was', 0.005439709067819798),
             ('frightening', -0.8135974168445725),
             ('and', -0.002334953123414774),
             ('gross', 0.2366024374426269),
             ('in', 0.04314772995234148),
             ('parts', 0.05590472194035334),
             ('.', -0.04362554293972562),
             ('My', -0.04252694977895808),
             ('parents', 0.051580790911406944),
             ('hated', 0.5067406070057585),
             ('it', 0.0527491071885104),
             ('.', -0.008280280618652273),
             ('', 0.07412384603053103),
             ('</s>', 0.0)],
 'fear': [('<s>', 0.0),
          ('There', -0.019615758046045408),
          ('were', 0.008033402634196246),
          ('many', 0.027772367717635423),
          ('aspects', 0.01334130725685673),
          ('of', 0.009186049991879768),
          ('the', 0.005828877177384549),
          ('film', 0.09882910753644959),
          ('I', 0.01753565003544039),
          ('liked', 0.02062597344466885),
          (',', -0.004469530636560965),
          ('but', -0.019660439408176984),
          ('it', 0.0488084071292538),
          ('was', 0.03830859527501167),
          ('frightening', 0.9526443954511705),
          ('and', 0.02535156284103706),
          ('gross', -0.10635301961551227),
          ('in', -0.019190425328209065),
          ('parts', -0.01713006453323631),
          ('.', 0.015043169035757302),
          ('My', 0.017068079071414916),
          ('parents', -0.0630781275517486),
          ('hated', -0.23630028921273583),
          ('it', -0.056057044429020306),
          ('.', 0.0015102052077844612),
          ('', -0.010045048665404609),
          ('</s>', 0.0)],
 'joy': [('<s>', 0.0),
         ('There', 0.04881772670614576),
         ('were', -0.0379316152427468),
         ('many', -0.007955371089444285),
         ('aspects', 0.04437296429416574),
         ('of', -0.06407011137335743),
         ('the', -0.07331568926973099),
         ('film', 0.21588462483311055),
         ('I', 0.04885724513463952),
         ('liked', 0.5309510543276107),
         (',', 0.1339765195225006),
         ('but', 0.09394079060730279),
         ('it', -0.1462792330432028),
         ('was', -0.1358591558323458),
         ('frightening', -0.22184169339341142),
         ('and', -0.07504142930419291),
         ('gross', -0.005472075984252812),
         ('in', -0.0942152657437379),
         ('parts', -0.19345218754215965),
         ('.', 0.11096247277185402),
         ('My', 0.06604512262645984),
         ('parents', 0.026376541098236207),
         ('hated', -0.4988319510231699),
         ('it', -0.17532499366236615),
         ('.', -0.022609976138939034),
         ('', -0.43417114685294833),
         ('</s>', 0.0)],
 'neutral': [('<s>', 0.0),
             ('There', 0.045984598036642205),
             ('were', 0.017142566357474697),
             ('many', 0.011419348619472542),
             ('aspects', 0.02558593440287365),
             ('of', 0.0186162232003498),
             ('the', 0.015616416841815963),
             ('film', -0.021190511300570092),
             ('I', -0.03572427925026324),
             ('liked', 0.027062554960050455),
             (',', 0.02089914209290366),
             ('but', 0.025872618597570115),
             ('it', -0.002980407262316265),
             ('was', -0.022218157611174086),
             ('frightening', -0.2982516449116045),
             ('and', -0.01604643529040792),
             ('gross', -0.04573829263548096),
             ('in', -0.006511536166676108),
             ('parts', -0.011744224307968652),
             ('.', -0.01817041167875332),
             ('My', -0.07362312722231429),
             ('parents', -0.06910711601816408),
             ('hated', -0.9418903509267312),
             ('it', 0.022201795222373488),
             ('.', 0.025694319747309045),
             ('', 0.04276690822325994),
             ('</s>', 0.0)],
 'sadness': [('<s>', 0.0),
             ('There', 0.028237893283377526),
             ('were', -0.04489910545229568),
             ('many', 0.004996044977269471),
             ('aspects', -0.1231292680125582),
             ('of', -0.04552690725956671),
             ('the', -0.022077819961347042),
             ('film', -0.14155752357877663),
             ('I', 0.04135347872193571),
             ('liked', -0.3097732540526099),
             (',', 0.045114660009053134),
             ('but', 0.0963352125332619),
             ('it', -0.08120617610094617),
             ('was', -0.08516150809170213),
             ('frightening', -0.10386889639962761),
             ('and', -0.03931986389970189),
             ('gross', -0.2145059013625132),
             ('in', -0.03465423285571697),
             ('parts', -0.08676627134611635),
             ('.', 0.19025217371906333),
             ('My', 0.2582092561303794),
             ('parents', 0.15432351476960307),
             ('hated', 0.7262186310977987),
             ('it', -0.029160655114499095),
             ('.', -0.002758524253450406),
             ('', -0.33846410359182094),
             ('</s>', 0.0)],
 'surprise': [('<s>', 0.0),
              ('There', 0.07196110795254315),
              ('were', 0.1434314520711312),
              ('many', 0.08812238369489701),
              ('aspects', 0.013432396769890982),
              ('of', -0.07127508805657243),
              ('the', -0.14079766624810955),
              ('film', -0.16881201614906485),
              ('I', 0.040595668935112135),
              ('liked', 0.03239855530171577),
              (',', -0.17676382558158257),
              ('but', -0.03797939330341559),
              ('it', -0.029191325089641736),
              ('was', 0.01758013584108571),
              ('frightening', -0.221738963726823),
              ('and', -0.05126920277135527),
              ('gross', -0.33986913466614044),
              ('in', -0.018180366628697),
              ('parts', 0.02939418603252064),
              ('.', 0.018080129971003226),
              ('My', -0.08060162218059498),
              ('parents', 0.04351719139081836),
              ('hated', -0.6919028585285265),
              ('it', 0.0009574844165327357),
              ('.', -0.059473118237873344),
              ('', -0.465690452620123),
              ('</s>', 0.0)]}
```

</details>

#### Visualize MultiLabel Classification attributions

Sometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the `visualize()` method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions. For this explainer attributions will be show w.r.t to each label.

If you are in a notebook, calls to the `visualize()` method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.

```python
cls_explainer.visualize("multilabel_viz.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/multilabel_example.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/multilabel_example.png" width="80%" height="80%" align="center"/>
</a>

</details>

<details><summary>Zero Shot Classification Explainer</summary>

_Models using this explainer must be previously trained on NLI classification downstream tasks and have a label in the model's config called either "entailment" or "ENTAILMENT"._

This explainer allows for attributions to be calculated for zero shot classification like models. In order to achieve this we use the same methodology employed by Hugging face. For those not familiar method employed by Hugging Face to achieve zero shot classification the way this works is by exploiting the "entailment" label of NLI models. Here is a [link](https://arxiv.org/abs/1909.00161) to a paper explaining more about it. A list of NLI models guaranteed to be compatible with this explainer can be found on the [model hub](https://huggingface.co/models?filter=pytorch&pipeline_tag=zero-shot-classification).

Let's start by initializing a transformers' sequence classification model and tokenizer trained specifically on a NLI task, and passing it to the ZeroShotClassificationExplainer.

For this example we are using `facebook/bart-large-mnli` which is a checkpoint for a bart-large model trained on the
[MNLI dataset](https://huggingface.co/datasets/multi_nli). This model typically predicts whether a sentence pair are an entailment, neutral, or a contradiction, however for zero-shot we only look the entailment label.

Notice that we pass our own custom labels `["finance", "technology", "sports"]` to the class instance. Any number of labels can be passed including as little as one. Whichever label scores highest for entailment can be accessed via `predicted_label`, however the attributions themselves are calculated for every label. If you want to see the attributions for a particular label it is recommended just to pass in that one label and then the attributions will be guaranteed to be calculated w.r.t. that label.

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import ZeroShotClassificationExplainer

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")


zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)


word_attributions = zero_shot_explainer(
    "Today apple released the new Macbook showing off a range of new features found in the proprietary silicon chip computer. ",
    labels = ["finance", "technology", "sports"],
)

```

Which will return the following dict of attribution tuple lists for each label:

```python
>>> word_attributions
{'finance': [('<s>', 0.0),
  ('Today', 0.0),
  ('apple', -0.016100065046282107),
  ('released', 0.3348383988281792),
  ('the', -0.8932952916127369),
  ('new', 0.14207183688642497),
  ('Mac', 0.016309545780430777),
  ('book', -0.06956802041125129),
  ('showing', -0.12661404114316252),
  ('off', -0.11470154900720078),
  ('a', -0.03299250484912159),
  ('range', -0.002532332125100561),
  ('of', -0.022451943898971004),
  ('new', -0.01859870581213379),
  ('features', -0.020774327263810944),
  ('found', -0.007734346326330102),
  ('in', 0.005100588658589585),
  ('the', 0.04711084622588314),
  ('proprietary', 0.046352064964644286),
  ('silicon', -0.0033502000158946127),
  ('chip', -0.010419324929115785),
  ('computer', -0.11507972995022273),
  ('.', 0.12237840300907425)],
 'technology': [('<s>', 0.0),
  ('Today', 0.0),
  ('apple', 0.22505152647747717),
  ('released', -0.16164146624851905),
  ('the', 0.5026975657258089),
  ('new', 0.052589263167955536),
  ('Mac', 0.2528325960993759),
  ('book', -0.06445090203729663),
  ('showing', -0.21204922293777534),
  ('off', 0.06319714817612732),
  ('a', 0.032048012090796815),
  ('range', 0.08553079346908955),
  ('of', 0.1409201107994034),
  ('new', 0.0515261917112576),
  ('features', -0.09656406466213506),
  ('found', 0.02336613296843605),
  ('in', -0.0011649894272190678),
  ('the', 0.14229640664777807),
  ('proprietary', -0.23169065661847646),
  ('silicon', 0.5963924257008087),
  ('chip', -0.19908474233975806),
  ('computer', 0.030620295844734646),
  ('.', 0.1995076958535378)],
 'sports': [('<s>', 0.0),
  ('Today', 0.0),
  ('apple', 0.1776618164760026),
  ('released', 0.10067773539491479),
  ('the', 0.4813466937627506),
  ('new', -0.018555244191949295),
  ('Mac', 0.016338241133536224),
  ('book', 0.39311969562943677),
  ('showing', 0.03579210145504227),
  ('off', 0.0016710813632476176),
  ('a', 0.04367940034297261),
  ('range', 0.06076859006993011),
  ('of', 0.11039711284328052),
  ('new', 0.003932416031994724),
  ('features', -0.009660883377622588),
  ('found', -0.06507586539836184),
  ('in', 0.2957812911667922),
  ('the', 0.1584106228974514),
  ('proprietary', 0.0005789280604917397),
  ('silicon', -0.04693795680472678),
  ('chip', -0.1699508539245465),
  ('computer', -0.4290823663975582),
  ('.', 0.469314992542427)]}
```

We can find out which label was predicted with:

```python
>>> zero_shot_explainer.predicted_label
'technology'
```

#### Visualize Zero Shot Classification attributions

For the `ZeroShotClassificationExplainer` the visualize() method returns a table similar to the `SequenceClassificationExplainer` but with attributions for every label.

```python
zero_shot_explainer.visualize("zero_shot.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png" width="100%" height="100%" align="center" />
</a>

</details>

<details><summary>Question Answering Explainer</summary>

Let's start by initializing a transformers' Question Answering model and tokenizer, and running it through the `QuestionAnsweringExplainer`.

For this example we are using `bert-large-uncased-whole-word-masking-finetuned-squad`, a bert model finetuned on a SQuAD.

```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
from transformers_interpret import QuestionAnsweringExplainer

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

qa_explainer = QuestionAnsweringExplainer(
    model,
    tokenizer,
)

context = """
In Artificial Intelligence and machine learning, Natural Language Processing relates to the usage of machines to process and understand human language.
Many researchers currently work in this space.
"""

word_attributions = qa_explainer(
    "What is natural language processing ?",
    context,
)
```

Which will return the following dict containing word attributions for both the predicted start and end positions for the answer.

```python
>>> word_attributions
{'start': [('[CLS]', 0.0),
  ('what', 0.9177170660377296),
  ('is', 0.13382234898765258),
  ('natural', 0.08061747350142005),
  ('language', 0.013138062762511409),
  ('processing', 0.11135923869816286),
  ('?', 0.00858057388924361),
  ('[SEP]', -0.09646373141894966),
  ('in', 0.01545633993975799),
  ('artificial', 0.0472082598707737),
  ('intelligence', 0.026687249355110867),
  ('and', 0.01675371260058537),
  ('machine', -0.08429502436554961),
  ('learning', 0.0044827685126163355),
  (',', -0.02401013152520878),
  ('natural', -0.0016756080249823537),
  ('language', 0.0026815068421401885),
  ('processing', 0.06773157580722854),
  ('relates', 0.03884601576992908),
  ('to', 0.009783797821526368),
  ('the', -0.026650922910540952),
  ('usage', -0.010675019721821147),
  ('of', 0.015346787885898537),
  ('machines', -0.08278008270160107),
  ('to', 0.12861387892768839),
  ('process', 0.19540146386642743),
  ('and', 0.009942879959615826),
  ('understand', 0.006836894853320319),
  ('human', 0.05020451122579102),
  ('language', -0.012980795199301),
  ('.', 0.00804358248127772),
  ('many', 0.02259009321498161),
  ('researchers', -0.02351650942555469),
  ('currently', 0.04484573078852946),
  ('work', 0.00990399948294476),
  ('in', 0.01806961211334615),
  ('this', 0.13075899776164499),
  ('space', 0.004298315347838973),
  ('.', -0.003767904539347979),
  ('[SEP]', -0.08891544093454595)],
 'end': [('[CLS]', 0.0),
  ('what', 0.8227231947501547),
  ('is', 0.0586864942952253),
  ('natural', 0.0938903563379123),
  ('language', 0.058596976016400674),
  ('processing', 0.1632374290269829),
  ('?', 0.09695686057123237),
  ('[SEP]', -0.11644447033554006),
  ('in', -0.03769172371919206),
  ('artificial', 0.06736158404049886),
  ('intelligence', 0.02496399001288386),
  ('and', -0.03526028847762427),
  ('machine', -0.20846431491771975),
  ('learning', 0.00904892847529654),
  (',', -0.02949905488474854),
  ('natural', 0.011024507784743872),
  ('language', 0.0870741751282507),
  ('processing', 0.11482449622317169),
  ('relates', 0.05008962090922852),
  ('to', 0.04079118393166258),
  ('the', -0.005069048880616451),
  ('usage', -0.011992752445836278),
  ('of', 0.01715183316135495),
  ('machines', -0.29823535624026265),
  ('to', -0.0043760160855057925),
  ('process', 0.10503217484645223),
  ('and', 0.06840313586976698),
  ('understand', 0.057184000619403944),
  ('human', 0.0976805947708315),
  ('language', 0.07031163646606695),
  ('.', 0.10494566513897102),
  ('many', 0.019227154676079487),
  ('researchers', -0.038173913797800885),
  ('currently', 0.03916641120002003),
  ('work', 0.03705371672439422),
  ('in', -0.0003155975107591203),
  ('this', 0.17254932354022232),
  ('space', 0.0014311439625599323),
  ('.', 0.060637932829867736),
  ('[SEP]', -0.09186286505530596)]}
```

We can get the text span for the predicted answer with:

```python
>>> qa_explainer.predicted_answer
'usage of machines to process and understand human language'
```

#### Visualize Question Answering attributions

For the `QuestionAnsweringExplainer` the visualize() method returns a table with two rows. The first row represents the attributions for the answers' start position and the second row represents the attributions for the answers' end position.

```python
qa_explainer.visualize("bert_qa_viz.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_qa_explainer.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_qa_explainer.png" width="120%" height="120%" align="center" />
</a>

</details>

<details><summary>Token Classification (NER) explainer</summary>

_This is currently an experimental explainer under active development and is not yet fully tested. The explainers' API is subject to change as are the attribution methods, if you find any bugs please let me know._

Let's start by initializing a transformers' Token Classfication model and tokenizer, and running it through the `TokenClassificationExplainer`.

For this example we are using `dslim/bert-base-NER`, a bert model finetuned on the CoNLL-2003 Named Entity Recognition dataset.

```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers_interpret import TokenClassificationExplainer

model = AutoModelForTokenClassification.from_pretrained('dslim/bert-base-NER')
tokenizer = AutoTokenizer.from_pretrained('dslim/bert-base-NER')

ner_explainer = TokenClassificationExplainer(
    model,
    tokenizer,
)

sample_text = "We visited Paris last weekend, where Emmanuel Macron lives."

word_attributions = ner_explainer(sample_text, ignored_labels=['O'])

```

In order to reduce the number of attributions that are calculated, we tell the explainer to ignore the tokens that whose predicted label is `'O'`. We could also tell the explainer to ignore certain indexes providing a list as argument of the parameter `ignored_indexes`.

Which will return the following dict of including the predicted label and the attributions for each of token, except those which were predicted as 'O':

```python
>>> word_attributions
{'paris': {'label': 'B-LOC',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.014352325471387907),
   ('visited', 0.32915222186559123),
   ('paris', 0.9086791784795596),
   ('last', 0.15181203147624034),
   ('weekend', 0.14400210630677038),
   (',', 0.01899744327012935),
   ('where', -0.039402005463239465),
   ('emmanuel', 0.061095284002642025),
   ('macro', 0.004192922551105228),
   ('##n', 0.09446355513057757),
   ('lives', -0.028724312616455003),
   ('.', 0.08099007392937585),
   ('[SEP]', 0.0)]},
 'emmanuel': {'label': 'B-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.006933030636686712),
   ('visited', 0.10396962390436904),
   ('paris', 0.14540758744233165),
   ('last', 0.08024018944451371),
   ('weekend', 0.10687970996804418),
   (',', 0.1793198466387937),
   ('where', 0.3436407835483767),
   ('emmanuel', 0.8774892642652167),
   ('macro', 0.03559399361048316),
   ('##n', 0.1516315604785551),
   ('lives', 0.07056441327498127),
   ('.', -0.025820924624605487),
   ('[SEP]', 0.0)]},
 'macro': {'label': 'I-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', 0.05578067326280157),
   ('visited', 0.00857021283406586),
   ('paris', 0.16559056506114297),
   ('last', 0.08285256685903823),
   ('weekend', 0.10468727443796395),
   (',', 0.09949509071515888),
   ('where', 0.3642458274356929),
   ('emmanuel', 0.7449335213978788),
   ('macro', 0.3794625659183485),
   ('##n', -0.2599031433800762),
   ('lives', 0.20563450682196147),
   ('.', -0.015607017319486929),
   ('[SEP]', 0.0)]},
 '##n': {'label': 'I-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', 0.025194121717285252),
   ('visited', -0.007415022865239864),
   ('paris', 0.09478357303107598),
   ('last', 0.06927939834474463),
   ('weekend', 0.0672008033510708),
   (',', 0.08316907214363504),
   ('where', 0.3784915854680165),
   ('emmanuel', 0.7729352621546081),
   ('macro', 0.4148652759139777),
   ('##n', -0.20853534512145033),
   ('lives', 0.09445057087678274),
   ('.', -0.094274985907366),
   ('[SEP]', 0.0)]},
 '[SEP]': {'label': 'B-LOC',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.3694351403796742),
   ('visited', 0.1699038407402483),
   ('paris', 0.5461587414992369),
   ('last', 0.0037948102770307517),
   ('weekend', 0.1628100955702496),
   (',', 0.4513093410909263),
   ('where', -0.09577409464161038),
   ('emmanuel', 0.48499459835388914),
   ('macro', -0.13528905587653023),
   ('##n', 0.14362969934754344),
   ('lives', -0.05758007024257254),
   ('.', -0.13970977266152554),
   ('[SEP]', 0.0)]}}
```

#### Visualize NER attributions

For the `TokenClassificationExplainer` the visualize() method returns a table with as many rows as tokens.

```python
ner_explainer.visualize("bert_ner_viz.html")
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_ner_explainer.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_ner_explainer.png" width="120%" height="120%" align="center" />
</a>

For more details about how the `TokenClassificationExplainer` works, you can check the notebook [notebooks/ner_example.ipynb](notebooks/ner_example.ipynb).

</details>

### Vision Explainers

<details><summary> Image Classification Explainer </summary>

<p>

The `ImageClassificationExplainer` is designed to work with all models from the Transformers library that are trained for image classification (Swin, ViT etc). It provides attributions for every pixel in that image that can be easily visualized using the explainers built in `visualize` method.

Initialising an image classification is very simple, all you need a is a image classification model finetuned or trained to work with Huggingface and its feature extractor.

For this example we are using `google/vit-base-patch16-224`, a Vision Transformer (ViT) model pre-trained on ImageNet-21k that predicts from 1000 possible classes.

```python
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from transformers_interpret import ImageClassificationExplainer
from PIL import Image
import requests

model_name = "google/vit-base-patch16-224"
model = AutoModelForImageClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

# With both the model and feature extractor initialized we are now able to get explanations on an image, we will use a simple image of a golden retriever.
image_link = "https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F47%2F2020%2F08%2F16%2Fgolden-retriever-177213599-2000.jpg"

image = Image.open(requests.get(image_link, stream=True).raw)

image_classification_explainer = ImageClassificationExplainer(model=model, feature_extractor=feature_extractor)

image_attributions = image_classification_explainer(
    image
)

print(image_attributions.shape)
```

Which will return the following list of tuples:

```python
>>> torch.Size([1, 3, 224, 224])
```

#### Visualizing Image Attributions

Because we are dealing with images visualization is even more straightforward than in text models.

Attrbutions can be easily visualized using the `visualize` method of the explainer. There are currently 4 supported visualization methods.

- `heatmap` - a heatmap of positive and negative attributions is drawn in using the dimensions of the image.
- `overlay` - the heatmap is overlayed over a grayscaled version of the original image
- `masked_image` - the absolute value of attrbutions is used to create a mask over original image
- `alpha_scaling` - Sets alpha channel (transparency) of each pixel to be equal to normalized attribution value.

#### Heatmap

```python
image_classification_explainer.visualize(
    method="heatmap",
    side_by_side=True,
    outlier_threshold=0.03

)
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png" width="100%" height="100%" align="center"/>
</a>

#### Overlay

```python
image_classification_explainer.visualize(
    method="overlay",
    side_by_side=True,
    outlier_threshold=0.03

)
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png" width="100%" height="100%" align="center"/>
</a>


#### Masked Image

```python
image_classification_explainer.visualize(
    method="masked_image",
    side_by_side=True,
    outlier_threshold=0.03

)
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png" width="100%" height="100%" align="center"/>
</a>


#### Alpha Scaling

```python
image_classification_explainer.visualize(
    method="alpha_scaling",
    side_by_side=True,
    outlier_threshold=0.03

)
```

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png" width="100%" height="100%" align="center"/>

</a>
</details>

## Future Development

This package is still in active development and there is much more planned. For a 1.0.0 release we're aiming to have:

- Clean and thorough documentation website
- ~~Support for Question Answering models~~
- ~~Support for NER models~~
- ~~Support for Zero Shot Classification models.~~
- ~~Ability to show attributions for multiple embedding type, rather than just the word embeddings.~~
- Support for SentenceTransformer embedding models and other image embeddings
- Additional attribution methods
- ~~Support for vision transformer models~~
- In depth examples
- ~~A nice logo~~ (thanks @Voyz)
- and more... feel free to submit your suggestions!

## Contributing

If you would like to make a contribution please checkout our [contribution guidelines](https://github.com/cdpierse/transformers-interpret/blob/master/CONTRIBUTING.md)

## Questions / Get In Touch

The maintainer of this repository is [@cdpierse](https://github.com/cdpierse).

If you have any questions, suggestions, or would like to make a contribution (please do 😁), feel free to get in touch at charlespierse@gmail.com

I'd also highly suggest checking out [Captum](https://captum.ai/) if you find model explainability and interpretability interesting.

This package stands on the shoulders of the the incredible work being done by the teams at [Pytorch Captum](https://captum.ai/) and [Hugging Face](https://huggingface.co/) and would not exist if not for the amazing job they are both doing in the fields of ML and model interpretability respectively.

## Reading and Resources

<details><summary>Captum</summary>

<p>

All of the attributions within this package are calculated using PyTorch's explainability package [Captum](https://captum.ai/). See below for some useful links related to Captum.

- [Captum Algorithm Overview](https://captum.ai/docs/algorithms)
- [Bert QA Example](https://captum.ai/tutorials/Bert_SQUAD_Interpret) this an implementation acheived purely using Captum.
- [API Reference](https://captum.ai/api/)
- [Model Interpretability with Captum - Narine Kokhilkyan (Video)](https://www.youtube.com/watch?v=iVSIFm0UN9I)

</details>

<details><summary>Attribution</summary>

<p>

Integrated Gradients (IG) and a variation of it Layer Integrated Gradients (LIG) are the core attribution methods on which Transformers Interpret is currently built. Below are some useful resources including the original paper and some video links explaining the inner mechanics. If you are curious about what is going on inside of Transformers Interpret I highly recommend checking out at least one of these resources.

- [Axiomatic Attributions for Deep Networks](https://arxiv.org/abs/1703.01365) the original paper [2017] where Integrated Gradients was specified.
- [Fiddler AI YouTube video on IG](https://www.youtube.com/watch?v=9AaDc35JYiI)
- [Henry AI Labs YouTube Primer on IG](https://www.youtube.com/watch?v=MB8KYX5UzKw)
- [Explaining Explanations: Axiomatic Feature Interactions for Deep Networks](http://export.arxiv.org/abs/2002.04138) more recent paper [2020] extending the work of the original paper.

</details>

<details><summary>Miscellaneous</summary>

**Captum Links**

Below are some links I used to help me get this package together using Captum. Thank you to @davidefiocco for your very insightful GIST.

- [Link to useful GIST on captum](https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5)
- [Link to runnable colab of captum with BERT](https://colab.research.google.com/drive/1snFbxdVDtL3JEFW7GNfRs1PZKgNHfoNz)

[transformers]: https://huggingface.co/transformers/

</details>

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "transformers-interpret",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Charles Pierse",
    "author_email": "charlespierse@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fa/3c/23798b5fda387840439aa1a0452719a1bf0f11d430da37131ad99560f2b9/transformers-interpret-0.10.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <a id=\"transformers-intepret\" href=\"#transformers-intepret\">\n        <img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/tight%401920x_transparent.png\" alt=\"Transformers Intepret Title\" title=\"Transformers Intepret Title\" width=\"600\"/>\n    </a>\n</p>\n\n<p align=\"center\"> Explainability for any \ud83e\udd17 Transformers models in 2 lines.</p>\n\n<h1 align=\"center\"></h1>\n\n<p align=\"center\">\n    <a href=\"https://opensource.org/licenses/Apache-2.0\">\n        <img src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"/>\n    <a href=\"https://github.com/cdpierse/transformers-interpret/actions/workflows/unit_tests.yml\">\n        <img src=\"https://github.com/cdpierse/transformers-interpret/actions/workflows/unit_tests.yml/badge.svg\">\n    </a>\n            <a href=\"https://github.com/cdpierse/transformers-interpret/releases\">\n        <img src=\"https://img.shields.io/pypi/v/transformers_interpret?label=version\"/>\n    </a>\n        <a href=\"https://pepy.tech/project/transformers-interpret\">\n        <img src=\"https://static.pepy.tech/personalized-badge/transformers-interpret?period=total&units=abbreviation&left_color=black&right_color=brightgreen&left_text=Downloads\">\n    </a>\n</p>\n\nTransformers Interpret is a model explainability tool designed to work exclusively with the \ud83e\udd17 [transformers][transformers] package.\n\nIn line with the philosophy of the Transformers package Transformers Interpret allows any transformers model to be explained in just two lines. Explainers are available for both text and computer vision models. Visualizations are also available in notebooks and as savable png and html files.\n\nCheck out the streamlit [demo app here](https://share.streamlit.io/cdpierse/transformers-interpret-streamlit/main/app.py)\n\n## Install\n\n```posh\npip install transformers-interpret\n```\n\n## Quick Start\n\n### Text Explainers\n\n<details><summary>Sequence Classification Explainer and Pairwise Sequence Classification</summary>\n\n<p>\nLet's start by initializing a transformers' model and tokenizer, and running it through the `SequenceClassificationExplainer`.\n\nFor this example we are using `distilbert-base-uncased-finetuned-sst-2-english`, a distilbert model finetuned on a sentiment analysis task.\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nmodel_name = \"distilbert-base-uncased-finetuned-sst-2-english\"\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\n# With both the model and tokenizer initialized we are now able to get explanations on an example text.\n\nfrom transformers_interpret import SequenceClassificationExplainer\ncls_explainer = SequenceClassificationExplainer(\n    model,\n    tokenizer)\nword_attributions = cls_explainer(\"I love you, I like you\")\n```\n\nWhich will return the following list of tuples:\n\n```python\n>>> word_attributions\n[('[CLS]', 0.0),\n ('i', 0.2778544699186709),\n ('love', 0.7792370723380415),\n ('you', 0.38560088858031094),\n (',', -0.01769750505546915),\n ('i', 0.12071898121557832),\n ('like', 0.19091105304734457),\n ('you', 0.33994871536713467),\n ('[SEP]', 0.0)]\n```\n\nPositive attribution numbers indicate a word contributes positively towards the predicted class, while negative numbers indicate a word contributes negatively towards the predicted class. Here we can see that **I love you** gets the most attention.\n\nYou can use `predicted_class_index` in case you'd want to know what the predicted class actually is:\n\n```python\n>>> cls_explainer.predicted_class_index\narray(1)\n```\n\nAnd if the model has label names for each class, we can see these too using `predicted_class_name`:\n\n```python\n>>> cls_explainer.predicted_class_name\n'POSITIVE'\n```\n\n#### Visualize Classification attributions\n\nSometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the `visualize()` method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions.\n\nIf you are in a notebook, calls to the `visualize()` method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.\n\n```python\ncls_explainer.visualize(\"distilbert_viz.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example.png\" width=\"80%\" height=\"80%\" align=\"center\"/>\n</a>\n\n#### Explaining Attributions for Non Predicted Class\n\nAttribution explanations are not limited to the predicted class. Let's test a more complex sentence that contains mixed sentiments.\n\nIn the example below we pass `class_name=\"NEGATIVE\"` as an argument indicating we would like the attributions to be explained for the **NEGATIVE** class regardless of what the actual prediction is. Effectively because this is a binary classifier we are getting the inverse attributions.\n\n```python\ncls_explainer = SequenceClassificationExplainer(model, tokenizer)\nattributions = cls_explainer(\"I love you, I like you, I also kinda dislike you\", class_name=\"NEGATIVE\")\n```\n\nIn this case, `predicted_class_name` still returns a prediction of the **POSITIVE** class, because the model has generated the same prediction but nonetheless we are interested in looking at the attributions for the negative class regardless of the predicted result.\n\n```python\n>>> cls_explainer.predicted_class_name\n'POSITIVE'\n```\n\nBut when we visualize the attributions we can see that the words \"**...kinda dislike**\" are contributing to a prediction of the \"NEGATIVE\"\nclass.\n\n```python\ncls_explainer.visualize(\"distilbert_negative_attr.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example_negative.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/distilbert_example_negative.png\" width=\"80%\" height=\"80%\" align=\"center\" />\n</a>\n\nGetting attributions for different classes is particularly insightful for multiclass problems as it allows you to inspect model predictions for a number of different classes and sanity-check that the model is \"looking\" at the right things.\n\nFor a detailed explanation of this example please checkout this [multiclass classification notebook.](notebooks/multiclass_classification_example.ipynb)\n\n### Pairwise Sequence Classification\n\nThe `PairwiseSequenceClassificationExplainer` is a variant of the the `SequenceClassificationExplainer` that is designed to work with classification models that expect the input sequence to be two inputs separated by a models' separator token. Common examples of this are [NLI models](https://arxiv.org/abs/1705.02364) and [Cross-Encoders ](https://www.sbert.net/docs/pretrained_cross-encoders.html) which are commonly used to score two inputs similarity to one another.\n\nThis explainer calculates pairwise attributions for two passed inputs `text1` and `text2` using the model\nand tokenizer given in the constructor.\n\nAlso, since a common use case for pairwise sequence classification is to compare two inputs similarity - models of this nature typically only have a single output node rather than multiple for each class. The pairwise sequence classification has some useful utility functions to make interpreting single node outputs clearer.\n\nBy default for models that output a single node the attributions are with respect to the inputs pushing the scores closer to 1.0, however if you want to see the\nattributions with respect to scores closer to 0.0 you can pass `flip_sign=True`. For similarity\nbased models this is useful, as the model might predict a score closer to 0.0 for the two inputs\nand in that case we would flip the attributions sign to explain why the two inputs are dissimilar.\n\nLet's start by initializing a cross-encoder model and tokenizer from the suite of [pre-trained cross-encoders ](https://www.sbert.net/docs/pretrained_cross-encoders.html)provided by [sentence-transformers](https://github.com/UKPLab/sentence-transformers).\n\nFor this example we are using `\"cross-encoder/ms-marco-MiniLM-L-6-v2\"`, a high quality cross-encoder trained on the [MSMarco dataset](https://github.com/microsoft/MSMARCO-Passage-Ranking) a passage ranking dataset for question answering and machine reading comprehension.\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\n\nfrom transformers_interpret import PairwiseSequenceClassificationExplainer\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\"cross-encoder/ms-marco-MiniLM-L-6-v2\")\ntokenizer = AutoTokenizer.from_pretrained(\"cross-encoder/ms-marco-MiniLM-L-6-v2\")\n\npairwise_explainer = PairwiseSequenceClassificationExplainer(model, tokenizer)\n\n# the pairwise explainer requires two string inputs to be passed, in this case given the nature of the model\n# we pass a query string and a context string. The question we are asking of our model is \"does this context contain a valid answer to our question\"\n# the higher the score the better the fit.\n\nquery = \"How many people live in Berlin?\"\ncontext = \"Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.\"\npairwise_attr = pairwise_explainer(query, context)\n```\n\nWhich returns the following attributions:\n\n```python\n>>> pairwise_attr\n[('[CLS]', 0.0),\n ('how', -0.037558652124213034),\n ('many', -0.40348581975409786),\n ('people', -0.29756140282349425),\n ('live', -0.48979015417391764),\n ('in', -0.17844527885888117),\n ('berlin', 0.3737346097442739),\n ('?', -0.2281428913480142),\n ('[SEP]', 0.0),\n ('berlin', 0.18282430604641564),\n ('has', 0.039114659489254834),\n ('a', 0.0820056652212297),\n ('population', 0.35712150914643026),\n ('of', 0.09680870840224687),\n ('3', 0.04791760029513795),\n (',', 0.040330986539774266),\n ('520', 0.16307677913176166),\n (',', -0.005919693904602767),\n ('03', 0.019431649515841844),\n ('##1', -0.0243808667024702),\n ('registered', 0.07748341753369632),\n ('inhabitants', 0.23904087299731255),\n ('in', 0.07553221327346359),\n ('an', 0.033112821611999875),\n ('area', -0.025378852244447532),\n ('of', 0.026526373859562906),\n ('89', 0.0030700151809002147),\n ('##1', -0.000410387092186983),\n ('.', -0.0193147139126114),\n ('82', 0.0073800833347678774),\n ('square', 0.028988305990861576),\n ('kilometers', 0.02071182933829008),\n ('.', -0.025901070914318036),\n ('[SEP]', 0.0)]\n```\n\n#### Visualize Pairwise Classification attributions\n\nVisualizing the pairwise attributions is no different to the sequence classification explaine. We can see that in both the `query` and `context` there is a lot of positive attribution for the word `berlin` as well the words `population` and `inhabitants` in the `context`, good signs that our model understands the textual context of the question asked.\n\n```python\npairwise_explainer.visualize(\"cross_encoder_attr.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/pairwise_cross_encoder_example.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/pairwise_cross_encoder_example.png\" width=\"100%\" height=\"100%\" align=\"center\" />\n</a>\n\nIf we were more interested in highlighting the input attributions that pushed the model away from the positive class of this single node output we could pass:\n\n```python\npairwise_attr = explainer(query, context, flip_sign=True)\n```\n\nThis simply inverts the sign of the attributions ensuring that they are with respect to the model outputting 0 rather than 1.\n\n</details>\n\n<details><summary>MultiLabel Classification Explainer</summary>\n<p>\n\nThis explainer is an extension of the `SequenceClassificationExplainer` and is thus compatible with all sequence classification models from the Transformers package. The key change in this explainer is that it caclulates attributions for each label in the model's config and returns a dictionary of word attributions w.r.t to each label. The `visualize()` method also displays a table of attributions with attributions calculated per label.\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nfrom transformers_interpret import MultiLabelClassificationExplainer\n\nmodel_name = \"j-hartmann/emotion-english-distilroberta-base\"\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\n\ncls_explainer = MultiLabelClassificationExplainer(model, tokenizer)\n\n\nword_attributions = cls_explainer(\"There were many aspects of the film I liked, but it was frightening and gross in parts. My parents hated it.\")\n```\n\nThis produces a dictionary of word attributions mapping labels to a list of tuples for each word and it's attribution score.\n\n<details><summary>Click to see word attribution dictionary</summary>\n\n```python\n>>> word_attributions\n{'anger': [('<s>', 0.0),\n           ('There', 0.09002208622000409),\n           ('were', -0.025129709879675187),\n           ('many', -0.028852677974079328),\n           ('aspects', -0.06341968013631565),\n           ('of', -0.03587626320752477),\n           ('the', -0.014813095892961287),\n           ('film', -0.14087587475098232),\n           ('I', 0.007367876912617766),\n           ('liked', -0.09816592066307557),\n           (',', -0.014259517291745674),\n           ('but', -0.08087144668471376),\n           ('it', -0.10185214349220136),\n           ('was', -0.07132244710777856),\n           ('frightening', -0.4125361737439814),\n           ('and', -0.021761663818889918),\n           ('gross', -0.10423745223600908),\n           ('in', -0.02383646952201854),\n           ('parts', -0.027137622525091033),\n           ('.', -0.02960415694062459),\n           ('My', 0.05642774605113695),\n           ('parents', 0.11146648216326158),\n           ('hated', 0.8497975489280364),\n           ('it', 0.05358116678115284),\n           ('.', -0.013566277162080632),\n           ('', 0.09293256725788422),\n           ('</s>', 0.0)],\n 'disgust': [('<s>', 0.0),\n             ('There', -0.035296263203072),\n             ('were', -0.010224922196739717),\n             ('many', -0.03747571761725605),\n             ('aspects', 0.007696321643436715),\n             ('of', 0.0026740873113235107),\n             ('the', 0.0025752851265661335),\n             ('film', -0.040890035285783645),\n             ('I', -0.014710007408208579),\n             ('liked', 0.025696806663391577),\n             (',', -0.00739107098314569),\n             ('but', 0.007353791868893654),\n             ('it', -0.00821368234753605),\n             ('was', 0.005439709067819798),\n             ('frightening', -0.8135974168445725),\n             ('and', -0.002334953123414774),\n             ('gross', 0.2366024374426269),\n             ('in', 0.04314772995234148),\n             ('parts', 0.05590472194035334),\n             ('.', -0.04362554293972562),\n             ('My', -0.04252694977895808),\n             ('parents', 0.051580790911406944),\n             ('hated', 0.5067406070057585),\n             ('it', 0.0527491071885104),\n             ('.', -0.008280280618652273),\n             ('', 0.07412384603053103),\n             ('</s>', 0.0)],\n 'fear': [('<s>', 0.0),\n          ('There', -0.019615758046045408),\n          ('were', 0.008033402634196246),\n          ('many', 0.027772367717635423),\n          ('aspects', 0.01334130725685673),\n          ('of', 0.009186049991879768),\n          ('the', 0.005828877177384549),\n          ('film', 0.09882910753644959),\n          ('I', 0.01753565003544039),\n          ('liked', 0.02062597344466885),\n          (',', -0.004469530636560965),\n          ('but', -0.019660439408176984),\n          ('it', 0.0488084071292538),\n          ('was', 0.03830859527501167),\n          ('frightening', 0.9526443954511705),\n          ('and', 0.02535156284103706),\n          ('gross', -0.10635301961551227),\n          ('in', -0.019190425328209065),\n          ('parts', -0.01713006453323631),\n          ('.', 0.015043169035757302),\n          ('My', 0.017068079071414916),\n          ('parents', -0.0630781275517486),\n          ('hated', -0.23630028921273583),\n          ('it', -0.056057044429020306),\n          ('.', 0.0015102052077844612),\n          ('', -0.010045048665404609),\n          ('</s>', 0.0)],\n 'joy': [('<s>', 0.0),\n         ('There', 0.04881772670614576),\n         ('were', -0.0379316152427468),\n         ('many', -0.007955371089444285),\n         ('aspects', 0.04437296429416574),\n         ('of', -0.06407011137335743),\n         ('the', -0.07331568926973099),\n         ('film', 0.21588462483311055),\n         ('I', 0.04885724513463952),\n         ('liked', 0.5309510543276107),\n         (',', 0.1339765195225006),\n         ('but', 0.09394079060730279),\n         ('it', -0.1462792330432028),\n         ('was', -0.1358591558323458),\n         ('frightening', -0.22184169339341142),\n         ('and', -0.07504142930419291),\n         ('gross', -0.005472075984252812),\n         ('in', -0.0942152657437379),\n         ('parts', -0.19345218754215965),\n         ('.', 0.11096247277185402),\n         ('My', 0.06604512262645984),\n         ('parents', 0.026376541098236207),\n         ('hated', -0.4988319510231699),\n         ('it', -0.17532499366236615),\n         ('.', -0.022609976138939034),\n         ('', -0.43417114685294833),\n         ('</s>', 0.0)],\n 'neutral': [('<s>', 0.0),\n             ('There', 0.045984598036642205),\n             ('were', 0.017142566357474697),\n             ('many', 0.011419348619472542),\n             ('aspects', 0.02558593440287365),\n             ('of', 0.0186162232003498),\n             ('the', 0.015616416841815963),\n             ('film', -0.021190511300570092),\n             ('I', -0.03572427925026324),\n             ('liked', 0.027062554960050455),\n             (',', 0.02089914209290366),\n             ('but', 0.025872618597570115),\n             ('it', -0.002980407262316265),\n             ('was', -0.022218157611174086),\n             ('frightening', -0.2982516449116045),\n             ('and', -0.01604643529040792),\n             ('gross', -0.04573829263548096),\n             ('in', -0.006511536166676108),\n             ('parts', -0.011744224307968652),\n             ('.', -0.01817041167875332),\n             ('My', -0.07362312722231429),\n             ('parents', -0.06910711601816408),\n             ('hated', -0.9418903509267312),\n             ('it', 0.022201795222373488),\n             ('.', 0.025694319747309045),\n             ('', 0.04276690822325994),\n             ('</s>', 0.0)],\n 'sadness': [('<s>', 0.0),\n             ('There', 0.028237893283377526),\n             ('were', -0.04489910545229568),\n             ('many', 0.004996044977269471),\n             ('aspects', -0.1231292680125582),\n             ('of', -0.04552690725956671),\n             ('the', -0.022077819961347042),\n             ('film', -0.14155752357877663),\n             ('I', 0.04135347872193571),\n             ('liked', -0.3097732540526099),\n             (',', 0.045114660009053134),\n             ('but', 0.0963352125332619),\n             ('it', -0.08120617610094617),\n             ('was', -0.08516150809170213),\n             ('frightening', -0.10386889639962761),\n             ('and', -0.03931986389970189),\n             ('gross', -0.2145059013625132),\n             ('in', -0.03465423285571697),\n             ('parts', -0.08676627134611635),\n             ('.', 0.19025217371906333),\n             ('My', 0.2582092561303794),\n             ('parents', 0.15432351476960307),\n             ('hated', 0.7262186310977987),\n             ('it', -0.029160655114499095),\n             ('.', -0.002758524253450406),\n             ('', -0.33846410359182094),\n             ('</s>', 0.0)],\n 'surprise': [('<s>', 0.0),\n              ('There', 0.07196110795254315),\n              ('were', 0.1434314520711312),\n              ('many', 0.08812238369489701),\n              ('aspects', 0.013432396769890982),\n              ('of', -0.07127508805657243),\n              ('the', -0.14079766624810955),\n              ('film', -0.16881201614906485),\n              ('I', 0.040595668935112135),\n              ('liked', 0.03239855530171577),\n              (',', -0.17676382558158257),\n              ('but', -0.03797939330341559),\n              ('it', -0.029191325089641736),\n              ('was', 0.01758013584108571),\n              ('frightening', -0.221738963726823),\n              ('and', -0.05126920277135527),\n              ('gross', -0.33986913466614044),\n              ('in', -0.018180366628697),\n              ('parts', 0.02939418603252064),\n              ('.', 0.018080129971003226),\n              ('My', -0.08060162218059498),\n              ('parents', 0.04351719139081836),\n              ('hated', -0.6919028585285265),\n              ('it', 0.0009574844165327357),\n              ('.', -0.059473118237873344),\n              ('', -0.465690452620123),\n              ('</s>', 0.0)]}\n```\n\n</details>\n\n#### Visualize MultiLabel Classification attributions\n\nSometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the `visualize()` method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions. For this explainer attributions will be show w.r.t to each label.\n\nIf you are in a notebook, calls to the `visualize()` method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.\n\n```python\ncls_explainer.visualize(\"multilabel_viz.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/multilabel_example.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/multilabel_example.png\" width=\"80%\" height=\"80%\" align=\"center\"/>\n</a>\n\n</details>\n\n<details><summary>Zero Shot Classification Explainer</summary>\n\n_Models using this explainer must be previously trained on NLI classification downstream tasks and have a label in the model's config called either \"entailment\" or \"ENTAILMENT\"._\n\nThis explainer allows for attributions to be calculated for zero shot classification like models. In order to achieve this we use the same methodology employed by Hugging face. For those not familiar method employed by Hugging Face to achieve zero shot classification the way this works is by exploiting the \"entailment\" label of NLI models. Here is a [link](https://arxiv.org/abs/1909.00161) to a paper explaining more about it. A list of NLI models guaranteed to be compatible with this explainer can be found on the [model hub](https://huggingface.co/models?filter=pytorch&pipeline_tag=zero-shot-classification).\n\nLet's start by initializing a transformers' sequence classification model and tokenizer trained specifically on a NLI task, and passing it to the ZeroShotClassificationExplainer.\n\nFor this example we are using `facebook/bart-large-mnli` which is a checkpoint for a bart-large model trained on the\n[MNLI dataset](https://huggingface.co/datasets/multi_nli). This model typically predicts whether a sentence pair are an entailment, neutral, or a contradiction, however for zero-shot we only look the entailment label.\n\nNotice that we pass our own custom labels `[\"finance\", \"technology\", \"sports\"]` to the class instance. Any number of labels can be passed including as little as one. Whichever label scores highest for entailment can be accessed via `predicted_label`, however the attributions themselves are calculated for every label. If you want to see the attributions for a particular label it is recommended just to pass in that one label and then the attributions will be guaranteed to be calculated w.r.t. that label.\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nfrom transformers_interpret import ZeroShotClassificationExplainer\n\ntokenizer = AutoTokenizer.from_pretrained(\"facebook/bart-large-mnli\")\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\"facebook/bart-large-mnli\")\n\n\nzero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)\n\n\nword_attributions = zero_shot_explainer(\n    \"Today apple released the new Macbook showing off a range of new features found in the proprietary silicon chip computer. \",\n    labels = [\"finance\", \"technology\", \"sports\"],\n)\n\n```\n\nWhich will return the following dict of attribution tuple lists for each label:\n\n```python\n>>> word_attributions\n{'finance': [('<s>', 0.0),\n  ('Today', 0.0),\n  ('apple', -0.016100065046282107),\n  ('released', 0.3348383988281792),\n  ('the', -0.8932952916127369),\n  ('new', 0.14207183688642497),\n  ('Mac', 0.016309545780430777),\n  ('book', -0.06956802041125129),\n  ('showing', -0.12661404114316252),\n  ('off', -0.11470154900720078),\n  ('a', -0.03299250484912159),\n  ('range', -0.002532332125100561),\n  ('of', -0.022451943898971004),\n  ('new', -0.01859870581213379),\n  ('features', -0.020774327263810944),\n  ('found', -0.007734346326330102),\n  ('in', 0.005100588658589585),\n  ('the', 0.04711084622588314),\n  ('proprietary', 0.046352064964644286),\n  ('silicon', -0.0033502000158946127),\n  ('chip', -0.010419324929115785),\n  ('computer', -0.11507972995022273),\n  ('.', 0.12237840300907425)],\n 'technology': [('<s>', 0.0),\n  ('Today', 0.0),\n  ('apple', 0.22505152647747717),\n  ('released', -0.16164146624851905),\n  ('the', 0.5026975657258089),\n  ('new', 0.052589263167955536),\n  ('Mac', 0.2528325960993759),\n  ('book', -0.06445090203729663),\n  ('showing', -0.21204922293777534),\n  ('off', 0.06319714817612732),\n  ('a', 0.032048012090796815),\n  ('range', 0.08553079346908955),\n  ('of', 0.1409201107994034),\n  ('new', 0.0515261917112576),\n  ('features', -0.09656406466213506),\n  ('found', 0.02336613296843605),\n  ('in', -0.0011649894272190678),\n  ('the', 0.14229640664777807),\n  ('proprietary', -0.23169065661847646),\n  ('silicon', 0.5963924257008087),\n  ('chip', -0.19908474233975806),\n  ('computer', 0.030620295844734646),\n  ('.', 0.1995076958535378)],\n 'sports': [('<s>', 0.0),\n  ('Today', 0.0),\n  ('apple', 0.1776618164760026),\n  ('released', 0.10067773539491479),\n  ('the', 0.4813466937627506),\n  ('new', -0.018555244191949295),\n  ('Mac', 0.016338241133536224),\n  ('book', 0.39311969562943677),\n  ('showing', 0.03579210145504227),\n  ('off', 0.0016710813632476176),\n  ('a', 0.04367940034297261),\n  ('range', 0.06076859006993011),\n  ('of', 0.11039711284328052),\n  ('new', 0.003932416031994724),\n  ('features', -0.009660883377622588),\n  ('found', -0.06507586539836184),\n  ('in', 0.2957812911667922),\n  ('the', 0.1584106228974514),\n  ('proprietary', 0.0005789280604917397),\n  ('silicon', -0.04693795680472678),\n  ('chip', -0.1699508539245465),\n  ('computer', -0.4290823663975582),\n  ('.', 0.469314992542427)]}\n```\n\nWe can find out which label was predicted with:\n\n```python\n>>> zero_shot_explainer.predicted_label\n'technology'\n```\n\n#### Visualize Zero Shot Classification attributions\n\nFor the `ZeroShotClassificationExplainer` the visualize() method returns a table similar to the `SequenceClassificationExplainer` but with attributions for every label.\n\n```python\nzero_shot_explainer.visualize(\"zero_shot.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png\" width=\"100%\" height=\"100%\" align=\"center\" />\n</a>\n\n</details>\n\n<details><summary>Question Answering Explainer</summary>\n\nLet's start by initializing a transformers' Question Answering model and tokenizer, and running it through the `QuestionAnsweringExplainer`.\n\nFor this example we are using `bert-large-uncased-whole-word-masking-finetuned-squad`, a bert model finetuned on a SQuAD.\n\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer\nfrom transformers_interpret import QuestionAnsweringExplainer\n\ntokenizer = AutoTokenizer.from_pretrained(\"bert-large-uncased-whole-word-masking-finetuned-squad\")\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"bert-large-uncased-whole-word-masking-finetuned-squad\")\n\nqa_explainer = QuestionAnsweringExplainer(\n    model,\n    tokenizer,\n)\n\ncontext = \"\"\"\nIn Artificial Intelligence and machine learning, Natural Language Processing relates to the usage of machines to process and understand human language.\nMany researchers currently work in this space.\n\"\"\"\n\nword_attributions = qa_explainer(\n    \"What is natural language processing ?\",\n    context,\n)\n```\n\nWhich will return the following dict containing word attributions for both the predicted start and end positions for the answer.\n\n```python\n>>> word_attributions\n{'start': [('[CLS]', 0.0),\n  ('what', 0.9177170660377296),\n  ('is', 0.13382234898765258),\n  ('natural', 0.08061747350142005),\n  ('language', 0.013138062762511409),\n  ('processing', 0.11135923869816286),\n  ('?', 0.00858057388924361),\n  ('[SEP]', -0.09646373141894966),\n  ('in', 0.01545633993975799),\n  ('artificial', 0.0472082598707737),\n  ('intelligence', 0.026687249355110867),\n  ('and', 0.01675371260058537),\n  ('machine', -0.08429502436554961),\n  ('learning', 0.0044827685126163355),\n  (',', -0.02401013152520878),\n  ('natural', -0.0016756080249823537),\n  ('language', 0.0026815068421401885),\n  ('processing', 0.06773157580722854),\n  ('relates', 0.03884601576992908),\n  ('to', 0.009783797821526368),\n  ('the', -0.026650922910540952),\n  ('usage', -0.010675019721821147),\n  ('of', 0.015346787885898537),\n  ('machines', -0.08278008270160107),\n  ('to', 0.12861387892768839),\n  ('process', 0.19540146386642743),\n  ('and', 0.009942879959615826),\n  ('understand', 0.006836894853320319),\n  ('human', 0.05020451122579102),\n  ('language', -0.012980795199301),\n  ('.', 0.00804358248127772),\n  ('many', 0.02259009321498161),\n  ('researchers', -0.02351650942555469),\n  ('currently', 0.04484573078852946),\n  ('work', 0.00990399948294476),\n  ('in', 0.01806961211334615),\n  ('this', 0.13075899776164499),\n  ('space', 0.004298315347838973),\n  ('.', -0.003767904539347979),\n  ('[SEP]', -0.08891544093454595)],\n 'end': [('[CLS]', 0.0),\n  ('what', 0.8227231947501547),\n  ('is', 0.0586864942952253),\n  ('natural', 0.0938903563379123),\n  ('language', 0.058596976016400674),\n  ('processing', 0.1632374290269829),\n  ('?', 0.09695686057123237),\n  ('[SEP]', -0.11644447033554006),\n  ('in', -0.03769172371919206),\n  ('artificial', 0.06736158404049886),\n  ('intelligence', 0.02496399001288386),\n  ('and', -0.03526028847762427),\n  ('machine', -0.20846431491771975),\n  ('learning', 0.00904892847529654),\n  (',', -0.02949905488474854),\n  ('natural', 0.011024507784743872),\n  ('language', 0.0870741751282507),\n  ('processing', 0.11482449622317169),\n  ('relates', 0.05008962090922852),\n  ('to', 0.04079118393166258),\n  ('the', -0.005069048880616451),\n  ('usage', -0.011992752445836278),\n  ('of', 0.01715183316135495),\n  ('machines', -0.29823535624026265),\n  ('to', -0.0043760160855057925),\n  ('process', 0.10503217484645223),\n  ('and', 0.06840313586976698),\n  ('understand', 0.057184000619403944),\n  ('human', 0.0976805947708315),\n  ('language', 0.07031163646606695),\n  ('.', 0.10494566513897102),\n  ('many', 0.019227154676079487),\n  ('researchers', -0.038173913797800885),\n  ('currently', 0.03916641120002003),\n  ('work', 0.03705371672439422),\n  ('in', -0.0003155975107591203),\n  ('this', 0.17254932354022232),\n  ('space', 0.0014311439625599323),\n  ('.', 0.060637932829867736),\n  ('[SEP]', -0.09186286505530596)]}\n```\n\nWe can get the text span for the predicted answer with:\n\n```python\n>>> qa_explainer.predicted_answer\n'usage of machines to process and understand human language'\n```\n\n#### Visualize Question Answering attributions\n\nFor the `QuestionAnsweringExplainer` the visualize() method returns a table with two rows. The first row represents the attributions for the answers' start position and the second row represents the attributions for the answers' end position.\n\n```python\nqa_explainer.visualize(\"bert_qa_viz.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_qa_explainer.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_qa_explainer.png\" width=\"120%\" height=\"120%\" align=\"center\" />\n</a>\n\n</details>\n\n<details><summary>Token Classification (NER) explainer</summary>\n\n_This is currently an experimental explainer under active development and is not yet fully tested. The explainers' API is subject to change as are the attribution methods, if you find any bugs please let me know._\n\nLet's start by initializing a transformers' Token Classfication model and tokenizer, and running it through the `TokenClassificationExplainer`.\n\nFor this example we are using `dslim/bert-base-NER`, a bert model finetuned on the CoNLL-2003 Named Entity Recognition dataset.\n\n```python\nfrom transformers import AutoModelForTokenClassification, AutoTokenizer\nfrom transformers_interpret import TokenClassificationExplainer\n\nmodel = AutoModelForTokenClassification.from_pretrained('dslim/bert-base-NER')\ntokenizer = AutoTokenizer.from_pretrained('dslim/bert-base-NER')\n\nner_explainer = TokenClassificationExplainer(\n    model,\n    tokenizer,\n)\n\nsample_text = \"We visited Paris last weekend, where Emmanuel Macron lives.\"\n\nword_attributions = ner_explainer(sample_text, ignored_labels=['O'])\n\n```\n\nIn order to reduce the number of attributions that are calculated, we tell the explainer to ignore the tokens that whose predicted label is `'O'`. We could also tell the explainer to ignore certain indexes providing a list as argument of the parameter `ignored_indexes`.\n\nWhich will return the following dict of including the predicted label and the attributions for each of token, except those which were predicted as 'O':\n\n```python\n>>> word_attributions\n{'paris': {'label': 'B-LOC',\n  'attribution_scores': [('[CLS]', 0.0),\n   ('we', -0.014352325471387907),\n   ('visited', 0.32915222186559123),\n   ('paris', 0.9086791784795596),\n   ('last', 0.15181203147624034),\n   ('weekend', 0.14400210630677038),\n   (',', 0.01899744327012935),\n   ('where', -0.039402005463239465),\n   ('emmanuel', 0.061095284002642025),\n   ('macro', 0.004192922551105228),\n   ('##n', 0.09446355513057757),\n   ('lives', -0.028724312616455003),\n   ('.', 0.08099007392937585),\n   ('[SEP]', 0.0)]},\n 'emmanuel': {'label': 'B-PER',\n  'attribution_scores': [('[CLS]', 0.0),\n   ('we', -0.006933030636686712),\n   ('visited', 0.10396962390436904),\n   ('paris', 0.14540758744233165),\n   ('last', 0.08024018944451371),\n   ('weekend', 0.10687970996804418),\n   (',', 0.1793198466387937),\n   ('where', 0.3436407835483767),\n   ('emmanuel', 0.8774892642652167),\n   ('macro', 0.03559399361048316),\n   ('##n', 0.1516315604785551),\n   ('lives', 0.07056441327498127),\n   ('.', -0.025820924624605487),\n   ('[SEP]', 0.0)]},\n 'macro': {'label': 'I-PER',\n  'attribution_scores': [('[CLS]', 0.0),\n   ('we', 0.05578067326280157),\n   ('visited', 0.00857021283406586),\n   ('paris', 0.16559056506114297),\n   ('last', 0.08285256685903823),\n   ('weekend', 0.10468727443796395),\n   (',', 0.09949509071515888),\n   ('where', 0.3642458274356929),\n   ('emmanuel', 0.7449335213978788),\n   ('macro', 0.3794625659183485),\n   ('##n', -0.2599031433800762),\n   ('lives', 0.20563450682196147),\n   ('.', -0.015607017319486929),\n   ('[SEP]', 0.0)]},\n '##n': {'label': 'I-PER',\n  'attribution_scores': [('[CLS]', 0.0),\n   ('we', 0.025194121717285252),\n   ('visited', -0.007415022865239864),\n   ('paris', 0.09478357303107598),\n   ('last', 0.06927939834474463),\n   ('weekend', 0.0672008033510708),\n   (',', 0.08316907214363504),\n   ('where', 0.3784915854680165),\n   ('emmanuel', 0.7729352621546081),\n   ('macro', 0.4148652759139777),\n   ('##n', -0.20853534512145033),\n   ('lives', 0.09445057087678274),\n   ('.', -0.094274985907366),\n   ('[SEP]', 0.0)]},\n '[SEP]': {'label': 'B-LOC',\n  'attribution_scores': [('[CLS]', 0.0),\n   ('we', -0.3694351403796742),\n   ('visited', 0.1699038407402483),\n   ('paris', 0.5461587414992369),\n   ('last', 0.0037948102770307517),\n   ('weekend', 0.1628100955702496),\n   (',', 0.4513093410909263),\n   ('where', -0.09577409464161038),\n   ('emmanuel', 0.48499459835388914),\n   ('macro', -0.13528905587653023),\n   ('##n', 0.14362969934754344),\n   ('lives', -0.05758007024257254),\n   ('.', -0.13970977266152554),\n   ('[SEP]', 0.0)]}}\n```\n\n#### Visualize NER attributions\n\nFor the `TokenClassificationExplainer` the visualize() method returns a table with as many rows as tokens.\n\n```python\nner_explainer.visualize(\"bert_ner_viz.html\")\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_ner_explainer.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/bert_ner_explainer.png\" width=\"120%\" height=\"120%\" align=\"center\" />\n</a>\n\nFor more details about how the `TokenClassificationExplainer` works, you can check the notebook [notebooks/ner_example.ipynb](notebooks/ner_example.ipynb).\n\n</details>\n\n### Vision Explainers\n\n<details><summary> Image Classification Explainer </summary>\n\n<p>\n\nThe `ImageClassificationExplainer` is designed to work with all models from the Transformers library that are trained for image classification (Swin, ViT etc). It provides attributions for every pixel in that image that can be easily visualized using the explainers built in `visualize` method.\n\nInitialising an image classification is very simple, all you need a is a image classification model finetuned or trained to work with Huggingface and its feature extractor.\n\nFor this example we are using `google/vit-base-patch16-224`, a Vision Transformer (ViT) model pre-trained on ImageNet-21k that predicts from 1000 possible classes.\n\n```python\nfrom transformers import AutoFeatureExtractor, AutoModelForImageClassification\nfrom transformers_interpret import ImageClassificationExplainer\nfrom PIL import Image\nimport requests\n\nmodel_name = \"google/vit-base-patch16-224\"\nmodel = AutoModelForImageClassification.from_pretrained(model_name)\nfeature_extractor = AutoFeatureExtractor.from_pretrained(model_name)\n\n# With both the model and feature extractor initialized we are now able to get explanations on an image, we will use a simple image of a golden retriever.\nimage_link = \"https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F47%2F2020%2F08%2F16%2Fgolden-retriever-177213599-2000.jpg\"\n\nimage = Image.open(requests.get(image_link, stream=True).raw)\n\nimage_classification_explainer = ImageClassificationExplainer(model=model, feature_extractor=feature_extractor)\n\nimage_attributions = image_classification_explainer(\n    image\n)\n\nprint(image_attributions.shape)\n```\n\nWhich will return the following list of tuples:\n\n```python\n>>> torch.Size([1, 3, 224, 224])\n```\n\n#### Visualizing Image Attributions\n\nBecause we are dealing with images visualization is even more straightforward than in text models.\n\nAttrbutions can be easily visualized using the `visualize` method of the explainer. There are currently 4 supported visualization methods.\n\n- `heatmap` - a heatmap of positive and negative attributions is drawn in using the dimensions of the image.\n- `overlay` - the heatmap is overlayed over a grayscaled version of the original image\n- `masked_image` - the absolute value of attrbutions is used to create a mask over original image\n- `alpha_scaling` - Sets alpha channel (transparency) of each pixel to be equal to normalized attribution value.\n\n#### Heatmap\n\n```python\nimage_classification_explainer.visualize(\n    method=\"heatmap\",\n    side_by_side=True,\n    outlier_threshold=0.03\n\n)\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png\" width=\"100%\" height=\"100%\" align=\"center\"/>\n</a>\n\n#### Overlay\n\n```python\nimage_classification_explainer.visualize(\n    method=\"overlay\",\n    side_by_side=True,\n    outlier_threshold=0.03\n\n)\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png\" width=\"100%\" height=\"100%\" align=\"center\"/>\n</a>\n\n\n#### Masked Image\n\n```python\nimage_classification_explainer.visualize(\n    method=\"masked_image\",\n    side_by_side=True,\n    outlier_threshold=0.03\n\n)\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png\" width=\"100%\" height=\"100%\" align=\"center\"/>\n</a>\n\n\n#### Alpha Scaling\n\n```python\nimage_classification_explainer.visualize(\n    method=\"alpha_scaling\",\n    side_by_side=True,\n    outlier_threshold=0.03\n\n)\n```\n\n<a href=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png\">\n<img src=\"https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png\" width=\"100%\" height=\"100%\" align=\"center\"/>\n\n</a>\n</details>\n\n## Future Development\n\nThis package is still in active development and there is much more planned. For a 1.0.0 release we're aiming to have:\n\n- Clean and thorough documentation website\n- ~~Support for Question Answering models~~\n- ~~Support for NER models~~\n- ~~Support for Zero Shot Classification models.~~\n- ~~Ability to show attributions for multiple embedding type, rather than just the word embeddings.~~\n- Support for SentenceTransformer embedding models and other image embeddings\n- Additional attribution methods\n- ~~Support for vision transformer models~~\n- In depth examples\n- ~~A nice logo~~ (thanks @Voyz)\n- and more... feel free to submit your suggestions!\n\n## Contributing\n\nIf you would like to make a contribution please checkout our [contribution guidelines](https://github.com/cdpierse/transformers-interpret/blob/master/CONTRIBUTING.md)\n\n## Questions / Get In Touch\n\nThe maintainer of this repository is [@cdpierse](https://github.com/cdpierse).\n\nIf you have any questions, suggestions, or would like to make a contribution (please do \ud83d\ude01), feel free to get in touch at charlespierse@gmail.com\n\nI'd also highly suggest checking out [Captum](https://captum.ai/) if you find model explainability and interpretability interesting.\n\nThis package stands on the shoulders of the the incredible work being done by the teams at [Pytorch Captum](https://captum.ai/) and [Hugging Face](https://huggingface.co/) and would not exist if not for the amazing job they are both doing in the fields of ML and model interpretability respectively.\n\n## Reading and Resources\n\n<details><summary>Captum</summary>\n\n<p>\n\nAll of the attributions within this package are calculated using PyTorch's explainability package [Captum](https://captum.ai/). See below for some useful links related to Captum.\n\n- [Captum Algorithm Overview](https://captum.ai/docs/algorithms)\n- [Bert QA Example](https://captum.ai/tutorials/Bert_SQUAD_Interpret) this an implementation acheived purely using Captum.\n- [API Reference](https://captum.ai/api/)\n- [Model Interpretability with Captum - Narine Kokhilkyan (Video)](https://www.youtube.com/watch?v=iVSIFm0UN9I)\n\n</details>\n\n<details><summary>Attribution</summary>\n\n<p>\n\nIntegrated Gradients (IG) and a variation of it Layer Integrated Gradients (LIG) are the core attribution methods on which Transformers Interpret is currently built. Below are some useful resources including the original paper and some video links explaining the inner mechanics. If you are curious about what is going on inside of Transformers Interpret I highly recommend checking out at least one of these resources.\n\n- [Axiomatic Attributions for Deep Networks](https://arxiv.org/abs/1703.01365) the original paper [2017] where Integrated Gradients was specified.\n- [Fiddler AI YouTube video on IG](https://www.youtube.com/watch?v=9AaDc35JYiI)\n- [Henry AI Labs YouTube Primer on IG](https://www.youtube.com/watch?v=MB8KYX5UzKw)\n- [Explaining Explanations: Axiomatic Feature Interactions for Deep Networks](http://export.arxiv.org/abs/2002.04138) more recent paper [2020] extending the work of the original paper.\n\n</details>\n\n<details><summary>Miscellaneous</summary>\n\n**Captum Links**\n\nBelow are some links I used to help me get this package together using Captum. Thank you to @davidefiocco for your very insightful GIST.\n\n- [Link to useful GIST on captum](https://gist.github.com/davidefiocco/3e1a0ed030792230a33c726c61f6b3a5)\n- [Link to runnable colab of captum with BERT](https://colab.research.google.com/drive/1snFbxdVDtL3JEFW7GNfRs1PZKgNHfoNz)\n\n[transformers]: https://huggingface.co/transformers/\n\n</details>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Model explainability that works seamlessly with \ud83e\udd17 transformers. Explain your transformers model in just 2 lines of code.",
    "version": "0.10.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a84c19098012fa903e9cea195fc381db0504f73fe3fcf9ccf4338f4b2b157092",
                "md5": "21a37a54108a6adff3624ff685f99494",
                "sha256": "851f44370d5977392bde820d2caa742e22a312dcbff038c79c0771e11037ba2c"
            },
            "downloads": -1,
            "filename": "transformers_interpret-0.10.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "21a37a54108a6adff3624ff685f99494",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 45798,
            "upload_time": "2023-04-06T23:05:32",
            "upload_time_iso_8601": "2023-04-06T23:05:32.416228Z",
            "url": "https://files.pythonhosted.org/packages/a8/4c/19098012fa903e9cea195fc381db0504f73fe3fcf9ccf4338f4b2b157092/transformers_interpret-0.10.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fa3c23798b5fda387840439aa1a0452719a1bf0f11d430da37131ad99560f2b9",
                "md5": "c04b32122f570327e6fd59c3a3c8186f",
                "sha256": "bf0e312ba5fd416249ec8fc383e57181b632a7a64dbf51c7ac78d7002140ade7"
            },
            "downloads": -1,
            "filename": "transformers-interpret-0.10.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c04b32122f570327e6fd59c3a3c8186f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 63177,
            "upload_time": "2023-04-06T23:05:29",
            "upload_time_iso_8601": "2023-04-06T23:05:29.861140Z",
            "url": "https://files.pythonhosted.org/packages/fa/3c/23798b5fda387840439aa1a0452719a1bf0f11d430da37131ad99560f2b9/transformers-interpret-0.10.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-06 23:05:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "transformers-interpret"
}
        
Elapsed time: 0.05586s