Name | asreview-insights JSON |
Version |
1.5
JSON |
| download |
home_page | None |
Summary | Insights and plotting tool for the ASReview project |
upload_time | 2025-02-03 13:05:30 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | Apache-2.0 |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# ASReview Insights
[](https://badge.fury.io/py/asreview-insights) [](https://pepy.tech/project/asreview-insights)    [](https://zenodo.org/badge/latestdoi/235795131)
This official extension to [ASReview
LAB](https://github.com/asreview/asreview) extends the software with tools for
[plotting](#plot-types) and extracting the [statistical results](#metrics) of
several [performance metrics](#performance-metrics). The extension is
especially useful in combination with the [simulation
functionality](https://asreview.readthedocs.io/en/latest/simulation_overview.html)
of ASReview LAB.
## Installation
ASReview Insights can be installed from PyPI:
``` bash
pip install asreview-insights
```
After installation, check if the `asreview-insights` package is listed as an
extension. Use the following command:
```bash
asreview --help
```
It should list the 'plot' subcommand and the 'metrics' subcommand.
## Performance metrics
The ASReview Insights extension is useful for measuring the performance of
active learning models on collections of binary labeled text. The extension
can be used after performing a simulation study that involves mimicking the
screening process with a specific model. As it is already known which records
are labeled relevant, the simulation can automatically reenact the screening
process as if a screener were using active learning. The performance of one or
multiple models can be measured by different metrics and the ASReview Insights
extension can plot or compute the values for such metrics from ASReview
project files. [O'Mara-Eves et al.
(2015)](https://doi.org/10.1186/2046-4053-4-5) provides a comprehensive
overview of different metrics used in the field of actrive learning. Below we
describe the metrics available in the software.
### Recall
The recall is the proportion of relevant records that have been found at a
certain point during the screening phase. It is sometimes also called the
proportion of Relevant Record Found (RRF) after screening an X% of the total
records. For example, the RRF@10 is the recall (i.e., the proportion of the
total number of relevant records) at screening 10% of the total number of
records available in the dataset.
### Confusion matrix
The confusion matrix consist of the True Positives (TP), False Positives (FP),
True Negatives (TN), and False Negatives (FN). Definitions are provided in the
following table retrieved at a certain recall (r%).
| | Definition | Calculation |
|----------------------|----------------------------------------------------------------------------------------|---------------------------------|
| True Positives (TP) | The number of relevant records found at recall level | Relevant Records * r% |
| False Positives (FP) | The number of irrelevant records reviewed at recall level | Records Reviewed – TP |
| True Negatives (TN) | The number of irrelevant records correctly not reviewed at recall level | Irrelevant Records – FP |
| False Negatives (FN) | The number of relevant records not reviewed at recall level (missing relevant records) | Relevant Records – TP |
### Work saved over sampling
The Work Saved over Sampling (WSS) is a measure of "the work saved over and
above the work saved by simple sampling for a given level of recall" [(Cohen
et al., 2006)](https://doi.org/10.1197/jamia.m1929). It is defined as the
proportion of records a screener does **not** have to screen compared to
random reading after providing the prior knowledge used to train the first
iteration of the model. The WSS is typically measured at a recall of .95
(WSS@95), reflecting the proportion of records saved by using active learning
at the cost of failing to identify .05 of relevant publications.
[Kusa et al. (2023)](https://doi.org/10.1016/j.iswa.2023.200193) propose to
normalize the WSS for class imbalance (denoted as the nWSS). Moreover, Kusa et
al. showed that nWSS is equal to the True Negative Rate (TNR). The TNR is the
proportion of irrelevant records that were correctly not reviewed at level of
recall. The nWSS is useful to compare performance in terms of work saved
across datasets and models while controlling for dataset class imbalance.
The following table provides a hypothetical dataset example:
| Dataset characteristics | Example value |
|-------------------------|-------------------|
| Total records | 2000 |
| Records Reviewed | 1100 |
| Relevant Records | 100 |
| Irrelevant Records | 1900 |
| Class imbalance | 5% |
With this information, the following metrics can be calculated:
| Metric | Example value |
|----------|-------------------|
| TP | 95 |
| FP | 1100 – 95 = 1005 |
| TN | 1900 – 1005 = 895 |
| FN | 100 – 95 = 5 |
| TNR95% | 895 / 1900 = 0.47 |
### Extra relevant found
A variation is the Extra Relevant records Found (ERF), which is the proportion
of relevant records found after correcting for the number of relevant records
found via random screening (assuming a uniform distribution of relevant
records).
The following plot illustrates the differences between the metrics Recall
(y-axis), WSS (blue line), and ERF (red line). The dataset contains 1.000
hypothetical records with labels. The stepped line on the diagonal is the
naive labeling approach (screening randomly sorted records).

### Time to discovery
Both recall and WSS are sensitive to the position of the cutoff value and the
distribution of the data. Moreover, the WSS makes assumptions about the
acceptable recall level whereas this level might depend on the research
question at hand. Therefore, [Ferdinands et al.
(2020)](https://doi.org/10.1186/s13643-023-02257-7) proposed two new metrics:
(1) the Time to Discover a relevant record as the fraction of records needed
to screen to detect this record (TD); and (2) the Average Time to Discover
(ATD) as an indicator of how many records need to be screened on average to
find all relevant records in the dataset. The TD metric enables you to
pinpoint hard-to-find papers. The ATD, on the other hand, measures performance
throughout the entire screening process, eliminating reliance on arbitrary
cut-off values, and can be used to compare different models.
### Loss
The Loss metric evaluates the performance of an active learning model by
quantifying how closely it approximates the ideal screening process. This
quantification is then normalized between the ideal curve and the worst possible
curve.
While metrics like WSS, Recall, and ERF evaluate the performance at specific
points on the recall curve, the Loss metric provides an overall measure of
performance.
To compute the loss, we start with three key concepts:
1. **Optimal AUC**: This is the area under a "perfect recall curve," where
relevant records are identified as early as possible. Mathematically, it is
computed as $Nx \times Ny - \frac{Ny \times (Ny - 1)}{2}$, where $Nx$ is the
total number of records, and $Ny$ is the number of relevant records.
2. **Worst AUC**: This represents the area under a worst-case recall curve,
where all relevant records appear at the end of the screening process. This
is calculated as $\frac{Ny \times (Ny + 1)}{2}$.
3. **Actual AUC**: This is the area under the recall curve produced by the model
during the screening process. It can be obtained by summing up the cumulative
recall values for the labeled records.
The normalized loss is calculated by taking the difference between the optimal
AUC and the actual AUC, divided by the difference between the optimal AUC and
the worst AUC.
$$\text{Normalized Loss} = \frac{Ny \times \left(Nx - \frac{Ny - 1}{2}\right) -
\sum \text{Cumulative Recall}}{Ny \times (Nx - Ny)}$$
The lower the loss, the closer the model is to the perfect recall curve,
indicating higher performance.

In this figure, the green area between the recall curve and the perfect recall line is the lost performance, which is then normalized for the total area (green and red combined).
## Basic usage
The ASReview Insights package extends ASReview LAB with two new subcommands
(see `asreview --help`): [`plot`](#plot) and [`metrics`](#metrics). The plots
and metrics are derived from an ASReview project file. The ASReview file
(extension `.asreview`) can be
[exported](https://asreview.readthedocs.io/en/latest/manage.html#export-project)
from ASReview LAB after a
[simulation](https://asreview.readthedocs.io/en/latest/simulation_overview.html),
or it is generated from running a [simulation via the command
line](https://asreview.readthedocs.io/en/latest/simulation_cli.html).
For example, an ASReview can be generated with:
```python
asreview simulate benchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview --init_seed 535
```
To use the most basic options of the ASReview Insights extension, run
```bash
asreview plot recall YOUR_ASREVIEW_FILE.asreview
```
where `recall` is the type of the plot, or
```bash
asreview metrics sim_van_de_schoot_2017.asreview
```
More options are described in the sections below. All options can be
obtained via `asreview plot --help` or `asreview metrics --help`.
## `Plot`
### Plot types
#### `recall`
The recall is an important metric to study the performance of active learning
algorithms in the context of information retrieval. ASReview Insights
offers a straightforward command line interface to plot a "recall curve". The
recall curve is the recall at any moment in the active learning process.
To plot the recall curve, you need a ASReview file (extension `.asreview`). To
plot the recall, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by
your ASReview file name.):
```bash
asreview plot recall YOUR_ASREVIEW_FILE.asreview
```
The following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via
the benchmark platform (command `asreview simulate
benchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).

On the vertical axis, you find the recall (i.e, the proportion of the relevant
records) after every labeling decision. The horizontal axis shows the
proportion of total number of records in the dataset. The steeper the recall
curve, the higher the performance of active learning when comparted to random
screening. The recall curve can also be used to estimate stopping criteria, see
the discussions in [#557](https://github.com/asreview/asreview/discussions/557) and [#1115](https://github.com/asreview/asreview/discussions/1115).
```bash
asreview plot recall YOUR_ASREVIEW_FILE.asreview
```
#### `wss`
The Work Saved over Sampling (WSS) metric is a useful metric to study the
performance of active learning alorithms compared with a naive (random order)
approach at a given level of recall. ASReview Insights offers a
straightforward command line interface to plot the WSS at any level of recall.
To plot the WSS curve, you need a ASReview file (extension `.asreview`). To
plot the WSS, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by your
ASReview file name.):
```bash
asreview plot wss YOUR_ASREVIEW_FILE.asreview
```
The following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via
the benchmark platform (command `asreview simulate
benchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).

On the vertical axis, you find the WSS after every labeling decision. The
recall is displayed on the horizontal axis. As shown in the figure, the
WSS is linearly related to the recall.
#### `erf`
The Extra Relevant Records found is a derivative of the recall and presents
the proportion of relevant records found after correcting for the number of
relevant records found via random screening (assuming a uniform distribution
of relevant records).
To plot the ERF curve, you need a ASReview file (extension `.asreview`). To
plot the ERF, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by your
ASReview file name.):
```bash
asreview plot erf YOUR_ASREVIEW_FILE.asreview
```
The following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via
the benchmark platform (command `asreview simulate
benchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).

On the vertical axis, you find the ERF after every labeling decision. The
horizontal axis shows the proportion of total number of records in the
dataset. The steep increase of the ERF in the beginning of the process is
related to the steep recall curve.
### Plotting CLI
Optional arguments for the command line are `--priors` to include prior
knowledge, `--x_absolute` and `--y_absolute` to use absolute axes.
See `asreview plot -h` for all command line arguments.
### Plotting multiple files
It is possible to show the curves of multiple files in one plot. Use this
syntax (replace `YOUR_ASREVIEW_FILE_1` and `YOUR_ASREVIEW_FILE_2` by the
asreview_files that you want to include in the plot):
```bash
asreview plot recall YOUR_ASREVIEW_FILE_1.asreview YOUR_ASREVIEW_FILE_2.asreview
```
### Plotting API
To make use of the more advanced features, you can make use of the Python API.
The advantage is that you can tweak every single element of the plot in the
way you like. The following examples show how the Python API can be used. They
make use of matplotlib extensively. See the [Introduction to
Matplotlib](https://matplotlib.org/stable/tutorials/introductory/usage.html)
for examples on using the API.
The following example show how to plot the recall with the API and save the
result. The plot is saved using the matplotlib API.
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_recall
with open_state("example.asreview") as s:
fig, ax = plt.subplots()
plot_recall(ax, s)
fig.savefig("example.png")
```
Other options are `plot_wss` and `plot_erf`.
#### Example: Customize plot
It's straightforward to customize the plots if you are familiar with
`matplotlib`. The following example shows how to update the title of the plot.
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_wss
with open_state("example.asreview") as s:
fig, ax = plt.subplots()
plot_wss(ax, s)
plt.title("WSS with custom title")
fig.savefig("example_custom_title.png")
```

#### Example: Prior knowledge
It's possible to include prior knowledge in your plot. By default, prior
knowledge is excluded from the plot.
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_wss
with open_state("example.asreview") as s:
fig, ax = plt.subplots()
plot_wss(ax, s, priors=True)
```
#### Example: Relative versus absolute axes
By default, all axes in ASReview Insights are relative. The API can be used to
change this behavior. The arguments are identical for each plot function.
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_wss
with open_state("example.asreview") as s:
fig, ax = plt.subplots()
plot_wss(ax, s, x_absolute=True, y_absolute=True)
fig.savefig("example_absolute_axis.png")
```

#### Example: Adjusting the random and optimal recalls
By default, each plot will have a curve representing optimal performance, and a
curve representing random sampling performance. Both curves can be removed from
the graph.
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_recall
with open_state("example.asreview") as s:
fig, ax = plt.subplots()
plot_recall(ax, s, show_random=False, show_optimal=False)
fig.savefig("example_without_curves.png")
```

#### Example: Legend for multiple curves in one plot
If you have multiple curves in one plot, you can customize the legend:
```python
import matplotlib.pyplot as plt
from asreview import open_state
from asreviewcontrib.insights.plot import plot_recall
fig, ax = plt.subplots()
with open_state("tests/asreview_files/sim_van_de_schoot_2017_1.asreview") as s1:
with open_state("tests/asreview_files/"
"sim_van_de_schoot_2017_logistic.asreview") as s2:
plot_recall(ax,
[s1, s2],
legend_values=["Naive Bayes", "Logistic"],
legend_kwargs={'loc': 'lower center'})
fig.savefig("docs/example_multiple_lines.png")
```

## `metrics`
The `metrics` subcommand in ASReview Insights can be used to compute metrics
at given values. The easiest way to compute metrics for a ASReview project
file is with the following command on the command line:
```
asreview metrics sim_van_de_schoot_2017.asreview
```
which results in
```
"asreviewVersion": "1.0",
"apiVersion": "1.0",
"data": {
"items": [
{
"id": "recall",
"title": "Recall",
"value": [
[
0.1,
1.0
],
[
0.25,
1.0
],
[
0.5,
1.0
],
[
0.75,
1.0
],
[
0.9,
1.0
]
]
},
{
"id": "wss",
"title": "Work Saved over Sampling",
"value": [
[
0.95,
0.8913851624373686
]
]
},
{
"id": "loss",
"title": "Loss",
"value": 0.01707543880041846
},
{
"id": "erf",
"title": "Extra Relevant record Found",
"value": [
[
0.1,
0.9047619047619048
]
]
},
{
"id": "atd",
"title": "Average time to discovery",
"value": 101.71428571428571
},
{
"id": "td",
"title": "Time to discovery",
"value": [
[
3898,
22
],
[
284,
23
],
[
592,
25
],
...
[
2382,
184
],
[
5479,
224
],
[
3316,
575
]
]
},
{
"id": "tp",
"title": "True Positives",
"value": [
[
0.95,
39
],
[
1.0,
42
]
]
},
{
"id": "fp",
"title": "False Positives",
"value": [
[
0.95,
122
],
[
1.0,
517
]
]
},
{
"id": "tn",
"title": "True Negatives",
"value": [
[
0.95,
6023
],
[
1.0,
5628
]
]
},
{
"id": "fn",
"title": "False Negatives",
"value": [
[
0.95,
3
],
[
1.0,
0
]
]
},
{
"id": "tnr",
"title": "True Negative Rate (Specificity)",
"value": [
[
0.95,
0.980146
],
[
1.0,
0.915867
]
]
}
]
}
}
```
Each available item has two values. The first value is the value at which the
metric is computed. In the plots above, this is the x-axis. The second value
is the results of the metric. Some metrics are computed for multiple values.
| Metric | Description pos. 1 | Description pos. 2 | Default |
|---|---|---|---|
| `recall` | Labels | Recall | 0.1, 0.25, 0.5, 0.75, 0.9 |
| `wss` | Recall | Work Saved over Sampling at recall | 0.95 |
| `erf` | Labels | ERF | 0.10 |
| `atd` | Average time to discovery (in label actions) | - | - |
| `td` | Row number (starting at 0) | Number of records labeled | - |
| `cm` | Recall | Confusion matrix values at recall | 0.95, 1 |
### Override default values
It is possible to override the default values of `asreview metrics`. See
`asreview metrics -h` for more information or see the example below.
```
asreview metrics sim_van_de_schoot_2017.asreview --wss 0.9 0.95
```
```
{
"asreviewVersion": "1.0",
"apiVersion": "1.0",
"data": {
"items": [
{
"id": "recall",
"title": "Recall",
"value": [
[
0.1,
1.0
],
[
0.25,
1.0
],
[
0.5,
1.0
],
[
0.75,
1.0
],
[
0.9,
1.0
]
]
},
{
"id": "wss",
"title": "Work Saved over Sampling",
"value": [
[
0.9,
0.8474220139001132
],
[
0.95,
0.8913851624373686
]
]
},
{
"id": "erf",
"title": "Extra Relevant record Found",
"value": [
[
0.1,
0.9047619047619048
]
]
},
{
"id": "atd",
"title": "Average time to discovery",
"value": 101.71428571428571
},
{
"id": "td",
"title": "Time to discovery",
"value": [
[
3898,
22
],
[
284,
23
],
[
592,
25
],
...
[
2382,
184
],
[
5479,
224
],
[
3316,
575
]
]
}
]
}
}
```
### Save metrics to file
Metrics can be saved to a file in the JSON format. Use the flag `-o` or
`--output`.
```
asreview metrics sim_van_de_schoot_2017.asreview -o my_file.json
```
### Metrics CLI
Optional arguments for the command line are `--priors` to include prior
knowledge, `--x_absolute` and `--y_absolute` to use absolute axes.
See `asreview metrics -h` for all command line arguments.
### Metrics API
Metrics are easily accesible with the ASReview Insights API.
Compute the recall after reading half of the dataset.
```python
from asreview import open_state
from asreviewcontrib.insights.metrics import recall
with open_state("example.asreview") as s:
print(recall(s, 0.5))
```
Other metrics are available like `wss` and `erf`.
#### Example: Prior knowledge
It's possible to include prior knowledge to your metric. By default, prior
knowledge is excluded from the metric.
```python
from asreview import open_state
from asreviewcontrib.insights.metrics import recall
with open_state("example.asreview") as s:
print(recall(s, 0.5, priors=True))
```
## License
This extension is published under the [MIT license](/LICENSE).
## Contact
This extension is part of the ASReview project ([asreview.ai](https://asreview.ai)). It is maintained by the
maintainers of ASReview LAB. See [ASReview
LAB](https://github.com/asreview/asreview) for contact information and more
resources.
Raw data
{
"_id": null,
"home_page": null,
"name": "asreview-insights",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "ASReview LAB developers <asreview@uu.nl>",
"download_url": "https://files.pythonhosted.org/packages/66/8b/d2f986dc181138404f2e10997c9abeb662b1cead1731a4f21b0970c82bb8/asreview_insights-1.5.tar.gz",
"platform": null,
"description": "# ASReview Insights\n\n[](https://badge.fury.io/py/asreview-insights) [](https://pepy.tech/project/asreview-insights)    [](https://zenodo.org/badge/latestdoi/235795131)\n\n\nThis official extension to [ASReview\nLAB](https://github.com/asreview/asreview) extends the software with tools for\n[plotting](#plot-types) and extracting the [statistical results](#metrics) of\nseveral [performance metrics](#performance-metrics). The extension is\nespecially useful in combination with the [simulation\nfunctionality](https://asreview.readthedocs.io/en/latest/simulation_overview.html)\nof ASReview LAB.\n\n## Installation\n\nASReview Insights can be installed from PyPI:\n\n``` bash\npip install asreview-insights\n```\n\nAfter installation, check if the `asreview-insights` package is listed as an\nextension. Use the following command:\n\n```bash\nasreview --help\n```\n\nIt should list the 'plot' subcommand and the 'metrics' subcommand.\n\n## Performance metrics\n\nThe ASReview Insights extension is useful for measuring the performance of\nactive learning models on collections of binary labeled text. The extension\ncan be used after performing a simulation study that involves mimicking the\nscreening process with a specific model. As it is already known which records\nare labeled relevant, the simulation can automatically reenact the screening\nprocess as if a screener were using active learning. The performance of one or\nmultiple models can be measured by different metrics and the ASReview Insights\nextension can plot or compute the values for such metrics from ASReview\nproject files. [O'Mara-Eves et al.\n(2015)](https://doi.org/10.1186/2046-4053-4-5) provides a comprehensive\noverview of different metrics used in the field of actrive learning. Below we\ndescribe the metrics available in the software.\n\n### Recall\n\nThe recall is the proportion of relevant records that have been found at a\ncertain point during the screening phase. It is sometimes also called the\nproportion of Relevant Record Found (RRF) after screening an X% of the total\nrecords. For example, the RRF@10 is the recall (i.e., the proportion of the\ntotal number of relevant records) at screening 10% of the total number of\nrecords available in the dataset.\n\n### Confusion matrix\n\nThe confusion matrix consist of the True Positives (TP), False Positives (FP),\nTrue Negatives (TN), and False Negatives (FN). Definitions are provided in the\nfollowing table retrieved at a certain recall (r%).\n\n| | Definition | Calculation |\n|----------------------|----------------------------------------------------------------------------------------|---------------------------------|\n| True Positives (TP) | The number of relevant records found at recall level | Relevant Records * r% |\n| False Positives (FP) | The number of irrelevant records reviewed at recall level | Records Reviewed \u2013 TP |\n| True Negatives (TN) | The number of irrelevant records correctly not reviewed at recall level | Irrelevant Records \u2013 FP |\n| False Negatives (FN) | The number of relevant records not reviewed at recall level (missing relevant records) | Relevant Records \u2013 TP |\n\n### Work saved over sampling\n\nThe Work Saved over Sampling (WSS) is a measure of \"the work saved over and\nabove the work saved by simple sampling for a given level of recall\" [(Cohen\net al., 2006)](https://doi.org/10.1197/jamia.m1929). It is defined as the\nproportion of records a screener does **not** have to screen compared to\nrandom reading after providing the prior knowledge used to train the first\niteration of the model. The WSS is typically measured at a recall of .95\n(WSS@95), reflecting the proportion of records saved by using active learning\nat the cost of failing to identify .05 of relevant publications.\n\n[Kusa et al. (2023)](https://doi.org/10.1016/j.iswa.2023.200193) propose to\nnormalize the WSS for class imbalance (denoted as the nWSS). Moreover, Kusa et\nal. showed that nWSS is equal to the True Negative Rate (TNR). The TNR is the\nproportion of irrelevant records that were correctly not reviewed at level of\nrecall. The nWSS is useful to compare performance in terms of work saved\nacross datasets and models while controlling for dataset class imbalance.\n\nThe following table provides a hypothetical dataset example:\n\n| Dataset characteristics | Example value |\n|-------------------------|-------------------|\n| Total records | 2000 |\n| Records Reviewed | 1100 |\n| Relevant Records | 100 |\n| Irrelevant Records | 1900 |\n| Class imbalance | 5% |\n\nWith this information, the following metrics can be calculated:\n\n| Metric | Example value |\n|----------|-------------------|\n| TP | 95 |\n| FP | 1100 \u2013 95 = 1005 |\n| TN | 1900 \u2013 1005 = 895 |\n| FN | 100 \u2013 95 = 5 |\n| TNR95% | 895 / 1900 = 0.47 |\n\n\n### Extra relevant found\n\nA variation is the Extra Relevant records Found (ERF), which is the proportion\nof relevant records found after correcting for the number of relevant records\nfound via random screening (assuming a uniform distribution of relevant\nrecords).\n\nThe following plot illustrates the differences between the metrics Recall\n(y-axis), WSS (blue line), and ERF (red line). The dataset contains 1.000\nhypothetical records with labels. The stepped line on the diagonal is the\nnaive labeling approach (screening randomly sorted records).\n\n\n\n### Time to discovery\n\nBoth recall and WSS are sensitive to the position of the cutoff value and the\ndistribution of the data. Moreover, the WSS makes assumptions about the\nacceptable recall level whereas this level might depend on the research\nquestion at hand. Therefore, [Ferdinands et al.\n(2020)](https://doi.org/10.1186/s13643-023-02257-7) proposed two new metrics:\n(1) the Time to Discover a relevant record as the fraction of records needed\nto screen to detect this record (TD); and (2) the Average Time to Discover\n(ATD) as an indicator of how many records need to be screened on average to\nfind all relevant records in the dataset. The TD metric enables you to\npinpoint hard-to-find papers. The ATD, on the other hand, measures performance\nthroughout the entire screening process, eliminating reliance on arbitrary\ncut-off values, and can be used to compare different models.\n\n### Loss\nThe Loss metric evaluates the performance of an active learning model by\nquantifying how closely it approximates the ideal screening process. This\nquantification is then normalized between the ideal curve and the worst possible\ncurve.\n\nWhile metrics like WSS, Recall, and ERF evaluate the performance at specific\npoints on the recall curve, the Loss metric provides an overall measure of\nperformance.\n\nTo compute the loss, we start with three key concepts:\n\n1. **Optimal AUC**: This is the area under a \"perfect recall curve,\" where\n relevant records are identified as early as possible. Mathematically, it is\n computed as $Nx \\times Ny - \\frac{Ny \\times (Ny - 1)}{2}$, where $Nx$ is the\n total number of records, and $Ny$ is the number of relevant records.\n\n2. **Worst AUC**: This represents the area under a worst-case recall curve,\n where all relevant records appear at the end of the screening process. This\n is calculated as $\\frac{Ny \\times (Ny + 1)}{2}$.\n\n3. **Actual AUC**: This is the area under the recall curve produced by the model\n during the screening process. It can be obtained by summing up the cumulative\n recall values for the labeled records.\n\nThe normalized loss is calculated by taking the difference between the optimal\nAUC and the actual AUC, divided by the difference between the optimal AUC and\nthe worst AUC.\n\n$$\\text{Normalized Loss} = \\frac{Ny \\times \\left(Nx - \\frac{Ny - 1}{2}\\right) -\n\\sum \\text{Cumulative Recall}}{Ny \\times (Nx - Ny)}$$\n\nThe lower the loss, the closer the model is to the perfect recall curve,\nindicating higher performance.\n\n\n\nIn this figure, the green area between the recall curve and the perfect recall line is the lost performance, which is then normalized for the total area (green and red combined).\n\n## Basic usage\n\nThe ASReview Insights package extends ASReview LAB with two new subcommands\n(see `asreview --help`): [`plot`](#plot) and [`metrics`](#metrics). The plots\nand metrics are derived from an ASReview project file. The ASReview file\n(extension `.asreview`) can be\n[exported](https://asreview.readthedocs.io/en/latest/manage.html#export-project)\nfrom ASReview LAB after a\n[simulation](https://asreview.readthedocs.io/en/latest/simulation_overview.html),\nor it is generated from running a [simulation via the command\nline](https://asreview.readthedocs.io/en/latest/simulation_cli.html).\n\nFor example, an ASReview can be generated with:\n\n\n```python\nasreview simulate benchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview --init_seed 535\n```\n\nTo use the most basic options of the ASReview Insights extension, run\n\n```bash\nasreview plot recall YOUR_ASREVIEW_FILE.asreview\n```\nwhere `recall` is the type of the plot, or\n\n```bash\nasreview metrics sim_van_de_schoot_2017.asreview\n```\n\nMore options are described in the sections below. All options can be\nobtained via `asreview plot --help` or `asreview metrics --help`.\n\n## `Plot`\n\n### Plot types\n\n#### `recall`\n\nThe recall is an important metric to study the performance of active learning\nalgorithms in the context of information retrieval. ASReview Insights\noffers a straightforward command line interface to plot a \"recall curve\". The\nrecall curve is the recall at any moment in the active learning process.\n\nTo plot the recall curve, you need a ASReview file (extension `.asreview`). To\nplot the recall, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by\nyour ASReview file name.):\n\n```bash\nasreview plot recall YOUR_ASREVIEW_FILE.asreview\n```\n\nThe following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via\nthe benchmark platform (command `asreview simulate\nbenchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).\n\n\n\nOn the vertical axis, you find the recall (i.e, the proportion of the relevant\nrecords) after every labeling decision. The horizontal axis shows the\nproportion of total number of records in the dataset. The steeper the recall\ncurve, the higher the performance of active learning when comparted to random\nscreening. The recall curve can also be used to estimate stopping criteria, see\nthe discussions in [#557](https://github.com/asreview/asreview/discussions/557) and [#1115](https://github.com/asreview/asreview/discussions/1115).\n\n\n```bash\nasreview plot recall YOUR_ASREVIEW_FILE.asreview\n```\n\n#### `wss`\n\nThe Work Saved over Sampling (WSS) metric is a useful metric to study the\nperformance of active learning alorithms compared with a naive (random order)\napproach at a given level of recall. ASReview Insights offers a\nstraightforward command line interface to plot the WSS at any level of recall.\n\nTo plot the WSS curve, you need a ASReview file (extension `.asreview`). To\nplot the WSS, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by your\nASReview file name.):\n\n```bash\nasreview plot wss YOUR_ASREVIEW_FILE.asreview\n```\n\nThe following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via\nthe benchmark platform (command `asreview simulate\nbenchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).\n\n\n\nOn the vertical axis, you find the WSS after every labeling decision. The\nrecall is displayed on the horizontal axis. As shown in the figure, the\nWSS is linearly related to the recall.\n\n\n#### `erf`\n\nThe Extra Relevant Records found is a derivative of the recall and presents\nthe proportion of relevant records found after correcting for the number of\nrelevant records found via random screening (assuming a uniform distribution\nof relevant records).\n\nTo plot the ERF curve, you need a ASReview file (extension `.asreview`). To\nplot the ERF, use this syntax (Replace `YOUR_ASREVIEW_FILE.asreview` by your\nASReview file name.):\n\n\n```bash\nasreview plot erf YOUR_ASREVIEW_FILE.asreview\n```\nThe following plot is the result of simulating the [`PTSD data`](https://doi.org/10.1038/s42256-020-00287-7) via\nthe benchmark platform (command `asreview simulate\nbenchmark:van_de_schoot_2017 -s sim_van_de_schoot_2017.asreview`).\n\n\n\nOn the vertical axis, you find the ERF after every labeling decision. The\nhorizontal axis shows the proportion of total number of records in the\ndataset. The steep increase of the ERF in the beginning of the process is\nrelated to the steep recall curve.\n\n### Plotting CLI\n\nOptional arguments for the command line are `--priors` to include prior\nknowledge, `--x_absolute` and `--y_absolute` to use absolute axes.\n\nSee `asreview plot -h` for all command line arguments.\n\n### Plotting multiple files\nIt is possible to show the curves of multiple files in one plot. Use this\nsyntax (replace `YOUR_ASREVIEW_FILE_1` and `YOUR_ASREVIEW_FILE_2` by the\nasreview_files that you want to include in the plot):\n\n```bash\nasreview plot recall YOUR_ASREVIEW_FILE_1.asreview YOUR_ASREVIEW_FILE_2.asreview\n```\n\n### Plotting API\n\nTo make use of the more advanced features, you can make use of the Python API.\nThe advantage is that you can tweak every single element of the plot in the\nway you like. The following examples show how the Python API can be used. They\nmake use of matplotlib extensively. See the [Introduction to\nMatplotlib](https://matplotlib.org/stable/tutorials/introductory/usage.html)\nfor examples on using the API.\n\nThe following example show how to plot the recall with the API and save the\nresult. The plot is saved using the matplotlib API.\n\n```python\nimport matplotlib.pyplot as plt\nfrom asreview import open_state\n\nfrom asreviewcontrib.insights.plot import plot_recall\n\nwith open_state(\"example.asreview\") as s:\n\n fig, ax = plt.subplots()\n\n plot_recall(ax, s)\n\n fig.savefig(\"example.png\")\n```\n\nOther options are `plot_wss` and `plot_erf`.\n\n#### Example: Customize plot\n\nIt's straightforward to customize the plots if you are familiar with\n`matplotlib`. The following example shows how to update the title of the plot.\n\n```python\nimport matplotlib.pyplot as plt\nfrom asreview import open_state\n\nfrom asreviewcontrib.insights.plot import plot_wss\n\nwith open_state(\"example.asreview\") as s:\n\n fig, ax = plt.subplots()\n plot_wss(ax, s)\n\n plt.title(\"WSS with custom title\")\n\n fig.savefig(\"example_custom_title.png\")\n```\n\n\n\n#### Example: Prior knowledge\n\nIt's possible to include prior knowledge in your plot. By default, prior\nknowledge is excluded from the plot.\n\n```python\nimport matplotlib.pyplot as plt\nfrom asreview import open_state\n\nfrom asreviewcontrib.insights.plot import plot_wss\n\nwith open_state(\"example.asreview\") as s:\n\n fig, ax = plt.subplots()\n plot_wss(ax, s, priors=True)\n\n```\n\n#### Example: Relative versus absolute axes\n\nBy default, all axes in ASReview Insights are relative. The API can be used to\nchange this behavior. The arguments are identical for each plot function.\n\n```python\nimport matplotlib.pyplot as plt\nfrom asreview import open_state\n\nfrom asreviewcontrib.insights.plot import plot_wss\n\nwith open_state(\"example.asreview\") as s:\n\n fig, ax = plt.subplots()\n plot_wss(ax, s, x_absolute=True, y_absolute=True)\n\n fig.savefig(\"example_absolute_axis.png\")\n```\n\n\n\n#### Example: Adjusting the random and optimal recalls\n\nBy default, each plot will have a curve representing optimal performance, and a\ncurve representing random sampling performance. Both curves can be removed from\nthe graph.\n\n```python\nimport matplotlib.pyplot as plt\nfrom asreview import open_state\n\nfrom asreviewcontrib.insights.plot import plot_recall\n\nwith open_state(\"example.asreview\") as s:\n\n fig, ax = plt.subplots()\n plot_recall(ax, s, show_random=False, show_optimal=False)\n\n fig.savefig(\"example_without_curves.png\")\n```\n\n\n\n\n#### Example: Legend for multiple curves in one plot\n\nIf you have multiple curves in one plot, you can customize the legend:\n\n```python\nimport matplotlib.pyplot as plt\n\nfrom asreview import open_state\nfrom asreviewcontrib.insights.plot import plot_recall\n\n\nfig, ax = plt.subplots()\n\nwith open_state(\"tests/asreview_files/sim_van_de_schoot_2017_1.asreview\") as s1:\n with open_state(\"tests/asreview_files/\"\n \"sim_van_de_schoot_2017_logistic.asreview\") as s2:\n plot_recall(ax,\n [s1, s2],\n legend_values=[\"Naive Bayes\", \"Logistic\"],\n legend_kwargs={'loc': 'lower center'})\n\nfig.savefig(\"docs/example_multiple_lines.png\")\n\n```\n\n\n## `metrics`\n\nThe `metrics` subcommand in ASReview Insights can be used to compute metrics\nat given values. The easiest way to compute metrics for a ASReview project\nfile is with the following command on the command line:\n\n```\nasreview metrics sim_van_de_schoot_2017.asreview\n```\n\nwhich results in\n\n```\n \"asreviewVersion\": \"1.0\",\n \"apiVersion\": \"1.0\",\n \"data\": {\n \"items\": [\n {\n \"id\": \"recall\",\n \"title\": \"Recall\",\n \"value\": [\n [\n 0.1,\n 1.0\n ],\n [\n 0.25,\n 1.0\n ],\n [\n 0.5,\n 1.0\n ],\n [\n 0.75,\n 1.0\n ],\n [\n 0.9,\n 1.0\n ]\n ]\n },\n {\n \"id\": \"wss\",\n \"title\": \"Work Saved over Sampling\",\n \"value\": [\n [\n 0.95,\n 0.8913851624373686\n ]\n ]\n },\n {\n \"id\": \"loss\",\n \"title\": \"Loss\",\n \"value\": 0.01707543880041846\n },\n {\n \"id\": \"erf\",\n \"title\": \"Extra Relevant record Found\",\n \"value\": [\n [\n 0.1,\n 0.9047619047619048\n ]\n ]\n },\n {\n \"id\": \"atd\",\n \"title\": \"Average time to discovery\",\n \"value\": 101.71428571428571\n },\n {\n \"id\": \"td\",\n \"title\": \"Time to discovery\",\n \"value\": [\n [\n 3898,\n 22\n ],\n [\n 284,\n 23\n ],\n [\n 592,\n 25\n ],\n ...\n [\n 2382,\n 184\n ],\n [\n 5479,\n 224\n ],\n [\n 3316,\n 575\n ]\n ]\n },\n {\n \"id\": \"tp\",\n \"title\": \"True Positives\",\n \"value\": [\n [\n 0.95,\n 39\n ],\n [\n 1.0,\n 42\n ]\n ]\n },\n {\n \"id\": \"fp\",\n \"title\": \"False Positives\",\n \"value\": [\n [\n 0.95,\n 122\n ],\n [\n 1.0,\n 517\n ]\n ]\n },\n {\n \"id\": \"tn\",\n \"title\": \"True Negatives\",\n \"value\": [\n [\n 0.95,\n 6023\n ],\n [\n 1.0,\n 5628\n ]\n ]\n },\n {\n \"id\": \"fn\",\n \"title\": \"False Negatives\",\n \"value\": [\n [\n 0.95,\n 3\n ],\n [\n 1.0,\n 0\n ]\n ]\n },\n {\n \"id\": \"tnr\",\n \"title\": \"True Negative Rate (Specificity)\",\n \"value\": [\n [\n 0.95,\n 0.980146\n ],\n [\n 1.0,\n 0.915867\n ]\n ]\n }\n ]\n }\n}\n```\n\nEach available item has two values. The first value is the value at which the\nmetric is computed. In the plots above, this is the x-axis. The second value\nis the results of the metric. Some metrics are computed for multiple values.\n\n| Metric | Description pos. 1 | Description pos. 2 | Default |\n|---|---|---|---|\n| `recall` | Labels | Recall | 0.1, 0.25, 0.5, 0.75, 0.9 |\n| `wss` | Recall | Work Saved over Sampling at recall | 0.95 |\n| `erf` | Labels | ERF | 0.10 |\n| `atd` | Average time to discovery (in label actions) | - | - |\n| `td` | Row number (starting at 0) | Number of records labeled | - |\n| `cm` | Recall | Confusion matrix values at recall | 0.95, 1 |\n\n\n### Override default values\n\nIt is possible to override the default values of `asreview metrics`. See\n`asreview metrics -h` for more information or see the example below.\n\n```\nasreview metrics sim_van_de_schoot_2017.asreview --wss 0.9 0.95\n```\n\n```\n{\n \"asreviewVersion\": \"1.0\",\n \"apiVersion\": \"1.0\",\n \"data\": {\n \"items\": [\n {\n \"id\": \"recall\",\n \"title\": \"Recall\",\n \"value\": [\n [\n 0.1,\n 1.0\n ],\n [\n 0.25,\n 1.0\n ],\n [\n 0.5,\n 1.0\n ],\n [\n 0.75,\n 1.0\n ],\n [\n 0.9,\n 1.0\n ]\n ]\n },\n {\n \"id\": \"wss\",\n \"title\": \"Work Saved over Sampling\",\n \"value\": [\n [\n 0.9,\n 0.8474220139001132\n ],\n [\n 0.95,\n 0.8913851624373686\n ]\n ]\n },\n {\n \"id\": \"erf\",\n \"title\": \"Extra Relevant record Found\",\n \"value\": [\n [\n 0.1,\n 0.9047619047619048\n ]\n ]\n },\n {\n \"id\": \"atd\",\n \"title\": \"Average time to discovery\",\n \"value\": 101.71428571428571\n },\n {\n \"id\": \"td\",\n \"title\": \"Time to discovery\",\n \"value\": [\n [\n 3898,\n 22\n ],\n [\n 284,\n 23\n ],\n [\n 592,\n 25\n ],\n ...\n [\n 2382,\n 184\n ],\n [\n 5479,\n 224\n ],\n [\n 3316,\n 575\n ]\n ]\n }\n ]\n }\n}\n```\n\n### Save metrics to file\n\nMetrics can be saved to a file in the JSON format. Use the flag `-o` or\n`--output`.\n\n```\nasreview metrics sim_van_de_schoot_2017.asreview -o my_file.json\n```\n\n### Metrics CLI\n\nOptional arguments for the command line are `--priors` to include prior\nknowledge, `--x_absolute` and `--y_absolute` to use absolute axes.\n\nSee `asreview metrics -h` for all command line arguments.\n\n### Metrics API\n\nMetrics are easily accesible with the ASReview Insights API.\n\nCompute the recall after reading half of the dataset.\n\n```python\n\nfrom asreview import open_state\nfrom asreviewcontrib.insights.metrics import recall\n\nwith open_state(\"example.asreview\") as s:\n\n print(recall(s, 0.5))\n```\n\nOther metrics are available like `wss` and `erf`.\n\n#### Example: Prior knowledge\n\nIt's possible to include prior knowledge to your metric. By default, prior\nknowledge is excluded from the metric.\n\n```python\n\nfrom asreview import open_state\nfrom asreviewcontrib.insights.metrics import recall\n\nwith open_state(\"example.asreview\") as s:\n\n print(recall(s, 0.5, priors=True))\n```\n\n## License\n\nThis extension is published under the [MIT license](/LICENSE).\n\n## Contact\n\nThis extension is part of the ASReview project ([asreview.ai](https://asreview.ai)). It is maintained by the\nmaintainers of ASReview LAB. See [ASReview\nLAB](https://github.com/asreview/asreview) for contact information and more\nresources.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Insights and plotting tool for the ASReview project",
"version": "1.5",
"project_urls": {
"homepage": "https://asreview.ai",
"repository": "https://github.com/asreview/asreview-insights"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4563fdb9e80e5c45e7a14eda01c6c51e0e8364c402499e984d9ffea9d3e3921f",
"md5": "3dc992de4b0d15a65d6b93e91748c46c",
"sha256": "b7c1bd356a8b225569cee9e5d15f349e71e798aa52d700b0f2934d6472193f41"
},
"downloads": -1,
"filename": "asreview_insights-1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3dc992de4b0d15a65d6b93e91748c46c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 21219,
"upload_time": "2025-02-03T13:05:26",
"upload_time_iso_8601": "2025-02-03T13:05:26.474974Z",
"url": "https://files.pythonhosted.org/packages/45/63/fdb9e80e5c45e7a14eda01c6c51e0e8364c402499e984d9ffea9d3e3921f/asreview_insights-1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "668bd2f986dc181138404f2e10997c9abeb662b1cead1731a4f21b0970c82bb8",
"md5": "2bc35c3c1b966a3a7a3f00f22246df3b",
"sha256": "8f5aec52bc51f34f9a975785e8eb2bd7119506ea4cf99686453e38c09280c231"
},
"downloads": -1,
"filename": "asreview_insights-1.5.tar.gz",
"has_sig": false,
"md5_digest": "2bc35c3c1b966a3a7a3f00f22246df3b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 24384581,
"upload_time": "2025-02-03T13:05:30",
"upload_time_iso_8601": "2025-02-03T13:05:30.093115Z",
"url": "https://files.pythonhosted.org/packages/66/8b/d2f986dc181138404f2e10997c9abeb662b1cead1731a4f21b0970c82bb8/asreview_insights-1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-03 13:05:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "asreview",
"github_project": "asreview-insights",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "asreview-insights"
}