autoqrels

Name	autoqrels JSON
Version	0.0.1 JSON
	download
home_page	https://github.com/seanmacavaney/autoqrels
Summary	a tool for automatically inferring query relevance assessments (qrels)
upload_time	2024-08-22 07:20:41
maintainer	None
docs_url	None
author	Sean MacAvaney
requires_python	>=3.6
license	None
keywords
VCS
bugtrack_url
requirements	ir_measures ir_datasets transformers more_itertools pandas smashed
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # autoqrels

`autoqrels` is a tool for automatically inferring query relevance assessments (qrels).

Currently, it supports the one-shot labeling approach (1SL) presented in *[MacAvaney and
Soldaini, One-Shot Labeling for Automatic Relevance Estimation, SIGIR 2023](https://arxiv.org/pdf/2302.11266.pdf)*.

This package adheres to the [`ir-measures`](https://ir-measur.es/) API, which means it can
be directly used by various tools, such as [PyTerrier](https://pyterrier.readthedocs.io/).

## Getting started

You can install `autoqrels` using pip:

```bash
pip install autoqrels
```

You can also work with the repository locally:

```bash
git clone https://github.com/seanmacavaney/autoqrels.git
cd autoqrels
python setup.py develop
```

## API

The primary interface in `autoqrels` is `autoqrels.Labeler`. A `Labeler` exposes a
method, `infer_qrels(run, qrels)`, which returns a new set of qrels that covers the
provided run:

 - `run` is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `score` (float)
 - `qrels` is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `relevance` (int)
 - The return value is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `relevance` (float)

`Labeler`s also expose several measure definitions compatible with `ir_measures`:
[`labeler.SDCG@k`](https://ir-measur.es/en/latest/measures.html#sdcg),
[`labeler.RBP(p=persistence)`](https://ir-measur.es/en/latest/measures.html#rbp),
[`labeler.P@k`](https://ir-measur.es/en/latest/measures.html#p).
These measures can be used to calculate the corresponding effectivness, with the
addition of the labeler's inferred qrels. See the [ir-measures documentation](https://ir-measur.es/)
for more details.

We'll now explore the available `Labeler` implementations.

### `autoqrels.oneshot`: 1SL (One-shot Labeling)

**Reproduction: See repro instructions in [`repro/oneshot`](repro/oneshot).**

One-shot labelers work over a single known relevant document per query. An error
is raised if multiple relevant documents are provided.

Example:

```python
import autoqrels
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
duot5 = autoqrels.oneshot.DuoT5(dataset=dataset, cache_path='data/duot5.cache.json.gz')
# measures:
duot5.SDCG@10
duot5.P@10
duot5.RBP
```

## Citation

If you use this work, please cite:

```bibtex
@inproceedings{autoqrels,
  author = {MacAvaney, Sean and Soldaini, Luca},
  title = {One-Shot Labeling for Automatic Relevance Estimation},
  booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year = {2023},
  url = {https://arxiv.org/abs/2302.11266}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/seanmacavaney/autoqrels",
    "name": "autoqrels",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Sean MacAvaney",
    "author_email": "sean.macavaney@glasgow.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/6d/75/2c8b6c658f624c44d8308e28da08fe267fb30c9e6a7ff7007312e0da5db1/autoqrels-0.0.1.tar.gz",
    "platform": null,
    "description": "# autoqrels\n\n`autoqrels` is a tool for automatically inferring query relevance assessments (qrels).\n\nCurrently, it supports the one-shot labeling approach (1SL) presented in *[MacAvaney and\nSoldaini, One-Shot Labeling for Automatic Relevance Estimation, SIGIR 2023](https://arxiv.org/pdf/2302.11266.pdf)*.\n\nThis package adheres to the [`ir-measures`](https://ir-measur.es/) API, which means it can\nbe directly used by various tools, such as [PyTerrier](https://pyterrier.readthedocs.io/).\n\n## Getting started\n\nYou can install `autoqrels` using pip:\n\n```bash\npip install autoqrels\n```\n\nYou can also work with the repository locally:\n\n```bash\ngit clone https://github.com/seanmacavaney/autoqrels.git\ncd autoqrels\npython setup.py develop\n```\n\n## API\n\nThe primary interface in `autoqrels` is `autoqrels.Labeler`. A `Labeler` exposes a\nmethod, `infer_qrels(run, qrels)`, which returns a new set of qrels that covers the\nprovided run:\n\n - `run` is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `score` (float)\n - `qrels` is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `relevance` (int)\n - The return value is a Pandas DataFrame with the columns `query_id` (str), `doc_id` (str), and `relevance` (float)\n\n`Labeler`s also expose several measure definitions compatible with `ir_measures`:\n[`labeler.SDCG@k`](https://ir-measur.es/en/latest/measures.html#sdcg),\n[`labeler.RBP(p=persistence)`](https://ir-measur.es/en/latest/measures.html#rbp),\n[`labeler.P@k`](https://ir-measur.es/en/latest/measures.html#p).\nThese measures can be used to calculate the corresponding effectivness, with the\naddition of the labeler's inferred qrels. See the [ir-measures documentation](https://ir-measur.es/)\nfor more details.\n\nWe'll now explore the available `Labeler` implementations.\n\n### `autoqrels.oneshot`: 1SL (One-shot Labeling)\n\n**Reproduction: See repro instructions in [`repro/oneshot`](repro/oneshot).**\n\nOne-shot labelers work over a single known relevant document per query. An error\nis raised if multiple relevant documents are provided.\n\nExample:\n\n```python\nimport autoqrels\nimport ir_datasets\ndataset = ir_datasets.load('msmarco-passage/trec-dl-2019')\nduot5 = autoqrels.oneshot.DuoT5(dataset=dataset, cache_path='data/duot5.cache.json.gz')\n# measures:\nduot5.SDCG@10\nduot5.P@10\nduot5.RBP\n```\n\n## Citation\n\nIf you use this work, please cite:\n\n```bibtex\n@inproceedings{autoqrels,\n  author = {MacAvaney, Sean and Soldaini, Luca},\n  title = {One-Shot Labeling for Automatic Relevance Estimation},\n  booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},\n  year = {2023},\n  url = {https://arxiv.org/abs/2302.11266}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "a tool for automatically inferring query relevance assessments (qrels)",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/seanmacavaney/autoqrels"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "868665406d3116ab5ef32b757465ac4de93019220bf29357454645026254c1ac",
                "md5": "c96e13e70f7d9765b5541e4db2dcf595",
                "sha256": "9fb17ba2e459853c1d45a17729421e5ccbf74267c79e8f946240eb49f190b938"
            },
            "downloads": -1,
            "filename": "autoqrels-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c96e13e70f7d9765b5541e4db2dcf595",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 17489,
            "upload_time": "2024-08-22T07:20:39",
            "upload_time_iso_8601": "2024-08-22T07:20:39.523291Z",
            "url": "https://files.pythonhosted.org/packages/86/86/65406d3116ab5ef32b757465ac4de93019220bf29357454645026254c1ac/autoqrels-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d752c8b6c658f624c44d8308e28da08fe267fb30c9e6a7ff7007312e0da5db1",
                "md5": "6a92c09d17c4f7c330c57dc13718f448",
                "sha256": "4ec3b2bd5eaa1dce2b3755a1e67e9864fd5c4ce9fe5bfebb0ba754884df4ef07"
            },
            "downloads": -1,
            "filename": "autoqrels-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6a92c09d17c4f7c330c57dc13718f448",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 14001,
            "upload_time": "2024-08-22T07:20:41",
            "upload_time_iso_8601": "2024-08-22T07:20:41.108698Z",
            "url": "https://files.pythonhosted.org/packages/6d/75/2c8b6c658f624c44d8308e28da08fe267fb30c9e6a7ff7007312e0da5db1/autoqrels-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-22 07:20:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "seanmacavaney",
    "github_project": "autoqrels",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ir_measures",
            "specs": [
                [
                    ">=",
                    "0.3.2"
                ]
            ]
        },
        {
            "name": "ir_datasets",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "more_itertools",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "smashed",
            "specs": []
        }
    ],
    "lcname": "autoqrels"
}

Sean MacAvaney