pyterrier-adaptive


Namepyterrier-adaptive JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryPyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)
upload_time2024-12-03 14:56:40
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements python-terrier npids pyterrier-alpha torch pandas tqdm more_itertools
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyterrier_adaptive

[PyTerrier](http://github.com/terrier-org/pyterrier) implementation of [Adaptive Re-Ranking using a Corpus Graph](https://arxiv.org/abs/2208.08942) (CIKM 2022).

## Getting Started

Install with pip:

```bash
pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git
```

Basic Example over the MS MARCO passage corpus (making use of the [pyterrier_t5](https://github.com/terrierteam/pyterrier_t5) and [pyterrier_pisa](https://github.com/terrierteam/pyterrier_pisa) plugins):

Try examples in Google Colab! [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/terrierteam/pyterrier_adaptive/blob/master/examples/example.ipynb)

```python
import pyterrier as pt
pt.init()
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, CorpusGraph

dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)

pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')

pipeline.search('clustering hypothesis information retrieval')
# qid                                        query    docno  rank       score  iteration                                               text
#   1  clustering hypothesis information retrieval  2180710     0   -0.017059          0  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  8430269     1   -0.166563          1  Clustering is the grouping of a particular set...
#   1  clustering hypothesis information retrieval  1091429     2   -0.208345          1  Clustering is a fundamental data analysis meth...
#   1  clustering hypothesis information retrieval  2180711     3   -0.341018          5  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  6031959     4   -0.367014          5  Cluster analysis or clustering is the task of ...
#  ..                                          ...      ...   ...         ...        ...                                                ...
#                iteration column indicates which GAR batch the document was scored in ^
#                even=initial retrieval   odd=corpus graph    -1=backfilled
```

Evaluation on a test collection ([TREC DL19](https://ir-datasets.com/msmarco-passage#msmarco-passage/trec-dl-2019)):

```python
from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
    [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
    dataset.get_topics(),
    dataset.get_qrels(),
    [nDCG, MAP(rel=2), R(rel=2)@1000],
    names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
#                name      nDCG  AP(rel=2)  R(rel=2)@1000
#                bm25  0.602325   0.303099       0.755495
#      bm25 >> monot5  0.696293   0.481259       0.755495
# bm25 >> GAR(monot5)  0.724501   0.489978       0.825952
```

## Reproduction

Detailed instructions to come!

## Building a Corpus Graph

You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.

Example:

```python
from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')

# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
    idx.index(dataset.get_corpus_iter())

# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
    idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
    dataset.get_corpus_iter(),
    'msmarco-passage.gbm25.16',
    k=K)
```

You can load a corpus graph using the `.load(path)` function. You can simulate lower $k$ values
using `.to_limit_k(k)`

```python
graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)
```

# Citation

Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.

```bibtex
@inproceedings{gar2022,
  title = {Adaptive Re-Ranking with a Corpus Graph},
  booktitle = {Proceedings of ACM CIKM},
  author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
  year = 2022
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyterrier-adaptive",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Sean MacAvaney <sean.macavaney@glasgow.ac.uk>",
    "keywords": null,
    "author": null,
    "author_email": "Sean MacAvaney <sean.macavaney@glasgow.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/a9/4a/f0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f/pyterrier_adaptive-0.2.0.tar.gz",
    "platform": null,
    "description": "# pyterrier_adaptive\n\n[PyTerrier](http://github.com/terrier-org/pyterrier) implementation of [Adaptive Re-Ranking using a Corpus Graph](https://arxiv.org/abs/2208.08942) (CIKM 2022).\n\n## Getting Started\n\nInstall with pip:\n\n```bash\npip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git\n```\n\nBasic Example over the MS MARCO passage corpus (making use of the [pyterrier_t5](https://github.com/terrierteam/pyterrier_t5) and [pyterrier_pisa](https://github.com/terrierteam/pyterrier_pisa) plugins):\n\nTry examples in Google Colab! [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/terrierteam/pyterrier_adaptive/blob/master/examples/example.ipynb)\n\n```python\nimport pyterrier as pt\npt.init()\nfrom pyterrier_t5 import MonoT5ReRanker\nfrom pyterrier_pisa import PisaIndex\nfrom pyterrier_adaptive import GAR, CorpusGraph\n\ndataset = pt.get_dataset('irds:msmarco-passage')\nretriever = PisaIndex.from_dataset('msmarco_passage').bm25()\nscorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)\ngraph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)\n\npipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')\n\npipeline.search('clustering hypothesis information retrieval')\n# qid                                        query    docno  rank       score  iteration                                               text\n#   1  clustering hypothesis information retrieval  2180710     0   -0.017059          0  Cluster analysis or clustering is the task of ...\n#   1  clustering hypothesis information retrieval  8430269     1   -0.166563          1  Clustering is the grouping of a particular set...\n#   1  clustering hypothesis information retrieval  1091429     2   -0.208345          1  Clustering is a fundamental data analysis meth...\n#   1  clustering hypothesis information retrieval  2180711     3   -0.341018          5  Cluster analysis or clustering is the task of ...\n#   1  clustering hypothesis information retrieval  6031959     4   -0.367014          5  Cluster analysis or clustering is the task of ...\n#  ..                                          ...      ...   ...         ...        ...                                                ...\n#                iteration column indicates which GAR batch the document was scored in ^\n#                even=initial retrieval   odd=corpus graph    -1=backfilled\n```\n\nEvaluation on a test collection ([TREC DL19](https://ir-datasets.com/msmarco-passage#msmarco-passage/trec-dl-2019)):\n\n```python\nfrom pyterrier.measures import *\ndataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')\npt.Experiment(\n    [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],\n    dataset.get_topics(),\n    dataset.get_qrels(),\n    [nDCG, MAP(rel=2), R(rel=2)@1000],\n    names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']\n)\n#                name      nDCG  AP(rel=2)  R(rel=2)@1000\n#                bm25  0.602325   0.303099       0.755495\n#      bm25 >> monot5  0.696293   0.481259       0.755495\n# bm25 >> GAR(monot5)  0.724501   0.489978       0.825952\n```\n\n## Reproduction\n\nDetailed instructions to come!\n\n## Building a Corpus Graph\n\nYou can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.\n\nExample:\n\n```python\nfrom pyterrier_adaptive import CorpusGraph\nfrom pyterrier_pisa import PisaIndex\ndataset = pt.get_dataset('irds:msmarco-passage')\n\n# Build the index needed for BM25 retrieval (if it doesn't already exist)\nidx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources\nif not idx.built():\n    idx.index(dataset.get_corpus_iter())\n\n# Build the corpus graph\nK = 16 # number of nearest neighbours\ngraph16 = CorpusGraph.from_retriever(\n    idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document\n    dataset.get_corpus_iter(),\n    'msmarco-passage.gbm25.16',\n    k=K)\n```\n\nYou can load a corpus graph using the `.load(path)` function. You can simulate lower $k$ values\nusing `.to_limit_k(k)`\n\n```python\ngraph16 = CorpusGraph.load('msmarco-passage.gbm25.16')\ngraph8 = graph16.to_limit_k(8)\n```\n\n# Citation\n\nAdaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.\n\n```bibtex\n@inproceedings{gar2022,\n  title = {Adaptive Re-Ranking with a Corpus Graph},\n  booktitle = {Proceedings of ACM CIKM},\n  author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},\n  year = 2022\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/terrierteam/pyterrier_adaptive/issues",
        "Repository": "https://github.com/terrierteam/pyterrier_adaptive"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d707ddc21a6e1248ff51662dda07e2ce96a605bb99787766f903be17917c09f",
                "md5": "a5b57d1d2af63542f25a913ef866575a",
                "sha256": "26cb085120a7d67216072e4b3986a85e13f2a57f89ed1fd99ac25066f231b827"
            },
            "downloads": -1,
            "filename": "pyterrier_adaptive-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a5b57d1d2af63542f25a913ef866575a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 13254,
            "upload_time": "2024-12-03T14:56:39",
            "upload_time_iso_8601": "2024-12-03T14:56:39.225656Z",
            "url": "https://files.pythonhosted.org/packages/3d/70/7ddc21a6e1248ff51662dda07e2ce96a605bb99787766f903be17917c09f/pyterrier_adaptive-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a94af0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f",
                "md5": "5bc4d996d014da299528ebbddf3fd359",
                "sha256": "6c67f09e13048d18affbb9cc2817725314ac944c0cb36fe2540d742e191b56d9"
            },
            "downloads": -1,
            "filename": "pyterrier_adaptive-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5bc4d996d014da299528ebbddf3fd359",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13942,
            "upload_time": "2024-12-03T14:56:40",
            "upload_time_iso_8601": "2024-12-03T14:56:40.834107Z",
            "url": "https://files.pythonhosted.org/packages/a9/4a/f0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f/pyterrier_adaptive-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-03 14:56:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "terrierteam",
    "github_project": "pyterrier_adaptive",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "python-terrier",
            "specs": []
        },
        {
            "name": "npids",
            "specs": [
                [
                    ">=",
                    "0.0.2"
                ]
            ]
        },
        {
            "name": "pyterrier-alpha",
            "specs": [
                [
                    ">=",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "more_itertools",
            "specs": []
        }
    ],
    "lcname": "pyterrier-adaptive"
}
        
Elapsed time: 1.69456s