# pyterrier_adaptive
[PyTerrier](http://github.com/terrier-org/pyterrier) implementation of [Adaptive Re-Ranking using a Corpus Graph](https://arxiv.org/abs/2208.08942) (CIKM 2022).
## Getting Started
Install with pip:
```bash
pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git
```
Basic Example over the MS MARCO passage corpus (making use of the [pyterrier_t5](https://github.com/terrierteam/pyterrier_t5) and [pyterrier_pisa](https://github.com/terrierteam/pyterrier_pisa) plugins):
Try examples in Google Colab! [](https://colab.research.google.com/github/terrierteam/pyterrier_adaptive/blob/master/examples/example.ipynb)
```python
import pyterrier as pt
pt.init()
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, CorpusGraph
dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)
pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')
pipeline.search('clustering hypothesis information retrieval')
# qid query docno rank score iteration text
# 1 clustering hypothesis information retrieval 2180710 0 -0.017059 0 Cluster analysis or clustering is the task of ...
# 1 clustering hypothesis information retrieval 8430269 1 -0.166563 1 Clustering is the grouping of a particular set...
# 1 clustering hypothesis information retrieval 1091429 2 -0.208345 1 Clustering is a fundamental data analysis meth...
# 1 clustering hypothesis information retrieval 2180711 3 -0.341018 5 Cluster analysis or clustering is the task of ...
# 1 clustering hypothesis information retrieval 6031959 4 -0.367014 5 Cluster analysis or clustering is the task of ...
# .. ... ... ... ... ... ...
# iteration column indicates which GAR batch the document was scored in ^
# even=initial retrieval odd=corpus graph -1=backfilled
```
Evaluation on a test collection ([TREC DL19](https://ir-datasets.com/msmarco-passage#msmarco-passage/trec-dl-2019)):
```python
from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
[retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
dataset.get_topics(),
dataset.get_qrels(),
[nDCG, MAP(rel=2), R(rel=2)@1000],
names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
# name nDCG AP(rel=2) R(rel=2)@1000
# bm25 0.602325 0.303099 0.755495
# bm25 >> monot5 0.696293 0.481259 0.755495
# bm25 >> GAR(monot5) 0.724501 0.489978 0.825952
```
## Reproduction
Detailed instructions to come!
## Building a Corpus Graph
You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.
Example:
```python
from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')
# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
idx.index(dataset.get_corpus_iter())
# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
dataset.get_corpus_iter(),
'msmarco-passage.gbm25.16',
k=K)
```
You can load a corpus graph using the `.load(path)` function. You can simulate lower $k$ values
using `.to_limit_k(k)`
```python
graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)
```
# Citation
Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.
```bibtex
@inproceedings{gar2022,
title = {Adaptive Re-Ranking with a Corpus Graph},
booktitle = {Proceedings of ACM CIKM},
author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
year = 2022
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "pyterrier-adaptive",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Sean MacAvaney <sean.macavaney@glasgow.ac.uk>",
"keywords": null,
"author": null,
"author_email": "Sean MacAvaney <sean.macavaney@glasgow.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/a9/4a/f0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f/pyterrier_adaptive-0.2.0.tar.gz",
"platform": null,
"description": "# pyterrier_adaptive\n\n[PyTerrier](http://github.com/terrier-org/pyterrier) implementation of [Adaptive Re-Ranking using a Corpus Graph](https://arxiv.org/abs/2208.08942) (CIKM 2022).\n\n## Getting Started\n\nInstall with pip:\n\n```bash\npip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git\n```\n\nBasic Example over the MS MARCO passage corpus (making use of the [pyterrier_t5](https://github.com/terrierteam/pyterrier_t5) and [pyterrier_pisa](https://github.com/terrierteam/pyterrier_pisa) plugins):\n\nTry examples in Google Colab! [](https://colab.research.google.com/github/terrierteam/pyterrier_adaptive/blob/master/examples/example.ipynb)\n\n```python\nimport pyterrier as pt\npt.init()\nfrom pyterrier_t5 import MonoT5ReRanker\nfrom pyterrier_pisa import PisaIndex\nfrom pyterrier_adaptive import GAR, CorpusGraph\n\ndataset = pt.get_dataset('irds:msmarco-passage')\nretriever = PisaIndex.from_dataset('msmarco_passage').bm25()\nscorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)\ngraph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)\n\npipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')\n\npipeline.search('clustering hypothesis information retrieval')\n# qid query docno rank score iteration text\n# 1 clustering hypothesis information retrieval 2180710 0 -0.017059 0 Cluster analysis or clustering is the task of ...\n# 1 clustering hypothesis information retrieval 8430269 1 -0.166563 1 Clustering is the grouping of a particular set...\n# 1 clustering hypothesis information retrieval 1091429 2 -0.208345 1 Clustering is a fundamental data analysis meth...\n# 1 clustering hypothesis information retrieval 2180711 3 -0.341018 5 Cluster analysis or clustering is the task of ...\n# 1 clustering hypothesis information retrieval 6031959 4 -0.367014 5 Cluster analysis or clustering is the task of ...\n# .. ... ... ... ... ... ...\n# iteration column indicates which GAR batch the document was scored in ^\n# even=initial retrieval odd=corpus graph -1=backfilled\n```\n\nEvaluation on a test collection ([TREC DL19](https://ir-datasets.com/msmarco-passage#msmarco-passage/trec-dl-2019)):\n\n```python\nfrom pyterrier.measures import *\ndataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')\npt.Experiment(\n [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],\n dataset.get_topics(),\n dataset.get_qrels(),\n [nDCG, MAP(rel=2), R(rel=2)@1000],\n names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']\n)\n# name nDCG AP(rel=2) R(rel=2)@1000\n# bm25 0.602325 0.303099 0.755495\n# bm25 >> monot5 0.696293 0.481259 0.755495\n# bm25 >> GAR(monot5) 0.724501 0.489978 0.825952\n```\n\n## Reproduction\n\nDetailed instructions to come!\n\n## Building a Corpus Graph\n\nYou can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.\n\nExample:\n\n```python\nfrom pyterrier_adaptive import CorpusGraph\nfrom pyterrier_pisa import PisaIndex\ndataset = pt.get_dataset('irds:msmarco-passage')\n\n# Build the index needed for BM25 retrieval (if it doesn't already exist)\nidx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources\nif not idx.built():\n idx.index(dataset.get_corpus_iter())\n\n# Build the corpus graph\nK = 16 # number of nearest neighbours\ngraph16 = CorpusGraph.from_retriever(\n idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document\n dataset.get_corpus_iter(),\n 'msmarco-passage.gbm25.16',\n k=K)\n```\n\nYou can load a corpus graph using the `.load(path)` function. You can simulate lower $k$ values\nusing `.to_limit_k(k)`\n\n```python\ngraph16 = CorpusGraph.load('msmarco-passage.gbm25.16')\ngraph8 = graph16.to_limit_k(8)\n```\n\n# Citation\n\nAdaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.\n\n```bibtex\n@inproceedings{gar2022,\n title = {Adaptive Re-Ranking with a Corpus Graph},\n booktitle = {Proceedings of ACM CIKM},\n author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},\n year = 2022\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/terrierteam/pyterrier_adaptive/issues",
"Repository": "https://github.com/terrierteam/pyterrier_adaptive"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3d707ddc21a6e1248ff51662dda07e2ce96a605bb99787766f903be17917c09f",
"md5": "a5b57d1d2af63542f25a913ef866575a",
"sha256": "26cb085120a7d67216072e4b3986a85e13f2a57f89ed1fd99ac25066f231b827"
},
"downloads": -1,
"filename": "pyterrier_adaptive-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a5b57d1d2af63542f25a913ef866575a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13254,
"upload_time": "2024-12-03T14:56:39",
"upload_time_iso_8601": "2024-12-03T14:56:39.225656Z",
"url": "https://files.pythonhosted.org/packages/3d/70/7ddc21a6e1248ff51662dda07e2ce96a605bb99787766f903be17917c09f/pyterrier_adaptive-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a94af0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f",
"md5": "5bc4d996d014da299528ebbddf3fd359",
"sha256": "6c67f09e13048d18affbb9cc2817725314ac944c0cb36fe2540d742e191b56d9"
},
"downloads": -1,
"filename": "pyterrier_adaptive-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "5bc4d996d014da299528ebbddf3fd359",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13942,
"upload_time": "2024-12-03T14:56:40",
"upload_time_iso_8601": "2024-12-03T14:56:40.834107Z",
"url": "https://files.pythonhosted.org/packages/a9/4a/f0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f/pyterrier_adaptive-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-03 14:56:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "terrierteam",
"github_project": "pyterrier_adaptive",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "python-terrier",
"specs": []
},
{
"name": "npids",
"specs": [
[
">=",
"0.0.2"
]
]
},
{
"name": "pyterrier-alpha",
"specs": [
[
">=",
"0.6.2"
]
]
},
{
"name": "torch",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "more_itertools",
"specs": []
}
],
"lcname": "pyterrier-adaptive"
}