chatnoir-pyterrier


Namechatnoir-pyterrier JSON
Version 3.1.2 PyPI version JSON
download
home_pageNone
SummaryUse the ChatNoir search engine in PyTerrier.
upload_time2025-01-09 16:02:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPi](https://img.shields.io/pypi/v/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)
[![CI](https://img.shields.io/github/actions/workflow/status/chatnoir-eu/chatnoir-pyterrier/ci.yml?branch=main&style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/actions/workflows/ci.yml)
[![Code coverage](https://img.shields.io/codecov/c/github/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://codecov.io/github/chatnoir-eu/chatnoir-pyterrier/)
[![Python](https://img.shields.io/pypi/pyversions/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)
[![Google Colab](https://img.shields.io/badge/example-open%20in%20colab-informational?style=flat-square)](https://colab.research.google.com/github/chatnoir-eu/chatnoir-pyterrier/blob/main/examples/search.ipynb)
[![Issues](https://img.shields.io/github/issues/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/issues)
[![Commit activity](https://img.shields.io/github/commit-activity/m/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/commits)
[![Downloads](https://img.shields.io/pypi/dm/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)
[![License](https://img.shields.io/github/license/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](LICENSE)

# 🔍 chatnoir-pyterrier

Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.

Powered by the [`chatnoir-api`](https://pypi.org/project/chatnoir-api/) package.

## Installation

Install the package from PyPI:

```shell
pip install chatnoir-pyterrier
```

## Usage

You can use the `ChatNoirRetrieve` PyTerrier module in any PyTerrier pipeline, like you would do with `BatchRetrieve`.

```python
from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir.search("python library")
```

### Features

ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices).
These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:

```python
from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")

chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")
```

### Advanced usage

Please check out our [sample notebook](examples/search.ipynb) or [open it in Google Colab](https://colab.research.google.com/github/chatnoir-eu/chatnoir-pyterrier/blob/main/examples/search.ipynb).

We also provide a hands-on guide for the Touché 2023 shared tasks [here](examples/search_touche_2023.ipynb).

<!-- ## Citation

If you use this package, please cite the [paper](https://webis.de/publications.html#bevendorff_2018)
from the [ChatNoir](https://github.com/chatnoir-eu) authors. 
You can use the following BibTeX information for citation:

```bibtex
@InProceedings{bevendorff:2018,
  address =               {Berlin Heidelberg New York},
  author =                {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =             {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =                {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =                 mar,
  publisher =             {Springer},
  series =                {Lecture Notes in Computer Science},
  site =                  {Grenoble, France},
  title =                 {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                  2018
}
``` -->

### Experiments

With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections.
We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.

First install the experiment dependencies:

```shell
pip install -e .[experiment]
```

To run the experiments, first create the runs by running:

```shell
ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py 
```

This will create runs for each shared task in parallel and save it to a cache.

After creating the runs, the [`experiment.ipynb`](examples/experiment.ipynb) notebook can be used to analyze the results.

## Development

To build this package and contribute to its development you need to install the `build`, and `setuptools` and `wheel` packages:

```shell
pip install build setuptools wheel
```

(On most systems, these packages are already pre-installed.)

### Development installation

Install package and test dependencies:

```shell
pip install -e .[test]
```

### Testing

Configure the API keys for testing:

```shell
export CHATNOIR_API_KEY="<API_KEY>"
```

Verify your changes against the test suite to verify.

```shell
ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests
```

Please also add tests for your newly developed code.

### Build wheels

Wheels for this package can be built with:

```shell
python -m build
```

## Support

If you hit any problems using this package, please file an [issue](https://github.com/chatnoir-eu/chatnoir-pyterrier/issues/new).
We're happy to help!

## License

This repository is released under the [MIT license](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "chatnoir-pyterrier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Jan Heinrich Merker <heinrich.merker@uni-jena.de>",
    "download_url": "https://files.pythonhosted.org/packages/70/a1/2d674aa129c3b86760464ac30b8559a9e11b01c228d3ba4db2db3c6eae0f/chatnoir_pyterrier-3.1.2.tar.gz",
    "platform": null,
    "description": "[![PyPi](https://img.shields.io/pypi/v/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)\n[![CI](https://img.shields.io/github/actions/workflow/status/chatnoir-eu/chatnoir-pyterrier/ci.yml?branch=main&style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/actions/workflows/ci.yml)\n[![Code coverage](https://img.shields.io/codecov/c/github/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://codecov.io/github/chatnoir-eu/chatnoir-pyterrier/)\n[![Python](https://img.shields.io/pypi/pyversions/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)\n[![Google Colab](https://img.shields.io/badge/example-open%20in%20colab-informational?style=flat-square)](https://colab.research.google.com/github/chatnoir-eu/chatnoir-pyterrier/blob/main/examples/search.ipynb)\n[![Issues](https://img.shields.io/github/issues/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/issues)\n[![Commit activity](https://img.shields.io/github/commit-activity/m/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](https://github.com/chatnoir-eu/chatnoir-pyterrier/commits)\n[![Downloads](https://img.shields.io/pypi/dm/chatnoir-pyterrier?style=flat-square)](https://pypi.org/project/chatnoir-pyterrier/)\n[![License](https://img.shields.io/github/license/chatnoir-eu/chatnoir-pyterrier?style=flat-square)](LICENSE)\n\n# \ud83d\udd0d chatnoir-pyterrier\n\nUse the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.\n\nPowered by the [`chatnoir-api`](https://pypi.org/project/chatnoir-api/) package.\n\n## Installation\n\nInstall the package from PyPI:\n\n```shell\npip install chatnoir-pyterrier\n```\n\n## Usage\n\nYou can use the `ChatNoirRetrieve` PyTerrier module in any PyTerrier pipeline, like you would do with `BatchRetrieve`.\n\n```python\nfrom chatnoir_pyterrier import ChatNoirRetrieve, Feature\n\nchatnoir = ChatNoirRetrieve(index=\"msmarco-document-v2.1\", features=Feature.SNIPPET_TEXT)\nchatnoir.search(\"python library\")\n```\n\n### Features\n\nChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices).\nThese can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:\n\n```python\nfrom chatnoir_pyterrier import ChatNoirRetrieve, Feature\n\nchatnoir_msmarco_snippet = ChatNoirRetrieve(index=\"msmarco-document-v2.1\", features=Feature.SNIPPET_TEXT)\nchatnoir_msmarco_snippet.search(\"python library\")\n\nchatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index=\"clueweb09\", features=Feature.PAGE_RANK | Feature.SPAM_RANK)\nchatnoir_cw09_page_spam_rank.search(\"python library\")\n```\n\n### Advanced usage\n\nPlease check out our [sample notebook](examples/search.ipynb) or [open it in Google Colab](https://colab.research.google.com/github/chatnoir-eu/chatnoir-pyterrier/blob/main/examples/search.ipynb).\n\nWe also provide a hands-on guide for the Touch\u00e9 2023 shared tasks [here](examples/search_touche_2023.ipynb).\n\n<!-- ## Citation\n\nIf you use this package, please cite the [paper](https://webis.de/publications.html#bevendorff_2018)\nfrom the [ChatNoir](https://github.com/chatnoir-eu) authors. \nYou can use the following BibTeX information for citation:\n\n```bibtex\n@InProceedings{bevendorff:2018,\n  address =               {Berlin Heidelberg New York},\n  author =                {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},\n  booktitle =             {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},\n  editor =                {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},\n  month =                 mar,\n  publisher =             {Springer},\n  series =                {Lecture Notes in Computer Science},\n  site =                  {Grenoble, France},\n  title =                 {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},\n  year =                  2018\n}\n``` -->\n\n### Experiments\n\nWith chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections.\nWe demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.\n\nFirst install the experiment dependencies:\n\n```shell\npip install -e .[experiment]\n```\n\nTo run the experiments, first create the runs by running:\n\n```shell\nray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py \n```\n\nThis will create runs for each shared task in parallel and save it to a cache.\n\nAfter creating the runs, the [`experiment.ipynb`](examples/experiment.ipynb) notebook can be used to analyze the results.\n\n## Development\n\nTo build this package and contribute to its development you need to install the `build`, and `setuptools` and `wheel` packages:\n\n```shell\npip install build setuptools wheel\n```\n\n(On most systems, these packages are already pre-installed.)\n\n### Development installation\n\nInstall package and test dependencies:\n\n```shell\npip install -e .[test]\n```\n\n### Testing\n\nConfigure the API keys for testing:\n\n```shell\nexport CHATNOIR_API_KEY=\"<API_KEY>\"\n```\n\nVerify your changes against the test suite to verify.\n\n```shell\nruff check .                   # Code format and LINT\nmypy .                         # Static typing\nbandit -c pyproject.toml -r .  # Security\npytest .                       # Unit tests\n```\n\nPlease also add tests for your newly developed code.\n\n### Build wheels\n\nWheels for this package can be built with:\n\n```shell\npython -m build\n```\n\n## Support\n\nIf you hit any problems using this package, please file an [issue](https://github.com/chatnoir-eu/chatnoir-pyterrier/issues/new).\nWe're happy to help!\n\n## License\n\nThis repository is released under the [MIT license](LICENSE).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Use the ChatNoir search engine in PyTerrier.",
    "version": "3.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/chatnoir-eu/chatnoir-pyterrier/issues",
        "Homepage": "https://github.com/chatnoir-eu/chatnoir-pyterrier"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c96d19d379fc126714258b994ebeda07fcb3a0b4b053a225ba0a750762ef00e",
                "md5": "132fca96209952da3ce0ed0b1c26a256",
                "sha256": "845a0dbefb5d1507bc560183d55952f527550cf0b38073ad0f5549f8d184a444"
            },
            "downloads": -1,
            "filename": "chatnoir_pyterrier-3.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "132fca96209952da3ce0ed0b1c26a256",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 31963,
            "upload_time": "2025-01-09T16:02:01",
            "upload_time_iso_8601": "2025-01-09T16:02:01.941000Z",
            "url": "https://files.pythonhosted.org/packages/5c/96/d19d379fc126714258b994ebeda07fcb3a0b4b053a225ba0a750762ef00e/chatnoir_pyterrier-3.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "70a12d674aa129c3b86760464ac30b8559a9e11b01c228d3ba4db2db3c6eae0f",
                "md5": "6d4dec5bcd79b10b6300b52ad8837005",
                "sha256": "a6aadd5c62ab70746954f5379043b7024263233aa654de6b42f0f8f182487228"
            },
            "downloads": -1,
            "filename": "chatnoir_pyterrier-3.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6d4dec5bcd79b10b6300b52ad8837005",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 34611,
            "upload_time": "2025-01-09T16:02:05",
            "upload_time_iso_8601": "2025-01-09T16:02:05.021985Z",
            "url": "https://files.pythonhosted.org/packages/70/a1/2d674aa129c3b86760464ac30b8559a9e11b01c228d3ba4db2db3c6eae0f/chatnoir_pyterrier-3.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-09 16:02:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chatnoir-eu",
    "github_project": "chatnoir-pyterrier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "chatnoir-pyterrier"
}
        
Elapsed time: 0.64475s