colbert-server


Namecolbert-server JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryRun ColBERT Wikipedia server
upload_time2025-10-30 21:34:33
maintainerNone
docs_urlNone
authorNiels van Galen Last
requires_python>=3.13
licenseNone
keywords colbert retrieval search wikipedia ir
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # colbert-server

[![uv tool](https://img.shields.io/badge/uv-tool-3b82f6?logo=uv&logoColor=white)](https://docs.astral.sh/uv/)
[![python 3.13+](https://img.shields.io/badge/python-3.13+-3776AB?logo=python&logoColor=white)](https://www.python.org/)
[![license MIT](https://img.shields.io/badge/license-MIT-7c3aed.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/colbert-server.svg?color=3776AB&label=pypi)](https://pypi.org/project/colbert-server/)
[![Downloads](https://img.shields.io/pypi/dm/colbert-server.svg?color=8b5cf6&label=downloads)](https://pypi.org/project/colbert-server/)
[![CI](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml/badge.svg)](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml)
[![dataset](https://img.shields.io/badge/dataset-huggingface-ff9a00?logo=huggingface&logoColor=white)](https://huggingface.co/datasets/nielsgl/colbert-wiki2017)
[![API Flask](https://img.shields.io/badge/api-flask-000000?logo=flask&logoColor=white)](https://flask.palletsprojects.com/)

CLI tooling to fetch the ColBERT Wikipedia 2017 dataset and run a lightweight Flask API on top of the ColBERT v2 searcher.

```text
$> I wrote this because the ColBERT server is down and I couldn't try one of the tutorial from DSPy.
$> I only tested this on my macbook, please open an issue if you have problems or feature requests.
```

## Features

- One-command install and execution via `uv tool`.
- Automatically downloads either ready-to-serve indexes/collection or the original archives.
- Optional archive extraction flow for offline usage.
- Caches ColBERT queries for fast, repeated lookups.
- Exposes a simple `/api/search` endpoint for programmatic access.

## Installation

```bash
uv tool install colbert-server
```

This registers a `colbert-server` executable in your `uv` toolchain.

Check the installed version at any time:

```bash
colbert-server --version
```

Or if you just want to run it:

```bash
uvx run colbert-server --help
```

## Running the server

### Use data from the Hugging Face cache (recommended quick start)

```bash
colbert-server serve --from-cache
```

This downloads only the `collection/` and `indexes/` folders from
[`nielsgl/colbert-wiki2017`](https://huggingface.co/datasets/nielsgl/colbert-wiki2017),
resolves the on-disk paths from the Hugging Face cache, and starts the server.

### Provide existing local assets

```bash
colbert-server serve \
  --index-root /path/to/indexes \
  --index-name wiki17.nbits.local \
  --collection-path /path/to/collection/wiki.abstracts.2017/collection.tsv
```

Use this mode when you already have ColBERT indexes and a collection TSV locally.

### Download archives first, then serve

```bash
colbert-server serve \
  --download-archives /tmp/wiki-assets \
  --extract \
  --port 8894
```

This fetches the archive files into `/tmp/wiki-assets/archives`, extracts them into
`/tmp/wiki-assets`, auto-detects the resulting layout (e.g. `wiki17.nbits.local`),
and starts the Flask server on port `8894`.

## API usage

Once running, the server listens on the host/port provided (defaults to `0.0.0.0:8893`)
and serves ColBERT search results via:

```
GET /api/search?query=<text>&k=<top-k>
```

Example request:

```
http://127.0.0.1:8893/api/search?query=halloween+movie&k=3
```

The JSON response includes the ranked passages, their scores, and normalized probabilities.

## Managing dataset archives only

If you just want the raw archive bundles in a local directory:

```bash
colbert-server download-archives ./downloads --extract
```

Add `--extract-to /desired/path` to unpack into a different directory. You can later reuse
the extracted paths with the `serve` command’s `--index-root` and `--collection-path` flags.

## Alternative / Manual Method

In case you don't want to use the script / `uv` tool you can set it up as follows:

1. Add the dependencies to your project: `uv add colbert-ai flask faiss-cpu torch`
2. Download the files (both the index and the collection) from the `archives` directory from the HuggingFace dataset and unzip them.
3. Copy the `standalone.py` script from this repository and edit the `INDEX_ROOT` and `COLLECTION_PATH` variables.
4. Run the server with `uv run standalone.py` and <tada.wav>

## Development tips

- Requires Python 3.13+ (or adjust the `pyproject.toml` requirement to match your interpreter).
- Run `colbert-server --help` or `colbert-server serve --help` to inspect available options.
- The dataset helpers live under `colbert_server/data.py`; server configuration sits in `colbert_server/server.py`.
- GitHub Actions runs lint/tests on every push; see `.github/workflows/ci.yml` for details.
- Publishing uses the `.github/workflows/publish.yml` workflow. Once your PyPI/TestPyPI trusted publishers are set up, bump the version in `pyproject.toml`, create a `vX.Y.Z` tag, and push it to trigger the release.
- The CLI pings PyPI at most once per day and nudges you if a newer version exists. Set `COLBERT_SERVER_DISABLE_UPDATE_CHECK=1` to disable this behaviour.

Happy searching! 🧠📚

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "colbert-server",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "colbert, retrieval, search, wikipedia, ir",
    "author": "Niels van Galen Last",
    "author_email": "Niels van Galen Last <nvangalenlast@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/f6/aa5f88834dc1a599687b3ee19cbb1a5702cc0dcf756fba546da699a9fb2e/colbert_server-0.2.1.tar.gz",
    "platform": null,
    "description": "# colbert-server\n\n[![uv tool](https://img.shields.io/badge/uv-tool-3b82f6?logo=uv&logoColor=white)](https://docs.astral.sh/uv/)\n[![python 3.13+](https://img.shields.io/badge/python-3.13+-3776AB?logo=python&logoColor=white)](https://www.python.org/)\n[![license MIT](https://img.shields.io/badge/license-MIT-7c3aed.svg)](LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/colbert-server.svg?color=3776AB&label=pypi)](https://pypi.org/project/colbert-server/)\n[![Downloads](https://img.shields.io/pypi/dm/colbert-server.svg?color=8b5cf6&label=downloads)](https://pypi.org/project/colbert-server/)\n[![CI](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml/badge.svg)](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml)\n[![dataset](https://img.shields.io/badge/dataset-huggingface-ff9a00?logo=huggingface&logoColor=white)](https://huggingface.co/datasets/nielsgl/colbert-wiki2017)\n[![API Flask](https://img.shields.io/badge/api-flask-000000?logo=flask&logoColor=white)](https://flask.palletsprojects.com/)\n\nCLI tooling to fetch the ColBERT Wikipedia 2017 dataset and run a lightweight Flask API on top of the ColBERT v2 searcher.\n\n```text\n$> I wrote this because the ColBERT server is down and I couldn't try one of the tutorial from DSPy.\n$> I only tested this on my macbook, please open an issue if you have problems or feature requests.\n```\n\n## Features\n\n- One-command install and execution via `uv tool`.\n- Automatically downloads either ready-to-serve indexes/collection or the original archives.\n- Optional archive extraction flow for offline usage.\n- Caches ColBERT queries for fast, repeated lookups.\n- Exposes a simple `/api/search` endpoint for programmatic access.\n\n## Installation\n\n```bash\nuv tool install colbert-server\n```\n\nThis registers a `colbert-server` executable in your `uv` toolchain.\n\nCheck the installed version at any time:\n\n```bash\ncolbert-server --version\n```\n\nOr if you just want to run it:\n\n```bash\nuvx run colbert-server --help\n```\n\n## Running the server\n\n### Use data from the Hugging Face cache (recommended quick start)\n\n```bash\ncolbert-server serve --from-cache\n```\n\nThis downloads only the `collection/` and `indexes/` folders from\n[`nielsgl/colbert-wiki2017`](https://huggingface.co/datasets/nielsgl/colbert-wiki2017),\nresolves the on-disk paths from the Hugging Face cache, and starts the server.\n\n### Provide existing local assets\n\n```bash\ncolbert-server serve \\\n  --index-root /path/to/indexes \\\n  --index-name wiki17.nbits.local \\\n  --collection-path /path/to/collection/wiki.abstracts.2017/collection.tsv\n```\n\nUse this mode when you already have ColBERT indexes and a collection TSV locally.\n\n### Download archives first, then serve\n\n```bash\ncolbert-server serve \\\n  --download-archives /tmp/wiki-assets \\\n  --extract \\\n  --port 8894\n```\n\nThis fetches the archive files into `/tmp/wiki-assets/archives`, extracts them into\n`/tmp/wiki-assets`, auto-detects the resulting layout (e.g. `wiki17.nbits.local`),\nand starts the Flask server on port `8894`.\n\n## API usage\n\nOnce running, the server listens on the host/port provided (defaults to `0.0.0.0:8893`)\nand serves ColBERT search results via:\n\n```\nGET /api/search?query=<text>&k=<top-k>\n```\n\nExample request:\n\n```\nhttp://127.0.0.1:8893/api/search?query=halloween+movie&k=3\n```\n\nThe JSON response includes the ranked passages, their scores, and normalized probabilities.\n\n## Managing dataset archives only\n\nIf you just want the raw archive bundles in a local directory:\n\n```bash\ncolbert-server download-archives ./downloads --extract\n```\n\nAdd `--extract-to /desired/path` to unpack into a different directory. You can later reuse\nthe extracted paths with the `serve` command\u2019s `--index-root` and `--collection-path` flags.\n\n## Alternative / Manual Method\n\nIn case you don't want to use the script / `uv` tool you can set it up as follows:\n\n1. Add the dependencies to your project: `uv add colbert-ai flask faiss-cpu torch`\n2. Download the files (both the index and the collection) from the `archives` directory from the HuggingFace dataset and unzip them.\n3. Copy the `standalone.py` script from this repository and edit the `INDEX_ROOT` and `COLLECTION_PATH` variables.\n4. Run the server with `uv run standalone.py` and <tada.wav>\n\n## Development tips\n\n- Requires Python 3.13+ (or adjust the `pyproject.toml` requirement to match your interpreter).\n- Run `colbert-server --help` or `colbert-server serve --help` to inspect available options.\n- The dataset helpers live under `colbert_server/data.py`; server configuration sits in `colbert_server/server.py`.\n- GitHub Actions runs lint/tests on every push; see `.github/workflows/ci.yml` for details.\n- Publishing uses the `.github/workflows/publish.yml` workflow. Once your PyPI/TestPyPI trusted publishers are set up, bump the version in `pyproject.toml`, create a `vX.Y.Z` tag, and push it to trigger the release.\n- The CLI pings PyPI at most once per day and nudges you if a newer version exists. Set `COLBERT_SERVER_DISABLE_UPDATE_CHECK=1` to disable this behaviour.\n\nHappy searching! \ud83e\udde0\ud83d\udcda\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Run ColBERT Wikipedia server",
    "version": "0.2.1",
    "project_urls": {
        "Dataset": "https://huggingface.co/datasets/nielsgl/colbert-wiki2017",
        "Homepage": "https://github.com/nielsgl/colbert-server",
        "Issues": "https://github.com/nielsgl/colbert-server/issues",
        "Repository": "https://github.com/nielsgl/colbert-server"
    },
    "split_keywords": [
        "colbert",
        " retrieval",
        " search",
        " wikipedia",
        " ir"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e4eb67a77c70965f3c4c8f074fd60052038535c7c1775952cda1115103063e21",
                "md5": "d1d2787ae741429fd24160e34b30638c",
                "sha256": "7afa45f5ee0c523eed84c885b62c5dbb9cddf5c7e47a6b163ec0127040b100b4"
            },
            "downloads": -1,
            "filename": "colbert_server-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1d2787ae741429fd24160e34b30638c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 11042,
            "upload_time": "2025-10-30T21:34:32",
            "upload_time_iso_8601": "2025-10-30T21:34:32.220147Z",
            "url": "https://files.pythonhosted.org/packages/e4/eb/67a77c70965f3c4c8f074fd60052038535c7c1775952cda1115103063e21/colbert_server-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0bf6aa5f88834dc1a599687b3ee19cbb1a5702cc0dcf756fba546da699a9fb2e",
                "md5": "0869d15792a266b550c3552e8bda0d13",
                "sha256": "13f3034d0e2360f747e0a9afa1f4c07b85422268c150e657d96a583eb3cd9124"
            },
            "downloads": -1,
            "filename": "colbert_server-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0869d15792a266b550c3552e8bda0d13",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 9470,
            "upload_time": "2025-10-30T21:34:33",
            "upload_time_iso_8601": "2025-10-30T21:34:33.669485Z",
            "url": "https://files.pythonhosted.org/packages/0b/f6/aa5f88834dc1a599687b3ee19cbb1a5702cc0dcf756fba546da699a9fb2e/colbert_server-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 21:34:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nielsgl",
    "github_project": "colbert-server",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "colbert-server"
}
        
Elapsed time: 2.05069s