# Lateness - Modern ColBERT for Late Interaction
A Python package for Modern ColBERT (late interaction) embeddings with native multi-vector support for efficient retrieval using Qdrant vector database.
## Features
- **Dual Backend Architecture**: ONNX for fast retrieval, PyTorch for GPU indexing
- **Native Multi-Vector Support**: Optimized for Qdrant's MaxSim comparator
- **Smart Installation**: Lightweight retrieval or heavy indexing based on your needs
- **Production Ready**: Separate deployment targets for different workloads
## Quick Start
### Installation
```bash
# Lightweight retrieval (ONNX + Qdrant)
pip install lateness
# Heavy indexing (PyTorch + Transformers + ONNX + Qdrant)
pip install lateness[index]
```
### Backend Selection
### Basic Usage
**Default Installation (ONNX Backend):**
```python
# pip install lateness
from lateness import ModernColBERT
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using ONNX backend Using ONNX backend (default, for GPU accelerated indexing, install lateness[index] and set LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# ✅ ONNX ColBERT loaded with providers: ['CPUExecutionProvider']
# Query max length: 256, Document max length: 300
```
**Index Installation (PyTorch Backend):**
```python
# pip install lateness[index]
import os
os.environ['LATENESS_USE_TORCH'] = 'true'
from lateness import ModernColBERT
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using PyTorch backend (LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# Loading model from: /root/.cache/huggingface/hub/models--prithivida--modern_colbert_base_en_v1/...
# ✅ PyTorch ColBERT loaded on cuda
# Query max length: 256, Document max length: 300
```
**Complete Example with Qdrant:**
For a complete working example with Qdrant integration, environment setup, and testing instructions, see the [examples/qdrant folder](./examples/qdrant/).
The examples include:
- Environment setup and testing
- Local Qdrant server management
- Complete indexing and retrieval workflows
- Both ONNX and PyTorch backend examples
## Architecture
### Two Deployment Models
**Retrieval Service (Lightweight)**
```bash
pip install lateness
```
- ONNX backend (fast CPU inference)
- Qdrant integration
- ~50MB total dependencies
- Perfect for user-facing search APIs
**Indexing Service (Heavy)**
```bash
pip install lateness[index]
```
- PyTorch backend (GPU acceleration)
- Full Transformers support
- ~2GB+ dependencies
- Perfect for batch document processing
### Backend Selection
The package uses environment variables for backend control:
- **Default behavior** → ONNX backend (CPU retrieval)
- **`LATENESS_USE_TORCH=true`** → PyTorch backend (GPU indexing)
**Note:** PyTorch backend requires `pip install lateness[index]` to install PyTorch dependencies.
## API Reference
### ModernColBERT
```python
from lateness import ModernColBERT
# Initialize
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Encode queries
query_embeddings = colbert.encode_queries(["What is AI?"])
# Encode documents
doc_embeddings = colbert.encode_documents(["AI is artificial intelligence"])
# Compute similarity
scores = ModernColBERT.compute_similarity(query_embeddings, doc_embeddings)
```
### Qdrant Integration
```python
from lateness import QdrantIndexer, QdrantRetriever
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
# Indexing
indexer = QdrantIndexer(client, "documents")
indexer.create_collection()
indexer.index_documents_simple(documents)
# Retrieval
retriever = QdrantRetriever(client, "documents")
results = retriever.search_simple("query", top_k=10)
```
## License
Apache License 2.0
## Contributing
Contributions welcome! Please check our [contributing guidelines](CONTRIBUTING.md).
Raw data
{
"_id": null,
"home_page": "https://github.com/moderncolbert/lateness",
"name": "lateness",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "colbert, retrieval, embeddings, information-retrieval, nlp, onnx, pytorch, qdrant",
"author": "donkey stereotype",
"author_email": "prithivi@donkeystereotype.com",
"download_url": "https://files.pythonhosted.org/packages/bd/fd/352f8bd8c48526cc37f7dc8614ec899cdd1624bf40b818856d5037767622/lateness-0.1.0.tar.gz",
"platform": null,
"description": "# Lateness - Modern ColBERT for Late Interaction\n\nA Python package for Modern ColBERT (late interaction) embeddings with native multi-vector support for efficient retrieval using Qdrant vector database.\n\n## Features\n\n- **Dual Backend Architecture**: ONNX for fast retrieval, PyTorch for GPU indexing\n- **Native Multi-Vector Support**: Optimized for Qdrant's MaxSim comparator\n- **Smart Installation**: Lightweight retrieval or heavy indexing based on your needs\n- **Production Ready**: Separate deployment targets for different workloads\n\n## Quick Start\n\n### Installation\n\n```bash\n# Lightweight retrieval (ONNX + Qdrant)\npip install lateness\n\n# Heavy indexing (PyTorch + Transformers + ONNX + Qdrant)\npip install lateness[index]\n```\n\n### Backend Selection\n\n### Basic Usage\n\n**Default Installation (ONNX Backend):**\n```python\n\n# pip install lateness\nfrom lateness import ModernColBERT\ncolbert = ModernColBERT(\"prithivida/modern_colbert_base_en_v1\")\n# Output:\n# \ud83d\ude80 Using ONNX backend Using ONNX backend (default, for GPU accelerated indexing, install lateness[index] and set LATENESS_USE_TORCH=true)\n# \ud83d\udd04 Downloading model: prithivida/modern_colbert_base_en_v1\n# \u2705 ONNX ColBERT loaded with providers: ['CPUExecutionProvider']\n# Query max length: 256, Document max length: 300\n```\n\n**Index Installation (PyTorch Backend):**\n```python\n# pip install lateness[index]\nimport os\nos.environ['LATENESS_USE_TORCH'] = 'true'\nfrom lateness import ModernColBERT\n\ncolbert = ModernColBERT(\"prithivida/modern_colbert_base_en_v1\")\n# Output:\n# \ud83d\ude80 Using PyTorch backend (LATENESS_USE_TORCH=true)\n# \ud83d\udd04 Downloading model: prithivida/modern_colbert_base_en_v1\n# Loading model from: /root/.cache/huggingface/hub/models--prithivida--modern_colbert_base_en_v1/...\n# \u2705 PyTorch ColBERT loaded on cuda\n# Query max length: 256, Document max length: 300\n```\n\n**Complete Example with Qdrant:**\n\nFor a complete working example with Qdrant integration, environment setup, and testing instructions, see the [examples/qdrant folder](./examples/qdrant/).\n\nThe examples include:\n- Environment setup and testing\n- Local Qdrant server management\n- Complete indexing and retrieval workflows\n- Both ONNX and PyTorch backend examples\n\n## Architecture\n\n### Two Deployment Models\n\n**Retrieval Service (Lightweight)**\n```bash\npip install lateness\n```\n- ONNX backend (fast CPU inference)\n- Qdrant integration\n- ~50MB total dependencies\n- Perfect for user-facing search APIs\n\n**Indexing Service (Heavy)**\n```bash\npip install lateness[index]\n```\n- PyTorch backend (GPU acceleration)\n- Full Transformers support\n- ~2GB+ dependencies\n- Perfect for batch document processing\n\n### Backend Selection\n\nThe package uses environment variables for backend control:\n\n- **Default behavior** \u2192 ONNX backend (CPU retrieval)\n- **`LATENESS_USE_TORCH=true`** \u2192 PyTorch backend (GPU indexing)\n\n**Note:** PyTorch backend requires `pip install lateness[index]` to install PyTorch dependencies.\n\n## API Reference\n\n### ModernColBERT\n\n```python\nfrom lateness import ModernColBERT\n\n# Initialize\ncolbert = ModernColBERT(\"prithivida/modern_colbert_base_en_v1\")\n\n# Encode queries\nquery_embeddings = colbert.encode_queries([\"What is AI?\"])\n\n# Encode documents \ndoc_embeddings = colbert.encode_documents([\"AI is artificial intelligence\"])\n\n# Compute similarity\nscores = ModernColBERT.compute_similarity(query_embeddings, doc_embeddings)\n```\n\n### Qdrant Integration\n\n```python\nfrom lateness import QdrantIndexer, QdrantRetriever\nfrom qdrant_client import QdrantClient\n\nclient = QdrantClient(\"localhost\", port=6333)\n\n# Indexing\nindexer = QdrantIndexer(client, \"documents\")\nindexer.create_collection()\nindexer.index_documents_simple(documents)\n\n# Retrieval\nretriever = QdrantRetriever(client, \"documents\")\nresults = retriever.search_simple(\"query\", top_k=10)\n```\n\n## License\n\nApache License 2.0\n\n## Contributing\n\nContributions welcome! Please check our [contributing guidelines](CONTRIBUTING.md).\n",
"bugtrack_url": null,
"license": null,
"summary": "Modern ColBERT for Late Interaction with native multi-vector support",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/moderncolbert/lateness/issues",
"Documentation": "https://moderncolbert.github.io/lateness/",
"Homepage": "https://github.com/moderncolbert/lateness",
"Source": "https://github.com/moderncolbert/lateness"
},
"split_keywords": [
"colbert",
" retrieval",
" embeddings",
" information-retrieval",
" nlp",
" onnx",
" pytorch",
" qdrant"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0e1402351c6bdf42d9880e999ab5234bbd10f6b99b99b8674cd68b8e40b4b26f",
"md5": "c015281960e6d63d9169054611c47448",
"sha256": "a36d2ac44be6b48ba06a2e9c8bd643318f947ba56cee24347cdcada034b6a580"
},
"downloads": -1,
"filename": "lateness-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c015281960e6d63d9169054611c47448",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19659,
"upload_time": "2025-07-17T07:49:49",
"upload_time_iso_8601": "2025-07-17T07:49:49.562760Z",
"url": "https://files.pythonhosted.org/packages/0e/14/02351c6bdf42d9880e999ab5234bbd10f6b99b99b8674cd68b8e40b4b26f/lateness-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bdfd352f8bd8c48526cc37f7dc8614ec899cdd1624bf40b818856d5037767622",
"md5": "f62faf8d16cd0194bcaeedbd55c87121",
"sha256": "fe807e2e8cd6757f4acf78eb15dcfbdef21e6aa917bf5964143ae6703b5e18e7"
},
"downloads": -1,
"filename": "lateness-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "f62faf8d16cd0194bcaeedbd55c87121",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17538,
"upload_time": "2025-07-17T07:49:50",
"upload_time_iso_8601": "2025-07-17T07:49:50.972318Z",
"url": "https://files.pythonhosted.org/packages/bd/fd/352f8bd8c48526cc37f7dc8614ec899cdd1624bf40b818856d5037767622/lateness-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 07:49:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "moderncolbert",
"github_project": "lateness",
"github_not_found": true,
"lcname": "lateness"
}