# muvera-rs
An unofficial Rust implementation of **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**.
[](https://crates.io/crates/muvera-rs)
[](https://docs.rs/muvera-rs)
[](LICENSE)
## Disclaimer
- This is an unofficial implementation and is not affiliated with Google Research or the original authors.
- Some parts of this README were generated by AI.
## Overview
MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.
### The Algorithm
Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.
MuVERA solves this by:
1. **Fixed Dimensional Encoding (FDE)**: Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
2. **Asymmetric Encoding**: Uses sum aggregation for queries and average aggregation for documents
3. **Single-Vector MIPS**: Enables the use of highly-optimized Maximum Inner Product Search algorithms
4. **Theoretical Guarantees**: Provides ε-approximation guarantees for Chamfer similarity
The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.
## Features
- ✅ **Core FDE Implementation**: Complete Fixed Dimensional Encoding with configurable buckets
- ✅ **Batch Processing**: Efficient parallel encoding for large datasets
- ✅ **Type Safety**: Generic implementation supporting `f32` and `f64`
- ✅ **Memory Efficient**: Uses `ndarray` for optimized vector operations
- ✅ **Deterministic**: Reproducible results with seed-based randomization
- ✅ **Comprehensive Testing**: Extensive test suite covering edge cases
## Installation
### Rust
Add to your `Cargo.toml`:
```toml
[dependencies]
muvera-rs = "0.1.0"
```
### Python
Install from PyPI:
```bash
pip install muvera
```
Or install from source:
```bash
pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings
```
## Testing the Python Bindings
After building with maturin, you can test the Python bindings interactively:
```python
import numpy as np
import muvera
embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)
```
Or save the above as a script and run with `python my_test.py`.
## Usage
### Python Example
```python
import numpy as np
import muvera
# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)
# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")
# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")
```
### Rust Example
```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};
fn main() {
// Create encoder with 1024 buckets for 768-dimensional embeddings
let encoder = FDEEncoder::new(128, 768, 42);
// Example token embeddings (num_tokens, embedding_dim)
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Encode query (sum aggregation)
let query_fde = encoder.encode_query(tokens.view());
// Encode document (average aggregation)
let doc_fde = encoder.encode_doc(tokens.view());
}
```
### Batch Processing
```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};
fn encode_batch() {
let encoder = FDEEncoder::new(128, 768, 42);
// Batch of token embeddings (batch_size, num_tokens, embedding_dim)
let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
// Encode entire batch
let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
println!("Encoded batch shape: {:?}", batch_fdes.shape());
}
```
### Custom Aggregation
```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
fn custom_encoding() {
let encoder = FDEEncoder::new(128, 768, 42);
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Use custom aggregation mode
let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}
```
## API Reference
### `FDEEncoder<T>`
The main encoder struct that implements the Fixed Dimensional Encoding algorithm.
#### Constructor
```rust
pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
```
- `buckets`: Number of hash buckets (hyperplanes)
- `dim`: Dimensionality of input token embeddings
- `seed`: Random seed for reproducible hyperplane initialization
#### Methods
- `encode(tokens, mode)`: Encode single multi-vector
- `batch_encode(tokens, mode)`: Encode batch of multi-vectors (parallel)
- `encode_query(tokens)`: Encode query with sum aggregation
- `encode_doc(tokens)`: Encode document with average aggregation
- `encode_query_batch(tokens)`: Batch encode queries
- `encode_doc_batch(tokens)`: Batch encode documents
### `Aggregation`
Enum defining aggregation modes:
- `Sum`: Sum all tokens in each bucket
- `Avg`: Average all tokens in each bucket
## Performance Considerations
- **Buckets**: More buckets = better approximation but higher memory usage
- **Batch Size**: Use batch encoding for multiple vectors to leverage parallelism
- **Memory**: FDE vectors are `buckets * dim` in size
- **Precision**: `f32` is typically sufficient and faster than `f64`
## References
### Original Paper
- **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
[arXiv:2405.19504](https://arxiv.org/pdf/2405.19504)
### Google Research Blog
- **MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search**
[Google Research Blog](https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-single-vector-search/)
### Original Implementation
- **Google Graph Mining Repository**
[github.com/google/graph-mining/tree/main/sketching/point_cloud](https://github.com/google/graph-mining/tree/main/sketching/point_cloud)
## Future Work
### Planned Features
1. **Benchmark Suite**
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
2. **Advanced Features**
- **Support BLAS**
- **Product Quantization (PQ)**: 32x compression with minimal quality loss
- **Final Projections**: Dimensionality reduction techniques
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/NewBornRustacean/muvera-rs",
"name": "muvera",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "muvera, multi-vector, embedding, fde, retrieval",
"author": "NewBornRustacean <huiseomkim@gmail.com>",
"author_email": "NewBornRustacean <huiseomkim@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/be/57/8624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c/muvera-0.2.0.tar.gz",
"platform": null,
"description": "# muvera-rs\r\n\r\nAn unofficial Rust implementation of **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**.\r\n\r\n[](https://crates.io/crates/muvera-rs)\r\n[](https://docs.rs/muvera-rs)\r\n[](LICENSE)\r\n\r\n## Disclaimer\r\n\r\n- This is an unofficial implementation and is not affiliated with Google Research or the original authors.\r\n- Some parts of this README were generated by AI.\r\n\r\n## Overview\r\n\r\nMuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.\r\n\r\n### The Algorithm\r\n\r\nMulti-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.\r\n\r\nMuVERA solves this by:\r\n\r\n1. **Fixed Dimensional Encoding (FDE)**: Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing\r\n2. **Asymmetric Encoding**: Uses sum aggregation for queries and average aggregation for documents\r\n3. **Single-Vector MIPS**: Enables the use of highly-optimized Maximum Inner Product Search algorithms\r\n4. **Theoretical Guarantees**: Provides \u03b5-approximation guarantees for Chamfer similarity\r\n\r\nThe key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.\r\n\r\n## Features\r\n\r\n- \u2705 **Core FDE Implementation**: Complete Fixed Dimensional Encoding with configurable buckets\r\n- \u2705 **Batch Processing**: Efficient parallel encoding for large datasets\r\n- \u2705 **Type Safety**: Generic implementation supporting `f32` and `f64`\r\n- \u2705 **Memory Efficient**: Uses `ndarray` for optimized vector operations\r\n- \u2705 **Deterministic**: Reproducible results with seed-based randomization\r\n- \u2705 **Comprehensive Testing**: Extensive test suite covering edge cases\r\n\r\n## Installation\r\n\r\n### Rust\r\n\r\nAdd to your `Cargo.toml`:\r\n\r\n```toml\r\n[dependencies]\r\nmuvera-rs = \"0.1.0\"\r\n```\r\n\r\n### Python\r\n\r\nInstall from PyPI:\r\n\r\n```bash\r\npip install muvera\r\n```\r\n\r\nOr install from source:\r\n\r\n```bash\r\npip install maturin\r\ngit clone https://github.com/NewBornRustacean/muvera-rs.git\r\ncd muvera-rs\r\nmaturin develop --features python-bindings\r\n```\r\n\r\n## Testing the Python Bindings\r\n\r\nAfter building with maturin, you can test the Python bindings interactively:\r\n\r\n```python\r\nimport numpy as np\r\nimport muvera\r\n\r\nembeddings = np.random.randn(32, 768).astype(np.float32)\r\nresult = muvera.encode_fde(embeddings, \"mean\")\r\nprint(result)\r\n```\r\n\r\nOr save the above as a script and run with `python my_test.py`.\r\n\r\n## Usage\r\n\r\n### Python Example\r\n\r\n```python\r\nimport numpy as np\r\nimport muvera\r\n\r\n# Create token embeddings (num_tokens, embedding_dim)\r\nembeddings = np.random.randn(32, 768).astype(np.float32)\r\n\r\n# Encode with mean aggregation\r\nbuckets = 3\r\nresult = muvera.encode_fde(embeddings, buckets, \"mean\")\r\nprint(f\"FDE result shape: {result.shape}\")\r\n\r\n# Encode with max aggregation\r\nresult_max = muvera.encode_fde(embeddings, buckets, \"max\")\r\nprint(f\"FDE max result shape: {result_max.shape}\")\r\n```\r\n\r\n### Rust Example\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse muvera_rs::types::Aggregation;\r\nuse ndarray::{Array2, ArrayView2};\r\n\r\nfn main() {\r\n // Create encoder with 1024 buckets for 768-dimensional embeddings\r\n let encoder = FDEEncoder::new(128, 768, 42);\r\n \r\n // Example token embeddings (num_tokens, embedding_dim)\r\n let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();\r\n \r\n // Encode query (sum aggregation)\r\n let query_fde = encoder.encode_query(tokens.view());\r\n \r\n // Encode document (average aggregation)\r\n let doc_fde = encoder.encode_doc(tokens.view());\r\n}\r\n```\r\n\r\n### Batch Processing\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse ndarray::{Array3, ArrayView3};\r\n\r\nfn encode_batch() {\r\n let encoder = FDEEncoder::new(128, 768, 42);\r\n \r\n // Batch of token embeddings (batch_size, num_tokens, embedding_dim)\r\n let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();\r\n \r\n // Encode entire batch\r\n let batch_fdes = encoder.encode_query_batch(batch_tokens.view());\r\n \r\n println!(\"Encoded batch shape: {:?}\", batch_fdes.shape());\r\n}\r\n```\r\n\r\n### Custom Aggregation\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse muvera_rs::types::Aggregation;\r\n\r\nfn custom_encoding() {\r\n let encoder = FDEEncoder::new(128, 768, 42);\r\n let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();\r\n \r\n // Use custom aggregation mode\r\n let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);\r\n let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);\r\n}\r\n```\r\n\r\n## API Reference\r\n\r\n### `FDEEncoder<T>`\r\n\r\nThe main encoder struct that implements the Fixed Dimensional Encoding algorithm.\r\n\r\n#### Constructor\r\n```rust\r\npub fn new(buckets: usize, dim: usize, seed: u64) -> Self\r\n```\r\n\r\n- `buckets`: Number of hash buckets (hyperplanes) \r\n- `dim`: Dimensionality of input token embeddings\r\n- `seed`: Random seed for reproducible hyperplane initialization\r\n\r\n#### Methods\r\n\r\n- `encode(tokens, mode)`: Encode single multi-vector\r\n- `batch_encode(tokens, mode)`: Encode batch of multi-vectors (parallel)\r\n- `encode_query(tokens)`: Encode query with sum aggregation\r\n- `encode_doc(tokens)`: Encode document with average aggregation\r\n- `encode_query_batch(tokens)`: Batch encode queries\r\n- `encode_doc_batch(tokens)`: Batch encode documents\r\n\r\n### `Aggregation`\r\n\r\nEnum defining aggregation modes:\r\n- `Sum`: Sum all tokens in each bucket\r\n- `Avg`: Average all tokens in each bucket\r\n\r\n## Performance Considerations\r\n\r\n- **Buckets**: More buckets = better approximation but higher memory usage\r\n- **Batch Size**: Use batch encoding for multiple vectors to leverage parallelism\r\n- **Memory**: FDE vectors are `buckets * dim` in size\r\n- **Precision**: `f32` is typically sufficient and faster than `f64`\r\n\r\n## References\r\n\r\n### Original Paper\r\n- **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings** \r\n Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni \r\n [arXiv:2405.19504](https://arxiv.org/pdf/2405.19504)\r\n\r\n### Google Research Blog\r\n- **MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search** \r\n [Google Research Blog](https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-single-vector-search/)\r\n\r\n### Original Implementation\r\n- **Google Graph Mining Repository** \r\n [github.com/google/graph-mining/tree/main/sketching/point_cloud](https://github.com/google/graph-mining/tree/main/sketching/point_cloud)\r\n\r\n## Future Work\r\n\r\n### Planned Features\r\n\r\n1. **Benchmark Suite**\r\n - Integration with BEIR datasets (MS MARCO, etc.)\r\n - Memory usage and compression analysis\r\n\r\n2. **Advanced Features**\r\n - **Support BLAS**\r\n - **Product Quantization (PQ)**: 32x compression with minimal quality loss\r\n - **Final Projections**: Dimensionality reduction techniques\r\n \r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings",
"version": "0.2.0",
"project_urls": {
"Documentation": "https://docs.rs/muvera-rs",
"Homepage": "https://github.com/NewBornRustacean/muvera-rs",
"Issues": "https://github.com/NewBornRustacean/muvera-rs/issues",
"Repository": "https://github.com/NewBornRustacean/muvera-rs"
},
"split_keywords": [
"muvera",
" multi-vector",
" embedding",
" fde",
" retrieval"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "736aef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130",
"md5": "4c7e0aa1e03005279611ee7ab77b096b",
"sha256": "ff216042f6253473f44d8e4405657c60f030ea4d8238ca2afd50876d7876f31a"
},
"downloads": -1,
"filename": "muvera-0.2.0-cp312-cp312-win_amd64.whl",
"has_sig": false,
"md5_digest": "4c7e0aa1e03005279611ee7ab77b096b",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.8",
"size": 182682,
"upload_time": "2025-07-12T14:29:38",
"upload_time_iso_8601": "2025-07-12T14:29:38.833614Z",
"url": "https://files.pythonhosted.org/packages/73/6a/ef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130/muvera-0.2.0-cp312-cp312-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "be578624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c",
"md5": "a39a75e77458c553c82505cc8b67bbcb",
"sha256": "61390f9b2e32ffb7f8022a2efc7acaef404fb2556883d14a3c4f5b527c59a477"
},
"downloads": -1,
"filename": "muvera-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a39a75e77458c553c82505cc8b67bbcb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 62497,
"upload_time": "2025-07-12T14:29:41",
"upload_time_iso_8601": "2025-07-12T14:29:41.165648Z",
"url": "https://files.pythonhosted.org/packages/be/57/8624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c/muvera-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-12 14:29:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NewBornRustacean",
"github_project": "muvera-rs",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "muvera"
}