muvera


Namemuvera JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/NewBornRustacean/muvera-rs
SummaryAn unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
upload_time2025-07-12 14:29:41
maintainerNone
docs_urlNone
authorNewBornRustacean <huiseomkim@gmail.com>
requires_python>=3.8
licenseMIT
keywords muvera multi-vector embedding fde retrieval
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # muvera-rs

An unofficial Rust implementation of **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**.

[![Crates.io](https://img.shields.io/crates/v/muvera-rs)](https://crates.io/crates/muvera-rs)
[![Documentation](https://docs.rs/muvera-rs/badge.svg)](https://docs.rs/muvera-rs)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

## Disclaimer

- This is an unofficial implementation and is not affiliated with Google Research or the original authors.
- Some parts of this README were generated by AI.

## Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

### The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

1. **Fixed Dimensional Encoding (FDE)**: Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
2. **Asymmetric Encoding**: Uses sum aggregation for queries and average aggregation for documents
3. **Single-Vector MIPS**: Enables the use of highly-optimized Maximum Inner Product Search algorithms
4. **Theoretical Guarantees**: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

## Features

- ✅ **Core FDE Implementation**: Complete Fixed Dimensional Encoding with configurable buckets
- ✅ **Batch Processing**: Efficient parallel encoding for large datasets
- ✅ **Type Safety**: Generic implementation supporting `f32` and `f64`
- ✅ **Memory Efficient**: Uses `ndarray` for optimized vector operations
- ✅ **Deterministic**: Reproducible results with seed-based randomization
- ✅ **Comprehensive Testing**: Extensive test suite covering edge cases

## Installation

### Rust

Add to your `Cargo.toml`:

```toml
[dependencies]
muvera-rs = "0.1.0"
```

### Python

Install from PyPI:

```bash
pip install muvera
```

Or install from source:

```bash
pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings
```

## Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

```python
import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)
```

Or save the above as a script and run with `python my_test.py`.

## Usage

### Python Example

```python
import numpy as np
import muvera

# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)

# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")

# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")
```

### Rust Example

```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Example token embeddings (num_tokens, embedding_dim)
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (sum aggregation)
    let query_fde = encoder.encode_query(tokens.view());
    
    // Encode document (average aggregation)
    let doc_fde = encoder.encode_doc(tokens.view());
}
```

### Batch Processing

```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}
```

### Custom Aggregation

```rust
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}
```

## API Reference

### `FDEEncoder<T>`

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

#### Constructor
```rust
pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
```

- `buckets`: Number of hash buckets (hyperplanes) 
- `dim`: Dimensionality of input token embeddings
- `seed`: Random seed for reproducible hyperplane initialization

#### Methods

- `encode(tokens, mode)`: Encode single multi-vector
- `batch_encode(tokens, mode)`: Encode batch of multi-vectors (parallel)
- `encode_query(tokens)`: Encode query with sum aggregation
- `encode_doc(tokens)`: Encode document with average aggregation
- `encode_query_batch(tokens)`: Batch encode queries
- `encode_doc_batch(tokens)`: Batch encode documents

### `Aggregation`

Enum defining aggregation modes:
- `Sum`: Sum all tokens in each bucket
- `Avg`: Average all tokens in each bucket

## Performance Considerations

- **Buckets**: More buckets = better approximation but higher memory usage
- **Batch Size**: Use batch encoding for multiple vectors to leverage parallelism
- **Memory**: FDE vectors are `buckets * dim` in size
- **Precision**: `f32` is typically sufficient and faster than `f64`

## References

### Original Paper
- **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**  
  Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni  
  [arXiv:2405.19504](https://arxiv.org/pdf/2405.19504)

### Google Research Blog
- **MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search**  
  [Google Research Blog](https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-single-vector-search/)

### Original Implementation
- **Google Graph Mining Repository**  
  [github.com/google/graph-mining/tree/main/sketching/point_cloud](https://github.com/google/graph-mining/tree/main/sketching/point_cloud)

## Future Work

### Planned Features

1. **Benchmark Suite**
   - Integration with BEIR datasets (MS MARCO, etc.)
   - Memory usage and compression analysis

2. **Advanced Features**
   - **Support BLAS**
   - **Product Quantization (PQ)**: 32x compression with minimal quality loss
   - **Final Projections**: Dimensionality reduction techniques
   

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NewBornRustacean/muvera-rs",
    "name": "muvera",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "muvera, multi-vector, embedding, fde, retrieval",
    "author": "NewBornRustacean <huiseomkim@gmail.com>",
    "author_email": "NewBornRustacean <huiseomkim@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/be/57/8624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c/muvera-0.2.0.tar.gz",
    "platform": null,
    "description": "# muvera-rs\r\n\r\nAn unofficial Rust implementation of **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**.\r\n\r\n[![Crates.io](https://img.shields.io/crates/v/muvera-rs)](https://crates.io/crates/muvera-rs)\r\n[![Documentation](https://docs.rs/muvera-rs/badge.svg)](https://docs.rs/muvera-rs)\r\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\r\n\r\n## Disclaimer\r\n\r\n- This is an unofficial implementation and is not affiliated with Google Research or the original authors.\r\n- Some parts of this README were generated by AI.\r\n\r\n## Overview\r\n\r\nMuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.\r\n\r\n### The Algorithm\r\n\r\nMulti-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.\r\n\r\nMuVERA solves this by:\r\n\r\n1. **Fixed Dimensional Encoding (FDE)**: Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing\r\n2. **Asymmetric Encoding**: Uses sum aggregation for queries and average aggregation for documents\r\n3. **Single-Vector MIPS**: Enables the use of highly-optimized Maximum Inner Product Search algorithms\r\n4. **Theoretical Guarantees**: Provides \u03b5-approximation guarantees for Chamfer similarity\r\n\r\nThe key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.\r\n\r\n## Features\r\n\r\n- \u2705 **Core FDE Implementation**: Complete Fixed Dimensional Encoding with configurable buckets\r\n- \u2705 **Batch Processing**: Efficient parallel encoding for large datasets\r\n- \u2705 **Type Safety**: Generic implementation supporting `f32` and `f64`\r\n- \u2705 **Memory Efficient**: Uses `ndarray` for optimized vector operations\r\n- \u2705 **Deterministic**: Reproducible results with seed-based randomization\r\n- \u2705 **Comprehensive Testing**: Extensive test suite covering edge cases\r\n\r\n## Installation\r\n\r\n### Rust\r\n\r\nAdd to your `Cargo.toml`:\r\n\r\n```toml\r\n[dependencies]\r\nmuvera-rs = \"0.1.0\"\r\n```\r\n\r\n### Python\r\n\r\nInstall from PyPI:\r\n\r\n```bash\r\npip install muvera\r\n```\r\n\r\nOr install from source:\r\n\r\n```bash\r\npip install maturin\r\ngit clone https://github.com/NewBornRustacean/muvera-rs.git\r\ncd muvera-rs\r\nmaturin develop --features python-bindings\r\n```\r\n\r\n## Testing the Python Bindings\r\n\r\nAfter building with maturin, you can test the Python bindings interactively:\r\n\r\n```python\r\nimport numpy as np\r\nimport muvera\r\n\r\nembeddings = np.random.randn(32, 768).astype(np.float32)\r\nresult = muvera.encode_fde(embeddings, \"mean\")\r\nprint(result)\r\n```\r\n\r\nOr save the above as a script and run with `python my_test.py`.\r\n\r\n## Usage\r\n\r\n### Python Example\r\n\r\n```python\r\nimport numpy as np\r\nimport muvera\r\n\r\n# Create token embeddings (num_tokens, embedding_dim)\r\nembeddings = np.random.randn(32, 768).astype(np.float32)\r\n\r\n# Encode with mean aggregation\r\nbuckets = 3\r\nresult = muvera.encode_fde(embeddings, buckets, \"mean\")\r\nprint(f\"FDE result shape: {result.shape}\")\r\n\r\n# Encode with max aggregation\r\nresult_max = muvera.encode_fde(embeddings, buckets, \"max\")\r\nprint(f\"FDE max result shape: {result_max.shape}\")\r\n```\r\n\r\n### Rust Example\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse muvera_rs::types::Aggregation;\r\nuse ndarray::{Array2, ArrayView2};\r\n\r\nfn main() {\r\n    // Create encoder with 1024 buckets for 768-dimensional embeddings\r\n    let encoder = FDEEncoder::new(128, 768, 42);\r\n    \r\n    // Example token embeddings (num_tokens, embedding_dim)\r\n    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();\r\n    \r\n    // Encode query (sum aggregation)\r\n    let query_fde = encoder.encode_query(tokens.view());\r\n    \r\n    // Encode document (average aggregation)\r\n    let doc_fde = encoder.encode_doc(tokens.view());\r\n}\r\n```\r\n\r\n### Batch Processing\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse ndarray::{Array3, ArrayView3};\r\n\r\nfn encode_batch() {\r\n    let encoder = FDEEncoder::new(128, 768, 42);\r\n    \r\n    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)\r\n    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();\r\n    \r\n    // Encode entire batch\r\n    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());\r\n    \r\n    println!(\"Encoded batch shape: {:?}\", batch_fdes.shape());\r\n}\r\n```\r\n\r\n### Custom Aggregation\r\n\r\n```rust\r\nuse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};\r\nuse muvera_rs::types::Aggregation;\r\n\r\nfn custom_encoding() {\r\n    let encoder = FDEEncoder::new(128, 768, 42);\r\n    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();\r\n    \r\n    // Use custom aggregation mode\r\n    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);\r\n    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);\r\n}\r\n```\r\n\r\n## API Reference\r\n\r\n### `FDEEncoder<T>`\r\n\r\nThe main encoder struct that implements the Fixed Dimensional Encoding algorithm.\r\n\r\n#### Constructor\r\n```rust\r\npub fn new(buckets: usize, dim: usize, seed: u64) -> Self\r\n```\r\n\r\n- `buckets`: Number of hash buckets (hyperplanes) \r\n- `dim`: Dimensionality of input token embeddings\r\n- `seed`: Random seed for reproducible hyperplane initialization\r\n\r\n#### Methods\r\n\r\n- `encode(tokens, mode)`: Encode single multi-vector\r\n- `batch_encode(tokens, mode)`: Encode batch of multi-vectors (parallel)\r\n- `encode_query(tokens)`: Encode query with sum aggregation\r\n- `encode_doc(tokens)`: Encode document with average aggregation\r\n- `encode_query_batch(tokens)`: Batch encode queries\r\n- `encode_doc_batch(tokens)`: Batch encode documents\r\n\r\n### `Aggregation`\r\n\r\nEnum defining aggregation modes:\r\n- `Sum`: Sum all tokens in each bucket\r\n- `Avg`: Average all tokens in each bucket\r\n\r\n## Performance Considerations\r\n\r\n- **Buckets**: More buckets = better approximation but higher memory usage\r\n- **Batch Size**: Use batch encoding for multiple vectors to leverage parallelism\r\n- **Memory**: FDE vectors are `buckets * dim` in size\r\n- **Precision**: `f32` is typically sufficient and faster than `f64`\r\n\r\n## References\r\n\r\n### Original Paper\r\n- **MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings**  \r\n  Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni  \r\n  [arXiv:2405.19504](https://arxiv.org/pdf/2405.19504)\r\n\r\n### Google Research Blog\r\n- **MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search**  \r\n  [Google Research Blog](https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-single-vector-search/)\r\n\r\n### Original Implementation\r\n- **Google Graph Mining Repository**  \r\n  [github.com/google/graph-mining/tree/main/sketching/point_cloud](https://github.com/google/graph-mining/tree/main/sketching/point_cloud)\r\n\r\n## Future Work\r\n\r\n### Planned Features\r\n\r\n1. **Benchmark Suite**\r\n   - Integration with BEIR datasets (MS MARCO, etc.)\r\n   - Memory usage and compression analysis\r\n\r\n2. **Advanced Features**\r\n   - **Support BLAS**\r\n   - **Product Quantization (PQ)**: 32x compression with minimal quality loss\r\n   - **Final Projections**: Dimensionality reduction techniques\r\n   \r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings",
    "version": "0.2.0",
    "project_urls": {
        "Documentation": "https://docs.rs/muvera-rs",
        "Homepage": "https://github.com/NewBornRustacean/muvera-rs",
        "Issues": "https://github.com/NewBornRustacean/muvera-rs/issues",
        "Repository": "https://github.com/NewBornRustacean/muvera-rs"
    },
    "split_keywords": [
        "muvera",
        " multi-vector",
        " embedding",
        " fde",
        " retrieval"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "736aef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130",
                "md5": "4c7e0aa1e03005279611ee7ab77b096b",
                "sha256": "ff216042f6253473f44d8e4405657c60f030ea4d8238ca2afd50876d7876f31a"
            },
            "downloads": -1,
            "filename": "muvera-0.2.0-cp312-cp312-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "4c7e0aa1e03005279611ee7ab77b096b",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.8",
            "size": 182682,
            "upload_time": "2025-07-12T14:29:38",
            "upload_time_iso_8601": "2025-07-12T14:29:38.833614Z",
            "url": "https://files.pythonhosted.org/packages/73/6a/ef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130/muvera-0.2.0-cp312-cp312-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "be578624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c",
                "md5": "a39a75e77458c553c82505cc8b67bbcb",
                "sha256": "61390f9b2e32ffb7f8022a2efc7acaef404fb2556883d14a3c4f5b527c59a477"
            },
            "downloads": -1,
            "filename": "muvera-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a39a75e77458c553c82505cc8b67bbcb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 62497,
            "upload_time": "2025-07-12T14:29:41",
            "upload_time_iso_8601": "2025-07-12T14:29:41.165648Z",
            "url": "https://files.pythonhosted.org/packages/be/57/8624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c/muvera-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-12 14:29:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NewBornRustacean",
    "github_project": "muvera-rs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "muvera"
}
        
Elapsed time: 0.44451s