maxsim-cpu


Namemaxsim-cpu JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryFast CPU implementation of MaxSim scoring
upload_time2025-07-10 11:14:39
maintainerNone
docs_urlNone
authorBenjamin Clavié, Mixedbread Team
requires_python>=3.8
licenseApache-2.0
keywords maxsim similarity ranking cpu simd performance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # maxsim-cpu

`maxsim-cpu` is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.

It is a python library written in Rust and powered by `libsxmm` on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.

`maxsim-cpu` is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.

It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.

## Getting Started

Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:

```bash
uv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!
```

Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:

```python
import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

docs = np.random.randn(1000, 512, 128).astype(np.float32)  # [num_docs, doc_len, dim]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores(query, docs)  # Returns [num_docs] scores
```

For variable length inputs, you should use the alternate `maxsim_scores_variable`:

```python
import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

# Create variable-length documents as a list
docs = [
    np.random.randn(np.random.randint(50, 800), 128).astype(np.float32)  # Variable length docs
    for _ in range(1000)
]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores_variable(query, docs)  # Returns [num_docs] scores
```

## Platform Requirements

- **macOS**: Apple Silicon (M1+)
- **Linux**: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)

We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!

## Building

We use `maturin` as our build system. 

#### Linux

The easy way to build `maxsim-cpu` from source on Linux is as follows:

```bash
# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm
```

Step by step:
- This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
- It then clones `libxsmm`, on which most of the performance depends, and installs it.
- Installs RUST and enables its environment
- Clones this repository and finally build it

You may modify it and remove any step depending on dependencies already present on your machine.

#### Mac

On Mac, the installation is simplified, assuming you use homebrew:
```bash
# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
maturin build --release -q
```

## Performance

For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), `maxsim-cpu` is always pretty fast thanks to more efficient batching.

### Mac M4 Ultra

![Mac M4 Ultra performance](speedup_comparisons/maxsim_speedup_mac.png)


### Linux AMD EPYC

#### 32 core limit performance

![Linux AMD EPYC 32 core performance](speedup_comparisons/maxsim_speedup_32cores.png)

#### 16 core limit performance

*It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.*

![Linux AMD EPYC 16 core performance](speedup_comparisons/maxsim_speedup_16cores.png)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "maxsim-cpu",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "maxsim, similarity, ranking, cpu, simd, performance",
    "author": "Benjamin Clavi\u00e9, Mixedbread Team",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# maxsim-cpu\n\n`maxsim-cpu` is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.\n\nIt is a python library written in Rust and powered by `libsxmm` on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.\n\n`maxsim-cpu` is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.\n\nIt also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.\n\n## Getting Started\n\nPre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:\n\n```bash\nuv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!\n```\n\nOnce installed, the simple API exposes two methods. For uniform-length inputs, you may use:\n\n```python\nimport numpy as np\nimport maxsim_cpu\n\n# Prepare normalized embeddings\nquery = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]\n\n# NOTE: maxsim-cpu expects normalized vectors.\nquery /= np.linalg.norm(query, axis=1, keepdims=True)\n\ndocs = np.random.randn(1000, 512, 128).astype(np.float32)  # [num_docs, doc_len, dim]\n# Normalize document embeddings...\n\n# Compute MaxSim scores\nscores = maxsim_cpu.maxsim_scores(query, docs)  # Returns [num_docs] scores\n```\n\nFor variable length inputs, you should use the alternate `maxsim_scores_variable`:\n\n```python\nimport numpy as np\nimport maxsim_cpu\n\n# Prepare normalized embeddings\nquery = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]\n\n# NOTE: maxsim-cpu expects normalized vectors.\nquery /= np.linalg.norm(query, axis=1, keepdims=True)\n\n# Create variable-length documents as a list\ndocs = [\n    np.random.randn(np.random.randint(50, 800), 128).astype(np.float32)  # Variable length docs\n    for _ in range(1000)\n]\n# Normalize document embeddings...\n\n# Compute MaxSim scores\nscores = maxsim_cpu.maxsim_scores_variable(query, docs)  # Returns [num_docs] scores\n```\n\n## Platform Requirements\n\n- **macOS**: Apple Silicon (M1+)\n- **Linux**: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)\n\nWe currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!\n\n## Building\n\nWe use `maturin` as our build system. \n\n#### Linux\n\nThe easy way to build `maxsim-cpu` from source on Linux is as follows:\n\n```bash\n# Install necessary system deps\napt-get install libssl-dev libopenblas-dev -y\napt-get install pkg-config -y\n# Install tooling\nuv pip install maturin patchelf numpy\n# Install libxsmm\ngit@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make\n# Install Rust\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n. \"$HOME/.cargo/env\"\n# Clone and install maxsim-cpu\ngit clone git@github.com:mixedbread-ai/maxsim-cpu.git\ncd maxsim-cpu\nRUSTFLAGS=\"-L native=$(pwd)/../libxsmm/lib\" maturin build --release --features use-libxsmm\n```\n\nStep by step:\n- This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.\n- It then clones `libxsmm`, on which most of the performance depends, and installs it.\n- Installs RUST and enables its environment\n- Clones this repository and finally build it\n\nYou may modify it and remove any step depending on dependencies already present on your machine.\n\n#### Mac\n\nOn Mac, the installation is simplified, assuming you use homebrew:\n```bash\n# Install maturin\nuv pip install maturin\n# Install patchelf\nbrew install patchelf\n# Install Rust\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n. \"$HOME/.cargo/env\"\n# Clone and install maxsim-cpu\ngit clone git@github.com:mixedbread-ai/maxsim-cpu.git\ncd maxsim-cpu\nmaturin build --release -q\n```\n\n## Performance\n\nFor documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), `maxsim-cpu` is always pretty fast thanks to more efficient batching.\n\n### Mac M4 Ultra\n\n![Mac M4 Ultra performance](speedup_comparisons/maxsim_speedup_mac.png)\n\n\n### Linux AMD EPYC\n\n#### 32 core limit performance\n\n![Linux AMD EPYC 32 core performance](speedup_comparisons/maxsim_speedup_32cores.png)\n\n#### 16 core limit performance\n\n*It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.*\n\n![Linux AMD EPYC 16 core performance](speedup_comparisons/maxsim_speedup_16cores.png)\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Fast CPU implementation of MaxSim scoring",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/mixedbread-ai/maxsim-cpu/issues",
        "Homepage": "https://github.com/mixedbread-ai/maxsim-cpu"
    },
    "split_keywords": [
        "maxsim",
        " similarity",
        " ranking",
        " cpu",
        " simd",
        " performance"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aa6ac21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d",
                "md5": "8f37ec3efd4477255b9e2e6dd41c718c",
                "sha256": "ebbdc8e7a7093dcdea4ff0f5f15f6aec430356db989f9ffd5f3d7eedac162dc2"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "8f37ec3efd4477255b9e2e6dd41c718c",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8",
            "size": 216873,
            "upload_time": "2025-07-10T11:14:39",
            "upload_time_iso_8601": "2025-07-10T11:14:39.857835Z",
            "url": "https://files.pythonhosted.org/packages/aa/6a/c21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d/maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ed297225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895",
                "md5": "7f976211f4aa3876c2f11575cc10e390",
                "sha256": "9ff45504832f4982ae78a5433d62a38ae73059a23275dadddb766fb8662c87da"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7f976211f4aa3876c2f11575cc10e390",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8",
            "size": 1152943,
            "upload_time": "2025-07-10T11:14:41",
            "upload_time_iso_8601": "2025-07-10T11:14:41.225692Z",
            "url": "https://files.pythonhosted.org/packages/ed/29/7225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895/maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9d01afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f",
                "md5": "0884002e993bbd7803d5819a9ab138d2",
                "sha256": "c2e35bfc618aaa36e232aba94580d9a3e94f4bf2e4a3c8f98d1d96aaabc31741"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "0884002e993bbd7803d5819a9ab138d2",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.8",
            "size": 216873,
            "upload_time": "2025-07-10T11:14:42",
            "upload_time_iso_8601": "2025-07-10T11:14:42.786011Z",
            "url": "https://files.pythonhosted.org/packages/9d/01/afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f/maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0446151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256",
                "md5": "68e352095f2d35767ef2d45dd59782fb",
                "sha256": "374eedf3e2682ce3fd8a3642b43e75d3f1d9622a3c1ddd978565001d913e8562"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "68e352095f2d35767ef2d45dd59782fb",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.8",
            "size": 1152944,
            "upload_time": "2025-07-10T11:14:43",
            "upload_time_iso_8601": "2025-07-10T11:14:43.803589Z",
            "url": "https://files.pythonhosted.org/packages/04/46/151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256/maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "056e0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd",
                "md5": "e143815f7bb3ba7d39fd74fd96659ba0",
                "sha256": "8f6860e0f814aa207db202076909409ad2eb73a8de9bf2032ae3335466d2a478"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "e143815f7bb3ba7d39fd74fd96659ba0",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.8",
            "size": 216876,
            "upload_time": "2025-07-10T11:14:45",
            "upload_time_iso_8601": "2025-07-10T11:14:45.067212Z",
            "url": "https://files.pythonhosted.org/packages/05/6e/0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd/maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "32d25dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866",
                "md5": "d5b4a767526edf855ba873119d30f994",
                "sha256": "28b1471a2a80204b35ea5852c0971f59880555e591364553e58ff4dc386e1143"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d5b4a767526edf855ba873119d30f994",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.8",
            "size": 1152943,
            "upload_time": "2025-07-10T11:14:46",
            "upload_time_iso_8601": "2025-07-10T11:14:46.357491Z",
            "url": "https://files.pythonhosted.org/packages/32/d2/5dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866/maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9c05e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db",
                "md5": "eb9067e1fb37fcb6f5f12f701092de90",
                "sha256": "db99d8a23da0e59bdfb765ff9a33af394f9f80144f5e6d0a999adea8f16e10a2"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "eb9067e1fb37fcb6f5f12f701092de90",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.8",
            "size": 216875,
            "upload_time": "2025-07-10T11:14:47",
            "upload_time_iso_8601": "2025-07-10T11:14:47.599920Z",
            "url": "https://files.pythonhosted.org/packages/9c/05/e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db/maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "70dfe1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f",
                "md5": "3d73c951077911896d1360aea694c335",
                "sha256": "1dcb010b1fa62d89807a13c995e7aebe8bf0307da155b4ad7b622a26e9dd3498"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3d73c951077911896d1360aea694c335",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.8",
            "size": 1152944,
            "upload_time": "2025-07-10T11:14:48",
            "upload_time_iso_8601": "2025-07-10T11:14:48.848739Z",
            "url": "https://files.pythonhosted.org/packages/70/df/e1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f/maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "464f31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e",
                "md5": "2c93f6e89ece714a43a2aff57456e424",
                "sha256": "bd638a191518e0e32bb08793476184aa96c54304a6a344f5ee2182d30e2dd38c"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "2c93f6e89ece714a43a2aff57456e424",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.8",
            "size": 217100,
            "upload_time": "2025-07-10T11:14:49",
            "upload_time_iso_8601": "2025-07-10T11:14:49.834460Z",
            "url": "https://files.pythonhosted.org/packages/46/4f/31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e/maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1fb3afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df",
                "md5": "b7f3e8f2b958d23ffa380bfc2633b2a8",
                "sha256": "34807dc3c631f90139c49dccdb3baf4ffa62f064688bcf8eb961d77580785460"
            },
            "downloads": -1,
            "filename": "maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b7f3e8f2b958d23ffa380bfc2633b2a8",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.8",
            "size": 1153234,
            "upload_time": "2025-07-10T11:14:51",
            "upload_time_iso_8601": "2025-07-10T11:14:51.076443Z",
            "url": "https://files.pythonhosted.org/packages/1f/b3/afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df/maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 11:14:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mixedbread-ai",
    "github_project": "maxsim-cpu",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "maxsim-cpu"
}
        
Elapsed time: 0.43589s