# maxsim-cpu
`maxsim-cpu` is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.
It is a python library written in Rust and powered by `libsxmm` on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.
`maxsim-cpu` is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.
It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.
## Getting Started
Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:
```bash
uv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!
```
Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:
```python
import numpy as np
import maxsim_cpu
# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]
# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)
docs = np.random.randn(1000, 512, 128).astype(np.float32) # [num_docs, doc_len, dim]
# Normalize document embeddings...
# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores(query, docs) # Returns [num_docs] scores
```
For variable length inputs, you should use the alternate `maxsim_scores_variable`:
```python
import numpy as np
import maxsim_cpu
# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]
# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)
# Create variable-length documents as a list
docs = [
np.random.randn(np.random.randint(50, 800), 128).astype(np.float32) # Variable length docs
for _ in range(1000)
]
# Normalize document embeddings...
# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores_variable(query, docs) # Returns [num_docs] scores
```
## Platform Requirements
- **macOS**: Apple Silicon (M1+)
- **Linux**: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)
We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!
## Building
We use `maturin` as our build system.
#### Linux
The easy way to build `maxsim-cpu` from source on Linux is as follows:
```bash
# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm
```
Step by step:
- This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
- It then clones `libxsmm`, on which most of the performance depends, and installs it.
- Installs RUST and enables its environment
- Clones this repository and finally build it
You may modify it and remove any step depending on dependencies already present on your machine.
#### Mac
On Mac, the installation is simplified, assuming you use homebrew:
```bash
# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
maturin build --release -q
```
## Performance
For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), `maxsim-cpu` is always pretty fast thanks to more efficient batching.
### Mac M4 Ultra

### Linux AMD EPYC
#### 32 core limit performance

#### 16 core limit performance
*It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.*

Raw data
{
"_id": null,
"home_page": null,
"name": "maxsim-cpu",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "maxsim, similarity, ranking, cpu, simd, performance",
"author": "Benjamin Clavi\u00e9, Mixedbread Team",
"author_email": null,
"download_url": null,
"platform": null,
"description": "# maxsim-cpu\n\n`maxsim-cpu` is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.\n\nIt is a python library written in Rust and powered by `libsxmm` on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.\n\n`maxsim-cpu` is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.\n\nIt also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.\n\n## Getting Started\n\nPre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:\n\n```bash\nuv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!\n```\n\nOnce installed, the simple API exposes two methods. For uniform-length inputs, you may use:\n\n```python\nimport numpy as np\nimport maxsim_cpu\n\n# Prepare normalized embeddings\nquery = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]\n\n# NOTE: maxsim-cpu expects normalized vectors.\nquery /= np.linalg.norm(query, axis=1, keepdims=True)\n\ndocs = np.random.randn(1000, 512, 128).astype(np.float32) # [num_docs, doc_len, dim]\n# Normalize document embeddings...\n\n# Compute MaxSim scores\nscores = maxsim_cpu.maxsim_scores(query, docs) # Returns [num_docs] scores\n```\n\nFor variable length inputs, you should use the alternate `maxsim_scores_variable`:\n\n```python\nimport numpy as np\nimport maxsim_cpu\n\n# Prepare normalized embeddings\nquery = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]\n\n# NOTE: maxsim-cpu expects normalized vectors.\nquery /= np.linalg.norm(query, axis=1, keepdims=True)\n\n# Create variable-length documents as a list\ndocs = [\n np.random.randn(np.random.randint(50, 800), 128).astype(np.float32) # Variable length docs\n for _ in range(1000)\n]\n# Normalize document embeddings...\n\n# Compute MaxSim scores\nscores = maxsim_cpu.maxsim_scores_variable(query, docs) # Returns [num_docs] scores\n```\n\n## Platform Requirements\n\n- **macOS**: Apple Silicon (M1+)\n- **Linux**: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)\n\nWe currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!\n\n## Building\n\nWe use `maturin` as our build system. \n\n#### Linux\n\nThe easy way to build `maxsim-cpu` from source on Linux is as follows:\n\n```bash\n# Install necessary system deps\napt-get install libssl-dev libopenblas-dev -y\napt-get install pkg-config -y\n# Install tooling\nuv pip install maturin patchelf numpy\n# Install libxsmm\ngit@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make\n# Install Rust\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n. \"$HOME/.cargo/env\"\n# Clone and install maxsim-cpu\ngit clone git@github.com:mixedbread-ai/maxsim-cpu.git\ncd maxsim-cpu\nRUSTFLAGS=\"-L native=$(pwd)/../libxsmm/lib\" maturin build --release --features use-libxsmm\n```\n\nStep by step:\n- This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.\n- It then clones `libxsmm`, on which most of the performance depends, and installs it.\n- Installs RUST and enables its environment\n- Clones this repository and finally build it\n\nYou may modify it and remove any step depending on dependencies already present on your machine.\n\n#### Mac\n\nOn Mac, the installation is simplified, assuming you use homebrew:\n```bash\n# Install maturin\nuv pip install maturin\n# Install patchelf\nbrew install patchelf\n# Install Rust\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n. \"$HOME/.cargo/env\"\n# Clone and install maxsim-cpu\ngit clone git@github.com:mixedbread-ai/maxsim-cpu.git\ncd maxsim-cpu\nmaturin build --release -q\n```\n\n## Performance\n\nFor documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), `maxsim-cpu` is always pretty fast thanks to more efficient batching.\n\n### Mac M4 Ultra\n\n\n\n\n### Linux AMD EPYC\n\n#### 32 core limit performance\n\n\n\n#### 16 core limit performance\n\n*It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.*\n\n\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Fast CPU implementation of MaxSim scoring",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/mixedbread-ai/maxsim-cpu/issues",
"Homepage": "https://github.com/mixedbread-ai/maxsim-cpu"
},
"split_keywords": [
"maxsim",
" similarity",
" ranking",
" cpu",
" simd",
" performance"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "aa6ac21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d",
"md5": "8f37ec3efd4477255b9e2e6dd41c718c",
"sha256": "ebbdc8e7a7093dcdea4ff0f5f15f6aec430356db989f9ffd5f3d7eedac162dc2"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "8f37ec3efd4477255b9e2e6dd41c718c",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.8",
"size": 216873,
"upload_time": "2025-07-10T11:14:39",
"upload_time_iso_8601": "2025-07-10T11:14:39.857835Z",
"url": "https://files.pythonhosted.org/packages/aa/6a/c21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d/maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ed297225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895",
"md5": "7f976211f4aa3876c2f11575cc10e390",
"sha256": "9ff45504832f4982ae78a5433d62a38ae73059a23275dadddb766fb8662c87da"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "7f976211f4aa3876c2f11575cc10e390",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.8",
"size": 1152943,
"upload_time": "2025-07-10T11:14:41",
"upload_time_iso_8601": "2025-07-10T11:14:41.225692Z",
"url": "https://files.pythonhosted.org/packages/ed/29/7225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895/maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9d01afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f",
"md5": "0884002e993bbd7803d5819a9ab138d2",
"sha256": "c2e35bfc618aaa36e232aba94580d9a3e94f4bf2e4a3c8f98d1d96aaabc31741"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "0884002e993bbd7803d5819a9ab138d2",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.8",
"size": 216873,
"upload_time": "2025-07-10T11:14:42",
"upload_time_iso_8601": "2025-07-10T11:14:42.786011Z",
"url": "https://files.pythonhosted.org/packages/9d/01/afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f/maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0446151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256",
"md5": "68e352095f2d35767ef2d45dd59782fb",
"sha256": "374eedf3e2682ce3fd8a3642b43e75d3f1d9622a3c1ddd978565001d913e8562"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "68e352095f2d35767ef2d45dd59782fb",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.8",
"size": 1152944,
"upload_time": "2025-07-10T11:14:43",
"upload_time_iso_8601": "2025-07-10T11:14:43.803589Z",
"url": "https://files.pythonhosted.org/packages/04/46/151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256/maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "056e0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd",
"md5": "e143815f7bb3ba7d39fd74fd96659ba0",
"sha256": "8f6860e0f814aa207db202076909409ad2eb73a8de9bf2032ae3335466d2a478"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "e143815f7bb3ba7d39fd74fd96659ba0",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.8",
"size": 216876,
"upload_time": "2025-07-10T11:14:45",
"upload_time_iso_8601": "2025-07-10T11:14:45.067212Z",
"url": "https://files.pythonhosted.org/packages/05/6e/0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd/maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "32d25dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866",
"md5": "d5b4a767526edf855ba873119d30f994",
"sha256": "28b1471a2a80204b35ea5852c0971f59880555e591364553e58ff4dc386e1143"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "d5b4a767526edf855ba873119d30f994",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.8",
"size": 1152943,
"upload_time": "2025-07-10T11:14:46",
"upload_time_iso_8601": "2025-07-10T11:14:46.357491Z",
"url": "https://files.pythonhosted.org/packages/32/d2/5dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866/maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9c05e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db",
"md5": "eb9067e1fb37fcb6f5f12f701092de90",
"sha256": "db99d8a23da0e59bdfb765ff9a33af394f9f80144f5e6d0a999adea8f16e10a2"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "eb9067e1fb37fcb6f5f12f701092de90",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.8",
"size": 216875,
"upload_time": "2025-07-10T11:14:47",
"upload_time_iso_8601": "2025-07-10T11:14:47.599920Z",
"url": "https://files.pythonhosted.org/packages/9c/05/e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db/maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "70dfe1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f",
"md5": "3d73c951077911896d1360aea694c335",
"sha256": "1dcb010b1fa62d89807a13c995e7aebe8bf0307da155b4ad7b622a26e9dd3498"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "3d73c951077911896d1360aea694c335",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.8",
"size": 1152944,
"upload_time": "2025-07-10T11:14:48",
"upload_time_iso_8601": "2025-07-10T11:14:48.848739Z",
"url": "https://files.pythonhosted.org/packages/70/df/e1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f/maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "464f31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e",
"md5": "2c93f6e89ece714a43a2aff57456e424",
"sha256": "bd638a191518e0e32bb08793476184aa96c54304a6a344f5ee2182d30e2dd38c"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "2c93f6e89ece714a43a2aff57456e424",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.8",
"size": 217100,
"upload_time": "2025-07-10T11:14:49",
"upload_time_iso_8601": "2025-07-10T11:14:49.834460Z",
"url": "https://files.pythonhosted.org/packages/46/4f/31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e/maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1fb3afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df",
"md5": "b7f3e8f2b958d23ffa380bfc2633b2a8",
"sha256": "34807dc3c631f90139c49dccdb3baf4ffa62f064688bcf8eb961d77580785460"
},
"downloads": -1,
"filename": "maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "b7f3e8f2b958d23ffa380bfc2633b2a8",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.8",
"size": 1153234,
"upload_time": "2025-07-10T11:14:51",
"upload_time_iso_8601": "2025-07-10T11:14:51.076443Z",
"url": "https://files.pythonhosted.org/packages/1f/b3/afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df/maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 11:14:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mixedbread-ai",
"github_project": "maxsim-cpu",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "maxsim-cpu"
}