# minivan
![Tests](https://github.com/aismlv/minivan/actions/workflows/test_and_lint.yml/badge.svg)
[![codecov](https://codecov.io/gh/aismlv/minivan/branch/main/graph/badge.svg?token=5J503UR8O7)](https://codecov.io/gh/aismlv/minivan)
[![PyPI version](https://badge.fury.io/py/minivan-tools.svg?)](https://pypi.org/project/minivan-tools/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
`minivan` is an exact nearest neighbor search Python library for those times when "approximate" just won't cut it (or is simply overkill).
## Installation
Install `minivan` using `pip`:
```bash
pip install minivan-tools
```
## Usage
Create new index:
```python
from minivan import Index
import numpy as np
# Create an index with 128-dimensional embeddings and dot product metric
index = Index(dim=128, metric="dot_product")
# Add embeddings to the index
embeddings = [np.random.rand(128) for _ in range(3)]
index.add_items([1, 2, 3], embeddings)
# Delete embeddings from the index
index.delete_items([3])
```
Query the index for the nearest neighbor:
```python
query_embedding = np.random.rand(128)
result = index.query(query_embedding, k=1)
print(result) # Returns [(index, similarity)] of the nearest neighbor
```
Save the index for future use:
```python
# Save to disk
index.save(filepath)
# Load from a saved file
new_index = Index.from_file(filepath)
```
## matmul vs ANN
Due to numpy's use of BLAS and other optimisations, brute-force search is performant enough for a large set of real-world applications. There are a bunch of cases when you might not need an approximate nearest neighbour library and can go with a simpler approach:
- Your document set is not in the multiple millions
- You're in the experimentation phase and want to iterate on the index rapidly with fast build times
- Your application requires the best accuracy
- You want to avoid the need to finetune hyperparameters (which can affect [performance and latency](https://github.com/erikbern/ann-benchmarks) quite a lot)
See a [quick benchmark](https://github.com/aismlv/minivan/blob/main/experiments/benchmark/README.md) for an illustration.
Raw data
{
"_id": null,
"home_page": "https://github.com/aismlv/minivan",
"name": "minivan-tools",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<3.12",
"maintainer_email": "",
"keywords": "nearest neighbor search",
"author": "aismlv",
"author_email": "adilzhan.ismailov@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cb/d2/2fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344/minivan_tools-0.1.5.tar.gz",
"platform": null,
"description": "# minivan\n\n![Tests](https://github.com/aismlv/minivan/actions/workflows/test_and_lint.yml/badge.svg)\n[![codecov](https://codecov.io/gh/aismlv/minivan/branch/main/graph/badge.svg?token=5J503UR8O7)](https://codecov.io/gh/aismlv/minivan)\n[![PyPI version](https://badge.fury.io/py/minivan-tools.svg?)](https://pypi.org/project/minivan-tools/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n`minivan` is an exact nearest neighbor search Python library for those times when \"approximate\" just won't cut it (or is simply overkill).\n\n## Installation\n\nInstall `minivan` using `pip`:\n\n```bash\npip install minivan-tools\n```\n\n## Usage\nCreate new index:\n```python\nfrom minivan import Index\nimport numpy as np\n\n# Create an index with 128-dimensional embeddings and dot product metric\nindex = Index(dim=128, metric=\"dot_product\")\n\n# Add embeddings to the index\nembeddings = [np.random.rand(128) for _ in range(3)]\nindex.add_items([1, 2, 3], embeddings)\n\n# Delete embeddings from the index\nindex.delete_items([3])\n```\n\nQuery the index for the nearest neighbor:\n```python\nquery_embedding = np.random.rand(128)\nresult = index.query(query_embedding, k=1)\n\nprint(result) # Returns [(index, similarity)] of the nearest neighbor\n```\n\nSave the index for future use:\n```python\n# Save to disk\nindex.save(filepath)\n\n# Load from a saved file\nnew_index = Index.from_file(filepath)\n```\n\n## matmul vs ANN\n\nDue to numpy's use of BLAS and other optimisations, brute-force search is performant enough for a large set of real-world applications. There are a bunch of cases when you might not need an approximate nearest neighbour library and can go with a simpler approach:\n\n- Your document set is not in the multiple millions\n- You're in the experimentation phase and want to iterate on the index rapidly with fast build times\n- Your application requires the best accuracy\n- You want to avoid the need to finetune hyperparameters (which can affect [performance and latency](https://github.com/erikbern/ann-benchmarks) quite a lot)\n\nSee a [quick benchmark](https://github.com/aismlv/minivan/blob/main/experiments/benchmark/README.md) for an illustration.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Exact nearest neighbor search library for those times when \"approximate\" just won't cut it (or is simply overkill)",
"version": "0.1.5",
"project_urls": {
"Homepage": "https://github.com/aismlv/minivan",
"Repository": "https://github.com/aismlv/minivan"
},
"split_keywords": [
"nearest",
"neighbor",
"search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "34ce1f116c36656031c2bf1ed8a5d8bf59072aac1f052a22311dc126f53ba612",
"md5": "e4d155be63f98348c20af0fa60a86d74",
"sha256": "504884d2d4ac9422d21ffad034693c91c74806842a1e8022ebd4347fc1fc0587"
},
"downloads": -1,
"filename": "minivan_tools-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e4d155be63f98348c20af0fa60a86d74",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<3.12",
"size": 4676,
"upload_time": "2023-05-10T22:53:55",
"upload_time_iso_8601": "2023-05-10T22:53:55.063664Z",
"url": "https://files.pythonhosted.org/packages/34/ce/1f116c36656031c2bf1ed8a5d8bf59072aac1f052a22311dc126f53ba612/minivan_tools-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cbd22fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344",
"md5": "4a0bc121628e355f932dee1dc35487ef",
"sha256": "7f50b86257569fe667babd03f88b0c9066e148b13debe65c6f15676ac97af0be"
},
"downloads": -1,
"filename": "minivan_tools-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "4a0bc121628e355f932dee1dc35487ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<3.12",
"size": 4154,
"upload_time": "2023-05-10T22:53:56",
"upload_time_iso_8601": "2023-05-10T22:53:56.766890Z",
"url": "https://files.pythonhosted.org/packages/cb/d2/2fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344/minivan_tools-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-10 22:53:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aismlv",
"github_project": "minivan",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "minivan-tools"
}