minivan-tools


Nameminivan-tools JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/aismlv/minivan
SummaryExact nearest neighbor search library for those times when "approximate" just won't cut it (or is simply overkill)
upload_time2023-05-10 22:53:56
maintainer
docs_urlNone
authoraismlv
requires_python>=3.9,<3.12
licenseApache-2.0
keywords nearest neighbor search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # minivan

![Tests](https://github.com/aismlv/minivan/actions/workflows/test_and_lint.yml/badge.svg)
[![codecov](https://codecov.io/gh/aismlv/minivan/branch/main/graph/badge.svg?token=5J503UR8O7)](https://codecov.io/gh/aismlv/minivan)
[![PyPI version](https://badge.fury.io/py/minivan-tools.svg?)](https://pypi.org/project/minivan-tools/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

`minivan` is an exact nearest neighbor search Python library for those times when "approximate" just won't cut it (or is simply overkill).

## Installation

Install `minivan` using `pip`:

```bash
pip install minivan-tools
```

## Usage
Create new index:
```python
from minivan import Index
import numpy as np

# Create an index with 128-dimensional embeddings and dot product metric
index = Index(dim=128, metric="dot_product")

# Add embeddings to the index
embeddings = [np.random.rand(128) for _ in range(3)]
index.add_items([1, 2, 3], embeddings)

# Delete embeddings from the index
index.delete_items([3])
```

Query the index for the nearest neighbor:
```python
query_embedding = np.random.rand(128)
result = index.query(query_embedding, k=1)

print(result)  # Returns [(index, similarity)] of the nearest neighbor
```

Save the index for future use:
```python
# Save to disk
index.save(filepath)

# Load from a saved file
new_index = Index.from_file(filepath)
```

## matmul vs ANN

Due to numpy's use of BLAS and other optimisations, brute-force search is performant enough for a large set of real-world applications. There are a bunch of cases when you might not need an approximate nearest neighbour library and can go with a simpler approach:

- Your document set is not in the multiple millions
- You're in the experimentation phase and want to iterate on the index rapidly with fast build times
- Your application requires the best accuracy
- You want to avoid the need to finetune hyperparameters (which can affect [performance and latency](https://github.com/erikbern/ann-benchmarks) quite a lot)

See a [quick benchmark](https://github.com/aismlv/minivan/blob/main/experiments/benchmark/README.md) for an illustration.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aismlv/minivan",
    "name": "minivan-tools",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.12",
    "maintainer_email": "",
    "keywords": "nearest neighbor search",
    "author": "aismlv",
    "author_email": "adilzhan.ismailov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cb/d2/2fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344/minivan_tools-0.1.5.tar.gz",
    "platform": null,
    "description": "# minivan\n\n![Tests](https://github.com/aismlv/minivan/actions/workflows/test_and_lint.yml/badge.svg)\n[![codecov](https://codecov.io/gh/aismlv/minivan/branch/main/graph/badge.svg?token=5J503UR8O7)](https://codecov.io/gh/aismlv/minivan)\n[![PyPI version](https://badge.fury.io/py/minivan-tools.svg?)](https://pypi.org/project/minivan-tools/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n`minivan` is an exact nearest neighbor search Python library for those times when \"approximate\" just won't cut it (or is simply overkill).\n\n## Installation\n\nInstall `minivan` using `pip`:\n\n```bash\npip install minivan-tools\n```\n\n## Usage\nCreate new index:\n```python\nfrom minivan import Index\nimport numpy as np\n\n# Create an index with 128-dimensional embeddings and dot product metric\nindex = Index(dim=128, metric=\"dot_product\")\n\n# Add embeddings to the index\nembeddings = [np.random.rand(128) for _ in range(3)]\nindex.add_items([1, 2, 3], embeddings)\n\n# Delete embeddings from the index\nindex.delete_items([3])\n```\n\nQuery the index for the nearest neighbor:\n```python\nquery_embedding = np.random.rand(128)\nresult = index.query(query_embedding, k=1)\n\nprint(result)  # Returns [(index, similarity)] of the nearest neighbor\n```\n\nSave the index for future use:\n```python\n# Save to disk\nindex.save(filepath)\n\n# Load from a saved file\nnew_index = Index.from_file(filepath)\n```\n\n## matmul vs ANN\n\nDue to numpy's use of BLAS and other optimisations, brute-force search is performant enough for a large set of real-world applications. There are a bunch of cases when you might not need an approximate nearest neighbour library and can go with a simpler approach:\n\n- Your document set is not in the multiple millions\n- You're in the experimentation phase and want to iterate on the index rapidly with fast build times\n- Your application requires the best accuracy\n- You want to avoid the need to finetune hyperparameters (which can affect [performance and latency](https://github.com/erikbern/ann-benchmarks) quite a lot)\n\nSee a [quick benchmark](https://github.com/aismlv/minivan/blob/main/experiments/benchmark/README.md) for an illustration.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Exact nearest neighbor search library for those times when \"approximate\" just won't cut it (or is simply overkill)",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://github.com/aismlv/minivan",
        "Repository": "https://github.com/aismlv/minivan"
    },
    "split_keywords": [
        "nearest",
        "neighbor",
        "search"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34ce1f116c36656031c2bf1ed8a5d8bf59072aac1f052a22311dc126f53ba612",
                "md5": "e4d155be63f98348c20af0fa60a86d74",
                "sha256": "504884d2d4ac9422d21ffad034693c91c74806842a1e8022ebd4347fc1fc0587"
            },
            "downloads": -1,
            "filename": "minivan_tools-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e4d155be63f98348c20af0fa60a86d74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.12",
            "size": 4676,
            "upload_time": "2023-05-10T22:53:55",
            "upload_time_iso_8601": "2023-05-10T22:53:55.063664Z",
            "url": "https://files.pythonhosted.org/packages/34/ce/1f116c36656031c2bf1ed8a5d8bf59072aac1f052a22311dc126f53ba612/minivan_tools-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbd22fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344",
                "md5": "4a0bc121628e355f932dee1dc35487ef",
                "sha256": "7f50b86257569fe667babd03f88b0c9066e148b13debe65c6f15676ac97af0be"
            },
            "downloads": -1,
            "filename": "minivan_tools-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "4a0bc121628e355f932dee1dc35487ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.12",
            "size": 4154,
            "upload_time": "2023-05-10T22:53:56",
            "upload_time_iso_8601": "2023-05-10T22:53:56.766890Z",
            "url": "https://files.pythonhosted.org/packages/cb/d2/2fc7281200351d7e6246ef430fce42b34d6554bbc3105e712685f48c8344/minivan_tools-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-10 22:53:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aismlv",
    "github_project": "minivan",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "minivan-tools"
}
        
Elapsed time: 0.07904s