GitHub repository <https://github.com/QunBB/fastannoy>
# FastAnnoy
This library is a [pybind11](https://github.com/pybind/pybind11) port of [spotify/annoy](https://github.com/spotify/annoy).
# Installation
To install, just do `pip install fastannoy` to pull down from [PyPI](https://pypi.python.org/pypi/fastannoy).
## Install from source code
- clone this repository
- `pip install ./fastannoy`
# Backgroud
First of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.
However, it's written in pybind11 for python interface, and discovered better performance.
# Usage
All basic interfaces is same as [spotify/annoy](https://github.com/spotify/annoy?tab=readme-ov-file#full-python-api).
```python
from fastannoy import AnnoyIndex
import random
f = 40 # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random.gauss(0, 1) for _ in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
t.save('test.ann')
# ...
u = AnnoyIndex(f, 'angular')
u.load('test.ann') # super fast, will just mmap the file
print(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors
"""
[0, 17, 389, 90, 363, 482, ...]
"""
print(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector
"""
[378, 664, 296, 409, 14, 618]
"""
```
## Batch Search
Corresponding to `get_nns_by_item`, the batch search version is `get_batch_nns_by_items`. The first argument should be a list of int.
In the same way, corresponding to `get_nns_by_vector`, the batch search version is `get_batch_nns_by_vectors`. The first argument should be a list of list[int].
And the batch search's implement supports multiple threads. You can set the argument `n_threads`, the default is 1.
```python
# will find the 100 nearest neighbors
print(u.get_batch_nns_by_items([0, 1, 2], 100))
"""
[[0, 146, 858, 64, 833, 350, 70, ...],
[1, 205, 48, 396, 382, 149, 305, 125, ...],
[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]
"""
print(u.get_batch_nns_by_vectors([
[random.gauss(0, 1) for _ in range(f)]
for _ in range(3)
], 100))
"""
[[862, 604, 495, 638, 3, 246, 778, 486, ...],
[260, 722, 215, 709, 49, 248, 539, 126, 8, ...],
[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]
"""
```
# Benchmark
The results are running in my macbook with the [test script](https://github.com/QunBB/fastannoy/blob/main/examples/performance_test.py), so focus on time consumption relatively between fastannoy and annoy.
| | fastannoy | annoy |
| -------------------------------------------------- | -------------- | -------------- |
| **50W items with 128 dimension** | | |
| - build+add_item | 13.810 seconds | 19.633 seconds |
| - 5W times search | 20.613 seconds | 39.760 seconds |
| - 5k times search with 10 batch size and 5 threads | 6.542 seconds | / |
Raw data
{
"_id": null,
"home_page": "https://github.com/QunBB/fastannoy",
"name": "fastannoy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=2.7",
"maintainer_email": null,
"keywords": "nns, approximate nearest neighbor search",
"author": "Qun",
"author_email": "myqun20190810@163.com",
"download_url": "https://files.pythonhosted.org/packages/75/dc/6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f/fastannoy-1.1.1.tar.gz",
"platform": null,
"description": "\n GitHub repository <https://github.com/QunBB/fastannoy>\n\n# FastAnnoy\n\nThis library is a [pybind11](https://github.com/pybind/pybind11) port of [spotify/annoy](https://github.com/spotify/annoy).\n\n# Installation \n\nTo install, just do `pip install fastannoy` to pull down from [PyPI](https://pypi.python.org/pypi/fastannoy).\n\n## Install from source code\n\n- clone this repository\n- `pip install ./fastannoy`\n\n# Backgroud\n\nFirst of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.\n\nHowever, it's written in pybind11 for python interface, and discovered better performance.\n\n# Usage\n\nAll basic interfaces is same as [spotify/annoy](https://github.com/spotify/annoy?tab=readme-ov-file#full-python-api).\n\n```python\nfrom fastannoy import AnnoyIndex\nimport random\n\nf = 40 # Length of item vector that will be indexed\n\nt = AnnoyIndex(f, 'angular')\nfor i in range(1000):\n v = [random.gauss(0, 1) for _ in range(f)]\n t.add_item(i, v)\n\nt.build(10) # 10 trees\nt.save('test.ann')\n\n# ...\n\nu = AnnoyIndex(f, 'angular')\nu.load('test.ann') # super fast, will just mmap the file\nprint(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors\n\"\"\"\n[0, 17, 389, 90, 363, 482, ...]\n\"\"\"\n\nprint(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector\n\"\"\"\n[378, 664, 296, 409, 14, 618]\n\"\"\"\n```\n\n## Batch Search\n\nCorresponding to `get_nns_by_item`, the batch search version is `get_batch_nns_by_items`. The first argument should be a list of int.\n\nIn the same way, corresponding to `get_nns_by_vector`, the batch search version is `get_batch_nns_by_vectors`. The first argument should be a list of list[int].\n\nAnd the batch search's implement supports multiple threads. You can set the argument `n_threads`, the default is 1.\n\n```python\n# will find the 100 nearest neighbors\n\nprint(u.get_batch_nns_by_items([0, 1, 2], 100))\n\"\"\"\n[[0, 146, 858, 64, 833, 350, 70, ...], \n[1, 205, 48, 396, 382, 149, 305, 125, ...], \n[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]\n\"\"\"\n\nprint(u.get_batch_nns_by_vectors([\n [random.gauss(0, 1) for _ in range(f)]\n for _ in range(3)\n], 100))\n\"\"\"\n[[862, 604, 495, 638, 3, 246, 778, 486, ...], \n[260, 722, 215, 709, 49, 248, 539, 126, 8, ...], \n[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]\n\"\"\"\n```\n\n# Benchmark\n\nThe results are running in my macbook with the [test script](https://github.com/QunBB/fastannoy/blob/main/examples/performance_test.py), so focus on time consumption relatively between fastannoy and annoy.\n\n| | fastannoy | annoy |\n| -------------------------------------------------- | -------------- | -------------- |\n| **50W items with 128 dimension** | | |\n| - build+add_item | 13.810 seconds | 19.633 seconds |\n| - 5W times search | 20.613 seconds | 39.760 seconds |\n| - 5k times search with 10 batch size and 5 threads | 6.542 seconds | / |\n\n\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Faster version of Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/QunBB/fastannoy"
},
"split_keywords": [
"nns",
" approximate nearest neighbor search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e74bf946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b",
"md5": "cfb7528a3b33d313ef006f04f47bf23b",
"sha256": "26f4dd3775a024bde8f4a3f49a45696871cd197e4615c54ec6d6f0a3ddc96f1c"
},
"downloads": -1,
"filename": "fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "cfb7528a3b33d313ef006f04f47bf23b",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=2.7",
"size": 147152,
"upload_time": "2024-08-14T13:19:51",
"upload_time_iso_8601": "2024-08-14T13:19:51.823302Z",
"url": "https://files.pythonhosted.org/packages/e7/4b/f946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b/fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "75dc6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f",
"md5": "a90f368a1dd24f3dca398cf250d5e318",
"sha256": "0848b21d697748cbac167103ac0d57985f370b4dbd6f2875c650f92502063f76"
},
"downloads": -1,
"filename": "fastannoy-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "a90f368a1dd24f3dca398cf250d5e318",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=2.7",
"size": 26044,
"upload_time": "2024-08-14T13:19:53",
"upload_time_iso_8601": "2024-08-14T13:19:53.162446Z",
"url": "https://files.pythonhosted.org/packages/75/dc/6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f/fastannoy-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-14 13:19:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "QunBB",
"github_project": "fastannoy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "fastannoy"
}