fastannoy


Namefastannoy JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/QunBB/fastannoy
SummaryFaster version of Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.
upload_time2024-08-14 13:19:53
maintainerNone
docs_urlNone
authorQun
requires_python>=2.7
licenseApache License 2.0
keywords nns approximate nearest neighbor search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
   GitHub repository <https://github.com/QunBB/fastannoy>

# FastAnnoy

This library is a [pybind11](https://github.com/pybind/pybind11) port of [spotify/annoy](https://github.com/spotify/annoy).

# Installation 

To install, just do `pip install fastannoy` to pull down from [PyPI](https://pypi.python.org/pypi/fastannoy).

## Install from source code

- clone this repository
- `pip install ./fastannoy`

# Backgroud

First of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.

However, it's written in pybind11 for python interface, and discovered better performance.

# Usage

All basic interfaces is same as [spotify/annoy](https://github.com/spotify/annoy?tab=readme-ov-file#full-python-api).

```python
from fastannoy import AnnoyIndex
import random

f = 40  # Length of item vector that will be indexed

t = AnnoyIndex(f, 'angular')
for i in range(1000):
    v = [random.gauss(0, 1) for _ in range(f)]
    t.add_item(i, v)

t.build(10) # 10 trees
t.save('test.ann')

# ...

u = AnnoyIndex(f, 'angular')
u.load('test.ann') # super fast, will just mmap the file
print(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors
"""
[0, 17, 389, 90, 363, 482, ...]
"""

print(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector
"""
[378, 664, 296, 409, 14, 618]
"""
```

## Batch Search

Corresponding to `get_nns_by_item`, the batch search version is `get_batch_nns_by_items`. The first argument should be a list of int.

In the same way, corresponding to `get_nns_by_vector`, the batch search version is `get_batch_nns_by_vectors`. The first argument should be a list of list[int].

And the batch search's implement supports multiple threads. You can set the argument `n_threads`, the default is 1.

```python
# will find the 100 nearest neighbors

print(u.get_batch_nns_by_items([0, 1, 2], 100))
"""
[[0, 146, 858, 64, 833, 350, 70, ...], 
[1, 205, 48, 396, 382, 149, 305, 125, ...], 
[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]
"""

print(u.get_batch_nns_by_vectors([
    [random.gauss(0, 1) for _ in range(f)]
    for _ in range(3)
], 100))
"""
[[862, 604, 495, 638, 3, 246, 778, 486, ...], 
[260, 722, 215, 709, 49, 248, 539, 126, 8, ...], 
[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]
"""
```

# Benchmark

The results are running in my macbook with the [test script](https://github.com/QunBB/fastannoy/blob/main/examples/performance_test.py), so focus on time consumption relatively between fastannoy and annoy.

|                                                    | fastannoy      | annoy          |
| -------------------------------------------------- | -------------- | -------------- |
| **50W items with 128 dimension**                   |                |                |
| - build+add_item                                   | 13.810 seconds | 19.633 seconds |
| - 5W times search                                  | 20.613 seconds | 39.760 seconds |
| - 5k times search with 10 batch size and 5 threads | 6.542 seconds  | /              |




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/QunBB/fastannoy",
    "name": "fastannoy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=2.7",
    "maintainer_email": null,
    "keywords": "nns, approximate nearest neighbor search",
    "author": "Qun",
    "author_email": "myqun20190810@163.com",
    "download_url": "https://files.pythonhosted.org/packages/75/dc/6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f/fastannoy-1.1.1.tar.gz",
    "platform": null,
    "description": "\n   GitHub repository <https://github.com/QunBB/fastannoy>\n\n# FastAnnoy\n\nThis library is a [pybind11](https://github.com/pybind/pybind11) port of [spotify/annoy](https://github.com/spotify/annoy).\n\n# Installation \n\nTo install, just do `pip install fastannoy` to pull down from [PyPI](https://pypi.python.org/pypi/fastannoy).\n\n## Install from source code\n\n- clone this repository\n- `pip install ./fastannoy`\n\n# Backgroud\n\nFirst of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.\n\nHowever, it's written in pybind11 for python interface, and discovered better performance.\n\n# Usage\n\nAll basic interfaces is same as [spotify/annoy](https://github.com/spotify/annoy?tab=readme-ov-file#full-python-api).\n\n```python\nfrom fastannoy import AnnoyIndex\nimport random\n\nf = 40  # Length of item vector that will be indexed\n\nt = AnnoyIndex(f, 'angular')\nfor i in range(1000):\n    v = [random.gauss(0, 1) for _ in range(f)]\n    t.add_item(i, v)\n\nt.build(10) # 10 trees\nt.save('test.ann')\n\n# ...\n\nu = AnnoyIndex(f, 'angular')\nu.load('test.ann') # super fast, will just mmap the file\nprint(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors\n\"\"\"\n[0, 17, 389, 90, 363, 482, ...]\n\"\"\"\n\nprint(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector\n\"\"\"\n[378, 664, 296, 409, 14, 618]\n\"\"\"\n```\n\n## Batch Search\n\nCorresponding to `get_nns_by_item`, the batch search version is `get_batch_nns_by_items`. The first argument should be a list of int.\n\nIn the same way, corresponding to `get_nns_by_vector`, the batch search version is `get_batch_nns_by_vectors`. The first argument should be a list of list[int].\n\nAnd the batch search's implement supports multiple threads. You can set the argument `n_threads`, the default is 1.\n\n```python\n# will find the 100 nearest neighbors\n\nprint(u.get_batch_nns_by_items([0, 1, 2], 100))\n\"\"\"\n[[0, 146, 858, 64, 833, 350, 70, ...], \n[1, 205, 48, 396, 382, 149, 305, 125, ...], \n[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]\n\"\"\"\n\nprint(u.get_batch_nns_by_vectors([\n    [random.gauss(0, 1) for _ in range(f)]\n    for _ in range(3)\n], 100))\n\"\"\"\n[[862, 604, 495, 638, 3, 246, 778, 486, ...], \n[260, 722, 215, 709, 49, 248, 539, 126, 8, ...], \n[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]\n\"\"\"\n```\n\n# Benchmark\n\nThe results are running in my macbook with the [test script](https://github.com/QunBB/fastannoy/blob/main/examples/performance_test.py), so focus on time consumption relatively between fastannoy and annoy.\n\n|                                                    | fastannoy      | annoy          |\n| -------------------------------------------------- | -------------- | -------------- |\n| **50W items with 128 dimension**                   |                |                |\n| - build+add_item                                   | 13.810 seconds | 19.633 seconds |\n| - 5W times search                                  | 20.613 seconds | 39.760 seconds |\n| - 5k times search with 10 batch size and 5 threads | 6.542 seconds  | /              |\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Faster version of Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/QunBB/fastannoy"
    },
    "split_keywords": [
        "nns",
        " approximate nearest neighbor search"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e74bf946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b",
                "md5": "cfb7528a3b33d313ef006f04f47bf23b",
                "sha256": "26f4dd3775a024bde8f4a3f49a45696871cd197e4615c54ec6d6f0a3ddc96f1c"
            },
            "downloads": -1,
            "filename": "fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "cfb7528a3b33d313ef006f04f47bf23b",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=2.7",
            "size": 147152,
            "upload_time": "2024-08-14T13:19:51",
            "upload_time_iso_8601": "2024-08-14T13:19:51.823302Z",
            "url": "https://files.pythonhosted.org/packages/e7/4b/f946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b/fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "75dc6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f",
                "md5": "a90f368a1dd24f3dca398cf250d5e318",
                "sha256": "0848b21d697748cbac167103ac0d57985f370b4dbd6f2875c650f92502063f76"
            },
            "downloads": -1,
            "filename": "fastannoy-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a90f368a1dd24f3dca398cf250d5e318",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 26044,
            "upload_time": "2024-08-14T13:19:53",
            "upload_time_iso_8601": "2024-08-14T13:19:53.162446Z",
            "url": "https://files.pythonhosted.org/packages/75/dc/6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f/fastannoy-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-14 13:19:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "QunBB",
    "github_project": "fastannoy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fastannoy"
}
        
Qun
Elapsed time: 0.46202s