# ANNB: Approximate Nearest Neighbor Benchmark
[](https://pypi.python.org/pypi/annb)
Note: This is a work in progress. The API/CLI is not stable yet.
## Installation
```bash
pip install annb
# install vector search index/client you may need for benchmark
# e.g install faiss for run faiss index benchmark
```
## Usage
### CLI Usage
#### Run Benchmark
##### start first benchmark with a randome dataset.
Just run `annb-test` to start your first benchmark with a random dataset.
```bash
annb-test
```
It will produce a result like this:
```plain
❯ annb-test
... some logs ...
BenchmarkResult:
attributes:
query_args: [{'nprobe': 1}]
topk: 10
jobs: 1
loop: 5
step: 10
name: Test
dataset: .annb_random_d256_l2_1000.hdf5
index: Test
dim: 256
metric_type: MetricType.L2
index_args: {'index': 'ivfflat', 'nlist': 128}
started: 2023-08-14 13:03:40
durations:
training: 1 items, 1000 total, 1490.03266ms
insert: 1 items, 1000 total, 132.439627ms
query:
nprobe=1,recall=0.2173 -> 1000 items, 18.615083ms, 53719.878659686874qps, latency=0.18615083ms, p95=0.31939ms, p99=0.41488ms
```
This is a simple benchmark test with default index(faiss) with random l2 dataset.
If you wants to generate more data or with some different specifications for the dataset, you could see below options:
- --index-dim The dimension of the index, default is 256
- --index-metric-type Index metric type, l2 or ip, default is l2
- --topk TOPK topk used for query, default is 10
- --step STEP the query step, default annb will query 10 items per query, you could set it to 0 for query all items in one query (similar like batch for ann-benchmarks)
- --batch batch mode, alias --step 0
- --count COUNT the total number of items in the dataset, default is 1000
##### run benchmark with a specific dataset
You could also use ann-benchmarks's [dataset](https://github.com/erikbern/ann-benchmarks#data-sets) to run benchmark. download them locally and run benchmark with `--dataset` option.
```bash
annb-test --dataset sift-128-euclidean.hdf5
```
##### run benchmark with query args
You mary benchmark with different query args, e.g. different nprobe for faiss ivfflat index. you could try `--query-args` option.
```bash
annb-test --query-args nprobe=10 --query-args nprobe=20
```
will output below result:
```plain
durations:
training: 1 items, 1000 total, 1548.84968ms
insert: 1 items, 1000 total, 143.402532ms
query:
nprobe=1,recall=0.2173 -> 1000 items, 20.074236ms, 49815.09632545916qps, latency=0.20074235999999998ms, p95=0.332276ms, p99=0.455525ms
nprobe=10,recall=0.5221 -> 1000 items, 49.141931ms, 20349.2207092961qps, latency=0.49141931ms, p95=0.722628ms, p99=0.818012ms
nprobe=20,recall=0.6861 -> 1000 items, 69.284072ms, 14433.331805324606qps, latency=0.69284072ms, p95=1.126946ms, p99=1.350359ms
```
##### run multiple benchmarks with config file
You may run multiple benchmarks with different index and dataset. you could use `--run-file` run benchmarks from a config file.
Below is a example config file:
config.yaml
```yaml
default:
index_factory: annb.anns.faiss.indexes.index_under_test_factory
index_factory_args: {}
index_name: Test
dataset: gist-960-euclidean.hdf5
topk: 10
step: 10
jobs: 1
loop: 2
result: output.pth
runs:
- name: faiss-gist960-gpu-ivfflat
index_args:
gpu: yes
index: ivfflat
nlist: 1024
query_args:
- nprobe: 1
- nprobe: 16
- nprobe: 256
- name: faiss-gist960-gpu-ivfpq8
index_args:
gpu: yes
index: ivfpq
nlist: 1024
query_args:
- nprobe: 1
- nprobe: 16
- nprobe: 256
```
Explanation for above config file:
- The default section is the default config for all benchmarks.
- The config keys are generally same as the options for `annb-test` command. e.g. `index_factory` is same as `--index-factory`.
- You could define multiple benchmarks in `runs` section. and each run config will override the default config. In this example, we define use gist-960-euclidean.hdf5 as dataset, so it will use this dataset for all benchmarks. and we use different index and query args for each benchmark. for index_args, we use ivfflat(nlist=1024) and ivfpq(nlist=1024) as two benchmark series. and for query_args, we use nprobe=1,16,256 for each benchmark. That means we will run 6 benchmarks in total, each series will run 3 benchmarks with different nprobe.
- The result will be saved to output.pth file by default setting. Actually, each benchmark series will save to a separate file. so in this example, we will get two files: `output-1.pth` and `output-2.pth`. you could use `annb-report` to view them.
##### more options
You could use `annb-test --help` to see more options.
```bash
❯ annb-test --help
```
#### Check Benchmark Results
The `annb-report` is use to view benchmark results as plain/csv text, or export them to Chart graphic.
```bash
annb-report --help
```
##### examples for view/export benchmark results
view benchmark results as plain text
```bash
annb-report output.pth
```
view benchmark results as csv text
```bash
annb-report output.pth --format csv
```
export benchmark results to chart graphic(multiple series)
```bash
annb-report output.pth --format png --output output.png output-1.pth output-2.pth
```
Raw data
{
"_id": null,
"home_page": "https://github.com/matrixji/annb",
"name": "annb",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "ANN benchmark,Test tools",
"author": "Ji Bin",
"author_email": "matrixji@live.com",
"download_url": "https://files.pythonhosted.org/packages/21/d8/238cfeadea55fb9160abe3f57dee7767ebbd4659e7461839883cceed3442/annb-0.1.22.tar.gz",
"platform": null,
"description": "# ANNB: Approximate Nearest Neighbor Benchmark\n\n[](https://pypi.python.org/pypi/annb)\n\nNote: This is a work in progress. The API/CLI is not stable yet.\n\n## Installation\n\n```bash\npip install annb\n\n# install vector search index/client you may need for benchmark\n# e.g install faiss for run faiss index benchmark\n```\n\n## Usage\n\n### CLI Usage\n\n#### Run Benchmark\n\n##### start first benchmark with a randome dataset.\n\nJust run `annb-test` to start your first benchmark with a random dataset.\n\n```bash\nannb-test\n```\n\nIt will produce a result like this:\n\n```plain\n\u276f annb-test\n... some logs ...\n\nBenchmarkResult:\n attributes:\n query_args: [{'nprobe': 1}]\n topk: 10\n jobs: 1\n loop: 5\n step: 10\n name: Test\n dataset: .annb_random_d256_l2_1000.hdf5\n index: Test\n dim: 256\n metric_type: MetricType.L2\n index_args: {'index': 'ivfflat', 'nlist': 128}\n started: 2023-08-14 13:03:40\n\n durations:\n training: 1 items, 1000 total, 1490.03266ms\n insert: 1 items, 1000 total, 132.439627ms\n query:\n nprobe=1,recall=0.2173 -> 1000 items, 18.615083ms, 53719.878659686874qps, latency=0.18615083ms, p95=0.31939ms, p99=0.41488ms\n```\n\nThis is a simple benchmark test with default index(faiss) with random l2 dataset.\nIf you wants to generate more data or with some different specifications for the dataset, you could see below options:\n - --index-dim The dimension of the index, default is 256\n - --index-metric-type Index metric type, l2 or ip, default is l2\n - --topk TOPK topk used for query, default is 10\n - --step STEP the query step, default annb will query 10 items per query, you could set it to 0 for query all items in one query (similar like batch for ann-benchmarks)\n - --batch batch mode, alias --step 0\n - --count COUNT the total number of items in the dataset, default is 1000\n\n##### run benchmark with a specific dataset\n\nYou could also use ann-benchmarks's [dataset](https://github.com/erikbern/ann-benchmarks#data-sets) to run benchmark. download them locally and run benchmark with `--dataset` option.\n\n```bash\nannb-test --dataset sift-128-euclidean.hdf5\n```\n\n##### run benchmark with query args\nYou mary benchmark with different query args, e.g. different nprobe for faiss ivfflat index. you could try `--query-args` option.\n\n```bash\nannb-test --query-args nprobe=10 --query-args nprobe=20\n```\n\nwill output below result:\n\n```plain\ndurations:\n training: 1 items, 1000 total, 1548.84968ms\n insert: 1 items, 1000 total, 143.402532ms\n query:\n nprobe=1,recall=0.2173 -> 1000 items, 20.074236ms, 49815.09632545916qps, latency=0.20074235999999998ms, p95=0.332276ms, p99=0.455525ms\n nprobe=10,recall=0.5221 -> 1000 items, 49.141931ms, 20349.2207092961qps, latency=0.49141931ms, p95=0.722628ms, p99=0.818012ms\n nprobe=20,recall=0.6861 -> 1000 items, 69.284072ms, 14433.331805324606qps, latency=0.69284072ms, p95=1.126946ms, p99=1.350359ms\n```\n\n##### run multiple benchmarks with config file\nYou may run multiple benchmarks with different index and dataset. you could use `--run-file` run benchmarks from a config file.\n\nBelow is a example config file:\n\nconfig.yaml\n\n```yaml\ndefault:\n index_factory: annb.anns.faiss.indexes.index_under_test_factory\n index_factory_args: {}\n index_name: Test\n dataset: gist-960-euclidean.hdf5\n topk: 10\n step: 10\n jobs: 1\n loop: 2\n result: output.pth\n\nruns:\n - name: faiss-gist960-gpu-ivfflat\n index_args:\n gpu: yes\n index: ivfflat\n nlist: 1024\n query_args:\n - nprobe: 1\n - nprobe: 16\n - nprobe: 256\n - name: faiss-gist960-gpu-ivfpq8\n index_args:\n gpu: yes\n index: ivfpq\n nlist: 1024\n query_args:\n - nprobe: 1\n - nprobe: 16\n - nprobe: 256\n```\n\nExplanation for above config file:\n- The default section is the default config for all benchmarks.\n- The config keys are generally same as the options for `annb-test` command. e.g. `index_factory` is same as `--index-factory`.\n- You could define multiple benchmarks in `runs` section. and each run config will override the default config. In this example, we define use gist-960-euclidean.hdf5 as dataset, so it will use this dataset for all benchmarks. and we use different index and query args for each benchmark. for index_args, we use ivfflat(nlist=1024) and ivfpq(nlist=1024) as two benchmark series. and for query_args, we use nprobe=1,16,256 for each benchmark. That means we will run 6 benchmarks in total, each series will run 3 benchmarks with different nprobe.\n- The result will be saved to output.pth file by default setting. Actually, each benchmark series will save to a separate file. so in this example, we will get two files: `output-1.pth` and `output-2.pth`. you could use `annb-report` to view them.\n\n\n##### more options\n\nYou could use `annb-test --help` to see more options.\n\n```bash\n\u276f annb-test --help\n```\n\n\n#### Check Benchmark Results\n\nThe `annb-report` is use to view benchmark results as plain/csv text, or export them to Chart graphic.\n\n```bash\nannb-report --help\n```\n\n##### examples for view/export benchmark results\n\nview benchmark results as plain text\n\n```bash\nannb-report output.pth\n```\n\nview benchmark results as csv text\n\n```bash\nannb-report output.pth --format csv\n```\n\nexport benchmark results to chart graphic(multiple series)\n\n```bash\nannb-report output.pth --format png --output output.png output-1.pth output-2.pth\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "A simple ANN benchmark tools",
"version": "0.1.22",
"project_urls": {
"Homepage": "https://github.com/matrixji/annb"
},
"split_keywords": [
"ann benchmark",
"test tools"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b2876ac2a5fe95b75d50e45feb4f232e1e8ea2274f9520d294dd950d7d69dbe0",
"md5": "4bfcc4e55745cf29536064a44da1e782",
"sha256": "7f0fddae555bb880fb8eff8ec68a6b123cf88ebb79bf8f998a73b0a7cf72b803"
},
"downloads": -1,
"filename": "annb-0.1.22-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4bfcc4e55745cf29536064a44da1e782",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 28601,
"upload_time": "2023-12-14T12:07:18",
"upload_time_iso_8601": "2023-12-14T12:07:18.856407Z",
"url": "https://files.pythonhosted.org/packages/b2/87/6ac2a5fe95b75d50e45feb4f232e1e8ea2274f9520d294dd950d7d69dbe0/annb-0.1.22-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "21d8238cfeadea55fb9160abe3f57dee7767ebbd4659e7461839883cceed3442",
"md5": "c1eba54559b5c7b53538ee26cd0feddb",
"sha256": "a097b2ef2167844d5bf8544df0d905fd26e9131ea8728f560ac641ebcc74302a"
},
"downloads": -1,
"filename": "annb-0.1.22.tar.gz",
"has_sig": false,
"md5_digest": "c1eba54559b5c7b53538ee26cd0feddb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 26336,
"upload_time": "2023-12-14T12:07:21",
"upload_time_iso_8601": "2023-12-14T12:07:21.223767Z",
"url": "https://files.pythonhosted.org/packages/21/d8/238cfeadea55fb9160abe3f57dee7767ebbd4659e7461839883cceed3442/annb-0.1.22.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-14 12:07:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "matrixji",
"github_project": "annb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "annb"
}