<div align="center">
<h1>nano-VectorDB</h1>
<p><strong>A simple, easy-to-hack Vector Database</strong></p>
<p>
<img src="https://img.shields.io/badge/python->=3.9.11-blue">
<a href="https://pypi.org/project/nano-vectordb/">
<img src="https://img.shields.io/pypi/v/nano-vectordb.svg">
</a>
<a href="https://codecov.io/github/gusye1234/nano-vectordb" >
<img src="https://codecov.io/github/gusye1234/nano-vectordb/graph/badge.svg?token=3ACScwuv4h"/>
</a>
</p>
</div>
🌬️ A vector database implementation with single-dependency (`numpy`).
🎁 It can handle a query from `100,000` vectors and return in 100 milliseconds.
🏃 It's okay for your prototypes, maybe even more.
🏃 Support naive [multi-tenancy](#Multi-Tenancy).
## Install
**Install from PyPi**
```shell
pip install nano-vectordb
```
**Install from source**
```shell
# clone this repo first
cd nano-vectordb
pip install -e .
```
## Quick Start
**Faking your data**:
```python
from nano_vectordb import NanoVectorDB
import numpy as np
data_len = 100_000
fake_dim = 1024
fake_embeds = np.random.rand(data_len, fake_dim)
fakes_data = [{"__vector__": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]
```
You can add any fields to a data. But there are two keywords:
- `__id__`: If passed, `NanoVectorDB` will use your id, otherwise a generated id will be used.
- `__vector__`: must pass, your embedding `np.ndarray`.
### Init a DB
```python
vdb = NanoVectorDB(fake_dim, storage_file="fool.json")
```
Next time you init `vdb` from `fool.json`, `NanoVectorDB` will load the index automatically.
### Upsert
```python
r = vdb.upsert(fakes_data)
print(r["update"], r["insert"])
```
### Query
```python
# query with embedding
vdb.query(np.random.rand(fake_dim))
# arguments:
vdb.query(np.random.rand(fake_dim), top_k=5, better_than_threshold=0.01)
```
#### Conditional filter
```python
vdb.query(np.random.rand(fake_dim), filter_lambda=lambda x: x["any_field"] == "any_value")
```
### Save
```python
# will create/overwrite 'fool.json'
vdb.save()
```
### Get, Delete
```python
# get and delete the inserted data
print(vdb.get(r["insert"]))
vdb.delete(r["insert"])
```
### Additional Data
```python
vdb.store_additional_data(a=1, b=2, c=3)
print(vdb.get_additional_data())
```
## Multi-Tenancy
If you have multiple vectorDB to use, you can use `MultiTenantNanoVDB` to manage:
```python
from nano_vectordb import NanoVectorDB, MultiTenantNanoVDB
multi_tenant = MultiTenantNanoVDB(1024)
tenant_id = multi_tenant.create_tenant()
# tenant is a NanoVectorDB, you can upsert, query, get... on this.
tenant: NanoVectorDB = multi_tenant.get_tenant(tenant_id)
# some chores:
multi_tenant.delete_tenant(tenant_id)
multi_tenant.contain_tenant(tenant_id)
# save it
multi_tenant.save()
```
`MultiTenantNanoVDB` use a queue to manage the total vector dbs in memory, you can adjust the parameter:
```python
# There will be only `max_capacity` NanoVectorDB in the memory.
multi_tenant = MultiTenantNanoVDB(1024, max_capacity=1)
```
## Benchmark
> Embedding Dim: 1024. Device: MacBook M3 Pro
- Save a index with `100,000` vectors will generate a roughly 520M json file.
- Insert `100,000` vectors will cost roughly `2`s
- Query from `100,000` vectors will cost roughly `0.1`s
Raw data
{
"_id": null,
"home_page": "https://github.com/gusye1234/nano-vectordb",
"name": "nano-vectordb",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "JianbaiYe",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/cb/ff/ed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc/nano_vectordb-0.0.4.3.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <h1>nano-VectorDB</h1>\n <p><strong>A simple, easy-to-hack Vector Database</strong></p>\n <p>\n <img src=\"https://img.shields.io/badge/python->=3.9.11-blue\">\n <a href=\"https://pypi.org/project/nano-vectordb/\">\n <img src=\"https://img.shields.io/pypi/v/nano-vectordb.svg\">\n </a>\n <a href=\"https://codecov.io/github/gusye1234/nano-vectordb\" > \n <img src=\"https://codecov.io/github/gusye1234/nano-vectordb/graph/badge.svg?token=3ACScwuv4h\"/> \n </a>\n </p>\n</div>\n\n\n\n\n\ud83c\udf2c\ufe0f A vector database implementation with single-dependency (`numpy`).\n\n\ud83c\udf81 It can handle a query from `100,000` vectors and return in 100 milliseconds.\n\n\ud83c\udfc3 It's okay for your prototypes, maybe even more.\n\n\ud83c\udfc3 Support naive [multi-tenancy](#Multi-Tenancy).\n\n\n\n## Install\n\n**Install from PyPi**\n\n```shell\npip install nano-vectordb\n```\n\n**Install from source**\n\n```shell\n# clone this repo first\ncd nano-vectordb\npip install -e .\n```\n\n\n\n## Quick Start\n\n**Faking your data**:\n\n```python\nfrom nano_vectordb import NanoVectorDB\nimport numpy as np\n\ndata_len = 100_000\nfake_dim = 1024\nfake_embeds = np.random.rand(data_len, fake_dim) \n\nfakes_data = [{\"__vector__\": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]\n```\n\nYou can add any fields to a data. But there are two keywords:\n\n- `__id__`: If passed, `NanoVectorDB` will use your id, otherwise a generated id will be used.\n- `__vector__`: must pass, your embedding `np.ndarray`.\n\n### Init a DB\n\n```python\nvdb = NanoVectorDB(fake_dim, storage_file=\"fool.json\")\n```\n\nNext time you init `vdb` from `fool.json`, `NanoVectorDB` will load the index automatically.\n\n### Upsert\n\n```python\nr = vdb.upsert(fakes_data)\nprint(r[\"update\"], r[\"insert\"])\n```\n\n### Query\n\n```python\n# query with embedding \nvdb.query(np.random.rand(fake_dim))\n\n# arguments:\nvdb.query(np.random.rand(fake_dim), top_k=5, better_than_threshold=0.01)\n```\n\n#### Conditional filter\n\n```python\nvdb.query(np.random.rand(fake_dim), filter_lambda=lambda x: x[\"any_field\"] == \"any_value\")\n```\n\n### Save\n\n```python\n# will create/overwrite 'fool.json'\nvdb.save()\n```\n\n### Get, Delete\n\n```python\n# get and delete the inserted data\nprint(vdb.get(r[\"insert\"]))\nvdb.delete(r[\"insert\"])\n```\n\n### Additional Data\n\n```python\nvdb.store_additional_data(a=1, b=2, c=3)\nprint(vdb.get_additional_data())\n```\n\n\n\n## Multi-Tenancy\n\nIf you have multiple vectorDB to use, you can use `MultiTenantNanoVDB` to manage:\n\n```python\nfrom nano_vectordb import NanoVectorDB, MultiTenantNanoVDB\n\nmulti_tenant = MultiTenantNanoVDB(1024)\ntenant_id = multi_tenant.create_tenant()\n\n# tenant is a NanoVectorDB, you can upsert, query, get... on this.\ntenant: NanoVectorDB = multi_tenant.get_tenant(tenant_id)\n\n# some chores:\nmulti_tenant.delete_tenant(tenant_id)\nmulti_tenant.contain_tenant(tenant_id)\n\n# save it\nmulti_tenant.save()\n```\n\n`MultiTenantNanoVDB` use a queue to manage the total vector dbs in memory, you can adjust the parameter:\n\n```python\n# There will be only `max_capacity` NanoVectorDB in the memory.\nmulti_tenant = MultiTenantNanoVDB(1024, max_capacity=1)\n```\n\n\n\n## Benchmark\n\n> Embedding Dim: 1024. Device: MacBook M3 Pro\n\n- Save a index with `100,000` vectors will generate a roughly 520M json file.\n- Insert `100,000` vectors will cost roughly `2`s\n- Query from `100,000` vectors will cost roughly `0.1`s\n",
"bugtrack_url": null,
"license": null,
"summary": "A simple, easy-to-hack Vector Database implementation",
"version": "0.0.4.3",
"project_urls": {
"Homepage": "https://github.com/gusye1234/nano-vectordb"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9bd8f1876f59916da0a2147e63066650c46bf7992828a9e92f1b4e3b695f1fb0",
"md5": "89a2412ad1d2705125ad9e4b839db010",
"sha256": "1b70401a54c02fabf76515b5dfb630076434547ed3c6861828ee8771b6dd7c19"
},
"downloads": -1,
"filename": "nano_vectordb-0.0.4.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "89a2412ad1d2705125ad9e4b839db010",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 5590,
"upload_time": "2024-11-11T12:50:48",
"upload_time_iso_8601": "2024-11-11T12:50:48.900336Z",
"url": "https://files.pythonhosted.org/packages/9b/d8/f1876f59916da0a2147e63066650c46bf7992828a9e92f1b4e3b695f1fb0/nano_vectordb-0.0.4.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cbffed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc",
"md5": "24cbf8f8f34b058754901c9ecd570587",
"sha256": "3d13074476f2b739e51261974ed44aa467725579966219734c03502c929ed3b5"
},
"downloads": -1,
"filename": "nano_vectordb-0.0.4.3.tar.gz",
"has_sig": false,
"md5_digest": "24cbf8f8f34b058754901c9ecd570587",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 6332,
"upload_time": "2024-11-11T12:50:50",
"upload_time_iso_8601": "2024-11-11T12:50:50.584216Z",
"url": "https://files.pythonhosted.org/packages/cb/ff/ed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc/nano_vectordb-0.0.4.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-11 12:50:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gusye1234",
"github_project": "nano-vectordb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "nano-vectordb"
}