nano-vectordb


Namenano-vectordb JSON
Version 0.0.4.3 PyPI version JSON
download
home_pagehttps://github.com/gusye1234/nano-vectordb
SummaryA simple, easy-to-hack Vector Database implementation
upload_time2024-11-11 12:50:50
maintainerNone
docs_urlNone
authorJianbaiYe
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <h1>nano-VectorDB</h1>
  <p><strong>A simple, easy-to-hack Vector Database</strong></p>
  <p>
    <img src="https://img.shields.io/badge/python->=3.9.11-blue">
    <a href="https://pypi.org/project/nano-vectordb/">
      <img src="https://img.shields.io/pypi/v/nano-vectordb.svg">
    </a>
    <a href="https://codecov.io/github/gusye1234/nano-vectordb" > 
 <img src="https://codecov.io/github/gusye1234/nano-vectordb/graph/badge.svg?token=3ACScwuv4h"/> 
 </a>
  </p>
</div>




🌬️ A vector database implementation with single-dependency (`numpy`).

🎁 It can handle a query from `100,000` vectors and return in 100 milliseconds.

🏃 It's okay for your prototypes, maybe even more.

🏃 Support naive [multi-tenancy](#Multi-Tenancy).



## Install

**Install from PyPi**

```shell
pip install nano-vectordb
```

**Install from source**

```shell
# clone this repo first
cd nano-vectordb
pip install -e .
```



## Quick Start

**Faking your data**:

```python
from nano_vectordb import NanoVectorDB
import numpy as np

data_len = 100_000
fake_dim = 1024
fake_embeds = np.random.rand(data_len, fake_dim)    

fakes_data = [{"__vector__": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]
```

You can add any fields to a data. But there are two keywords:

- `__id__`: If passed, `NanoVectorDB` will use your id, otherwise a generated id will be used.
- `__vector__`: must pass, your embedding `np.ndarray`.

### Init a DB

```python
vdb = NanoVectorDB(fake_dim, storage_file="fool.json")
```

Next time you init `vdb` from `fool.json`, `NanoVectorDB` will load the index automatically.

### Upsert

```python
r = vdb.upsert(fakes_data)
print(r["update"], r["insert"])
```

### Query

```python
# query with embedding 
vdb.query(np.random.rand(fake_dim))

# arguments:
vdb.query(np.random.rand(fake_dim), top_k=5, better_than_threshold=0.01)
```

#### Conditional filter

```python
vdb.query(np.random.rand(fake_dim), filter_lambda=lambda x: x["any_field"] == "any_value")
```

### Save

```python
# will create/overwrite 'fool.json'
vdb.save()
```

### Get, Delete

```python
# get and delete the inserted data
print(vdb.get(r["insert"]))
vdb.delete(r["insert"])
```

### Additional Data

```python
vdb.store_additional_data(a=1, b=2, c=3)
print(vdb.get_additional_data())
```



## Multi-Tenancy

If you have multiple vectorDB to use, you can use `MultiTenantNanoVDB` to manage:

```python
from nano_vectordb import NanoVectorDB, MultiTenantNanoVDB

multi_tenant = MultiTenantNanoVDB(1024)
tenant_id = multi_tenant.create_tenant()

# tenant is a NanoVectorDB, you can upsert, query, get... on this.
tenant: NanoVectorDB = multi_tenant.get_tenant(tenant_id)

# some chores:
multi_tenant.delete_tenant(tenant_id)
multi_tenant.contain_tenant(tenant_id)

# save it
multi_tenant.save()
```

`MultiTenantNanoVDB` use a queue to manage the total vector dbs in memory, you can adjust the parameter:

```python
# There will be only `max_capacity` NanoVectorDB in the memory.
multi_tenant = MultiTenantNanoVDB(1024, max_capacity=1)
```



## Benchmark

> Embedding Dim: 1024. Device: MacBook M3 Pro

- Save a index with `100,000` vectors will generate a roughly 520M json file.
- Insert `100,000` vectors will cost roughly `2`s
- Query from `100,000` vectors will cost roughly `0.1`s

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gusye1234/nano-vectordb",
    "name": "nano-vectordb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "JianbaiYe",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/cb/ff/ed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc/nano_vectordb-0.0.4.3.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1>nano-VectorDB</h1>\n  <p><strong>A simple, easy-to-hack Vector Database</strong></p>\n  <p>\n    <img src=\"https://img.shields.io/badge/python->=3.9.11-blue\">\n    <a href=\"https://pypi.org/project/nano-vectordb/\">\n      <img src=\"https://img.shields.io/pypi/v/nano-vectordb.svg\">\n    </a>\n    <a href=\"https://codecov.io/github/gusye1234/nano-vectordb\" > \n <img src=\"https://codecov.io/github/gusye1234/nano-vectordb/graph/badge.svg?token=3ACScwuv4h\"/> \n </a>\n  </p>\n</div>\n\n\n\n\n\ud83c\udf2c\ufe0f A vector database implementation with single-dependency (`numpy`).\n\n\ud83c\udf81 It can handle a query from `100,000` vectors and return in 100 milliseconds.\n\n\ud83c\udfc3 It's okay for your prototypes, maybe even more.\n\n\ud83c\udfc3 Support naive [multi-tenancy](#Multi-Tenancy).\n\n\n\n## Install\n\n**Install from PyPi**\n\n```shell\npip install nano-vectordb\n```\n\n**Install from source**\n\n```shell\n# clone this repo first\ncd nano-vectordb\npip install -e .\n```\n\n\n\n## Quick Start\n\n**Faking your data**:\n\n```python\nfrom nano_vectordb import NanoVectorDB\nimport numpy as np\n\ndata_len = 100_000\nfake_dim = 1024\nfake_embeds = np.random.rand(data_len, fake_dim)    \n\nfakes_data = [{\"__vector__\": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]\n```\n\nYou can add any fields to a data. But there are two keywords:\n\n- `__id__`: If passed, `NanoVectorDB` will use your id, otherwise a generated id will be used.\n- `__vector__`: must pass, your embedding `np.ndarray`.\n\n### Init a DB\n\n```python\nvdb = NanoVectorDB(fake_dim, storage_file=\"fool.json\")\n```\n\nNext time you init `vdb` from `fool.json`, `NanoVectorDB` will load the index automatically.\n\n### Upsert\n\n```python\nr = vdb.upsert(fakes_data)\nprint(r[\"update\"], r[\"insert\"])\n```\n\n### Query\n\n```python\n# query with embedding \nvdb.query(np.random.rand(fake_dim))\n\n# arguments:\nvdb.query(np.random.rand(fake_dim), top_k=5, better_than_threshold=0.01)\n```\n\n#### Conditional filter\n\n```python\nvdb.query(np.random.rand(fake_dim), filter_lambda=lambda x: x[\"any_field\"] == \"any_value\")\n```\n\n### Save\n\n```python\n# will create/overwrite 'fool.json'\nvdb.save()\n```\n\n### Get, Delete\n\n```python\n# get and delete the inserted data\nprint(vdb.get(r[\"insert\"]))\nvdb.delete(r[\"insert\"])\n```\n\n### Additional Data\n\n```python\nvdb.store_additional_data(a=1, b=2, c=3)\nprint(vdb.get_additional_data())\n```\n\n\n\n## Multi-Tenancy\n\nIf you have multiple vectorDB to use, you can use `MultiTenantNanoVDB` to manage:\n\n```python\nfrom nano_vectordb import NanoVectorDB, MultiTenantNanoVDB\n\nmulti_tenant = MultiTenantNanoVDB(1024)\ntenant_id = multi_tenant.create_tenant()\n\n# tenant is a NanoVectorDB, you can upsert, query, get... on this.\ntenant: NanoVectorDB = multi_tenant.get_tenant(tenant_id)\n\n# some chores:\nmulti_tenant.delete_tenant(tenant_id)\nmulti_tenant.contain_tenant(tenant_id)\n\n# save it\nmulti_tenant.save()\n```\n\n`MultiTenantNanoVDB` use a queue to manage the total vector dbs in memory, you can adjust the parameter:\n\n```python\n# There will be only `max_capacity` NanoVectorDB in the memory.\nmulti_tenant = MultiTenantNanoVDB(1024, max_capacity=1)\n```\n\n\n\n## Benchmark\n\n> Embedding Dim: 1024. Device: MacBook M3 Pro\n\n- Save a index with `100,000` vectors will generate a roughly 520M json file.\n- Insert `100,000` vectors will cost roughly `2`s\n- Query from `100,000` vectors will cost roughly `0.1`s\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple, easy-to-hack Vector Database implementation",
    "version": "0.0.4.3",
    "project_urls": {
        "Homepage": "https://github.com/gusye1234/nano-vectordb"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9bd8f1876f59916da0a2147e63066650c46bf7992828a9e92f1b4e3b695f1fb0",
                "md5": "89a2412ad1d2705125ad9e4b839db010",
                "sha256": "1b70401a54c02fabf76515b5dfb630076434547ed3c6861828ee8771b6dd7c19"
            },
            "downloads": -1,
            "filename": "nano_vectordb-0.0.4.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "89a2412ad1d2705125ad9e4b839db010",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5590,
            "upload_time": "2024-11-11T12:50:48",
            "upload_time_iso_8601": "2024-11-11T12:50:48.900336Z",
            "url": "https://files.pythonhosted.org/packages/9b/d8/f1876f59916da0a2147e63066650c46bf7992828a9e92f1b4e3b695f1fb0/nano_vectordb-0.0.4.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbffed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc",
                "md5": "24cbf8f8f34b058754901c9ecd570587",
                "sha256": "3d13074476f2b739e51261974ed44aa467725579966219734c03502c929ed3b5"
            },
            "downloads": -1,
            "filename": "nano_vectordb-0.0.4.3.tar.gz",
            "has_sig": false,
            "md5_digest": "24cbf8f8f34b058754901c9ecd570587",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 6332,
            "upload_time": "2024-11-11T12:50:50",
            "upload_time_iso_8601": "2024-11-11T12:50:50.584216Z",
            "url": "https://files.pythonhosted.org/packages/cb/ff/ed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc/nano_vectordb-0.0.4.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-11 12:50:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gusye1234",
    "github_project": "nano-vectordb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "nano-vectordb"
}
        
Elapsed time: 1.22123s