<div align="center">
<a href="https://github.com/BirchKwok/MinVectorDB"><img src="https://github.com/BirchKwok/MinVectorDB/blob/main/pic/logo.png" alt="MinVectorDB" style="max-width: 20%; height: auto;"></a>
<h3>A pure Python-implemented, lightweight, server-optional, multi-end compatible, vector database deployable locally or remotely.</h3>
<p>
<a href="https://badge.fury.io/py/MinVectorDB"><img src="https://badge.fury.io/py/MinVectorDB.svg" alt="PyPI version"></a>
<a href="https://pypi.org/project/MinVectorDB/"><img src="https://img.shields.io/pypi/pyversions/MinVectorDB" alt="PyPI - Python Version"></a>
<a href="https://pypi.org/project/MinVectorDB/"><img src="https://img.shields.io/pypi/l/MinVectorDB" alt="PyPI - License"></a>
<a href="https://github.com/BirchKwok/MinVectorDB/actions/workflows/python-tests.yml"><img src="https://github.com/BirchKwok/MinVectorDB/actions/workflows/python-tests.yml/badge.svg" alt="Python testing"></a>
<a href="https://github.com/BirchKwok/MinVectorDB/actions/workflows/docker-tests.yml"><img src="https://github.com/BirchKwok/MinVectorDB/actions/workflows/docker-tests.yml/badge.svg" alt="Docker build"></a>
</p>
</div>
⚡ **Server-optional, simple parameters, simple API.**
⚡ **Fast, memory-efficient, easily scales to millions of vectors.**
⚡ **Supports cosine similarity and L2 distance, uses FLAT for exhaustive search or IVF-FLAT for inverted indexing.**
⚡ **Friendly caching technology stores recently queried vectors for accelerated access.**
⚡ **Based on a generic Python software stack, platform-independent, highly versatile.**
> **WARNING**: MinVectorDB is actively being updated, and API backward compatibility is not guaranteed. You should use version numbers as a strong constraint during deployment to avoid unnecessary feature conflicts and errors.
> **Although our goal is to enable brute force search or inverted indexing on billion-scale vectors, we currently still recommend using it on a scale of millions of vectors or less for the best experience.**
*MinVectorDB* is a vector database implemented purely in Python, designed to be lightweight, server-optional, and easy to deploy locally or remotely. It offers straightforward and clear Python APIs, aiming to lower the entry barrier for using vector databases. In response to user needs and to enhance its practicality, we are planning to introduce new features, including but not limited to:
- **Optimizing Global Search Performance**: We are focusing on algorithm and data structure enhancements to speed up searches across the database, enabling faster retrieval of vector data.
- **Enhancing Cluster Search with Inverted Indexes**: Utilizing inverted index technology, we aim to refine the cluster search process for better search efficiency and precision.
- **Refining Clustering Algorithms**: By improving our clustering algorithms, we intend to offer more precise and efficient data clustering to support complex queries.
- **Facilitating Vector Modifications and Deletions**: We will introduce features to modify and delete vectors, allowing for more flexible data management.
MinVectorDB focuses on achieving 100% recall, prioritizing recall accuracy over high-speed search performance. This approach ensures that users can reliably retrieve all relevant vector data, making MinVectorDB particularly suitable for applications that require responses within hundreds of milliseconds.
- [x] **Now supports HTTP API and Python local code API.**
- [X] **Now supports Docker deployment.**
- [X] **Now supports vector id and field filtering.**
- [X] **Now supports transaction management; if a commit fails, it will automatically roll back.**
## Prerequisite
- [x] python version >= 3.9
- [x] Owns one of the operating systems: Windows, macOS, or Ubuntu (or other Linux distributions). The recommendation is for the latest version of the system, but non-latest versions should also be installable, although they have not been tested.
- [x] Memory >= 4GB, Free Disk >= 4GB.
## Install Client API package (Mandatory)
```shell
pip install MinVectorDB
```
## If you wish to use Docker (Optional)
**You must first [install Docker](https://docs.docker.com/engine/install/) on the host machine.**
```shell
docker pull birchkwok/minvectordb:latest
```
## Qucik Start
```python
import min_vec
print("MinVectorDB version is: ", min_vec.__version__)
```
MinVectorDB version is: 0.3.4
## Initialize Database
MinVectorDB now supports HTTP API and Python native code API.
The HTTP API mode requires starting an HTTP server beforehand. You have two options:
- start directly.
For direct startup, the default port is 7637. You can run the following command in the terminal to start the service:
```shell
min_vec run --host localhost --port 7637
```
- within Docker
In Docker, You can run the following command in the terminal to start the service:
```shell
docker run -p 7637:7637 birchkwok/minvectordb:latest
```
- Remote deploy
If you want to deploy remotely, you can bind the image to port 80 of the remote host, or allow the host to open access to port 7637.
such as:
```shell
docker run -p 80:7637 birchkwok/minvectordb:latest
```
- test if api available
You can directly request in the browser http://localhost:7637
For port 80, you can use this url: http://localhost
If the image is bound to port 80 of the host in remote deployment, you can directly access it http://your_host_ip
```python
from min_vec import MinVectorDB
# Use the HTTP API mode, it is suitable for use in production environments.
my_db = MinVectorDB("http://localhost:7637")
# Or use the Python native code API by specifying the database root directory.
# my_db = MinVectorDB('my_vec_db') # Judgment condition, root_path does not start with http or https
# The Python native code API is recommended only for CI/CD testing or single-user local use.
```
### create a collection
**`WARNING`**
When using the `require_collection` method to request a collection, if the `drop_if_exists` parameter is set to True, it will delete all content of the collection if it already exists.
A safer method is to use the `get_collection` method. It is recommended to use the `require_collection` method only when you need to reinitialize a collection or create a new one.
```python
collection = my_db.require_collection("test_collection", dim=4, drop_if_exists=True, scaler_bits=8, description="demo collection")
```
#### show database collections
```python
my_db.show_collections_details()
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>chunk_size</th>
<th>description</th>
<th>dim</th>
<th>distance</th>
<th>dtypes</th>
<th>index_mode</th>
<th>initialize_as_collection</th>
<th>n_clusters</th>
<th>n_threads</th>
<th>scaler_bits</th>
<th>use_cache</th>
<th>warm_up</th>
</tr>
<tr>
<th>collections</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>test_collection</th>
<td>100000</td>
<td>demo collection</td>
<td>4</td>
<td>cosine</td>
<td>float32</td>
<td>IVF-FLAT</td>
<td>True</td>
<td>16</td>
<td>10</td>
<td>8</td>
<td>True</td>
<td>False</td>
</tr>
</tbody>
</table>
</div>
#### update description
```python
collection.update_description("test2")
my_db.show_collections_details()
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>chunk_size</th>
<th>description</th>
<th>dim</th>
<th>distance</th>
<th>dtypes</th>
<th>index_mode</th>
<th>initialize_as_collection</th>
<th>n_clusters</th>
<th>n_threads</th>
<th>scaler_bits</th>
<th>use_cache</th>
<th>warm_up</th>
</tr>
<tr>
<th>collections</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>test_collection</th>
<td>100000</td>
<td>test2</td>
<td>4</td>
<td>cosine</td>
<td>float32</td>
<td>IVF-FLAT</td>
<td>True</td>
<td>16</td>
<td>10</td>
<td>8</td>
<td>True</td>
<td>False</td>
</tr>
</tbody>
</table>
</div>
### Add vectors
When inserting vectors, collection requires manually running the `commit` function or inserting within the `insert_session` function context manager, which will run the `commit` function in the background.
```python
with collection.insert_session():
id = collection.add_item(vector=[0.01, 0.34, 0.74, 0.31], id=1, field={'field': 'test_1', 'order': 0}) # id = 0
id = collection.add_item(vector=[0.36, 0.43, 0.56, 0.12], id=2, field={'field': 'test_1', 'order': 1}) # id = 1
id = collection.add_item(vector=[0.03, 0.04, 0.10, 0.51], id=3, field={'field': 'test_2', 'order': 2}) # id = 2
id = collection.add_item(vector=[0.11, 0.44, 0.23, 0.24], id=4, field={'field': 'test_2', 'order': 3}) # id = 3
id = collection.add_item(vector=[0.91, 0.43, 0.44, 0.67], id=5, field={'field': 'test_2', 'order': 4}) # id = 4
id = collection.add_item(vector=[0.92, 0.12, 0.56, 0.19], id=6, field={'field': 'test_3', 'order': 5}) # id = 5
id = collection.add_item(vector=[0.18, 0.34, 0.56, 0.71], id=7, field={'field': 'test_1', 'order': 6}) # id = 6
id = collection.add_item(vector=[0.01, 0.33, 0.14, 0.31], id=8, field={'field': 'test_2', 'order': 7}) # id = 7
id = collection.add_item(vector=[0.71, 0.75, 0.91, 0.82], id=9, field={'field': 'test_3', 'order': 8}) # id = 8
id = collection.add_item(vector=[0.75, 0.44, 0.38, 0.75], id=10, field={'field': 'test_1', 'order': 9}) # id = 9
# If you do not use the insert_session function, you need to manually call the commit function to submit the data
# collection.commit()
```
```python
# or use the bulk_add_items function
# with collection.insert_session():
# ids = collection.bulk_add_items([([0.01, 0.34, 0.74, 0.31], 0, {'field': 'test_1', 'order': 0}),
# ([0.36, 0.43, 0.56, 0.12], 1, {'field': 'test_1', 'order': 1}),
# ([0.03, 0.04, 0.10, 0.51], 2, {'field': 'test_2', 'order': 2}),
# ([0.11, 0.44, 0.23, 0.24], 3, {'field': 'test_2', 'order': 3}),
# ([0.91, 0.43, 0.44, 0.67], 4, {'field': 'test_2', 'order': 4}),
# ([0.92, 0.12, 0.56, 0.19], 5, {'field': 'test_3', 'order': 5}),
# ([0.18, 0.34, 0.56, 0.71], 6, {'field': 'test_1', 'order': 6}),
# ([0.01, 0.33, 0.14, 0.31], 7, {'field': 'test_2', 'order': 7}),
# ([0.71, 0.75, 0.91, 0.82], 8, {'field': 'test_3', 'order': 8}),
# ([0.75, 0.44, 0.38, 0.75], 9, {'field': 'test_1', 'order': 9})])
# print(ids) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```
### Query
The default similarity measure for query is cosine. You can specify cosine or L2 to obtain the similarity measure you need.
```python
collection.query(vector=[0.36, 0.43, 0.56, 0.12], k=10)
```
(array([ 2, 9, 1, 4, 6, 5, 10, 7, 8, 3]),
array([1. , 0.92355633, 0.86097705, 0.85727406, 0.81551266,
0.813797 , 0.78595245, 0.7741583 , 0.6871773 , 0.34695023]))
The `query_report_` attribute is the report of the most recent query. When multiple queries are conducted simultaneously, this attribute will only save the report of the last completed query result.
```python
print(collection.query_report_)
```
* - MOST RECENT QUERY REPORT -
| - Collection Shape: (10, 4)
| - Query Time: 0.13898 s
| - Query Distance: cosine
| - Query K: 10
| - Top 10 Results ID: [ 2 9 1 4 6 5 10 7 8 3]
| - Top 10 Results Similarity: [1. 0.92355633 0.86097705 0.85727406 0.81551266 0.813797
0.78595245 0.7741583 0.6871773 0.34695023]
* - END OF REPORT -
### Use Filter
Using the Filter class for result filtering can maximize Recall.
The Filter class now supports `must`, `any`, and `must_not` parameters, all of which only accept list-type argument values.
The filtering conditions in `must` must be met, those in `must_not` must not be met.
After filtering with `must` and `must_not` conditions, the conditions in `any` will be considered, and at least one of the conditions in `any` must be met.
If there is a conflict between the conditions in `any` and those in `must` or `must_not`, the conditions in `any` will be ignored.
```python
import operator
from min_vec.core_components.filter import Filter, FieldCondition, MatchField, IDCondition, MatchID
collection.query(
vector=[0.36, 0.43, 0.56, 0.12],
k=10,
query_filter=Filter(
must=[
FieldCondition(key='field', matcher=MatchField('test_1')), # Support for filtering fields
],
any=[
FieldCondition(key='order', matcher=MatchField(8, comparator=operator.ge)),
IDCondition(MatchID([1, 2, 3, 4, 5])), # Support for filtering IDs
],
must_not=[
IDCondition(MatchID([8])),
FieldCondition(key='order', matcher=MatchField(8, comparator=operator.ge)),
]
)
)
print(collection.query_report_)
```
* - MOST RECENT QUERY REPORT -
| - Collection Shape: (10, 4)
| - Query Time: 0.09066 s
| - Query Distance: cosine
| - Query K: 10
| - Top 10 Results ID: [2 1]
| - Top 10 Results Similarity: [1. 0.86097705]
* - END OF REPORT -
### Drop a collection
`WARNING: This operation cannot be undone`
```python
print("Collection list before dropping:", my_db.show_collections())
status = my_db.drop_collection("test_collection")
print("Collection list after dropped:", my_db.show_collections())
```
Collection list before dropping: ['test_collection']
{'status': 'success', 'params': {'collection_name': 'test_collection', 'exists': False}}
Collection list after dropped: []
## Drop the database
`WARNING: This operation cannot be undone`
```python
my_db.drop_database()
my_db
```
MinVectorDB remote server at http://localhost:7637 does not exist.
```python
my_db.database_exists()
```
{'status': 'success', 'params': {'exists': False}}
## What's Next
- [Collection's operations](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/collections.ipynb)
- [Add vectors to collection](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/add_vectors.ipynb)
- [Using different indexing methods](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/index_mode.ipynb)
- [Using different distance metric functions](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/distance.ipynb)
- [Diversified queries](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/queries.ipynb)
Raw data
{
"_id": null,
"home_page": "https://github.com/BirchKwok/MinVectorDB",
"name": "MinVectorDB",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "vector database",
"author": "Birch Kwok",
"author_email": "birchkwok@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a1/fd/ed05cbdf6a07dd27c09f2e76d8a4bf2a01c05ca44ac7f5247276b251fac3/minvectordb-0.3.4.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <a href=\"https://github.com/BirchKwok/MinVectorDB\"><img src=\"https://github.com/BirchKwok/MinVectorDB/blob/main/pic/logo.png\" alt=\"MinVectorDB\" style=\"max-width: 20%; height: auto;\"></a>\n <h3>A pure Python-implemented, lightweight, server-optional, multi-end compatible, vector database deployable locally or remotely.</h3>\n <p>\n <a href=\"https://badge.fury.io/py/MinVectorDB\"><img src=\"https://badge.fury.io/py/MinVectorDB.svg\" alt=\"PyPI version\"></a>\n <a href=\"https://pypi.org/project/MinVectorDB/\"><img src=\"https://img.shields.io/pypi/pyversions/MinVectorDB\" alt=\"PyPI - Python Version\"></a>\n <a href=\"https://pypi.org/project/MinVectorDB/\"><img src=\"https://img.shields.io/pypi/l/MinVectorDB\" alt=\"PyPI - License\"></a>\n <a href=\"https://github.com/BirchKwok/MinVectorDB/actions/workflows/python-tests.yml\"><img src=\"https://github.com/BirchKwok/MinVectorDB/actions/workflows/python-tests.yml/badge.svg\" alt=\"Python testing\"></a>\n <a href=\"https://github.com/BirchKwok/MinVectorDB/actions/workflows/docker-tests.yml\"><img src=\"https://github.com/BirchKwok/MinVectorDB/actions/workflows/docker-tests.yml/badge.svg\" alt=\"Docker build\"></a>\n </p>\n</div>\n\n\u26a1 **Server-optional, simple parameters, simple API.**\n\n\u26a1 **Fast, memory-efficient, easily scales to millions of vectors.**\n\n\u26a1 **Supports cosine similarity and L2 distance, uses FLAT for exhaustive search or IVF-FLAT for inverted indexing.**\n\n\u26a1 **Friendly caching technology stores recently queried vectors for accelerated access.**\n\n\u26a1 **Based on a generic Python software stack, platform-independent, highly versatile.**\n\n> **WARNING**: MinVectorDB is actively being updated, and API backward compatibility is not guaranteed. You should use version numbers as a strong constraint during deployment to avoid unnecessary feature conflicts and errors.\n> **Although our goal is to enable brute force search or inverted indexing on billion-scale vectors, we currently still recommend using it on a scale of millions of vectors or less for the best experience.**\n\n*MinVectorDB* is a vector database implemented purely in Python, designed to be lightweight, server-optional, and easy to deploy locally or remotely. It offers straightforward and clear Python APIs, aiming to lower the entry barrier for using vector databases. In response to user needs and to enhance its practicality, we are planning to introduce new features, including but not limited to:\n\n- **Optimizing Global Search Performance**: We are focusing on algorithm and data structure enhancements to speed up searches across the database, enabling faster retrieval of vector data.\n- **Enhancing Cluster Search with Inverted Indexes**: Utilizing inverted index technology, we aim to refine the cluster search process for better search efficiency and precision.\n- **Refining Clustering Algorithms**: By improving our clustering algorithms, we intend to offer more precise and efficient data clustering to support complex queries.\n- **Facilitating Vector Modifications and Deletions**: We will introduce features to modify and delete vectors, allowing for more flexible data management.\n\nMinVectorDB focuses on achieving 100% recall, prioritizing recall accuracy over high-speed search performance. This approach ensures that users can reliably retrieve all relevant vector data, making MinVectorDB particularly suitable for applications that require responses within hundreds of milliseconds.\n\n- [x] **Now supports HTTP API and Python local code API.**\n- [X] **Now supports Docker deployment.**\n- [X] **Now supports vector id and field filtering.**\n- [X] **Now supports transaction management; if a commit fails, it will automatically roll back.**\n\n## Prerequisite\n\n- [x] python version >= 3.9\n- [x] Owns one of the operating systems: Windows, macOS, or Ubuntu (or other Linux distributions). The recommendation is for the latest version of the system, but non-latest versions should also be installable, although they have not been tested.\n- [x] Memory >= 4GB, Free Disk >= 4GB.\n\n## Install Client API package (Mandatory)\n\n```shell\npip install MinVectorDB\n```\n\n## If you wish to use Docker (Optional)\n\n**You must first [install Docker](https://docs.docker.com/engine/install/) on the host machine.**\n\n```shell\ndocker pull birchkwok/minvectordb:latest\n```\n\n## Qucik Start\n\n\n```python\nimport min_vec\nprint(\"MinVectorDB version is: \", min_vec.__version__)\n```\n\n MinVectorDB version is: 0.3.4\n\n\n## Initialize Database\n\nMinVectorDB now supports HTTP API and Python native code API. \n\n\nThe HTTP API mode requires starting an HTTP server beforehand. You have two options: \n- start directly.\n \n For direct startup, the default port is 7637. You can run the following command in the terminal to start the service:\n```shell\nmin_vec run --host localhost --port 7637\n```\n\n- within Docker\n \n In Docker, You can run the following command in the terminal to start the service:\n```shell\ndocker run -p 7637:7637 birchkwok/minvectordb:latest\n```\n- Remote deploy\n\n If you want to deploy remotely, you can bind the image to port 80 of the remote host, or allow the host to open access to port 7637.\n such as:\n```shell\ndocker run -p 80:7637 birchkwok/minvectordb:latest\n```\n\n- test if api available\n\n You can directly request in the browser http://localhost:7637\n \n For port 80, you can use this url: http://localhost\n \n If the image is bound to port 80 of the host in remote deployment, you can directly access it http://your_host_ip\n \n\n\n```python\nfrom min_vec import MinVectorDB\n\n# Use the HTTP API mode, it is suitable for use in production environments.\nmy_db = MinVectorDB(\"http://localhost:7637\")\n# Or use the Python native code API by specifying the database root directory.\n# my_db = MinVectorDB('my_vec_db') # Judgment condition, root_path does not start with http or https\n# The Python native code API is recommended only for CI/CD testing or single-user local use.\n```\n\n### create a collection\n\n**`WARNING`**\n\nWhen using the `require_collection` method to request a collection, if the `drop_if_exists` parameter is set to True, it will delete all content of the collection if it already exists. \n\nA safer method is to use the `get_collection` method. It is recommended to use the `require_collection` method only when you need to reinitialize a collection or create a new one.\n\n\n```python\ncollection = my_db.require_collection(\"test_collection\", dim=4, drop_if_exists=True, scaler_bits=8, description=\"demo collection\")\n```\n\n#### show database collections\n\n\n```python\nmy_db.show_collections_details()\n```\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>chunk_size</th>\n <th>description</th>\n <th>dim</th>\n <th>distance</th>\n <th>dtypes</th>\n <th>index_mode</th>\n <th>initialize_as_collection</th>\n <th>n_clusters</th>\n <th>n_threads</th>\n <th>scaler_bits</th>\n <th>use_cache</th>\n <th>warm_up</th>\n </tr>\n <tr>\n <th>collections</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>test_collection</th>\n <td>100000</td>\n <td>demo collection</td>\n <td>4</td>\n <td>cosine</td>\n <td>float32</td>\n <td>IVF-FLAT</td>\n <td>True</td>\n <td>16</td>\n <td>10</td>\n <td>8</td>\n <td>True</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n#### update description\n\n\n```python\ncollection.update_description(\"test2\")\nmy_db.show_collections_details()\n```\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>chunk_size</th>\n <th>description</th>\n <th>dim</th>\n <th>distance</th>\n <th>dtypes</th>\n <th>index_mode</th>\n <th>initialize_as_collection</th>\n <th>n_clusters</th>\n <th>n_threads</th>\n <th>scaler_bits</th>\n <th>use_cache</th>\n <th>warm_up</th>\n </tr>\n <tr>\n <th>collections</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>test_collection</th>\n <td>100000</td>\n <td>test2</td>\n <td>4</td>\n <td>cosine</td>\n <td>float32</td>\n <td>IVF-FLAT</td>\n <td>True</td>\n <td>16</td>\n <td>10</td>\n <td>8</td>\n <td>True</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n### Add vectors\n\nWhen inserting vectors, collection requires manually running the `commit` function or inserting within the `insert_session` function context manager, which will run the `commit` function in the background.\n\n\n```python\nwith collection.insert_session():\n id = collection.add_item(vector=[0.01, 0.34, 0.74, 0.31], id=1, field={'field': 'test_1', 'order': 0}) # id = 0\n id = collection.add_item(vector=[0.36, 0.43, 0.56, 0.12], id=2, field={'field': 'test_1', 'order': 1}) # id = 1\n id = collection.add_item(vector=[0.03, 0.04, 0.10, 0.51], id=3, field={'field': 'test_2', 'order': 2}) # id = 2\n id = collection.add_item(vector=[0.11, 0.44, 0.23, 0.24], id=4, field={'field': 'test_2', 'order': 3}) # id = 3\n id = collection.add_item(vector=[0.91, 0.43, 0.44, 0.67], id=5, field={'field': 'test_2', 'order': 4}) # id = 4\n id = collection.add_item(vector=[0.92, 0.12, 0.56, 0.19], id=6, field={'field': 'test_3', 'order': 5}) # id = 5\n id = collection.add_item(vector=[0.18, 0.34, 0.56, 0.71], id=7, field={'field': 'test_1', 'order': 6}) # id = 6\n id = collection.add_item(vector=[0.01, 0.33, 0.14, 0.31], id=8, field={'field': 'test_2', 'order': 7}) # id = 7\n id = collection.add_item(vector=[0.71, 0.75, 0.91, 0.82], id=9, field={'field': 'test_3', 'order': 8}) # id = 8\n id = collection.add_item(vector=[0.75, 0.44, 0.38, 0.75], id=10, field={'field': 'test_1', 'order': 9}) # id = 9\n\n# If you do not use the insert_session function, you need to manually call the commit function to submit the data\n# collection.commit()\n```\n\n\n```python\n# or use the bulk_add_items function\n# with collection.insert_session():\n# ids = collection.bulk_add_items([([0.01, 0.34, 0.74, 0.31], 0, {'field': 'test_1', 'order': 0}), \n# ([0.36, 0.43, 0.56, 0.12], 1, {'field': 'test_1', 'order': 1}), \n# ([0.03, 0.04, 0.10, 0.51], 2, {'field': 'test_2', 'order': 2}),\n# ([0.11, 0.44, 0.23, 0.24], 3, {'field': 'test_2', 'order': 3}), \n# ([0.91, 0.43, 0.44, 0.67], 4, {'field': 'test_2', 'order': 4}), \n# ([0.92, 0.12, 0.56, 0.19], 5, {'field': 'test_3', 'order': 5}),\n# ([0.18, 0.34, 0.56, 0.71], 6, {'field': 'test_1', 'order': 6}), \n# ([0.01, 0.33, 0.14, 0.31], 7, {'field': 'test_2', 'order': 7}), \n# ([0.71, 0.75, 0.91, 0.82], 8, {'field': 'test_3', 'order': 8}),\n# ([0.75, 0.44, 0.38, 0.75], 9, {'field': 'test_1', 'order': 9})])\n# print(ids) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n```\n\n### Query\n\n\nThe default similarity measure for query is cosine. You can specify cosine or L2 to obtain the similarity measure you need.\n\n\n```python\ncollection.query(vector=[0.36, 0.43, 0.56, 0.12], k=10)\n```\n\n\n\n\n (array([ 2, 9, 1, 4, 6, 5, 10, 7, 8, 3]),\n array([1. , 0.92355633, 0.86097705, 0.85727406, 0.81551266,\n 0.813797 , 0.78595245, 0.7741583 , 0.6871773 , 0.34695023]))\n\n\n\nThe `query_report_` attribute is the report of the most recent query. When multiple queries are conducted simultaneously, this attribute will only save the report of the last completed query result.\n\n\n```python\nprint(collection.query_report_)\n```\n\n \n * - MOST RECENT QUERY REPORT -\n | - Collection Shape: (10, 4)\n | - Query Time: 0.13898 s\n | - Query Distance: cosine\n | - Query K: 10\n | - Top 10 Results ID: [ 2 9 1 4 6 5 10 7 8 3]\n | - Top 10 Results Similarity: [1. 0.92355633 0.86097705 0.85727406 0.81551266 0.813797\n 0.78595245 0.7741583 0.6871773 0.34695023]\n * - END OF REPORT -\n \n\n\n### Use Filter\n\nUsing the Filter class for result filtering can maximize Recall. \n\nThe Filter class now supports `must`, `any`, and `must_not` parameters, all of which only accept list-type argument values. \n\nThe filtering conditions in `must` must be met, those in `must_not` must not be met. \n\nAfter filtering with `must` and `must_not` conditions, the conditions in `any` will be considered, and at least one of the conditions in `any` must be met. \n\nIf there is a conflict between the conditions in `any` and those in `must` or `must_not`, the conditions in `any` will be ignored.\n\n\n```python\nimport operator\n\nfrom min_vec.core_components.filter import Filter, FieldCondition, MatchField, IDCondition, MatchID\n\n\ncollection.query(\n vector=[0.36, 0.43, 0.56, 0.12], \n k=10, \n query_filter=Filter(\n must=[\n FieldCondition(key='field', matcher=MatchField('test_1')), # Support for filtering fields\n ], \n any=[\n FieldCondition(key='order', matcher=MatchField(8, comparator=operator.ge)),\n IDCondition(MatchID([1, 2, 3, 4, 5])), # Support for filtering IDs\n ],\n must_not=[\n IDCondition(MatchID([8])), \n FieldCondition(key='order', matcher=MatchField(8, comparator=operator.ge)),\n ]\n )\n)\n\nprint(collection.query_report_)\n```\n\n \n * - MOST RECENT QUERY REPORT -\n | - Collection Shape: (10, 4)\n | - Query Time: 0.09066 s\n | - Query Distance: cosine\n | - Query K: 10\n | - Top 10 Results ID: [2 1]\n | - Top 10 Results Similarity: [1. 0.86097705]\n * - END OF REPORT -\n \n\n\n### Drop a collection\n\n`WARNING: This operation cannot be undone`\n\n\n```python\nprint(\"Collection list before dropping:\", my_db.show_collections())\nstatus = my_db.drop_collection(\"test_collection\")\nprint(\"Collection list after dropped:\", my_db.show_collections())\n```\n\n Collection list before dropping: ['test_collection']\n {'status': 'success', 'params': {'collection_name': 'test_collection', 'exists': False}}\n Collection list after dropped: []\n\n\n## Drop the database\n\n`WARNING: This operation cannot be undone`\n\n\n```python\nmy_db.drop_database()\nmy_db\n```\n\n\n\n\n MinVectorDB remote server at http://localhost:7637 does not exist.\n\n\n\n\n```python\nmy_db.database_exists()\n```\n\n\n\n\n {'status': 'success', 'params': {'exists': False}}\n\n\n\n## What's Next\n\n- [Collection's operations](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/collections.ipynb)\n- [Add vectors to collection](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/add_vectors.ipynb)\n- [Using different indexing methods](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/index_mode.ipynb)\n- [Using different distance metric functions](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/distance.ipynb)\n- [Diversified queries](https://github.com/BirchKwok/MinVectorDB/blob/main/tutorials/queries.ipynb)\n",
"bugtrack_url": null,
"license": null,
"summary": "A pure Python-implemented, lightweight, server-optional, multi-end compatible, vector database deployable locally or remotely.",
"version": "0.3.4",
"project_urls": {
"Homepage": "https://github.com/BirchKwok/MinVectorDB"
},
"split_keywords": [
"vector",
"database"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "03d2a6ba8b607135dd11984d9466da48921ba1f64432317dd629b1a877bd4bd3",
"md5": "1a2694abf9a502ccddfa4102d1016b7e",
"sha256": "4abbd58cda1cfe2e7cd6e20f48869258996c033134cc3e66b917e98b8a163055"
},
"downloads": -1,
"filename": "MinVectorDB-0.3.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1a2694abf9a502ccddfa4102d1016b7e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 117767,
"upload_time": "2024-05-09T03:09:09",
"upload_time_iso_8601": "2024-05-09T03:09:09.016641Z",
"url": "https://files.pythonhosted.org/packages/03/d2/a6ba8b607135dd11984d9466da48921ba1f64432317dd629b1a877bd4bd3/MinVectorDB-0.3.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a1fded05cbdf6a07dd27c09f2e76d8a4bf2a01c05ca44ac7f5247276b251fac3",
"md5": "a615ce907faec57dd186886a9b3e84c9",
"sha256": "1c8ffce2c24aecbd9c015362165eed3550fb2fdc77dd76602a093255f74391ac"
},
"downloads": -1,
"filename": "minvectordb-0.3.4.tar.gz",
"has_sig": false,
"md5_digest": "a615ce907faec57dd186886a9b3e84c9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 52459,
"upload_time": "2024-05-09T03:09:12",
"upload_time_iso_8601": "2024-05-09T03:09:12.010088Z",
"url": "https://files.pythonhosted.org/packages/a1/fd/ed05cbdf6a07dd27c09f2e76d8a4bf2a01c05ca44ac7f5247276b251fac3/minvectordb-0.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 03:09:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BirchKwok",
"github_project": "MinVectorDB",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "minvectordb"
}