# SemanticStore
<p align="center">
<img src="https://img.shields.io/badge/license-MIT-blue.svg" />
<img src="https://img.shields.io/badge/version-v0.1.0-red" alt="Alpha Version">
<a href="https://github.com/pragneshbarik/semantic-store/txtai">
<img src="https://img.shields.io/github/last-commit/pragneshbarik/semantic-store.svg?style=flat&color=blue" alt="GitHub last commit"/>
</a>
<img src="https://img.shields.io/github/contributors/pragneshbarik/semantic-store" />
<a href="https://github.com/pragneshbarik/semantic-store/txtai/issues">
<img src="https://img.shields.io/github/issues/pragneshbarik/semantic-store.svg?style=flat&color=success" alt="GitHub issues"/>
</a>
<img src="https://img.shields.io/badge/discord-join-blue?style=flat&logo=discord&logocolor=white" alt="Join Slack"/>
</p>
<p align="center">
<img src = "https://github.com/pragneshbarik/semantic-store/assets/65221256/3c47be22-28e0-4ece-80de-e8a7bfa111bf" width='60%'/>
</p>
# What is SemanticStore
A no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion.
## Getting Started
Follow these steps to get started with the SemanticStore:
1. **Install into environment**
```shell
pip install semantic-store
```
# Overview of KV
A no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion. Only requires numpy and faiss as additional requirements.
## Getting Started with KV
1. **CRUD Operations**
KV provides a similar interface to that of a python dictionary.
```python
from semanticstore import KV
# IF PRESENT LOAD DB, ELSE CREATE NEW
kv = KV('path/of/data_base', num_dimensions = 2)
```
```py
# CREATE
kv['foo'] = {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}
kv['star'] = {'vector': [1.0, 1.0],'payload': 'angel'}
kv[2] = {'vector': [3.0, 5.0],'payload': [1, 2, 5]}
```
```py
# READ
print(kv['foo'])
>> {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}
```
```py
# UPDATE
kv['foo'] = {'vector':[-1.0, -3.4], 'payload' : {'subtitle' : 'villian'}}
```
```py
# DELETE
del kv['foo']
kv.remove('star')
```
```py
# FIND
kv.find('bar')
>> False
```
```py
# COMMIT
kv.commit() # Flush changes to disk
```
```py
# CLOSE
kv.close() # Unlocks and frees the database
```
2. **Vector Operations**
KV provides these following vector operations
**1. Nearest neighbor search:** Nearest neighbor search in a vector database is a specialized problem that deals with finding the nearest neighbors to a given query vector within a large database of vectors.
<p align="center"> <img src = 'assets/image-4.png' />
</p>
```python
# kv[query_vector][top_k]
kv[[1.0, 2.1]][2]
# OR
# kv.search(query, top_k)
kv.search(query=[1.0, 2.1], top_k=2)
# Returns results in sorted according to distance
>> [{'key': 'star',
'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
'distance': 1.2099998},
{'key': 'foo',
'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
'distance': 1.6900005}]
```
Also supports slicing, might come handy sometimes.
```python
# kv[query_vector][truncate_offset : top_k]
kv[[1.0, 3.1]][1:2]
>> [{'key': 'star',
'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
'distance': 4.4099994}]
```
**2. Range Search:** Range search is a data retrieval or querying technique used in databases and data structures to find all data points or items that fall within a specified range or region in a multidimensional space.
> Can be used in RAG and HyDE for limiting response of a LLM between two contexts.
![Alt text](assets/image-6.png)
```python
# CASE 1 : kv[query_vector : radius]
kv[[1.0, 2.1] : 5.0]
# Results are not sorted
>> [{'key': 'foo',
'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
'distance': 1.6900005},
{'key': 'star',
'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
'distance': 1.2099998},
{'key': '2',
'value': {'vector': [3.0, 5.0], 'payload': [1, 2, 5]},
'distance': 12.410001}]
```
```python
# CASE 2 : kv[initial_vector : final_vector]
kv[[1.0, 2.1] : [3, 5]]
# Results are not sorted
>> [{'key': 'foo',
'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
'distance': 1.6900005},
{'key': 'star',
'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
'distance': 1.2099998}]
```
**3. Advanced data filtering:** KV supports advanced data filtering using jmespath, allowing you to filter items based on specific criteria. Learn more about jmespath [here](https://jmespath.org/tutorial.html).
```python
# CASE 2 : kv[initial_vector : final_vector]
kv[[1.0, 2.1] : [3, 5]]
>> [{'key': 'foo',
'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
'distance': 1.6900005},
{'key': 'star',
'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
'distance': 1.2099998}]
kv[[1.0, 2.1] : [3, 5]].filter('[].payload.title')
>> 'hero'
```
You can chain multiple jmespath filters for granular control.
```py
kv.search(query, top_k).filter('<filter1>')
.filter('<filter2>')
.filter('<filter3>')
.fetch() # Fetch returns final search object.
```
## Contributing
Contributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:
1. Fork the repository.
2. Create a branch: git checkout -b feature/your-feature or fix/your-fix.
3. Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.
4. Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.
5. Open a pull request
Raw data
{
"_id": null,
"home_page": "",
"name": "semantic-store",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "embedded database,vector,database,semantic,search",
"author": "Pragnesh Barik",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/68/58/fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425/semantic-store-0.0.15.tar.gz",
"platform": null,
"description": "# SemanticStore\r\n\r\n<p align=\"center\">\r\n <img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" /> \r\n <img src=\"https://img.shields.io/badge/version-v0.1.0-red\" alt=\"Alpha Version\">\r\n <a href=\"https://github.com/pragneshbarik/semantic-store/txtai\">\r\n <img src=\"https://img.shields.io/github/last-commit/pragneshbarik/semantic-store.svg?style=flat&color=blue\" alt=\"GitHub last commit\"/>\r\n </a>\r\n <img src=\"https://img.shields.io/github/contributors/pragneshbarik/semantic-store\" />\r\n <a href=\"https://github.com/pragneshbarik/semantic-store/txtai/issues\">\r\n <img src=\"https://img.shields.io/github/issues/pragneshbarik/semantic-store.svg?style=flat&color=success\" alt=\"GitHub issues\"/>\r\n </a>\r\n <img src=\"https://img.shields.io/badge/discord-join-blue?style=flat&logo=discord&logocolor=white\" alt=\"Join Slack\"/>\r\n\r\n \r\n \r\n</p>\r\n\r\n\r\n<p align=\"center\">\r\n<img src = \"https://github.com/pragneshbarik/semantic-store/assets/65221256/3c47be22-28e0-4ece-80de-e8a7bfa111bf\" width='60%'/>\r\n</p>\r\n\r\n\r\n# What is SemanticStore\r\nA no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion.\r\n\r\n\r\n## Getting Started\r\n\r\nFollow these steps to get started with the SemanticStore:\r\n\r\n1. **Install into environment**\r\n\r\n```shell\r\npip install semantic-store\r\n```\r\n# Overview of KV\r\n\r\nA no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion. Only requires numpy and faiss as additional requirements.\r\n\r\n## Getting Started with KV\r\n\r\n1. **CRUD Operations**\r\n\r\nKV provides a similar interface to that of a python dictionary. \r\n```python\r\nfrom semanticstore import KV\r\n\r\n# IF PRESENT LOAD DB, ELSE CREATE NEW\r\nkv = KV('path/of/data_base', num_dimensions = 2)\r\n```\r\n```py\r\n# CREATE\r\nkv['foo'] = {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}\r\nkv['star'] = {'vector': [1.0, 1.0],'payload': 'angel'}\r\nkv[2] = {'vector': [3.0, 5.0],'payload': [1, 2, 5]}\r\n```\r\n```py\r\n# READ\r\nprint(kv['foo'])\r\n>> {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}\r\n```\r\n```py\r\n# UPDATE\r\nkv['foo'] = {'vector':[-1.0, -3.4], 'payload' : {'subtitle' : 'villian'}}\r\n```\r\n```py\r\n# DELETE\r\ndel kv['foo']\r\nkv.remove('star')\r\n```\r\n```py\r\n# FIND\r\nkv.find('bar')\r\n>> False\r\n```\r\n```py\r\n# COMMIT\r\nkv.commit() # Flush changes to disk\r\n```\r\n```py\r\n# CLOSE\r\nkv.close() # Unlocks and frees the database \r\n```\r\n\r\n2. **Vector Operations**\r\n\r\nKV provides these following vector operations\r\n\r\n**1. Nearest neighbor search:** Nearest neighbor search in a vector database is a specialized problem that deals with finding the nearest neighbors to a given query vector within a large database of vectors. \r\n\r\n<p align=\"center\"> <img src = 'assets/image-4.png' /> \r\n</p>\r\n\r\n```python\r\n# kv[query_vector][top_k]\r\nkv[[1.0, 2.1]][2]\r\n\r\n# OR\r\n\r\n# kv.search(query, top_k)\r\nkv.search(query=[1.0, 2.1], top_k=2)\r\n\r\n# Returns results in sorted according to distance\r\n>> [{'key': 'star',\r\n 'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n 'distance': 1.2099998},\r\n {'key': 'foo',\r\n 'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n 'distance': 1.6900005}]\r\n```\r\nAlso supports slicing, might come handy sometimes. \r\n```python \r\n# kv[query_vector][truncate_offset : top_k]\r\nkv[[1.0, 3.1]][1:2]\r\n\r\n>> [{'key': 'star',\r\n 'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n 'distance': 4.4099994}]\r\n```\r\n\r\n**2. Range Search:** Range search is a data retrieval or querying technique used in databases and data structures to find all data points or items that fall within a specified range or region in a multidimensional space. \r\n\r\n> Can be used in RAG and HyDE for limiting response of a LLM between two contexts.\r\n\r\n\r\n![Alt text](assets/image-6.png)\r\n```python\r\n# CASE 1 : kv[query_vector : radius]\r\nkv[[1.0, 2.1] : 5.0]\r\n\r\n# Results are not sorted\r\n>> [{'key': 'foo',\r\n 'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n 'distance': 1.6900005},\r\n {'key': 'star',\r\n 'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n 'distance': 1.2099998},\r\n {'key': '2',\r\n 'value': {'vector': [3.0, 5.0], 'payload': [1, 2, 5]},\r\n 'distance': 12.410001}]\r\n```\r\n\r\n```python\r\n# CASE 2 : kv[initial_vector : final_vector]\r\nkv[[1.0, 2.1] : [3, 5]]\r\n\r\n# Results are not sorted\r\n>> [{'key': 'foo',\r\n 'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n 'distance': 1.6900005},\r\n {'key': 'star',\r\n 'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n 'distance': 1.2099998}]\r\n```\r\n\r\n**3. Advanced data filtering:** KV supports advanced data filtering using jmespath, allowing you to filter items based on specific criteria. Learn more about jmespath [here](https://jmespath.org/tutorial.html).\r\n\r\n```python\r\n# CASE 2 : kv[initial_vector : final_vector]\r\nkv[[1.0, 2.1] : [3, 5]]\r\n\r\n>> [{'key': 'foo',\r\n 'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n 'distance': 1.6900005},\r\n {'key': 'star',\r\n 'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n 'distance': 1.2099998}]\r\n\r\nkv[[1.0, 2.1] : [3, 5]].filter('[].payload.title')\r\n>> 'hero'\r\n```\r\n\r\nYou can chain multiple jmespath filters for granular control.\r\n\r\n```py\r\nkv.search(query, top_k).filter('<filter1>')\r\n .filter('<filter2>')\r\n .filter('<filter3>')\r\n .fetch() # Fetch returns final search object.\r\n```\r\n\r\n\r\n\r\n## Contributing\r\nContributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:\r\n\r\n1. Fork the repository.\r\n2. Create a branch: git checkout -b feature/your-feature or fix/your-fix.\r\n3. Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.\r\n4. Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.\r\n5. Open a pull request\r\n",
"bugtrack_url": null,
"license": "",
"summary": "An embedded vector database for semantic data storage and retrieval",
"version": "0.0.15",
"project_urls": null,
"split_keywords": [
"embedded database",
"vector",
"database",
"semantic",
"search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "22ba6a821617cec19dd83d7fe7565aa43836b1072cd2e6503999c1f8d1744a21",
"md5": "32b6eb32428686004c4181e4c7a352e4",
"sha256": "2ec2f8d590bc2695e12ac788ead849d40b6e868d4cf3e4fb910f65b3cd6bc276"
},
"downloads": -1,
"filename": "semantic_store-0.0.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "32b6eb32428686004c4181e4c7a352e4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 26685,
"upload_time": "2023-10-02T06:04:15",
"upload_time_iso_8601": "2023-10-02T06:04:15.237768Z",
"url": "https://files.pythonhosted.org/packages/22/ba/6a821617cec19dd83d7fe7565aa43836b1072cd2e6503999c1f8d1744a21/semantic_store-0.0.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6858fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425",
"md5": "d591875e82bebc8faf48fb81b8774ea0",
"sha256": "93ea32b8e773f265daf3cdc3d0cc58495c9a759958c70f53d38c6b281fc766f6"
},
"downloads": -1,
"filename": "semantic-store-0.0.15.tar.gz",
"has_sig": false,
"md5_digest": "d591875e82bebc8faf48fb81b8774ea0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24714,
"upload_time": "2023-10-02T06:04:18",
"upload_time_iso_8601": "2023-10-02T06:04:18.189529Z",
"url": "https://files.pythonhosted.org/packages/68/58/fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425/semantic-store-0.0.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-02 06:04:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "semantic-store"
}