semantic-store


Namesemantic-store JSON
Version 0.0.15 PyPI version JSON
download
home_page
SummaryAn embedded vector database for semantic data storage and retrieval
upload_time2023-10-02 06:04:18
maintainer
docs_urlNone
authorPragnesh Barik
requires_python
license
keywords embedded database vector database semantic search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SemanticStore

<p align="center">
   <img src="https://img.shields.io/badge/license-MIT-blue.svg" /> 
   <img src="https://img.shields.io/badge/version-v0.1.0-red" alt="Alpha Version">
    <a href="https://github.com/pragneshbarik/semantic-store/txtai">
        <img src="https://img.shields.io/github/last-commit/pragneshbarik/semantic-store.svg?style=flat&color=blue" alt="GitHub last commit"/>
    </a>
   <img src="https://img.shields.io/github/contributors/pragneshbarik/semantic-store" />
    <a href="https://github.com/pragneshbarik/semantic-store/txtai/issues">
        <img src="https://img.shields.io/github/issues/pragneshbarik/semantic-store.svg?style=flat&color=success" alt="GitHub issues"/>
    </a>
        <img src="https://img.shields.io/badge/discord-join-blue?style=flat&logo=discord&logocolor=white" alt="Join Slack"/>

    
   
</p>


<p align="center">
<img src = "https://github.com/pragneshbarik/semantic-store/assets/65221256/3c47be22-28e0-4ece-80de-e8a7bfa111bf" width='60%'/>
</p>


# What is SemanticStore
A no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion.


## Getting Started

Follow these steps to get started with the SemanticStore:

1. **Install into environment**

```shell
pip install semantic-store
```
# Overview of KV

A no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion. Only requires numpy and faiss as additional requirements.

## Getting Started with KV

1. **CRUD Operations**

KV provides a similar interface to that of a python dictionary.  
```python
from semanticstore import KV

# IF PRESENT LOAD DB, ELSE CREATE NEW
kv = KV('path/of/data_base', num_dimensions = 2)
```
```py
# CREATE
kv['foo'] = {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}
kv['star'] = {'vector': [1.0, 1.0],'payload': 'angel'}
kv[2] = {'vector': [3.0, 5.0],'payload': [1, 2, 5]}
```
```py
# READ
print(kv['foo'])
>> {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}
```
```py
# UPDATE
kv['foo'] = {'vector':[-1.0, -3.4], 'payload' : {'subtitle' : 'villian'}}
```
```py
# DELETE
del kv['foo']
kv.remove('star')
```
```py
# FIND
kv.find('bar')
>> False
```
```py
# COMMIT
kv.commit() # Flush changes to disk
```
```py
# CLOSE
kv.close() # Unlocks and frees the database 
```

2. **Vector Operations**

KV provides these following vector operations

**1. Nearest neighbor search:** Nearest neighbor search in a vector database is a specialized problem that deals with finding the nearest neighbors to a given query vector within a large database of vectors. 

<p align="center"> <img src = 'assets/image-4.png' /> 
</p>

```python
# kv[query_vector][top_k]
kv[[1.0, 2.1]][2]

# OR

# kv.search(query, top_k)
kv.search(query=[1.0, 2.1], top_k=2)

# Returns results in sorted according to distance
>> [{'key': 'star',
  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
  'distance': 1.2099998},
 {'key': 'foo',
  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
  'distance': 1.6900005}]
```
Also supports slicing, might come handy sometimes. 
```python 
# kv[query_vector][truncate_offset : top_k]
kv[[1.0, 3.1]][1:2]

>> [{'key': 'star',
  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
  'distance': 4.4099994}]
```

**2. Range Search:** Range search is a data retrieval or querying technique used in databases and data structures to find all data points or items that fall within a specified range or region in a multidimensional space. 

> Can be used in RAG and HyDE for limiting response of a LLM between two contexts.


![Alt text](assets/image-6.png)
```python
# CASE 1 : kv[query_vector : radius]
kv[[1.0, 2.1] : 5.0]

#  Results are not sorted
>> [{'key': 'foo',
  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
  'distance': 1.6900005},
  {'key': 'star',
  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
  'distance': 1.2099998},
  {'key': '2',
  'value': {'vector': [3.0, 5.0], 'payload': [1, 2, 5]},
  'distance': 12.410001}]
```

```python
# CASE 2 : kv[initial_vector : final_vector]
kv[[1.0, 2.1] : [3, 5]]

#  Results are not sorted
>> [{'key': 'foo',
  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
  'distance': 1.6900005},
  {'key': 'star',
  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
  'distance': 1.2099998}]
```

**3. Advanced data filtering:**  KV supports advanced data filtering using jmespath, allowing you to filter items based on specific criteria. Learn more about jmespath [here](https://jmespath.org/tutorial.html).

```python
# CASE 2 : kv[initial_vector : final_vector]
kv[[1.0, 2.1] : [3, 5]]

>> [{'key': 'foo',
  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},
  'distance': 1.6900005},
  {'key': 'star',
  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},
  'distance': 1.2099998}]

kv[[1.0, 2.1] : [3, 5]].filter('[].payload.title')
>> 'hero'
```

You can chain multiple jmespath filters for granular control.

```py
kv.search(query, top_k).filter('<filter1>')
                       .filter('<filter2>')
                       .filter('<filter3>')
                       .fetch() # Fetch returns final search object.
```



## Contributing
Contributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:

1. Fork the repository.
2. Create a branch: git checkout -b feature/your-feature or fix/your-fix.
3. Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.
4. Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.
5. Open a pull request

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "semantic-store",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "embedded database,vector,database,semantic,search",
    "author": "Pragnesh Barik",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/68/58/fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425/semantic-store-0.0.15.tar.gz",
    "platform": null,
    "description": "# SemanticStore\r\n\r\n<p align=\"center\">\r\n   <img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" /> \r\n   <img src=\"https://img.shields.io/badge/version-v0.1.0-red\" alt=\"Alpha Version\">\r\n    <a href=\"https://github.com/pragneshbarik/semantic-store/txtai\">\r\n        <img src=\"https://img.shields.io/github/last-commit/pragneshbarik/semantic-store.svg?style=flat&color=blue\" alt=\"GitHub last commit\"/>\r\n    </a>\r\n   <img src=\"https://img.shields.io/github/contributors/pragneshbarik/semantic-store\" />\r\n    <a href=\"https://github.com/pragneshbarik/semantic-store/txtai/issues\">\r\n        <img src=\"https://img.shields.io/github/issues/pragneshbarik/semantic-store.svg?style=flat&color=success\" alt=\"GitHub issues\"/>\r\n    </a>\r\n        <img src=\"https://img.shields.io/badge/discord-join-blue?style=flat&logo=discord&logocolor=white\" alt=\"Join Slack\"/>\r\n\r\n    \r\n   \r\n</p>\r\n\r\n\r\n<p align=\"center\">\r\n<img src = \"https://github.com/pragneshbarik/semantic-store/assets/65221256/3c47be22-28e0-4ece-80de-e8a7bfa111bf\" width='60%'/>\r\n</p>\r\n\r\n\r\n# What is SemanticStore\r\nA no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion.\r\n\r\n\r\n## Getting Started\r\n\r\nFollow these steps to get started with the SemanticStore:\r\n\r\n1. **Install into environment**\r\n\r\n```shell\r\npip install semantic-store\r\n```\r\n# Overview of KV\r\n\r\nA no non-sense Key-Value Vector database, built around faiss, provides a pythonic interface for insertion, deletion, updation and deletion. Only requires numpy and faiss as additional requirements.\r\n\r\n## Getting Started with KV\r\n\r\n1. **CRUD Operations**\r\n\r\nKV provides a similar interface to that of a python dictionary.  \r\n```python\r\nfrom semanticstore import KV\r\n\r\n# IF PRESENT LOAD DB, ELSE CREATE NEW\r\nkv = KV('path/of/data_base', num_dimensions = 2)\r\n```\r\n```py\r\n# CREATE\r\nkv['foo'] = {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}\r\nkv['star'] = {'vector': [1.0, 1.0],'payload': 'angel'}\r\nkv[2] = {'vector': [3.0, 5.0],'payload': [1, 2, 5]}\r\n```\r\n```py\r\n# READ\r\nprint(kv['foo'])\r\n>> {'vector':[1.0, 3.4], 'payload' : {'title' : 'hero'}}\r\n```\r\n```py\r\n# UPDATE\r\nkv['foo'] = {'vector':[-1.0, -3.4], 'payload' : {'subtitle' : 'villian'}}\r\n```\r\n```py\r\n# DELETE\r\ndel kv['foo']\r\nkv.remove('star')\r\n```\r\n```py\r\n# FIND\r\nkv.find('bar')\r\n>> False\r\n```\r\n```py\r\n# COMMIT\r\nkv.commit() # Flush changes to disk\r\n```\r\n```py\r\n# CLOSE\r\nkv.close() # Unlocks and frees the database \r\n```\r\n\r\n2. **Vector Operations**\r\n\r\nKV provides these following vector operations\r\n\r\n**1. Nearest neighbor search:** Nearest neighbor search in a vector database is a specialized problem that deals with finding the nearest neighbors to a given query vector within a large database of vectors. \r\n\r\n<p align=\"center\"> <img src = 'assets/image-4.png' /> \r\n</p>\r\n\r\n```python\r\n# kv[query_vector][top_k]\r\nkv[[1.0, 2.1]][2]\r\n\r\n# OR\r\n\r\n# kv.search(query, top_k)\r\nkv.search(query=[1.0, 2.1], top_k=2)\r\n\r\n# Returns results in sorted according to distance\r\n>> [{'key': 'star',\r\n  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n  'distance': 1.2099998},\r\n {'key': 'foo',\r\n  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n  'distance': 1.6900005}]\r\n```\r\nAlso supports slicing, might come handy sometimes. \r\n```python \r\n# kv[query_vector][truncate_offset : top_k]\r\nkv[[1.0, 3.1]][1:2]\r\n\r\n>> [{'key': 'star',\r\n  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n  'distance': 4.4099994}]\r\n```\r\n\r\n**2. Range Search:** Range search is a data retrieval or querying technique used in databases and data structures to find all data points or items that fall within a specified range or region in a multidimensional space. \r\n\r\n> Can be used in RAG and HyDE for limiting response of a LLM between two contexts.\r\n\r\n\r\n![Alt text](assets/image-6.png)\r\n```python\r\n# CASE 1 : kv[query_vector : radius]\r\nkv[[1.0, 2.1] : 5.0]\r\n\r\n#  Results are not sorted\r\n>> [{'key': 'foo',\r\n  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n  'distance': 1.6900005},\r\n  {'key': 'star',\r\n  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n  'distance': 1.2099998},\r\n  {'key': '2',\r\n  'value': {'vector': [3.0, 5.0], 'payload': [1, 2, 5]},\r\n  'distance': 12.410001}]\r\n```\r\n\r\n```python\r\n# CASE 2 : kv[initial_vector : final_vector]\r\nkv[[1.0, 2.1] : [3, 5]]\r\n\r\n#  Results are not sorted\r\n>> [{'key': 'foo',\r\n  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n  'distance': 1.6900005},\r\n  {'key': 'star',\r\n  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n  'distance': 1.2099998}]\r\n```\r\n\r\n**3. Advanced data filtering:**  KV supports advanced data filtering using jmespath, allowing you to filter items based on specific criteria. Learn more about jmespath [here](https://jmespath.org/tutorial.html).\r\n\r\n```python\r\n# CASE 2 : kv[initial_vector : final_vector]\r\nkv[[1.0, 2.1] : [3, 5]]\r\n\r\n>> [{'key': 'foo',\r\n  'value': {'vector': [1.0, 3.4], 'payload': {'title': 'hero'}},\r\n  'distance': 1.6900005},\r\n  {'key': 'star',\r\n  'value': {'vector': [1.0, 1.0], 'payload': 'angel'},\r\n  'distance': 1.2099998}]\r\n\r\nkv[[1.0, 2.1] : [3, 5]].filter('[].payload.title')\r\n>> 'hero'\r\n```\r\n\r\nYou can chain multiple jmespath filters for granular control.\r\n\r\n```py\r\nkv.search(query, top_k).filter('<filter1>')\r\n                       .filter('<filter2>')\r\n                       .filter('<filter3>')\r\n                       .fetch() # Fetch returns final search object.\r\n```\r\n\r\n\r\n\r\n## Contributing\r\nContributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:\r\n\r\n1. Fork the repository.\r\n2. Create a branch: git checkout -b feature/your-feature or fix/your-fix.\r\n3. Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.\r\n4. Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.\r\n5. Open a pull request\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An embedded vector database for semantic data storage and retrieval",
    "version": "0.0.15",
    "project_urls": null,
    "split_keywords": [
        "embedded database",
        "vector",
        "database",
        "semantic",
        "search"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22ba6a821617cec19dd83d7fe7565aa43836b1072cd2e6503999c1f8d1744a21",
                "md5": "32b6eb32428686004c4181e4c7a352e4",
                "sha256": "2ec2f8d590bc2695e12ac788ead849d40b6e868d4cf3e4fb910f65b3cd6bc276"
            },
            "downloads": -1,
            "filename": "semantic_store-0.0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "32b6eb32428686004c4181e4c7a352e4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 26685,
            "upload_time": "2023-10-02T06:04:15",
            "upload_time_iso_8601": "2023-10-02T06:04:15.237768Z",
            "url": "https://files.pythonhosted.org/packages/22/ba/6a821617cec19dd83d7fe7565aa43836b1072cd2e6503999c1f8d1744a21/semantic_store-0.0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6858fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425",
                "md5": "d591875e82bebc8faf48fb81b8774ea0",
                "sha256": "93ea32b8e773f265daf3cdc3d0cc58495c9a759958c70f53d38c6b281fc766f6"
            },
            "downloads": -1,
            "filename": "semantic-store-0.0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "d591875e82bebc8faf48fb81b8774ea0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24714,
            "upload_time": "2023-10-02T06:04:18",
            "upload_time_iso_8601": "2023-10-02T06:04:18.189529Z",
            "url": "https://files.pythonhosted.org/packages/68/58/fe7d2cd1ffbf2431b9e6091763038671681d4c4592a09193c58852e02425/semantic-store-0.0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-02 06:04:18",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "semantic-store"
}
        
Elapsed time: 0.12628s