VDBforGenAI


NameVDBforGenAI JSON
Version 0.23 PyPI version JSON
download
home_pagehttps://github.com/JakubJDolezal/VDBforGenAI
SummaryA simple package for generating and querying Vector Databases for Generative AI as well any other reason
upload_time2023-05-03 21:50:48
maintainer
docs_urlNone
authorJakub Dolezal
requires_python>=3.6
license
keywords vector database generative ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # VDBforGenAI

VDBforGenAI is a Python package for building vector databases of text for use in natural language processing applications.

## Usage

To use VDBforGenAI, first install the package and its dependencies:

```commandline
pip install git+https://github.com/JakubJDolezal/VDBforGenAI.git
```
Next, create an instance of the VectorDatabase class by passing in a list of strings, which represent the context you care about. Each string can contain multiple sentences.


## Minimal example
You instantiate a database and then tell it where to load
```python
from VDBforGenAI.VectorDatabase import VectorDatabase

vdb = VectorDatabase(splitting_choice="length")
vdb.load_all_in_directory('./ExampleFolder')


```
Once you have a VectorDatabase instance, you can use the get_context_from_entire_database method to retrieve the context that is most similar to a given input text.

```python
context = vdb.get_context_from_entire_database('What does parma ham go well with?')

print(context)
```
This retrieves the most similar piece of text to "What does parma ham go well with?" from your indexed directory
You can also specify which level and which directory on that level you wish to search
```python
context_selection=vdb.get_index_and_context_from_selection('Who made this?', 2, 'SubfolderOfLies')

```
The directory level and value structure is saved in 
```python
print(vdb.dlv)
```


Dependencies

VDBforGenAI has the following dependencies:
```
        "faiss-cpu",
        "transformers",
        "torch",
        "numpy","PyPDF2",'docx','python-docx
```


Contributions are welcome! If you have any suggestions or issues, please create an issue or pull request on the GitHub repository.
License

VDBforGenAI is licensed under the MIT License.

# More Usage -
## How to add new strings



## Passing an encoder and tokenizer from Hugging Face's Transformers library:


```
from transformers import AutoTokenizer, AutoModel
from VDBforGenAI import VectorDatabase

[//]: # ( Initialize the tokenizer and encoder)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoder = AutoModel.from_pretrained('bert-base-uncased')

[//]: # ( Initialize the VectorDatabase)
vdb = VectorDatabase( encoder=encoder, tokenizer=tokenizer)

```
Similarly, you can pass your own encoder as a torch model if you provide a tokenizer and the 0th output is the encoding.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/JakubJDolezal/VDBforGenAI",
    "name": "VDBforGenAI",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "Vector Database,Generative AI",
    "author": "Jakub Dolezal",
    "author_email": "jakubdolezal93@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/70/34/d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4/VDBforGenAI-0.23.tar.gz",
    "platform": null,
    "description": "# VDBforGenAI\r\n\r\nVDBforGenAI is a Python package for building vector databases of text for use in natural language processing applications.\r\n\r\n## Usage\r\n\r\nTo use VDBforGenAI, first install the package and its dependencies:\r\n\r\n```commandline\r\npip install git+https://github.com/JakubJDolezal/VDBforGenAI.git\r\n```\r\nNext, create an instance of the VectorDatabase class by passing in a list of strings, which represent the context you care about. Each string can contain multiple sentences.\r\n\r\n\r\n## Minimal example\r\nYou instantiate a database and then tell it where to load\r\n```python\r\nfrom VDBforGenAI.VectorDatabase import VectorDatabase\r\n\r\nvdb = VectorDatabase(splitting_choice=\"length\")\r\nvdb.load_all_in_directory('./ExampleFolder')\r\n\r\n\r\n```\r\nOnce you have a VectorDatabase instance, you can use the get_context_from_entire_database method to retrieve the context that is most similar to a given input text.\r\n\r\n```python\r\ncontext = vdb.get_context_from_entire_database('What does parma ham go well with?')\r\n\r\nprint(context)\r\n```\r\nThis retrieves the most similar piece of text to \"What does parma ham go well with?\" from your indexed directory\r\nYou can also specify which level and which directory on that level you wish to search\r\n```python\r\ncontext_selection=vdb.get_index_and_context_from_selection('Who made this?', 2, 'SubfolderOfLies')\r\n\r\n```\r\nThe directory level and value structure is saved in \r\n```python\r\nprint(vdb.dlv)\r\n```\r\n\r\n\r\nDependencies\r\n\r\nVDBforGenAI has the following dependencies:\r\n```\r\n        \"faiss-cpu\",\r\n        \"transformers\",\r\n        \"torch\",\r\n        \"numpy\",\"PyPDF2\",'docx','python-docx\r\n```\r\n\r\n\r\nContributions are welcome! If you have any suggestions or issues, please create an issue or pull request on the GitHub repository.\r\nLicense\r\n\r\nVDBforGenAI is licensed under the MIT License.\r\n\r\n# More Usage -\r\n## How to add new strings\r\n\r\n\r\n\r\n## Passing an encoder and tokenizer from Hugging Face's Transformers library:\r\n\r\n\r\n```\r\nfrom transformers import AutoTokenizer, AutoModel\r\nfrom VDBforGenAI import VectorDatabase\r\n\r\n[//]: # ( Initialize the tokenizer and encoder)\r\ntokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')\r\nencoder = AutoModel.from_pretrained('bert-base-uncased')\r\n\r\n[//]: # ( Initialize the VectorDatabase)\r\nvdb = VectorDatabase( encoder=encoder, tokenizer=tokenizer)\r\n\r\n```\r\nSimilarly, you can pass your own encoder as a torch model if you provide a tokenizer and the 0th output is the encoding.\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A simple package for generating and querying Vector Databases for Generative AI as well any other reason",
    "version": "0.23",
    "project_urls": {
        "Homepage": "https://github.com/JakubJDolezal/VDBforGenAI"
    },
    "split_keywords": [
        "vector database",
        "generative ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3b0928d8e86d17c6e0e74041007d89c52fdf5c9d8276d64ea79776f55619480d",
                "md5": "548860511be4fe0fba8b7d05dbcc219e",
                "sha256": "3fd0a8ef081542d46a3561531b1e1cfcc366d15924d384a80c5e247c856705da"
            },
            "downloads": -1,
            "filename": "VDBforGenAI-0.23-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "548860511be4fe0fba8b7d05dbcc219e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 16543,
            "upload_time": "2023-05-03T21:50:46",
            "upload_time_iso_8601": "2023-05-03T21:50:46.078381Z",
            "url": "https://files.pythonhosted.org/packages/3b/09/28d8e86d17c6e0e74041007d89c52fdf5c9d8276d64ea79776f55619480d/VDBforGenAI-0.23-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7034d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4",
                "md5": "cb9b052a7c3eb78ad670287463068c01",
                "sha256": "a0d937d7831d633418b42a14897ff2f3380e0675fb02cd4cf223adc75fcd0c52"
            },
            "downloads": -1,
            "filename": "VDBforGenAI-0.23.tar.gz",
            "has_sig": false,
            "md5_digest": "cb9b052a7c3eb78ad670287463068c01",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10465,
            "upload_time": "2023-05-03T21:50:48",
            "upload_time_iso_8601": "2023-05-03T21:50:48.343461Z",
            "url": "https://files.pythonhosted.org/packages/70/34/d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4/VDBforGenAI-0.23.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-03 21:50:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "JakubJDolezal",
    "github_project": "VDBforGenAI",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "vdbforgenai"
}
        
Elapsed time: 0.29606s