mocker-db


Namemocker-db JSON
Version 0.2.5 PyPI version JSON
download
home_pageNone
SummaryA mock handler for simulating a vector database.
upload_time2024-09-21 02:44:21
maintainerNone
docs_urlNone
authorKyrylo Mordan
requires_pythonNone
licensemit
keywords ['aa-paa-tool']
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Mocker db

MockerDB is a python module that contains mock vector database like solution built around
python dictionary data type. It contains methods necessary to interact with this 'database',
embed, search and persist.

```python
from mocker_db import MockerDB, MockerConnector, SentenceTransformerEmbedder
```

### 1. Inserting values into the database

MockerDB can be used as ephemeral database where everything is saved in memory, but also can be persisted in one file for the database and another for embeddings storage.

Embedder is set to sentence_transformer by default and processed locally, custom embedders that connect to an api or use other open source models could be used as long as they have the same interface. 


```python
# Initialization
handler = MockerDB(
    # optional
    embedder_params = {'model_name_or_path' : 'paraphrase-multilingual-mpnet-base-v2',
                        'processing_type' : 'batch',
                        'tbatch_size' : 500},
    use_embedder = True,
    embedder = SentenceTransformerEmbedder,
    persist = True
)
# Initialize empty database
handler.establish_connection(
    # optional for persist
    file_path = "./mock_persist",
    embs_file_path = "./mock_embs_persist",
)
```


```python
# Insert Data
values_list = [
    {"text": "Sample text 1",
     "text2": "Sample text 1"},
    {"text": "Sample text 2",
     "text2": "Sample text 2"}
]
handler.insert_values(values_list, "text")
print(f"Items in the database {len(handler.data)}")
```

    Items in the database 2


### 2. Searching and retrieving values from the database

There are multiple options for search which could be used together or separately:

- simple filter
- filter with keywords
- llm filter
- search based on similarity

- get all keys


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1",
    }
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...'}]


- get all keys with keywords search


```python
results = handler.search_database(
    query = "text",
    # when keyword key is provided filter is used to pass keywords
    filter_criteria = {
        "text" : ["1"],
    },
    keyword_check_keys = ['text'],
    # percentage of filter keyword allowed to be different
    keyword_check_cutoff = 1,
    return_keys_list=['text']
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'text': 'Sample text 1...'}]


- get all key - text2


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1",
    },
    return_keys_list=["-text2"])
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'text': 'Sample text 1...'}]


- get all keys + distance


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["+&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...', '&distance': '0.6744726...'}]


- get distance


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'&distance': '0.6744726...'}]


- get all keys + embeddings


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["+embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
```

    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]


- get embeddings


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

```

    [{'embedding': '[-4.94665056e-02 -2.38676026e-...'}]


- get embeddings and embedded field


```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["embedding", "+&embedded_field"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

```

    [{'embedding': '[-4.94665056e-02 -2.38676026e-...', '&embedded_field': 'text...'}]


### 3. Removing values from the database


```python
print(f"Items in the database {len(handler.data)}")
handler.remove_from_database(filter_criteria = {"text": "Sample text 1"})
print(f"Items left in the database {len(handler.data)}")

```

    Items in the database 2
    Items left in the database 1


### 4 Embeding text


```python
results = handler.embed_texts(
    texts = [
    "Short. Variation 1: Short.",
    "Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another"
  ]
)

print(str(results)[0:300] + "...")
```

    {'embeddings': [[0.04973424971103668, -0.43570247292518616, -0.014545125886797905, -0.03648979589343071, -0.04165348783135414, -0.04544278606772423, -0.07025150209665298, 0.10043243318796158, -0.20846229791641235, 0.15596869587898254, 0.11489829421043396, -0.13442179560661316, -0.02425091527402401, ...


### 5. Using MockerDB API

Remote Mocker can be used via very similar methods to the local one.


```python
# Initialization
handler = MockerDB(
    skip_post_init=True
)
# Initialize empty database
handler.establish_connection(
     # optional for connecting to api
    connection_details = {
        'base_url' : "http://localhost:8000/mocker-db"
    }
)
```


```python
# Insert Data
values_list = [
    {"text": "Sample text 1",
     "text2": "Sample text 1"},
    {"text": "Sample text 2",
     "text2": "Sample text 2"}
]
handler.insert_values(values_list, "text")
```

    HTTP Request: POST http://localhost:8000/mocker-db/insert "HTTP/1.1 200 OK"





    {'status': 'success', 'message': ''}



MockerAPI has multiple handlers stored in memory at a time, they can be displayed with number of items and memory estimate.


```python
handler.show_handlers()
```

    HTTP Request: GET http://localhost:8000/mocker-db/active_handlers "HTTP/1.1 200 OK"





    {'results': [{'handler': 'default',
       'items': 4,
       'memory_usage': 1.3744659423828125}],
     'status': 'success',
     'message': '',
     'handlers': ['default'],
     'items': [4],
     'memory_usage': [1.3744659423828125]}




```python
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1",
    }
)

results
```

    HTTP Request: POST http://localhost:8000/mocker-db/search "HTTP/1.1 200 OK"





    {'status': 'success',
     'message': '',
     'handler': 'default',
     'results': [{'other_field': 'Additional data', 'text': 'Example text 1'},
      {'other_field': 'Additional data', 'text': 'Example text 2'},
      {'text': 'Sample text 1', 'text2': 'Sample text 1'},
      {'text': 'Sample text 2', 'text2': 'Sample text 2'}]}




```python
results = handler.embed_texts(
    texts = [
    "Short. Variation 1: Short.",
    "Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another"
  ],
    # optional
    embedding_model = "intfloat/multilingual-e5-base"
)

print(str(results)[0:500] + "...")
```

    HTTP Request: POST http://localhost:8000/mocker-db/embed "HTTP/1.1 200 OK"


    {'status': 'success', 'message': '', 'handler': 'cache_mocker_intfloat_multilingual-e5-base', 'embedding_model': 'intfloat/multilingual-e5-base', 'embeddings': [[-0.021023565903306007, 0.03461984172463417, -0.01310338918119669, 0.03071131743490696, 0.023395607247948647, -0.04054545238614082, -0.015805143862962723, -0.02682858146727085, 0.01583343744277954, 0.01763748936355114, 0.0008703064522705972, -0.011133715510368347, 0.11296682059764862, 0.015158131718635559, -0.0466904453933239, -0.0481428...



```python
handler.show_handlers()
```

    HTTP Request: GET http://localhost:8000/mocker-db/active_handlers "HTTP/1.1 200 OK"





    {'results': [{'handler': 'default',
       'items': 4,
       'memory_usage': 1.3749237060546875},
      {'handler': 'cache_mocker_intfloat_multilingual-e5-base',
       'items': 2,
       'memory_usage': 1.3611679077148438}],
     'status': 'success',
     'message': '',
     'handlers': ['default', 'cache_mocker_intfloat_multilingual-e5-base'],
     'items': [4, 2],
     'memory_usage': [1.3749237060546875, 1.3611679077148438]}



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mocker-db",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "['aa-paa-tool']",
    "author": "Kyrylo Mordan",
    "author_email": "parachute.repo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/44/56/6c57b8a0c276ec28f88bb4934f8bcff4342eaf4dc565819cf430b094ce89/mocker_db-0.2.5.tar.gz",
    "platform": null,
    "description": "# Mocker db\n\nMockerDB is a python module that contains mock vector database like solution built around\npython dictionary data type. It contains methods necessary to interact with this 'database',\nembed, search and persist.\n\n```python\nfrom mocker_db import MockerDB, MockerConnector, SentenceTransformerEmbedder\n```\n\n### 1. Inserting values into the database\n\nMockerDB can be used as ephemeral database where everything is saved in memory, but also can be persisted in one file for the database and another for embeddings storage.\n\nEmbedder is set to sentence_transformer by default and processed locally, custom embedders that connect to an api or use other open source models could be used as long as they have the same interface. \n\n\n```python\n# Initialization\nhandler = MockerDB(\n    # optional\n    embedder_params = {'model_name_or_path' : 'paraphrase-multilingual-mpnet-base-v2',\n                        'processing_type' : 'batch',\n                        'tbatch_size' : 500},\n    use_embedder = True,\n    embedder = SentenceTransformerEmbedder,\n    persist = True\n)\n# Initialize empty database\nhandler.establish_connection(\n    # optional for persist\n    file_path = \"./mock_persist\",\n    embs_file_path = \"./mock_embs_persist\",\n)\n```\n\n\n```python\n# Insert Data\nvalues_list = [\n    {\"text\": \"Sample text 1\",\n     \"text2\": \"Sample text 1\"},\n    {\"text\": \"Sample text 2\",\n     \"text2\": \"Sample text 2\"}\n]\nhandler.insert_values(values_list, \"text\")\nprint(f\"Items in the database {len(handler.data)}\")\n```\n\n    Items in the database 2\n\n\n### 2. Searching and retrieving values from the database\n\nThere are multiple options for search which could be used together or separately:\n\n- simple filter\n- filter with keywords\n- llm filter\n- search based on similarity\n\n- get all keys\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\",\n    }\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...'}]\n\n\n- get all keys with keywords search\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    # when keyword key is provided filter is used to pass keywords\n    filter_criteria = {\n        \"text\" : [\"1\"],\n    },\n    keyword_check_keys = ['text'],\n    # percentage of filter keyword allowed to be different\n    keyword_check_cutoff = 1,\n    return_keys_list=['text']\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'text': 'Sample text 1...'}]\n\n\n- get all key - text2\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\",\n    },\n    return_keys_list=[\"-text2\"])\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'text': 'Sample text 1...'}]\n\n\n- get all keys + distance\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\"\n    },\n    return_keys_list=[\"+&distance\"]\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...', '&distance': '0.6744726...'}]\n\n\n- get distance\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\"\n    },\n    return_keys_list=[\"&distance\"]\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'&distance': '0.6744726...'}]\n\n\n- get all keys + embeddings\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\"\n    },\n    return_keys_list=[\"+embedding\"]\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n```\n\n    [{'text': 'Sample text 1...', 'text2': 'Sample text 1...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]\n\n\n- get embeddings\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\"\n    },\n    return_keys_list=[\"embedding\"]\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n\n```\n\n    [{'embedding': '[-4.94665056e-02 -2.38676026e-...'}]\n\n\n- get embeddings and embedded field\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\"\n    },\n    return_keys_list=[\"embedding\", \"+&embedded_field\"]\n)\nprint([{k: str(v)[:30] + \"...\" for k, v in result.items()} for result in results])\n\n```\n\n    [{'embedding': '[-4.94665056e-02 -2.38676026e-...', '&embedded_field': 'text...'}]\n\n\n### 3. Removing values from the database\n\n\n```python\nprint(f\"Items in the database {len(handler.data)}\")\nhandler.remove_from_database(filter_criteria = {\"text\": \"Sample text 1\"})\nprint(f\"Items left in the database {len(handler.data)}\")\n\n```\n\n    Items in the database 2\n    Items left in the database 1\n\n\n### 4 Embeding text\n\n\n```python\nresults = handler.embed_texts(\n    texts = [\n    \"Short. Variation 1: Short.\",\n    \"Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another\"\n  ]\n)\n\nprint(str(results)[0:300] + \"...\")\n```\n\n    {'embeddings': [[0.04973424971103668, -0.43570247292518616, -0.014545125886797905, -0.03648979589343071, -0.04165348783135414, -0.04544278606772423, -0.07025150209665298, 0.10043243318796158, -0.20846229791641235, 0.15596869587898254, 0.11489829421043396, -0.13442179560661316, -0.02425091527402401, ...\n\n\n### 5. Using MockerDB API\n\nRemote Mocker can be used via very similar methods to the local one.\n\n\n```python\n# Initialization\nhandler = MockerDB(\n    skip_post_init=True\n)\n# Initialize empty database\nhandler.establish_connection(\n     # optional for connecting to api\n    connection_details = {\n        'base_url' : \"http://localhost:8000/mocker-db\"\n    }\n)\n```\n\n\n```python\n# Insert Data\nvalues_list = [\n    {\"text\": \"Sample text 1\",\n     \"text2\": \"Sample text 1\"},\n    {\"text\": \"Sample text 2\",\n     \"text2\": \"Sample text 2\"}\n]\nhandler.insert_values(values_list, \"text\")\n```\n\n    HTTP Request: POST http://localhost:8000/mocker-db/insert \"HTTP/1.1 200 OK\"\n\n\n\n\n\n    {'status': 'success', 'message': ''}\n\n\n\nMockerAPI has multiple handlers stored in memory at a time, they can be displayed with number of items and memory estimate.\n\n\n```python\nhandler.show_handlers()\n```\n\n    HTTP Request: GET http://localhost:8000/mocker-db/active_handlers \"HTTP/1.1 200 OK\"\n\n\n\n\n\n    {'results': [{'handler': 'default',\n       'items': 4,\n       'memory_usage': 1.3744659423828125}],\n     'status': 'success',\n     'message': '',\n     'handlers': ['default'],\n     'items': [4],\n     'memory_usage': [1.3744659423828125]}\n\n\n\n\n```python\nresults = handler.search_database(\n    query = \"text\",\n    filter_criteria = {\n        \"text\" : \"Sample text 1\",\n    }\n)\n\nresults\n```\n\n    HTTP Request: POST http://localhost:8000/mocker-db/search \"HTTP/1.1 200 OK\"\n\n\n\n\n\n    {'status': 'success',\n     'message': '',\n     'handler': 'default',\n     'results': [{'other_field': 'Additional data', 'text': 'Example text 1'},\n      {'other_field': 'Additional data', 'text': 'Example text 2'},\n      {'text': 'Sample text 1', 'text2': 'Sample text 1'},\n      {'text': 'Sample text 2', 'text2': 'Sample text 2'}]}\n\n\n\n\n```python\nresults = handler.embed_texts(\n    texts = [\n    \"Short. Variation 1: Short.\",\n    \"Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another\"\n  ],\n    # optional\n    embedding_model = \"intfloat/multilingual-e5-base\"\n)\n\nprint(str(results)[0:500] + \"...\")\n```\n\n    HTTP Request: POST http://localhost:8000/mocker-db/embed \"HTTP/1.1 200 OK\"\n\n\n    {'status': 'success', 'message': '', 'handler': 'cache_mocker_intfloat_multilingual-e5-base', 'embedding_model': 'intfloat/multilingual-e5-base', 'embeddings': [[-0.021023565903306007, 0.03461984172463417, -0.01310338918119669, 0.03071131743490696, 0.023395607247948647, -0.04054545238614082, -0.015805143862962723, -0.02682858146727085, 0.01583343744277954, 0.01763748936355114, 0.0008703064522705972, -0.011133715510368347, 0.11296682059764862, 0.015158131718635559, -0.0466904453933239, -0.0481428...\n\n\n\n```python\nhandler.show_handlers()\n```\n\n    HTTP Request: GET http://localhost:8000/mocker-db/active_handlers \"HTTP/1.1 200 OK\"\n\n\n\n\n\n    {'results': [{'handler': 'default',\n       'items': 4,\n       'memory_usage': 1.3749237060546875},\n      {'handler': 'cache_mocker_intfloat_multilingual-e5-base',\n       'items': 2,\n       'memory_usage': 1.3611679077148438}],\n     'status': 'success',\n     'message': '',\n     'handlers': ['default', 'cache_mocker_intfloat_multilingual-e5-base'],\n     'items': [4, 2],\n     'memory_usage': [1.3749237060546875, 1.3611679077148438]}\n\n\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "A mock handler for simulating a vector database.",
    "version": "0.2.5",
    "project_urls": null,
    "split_keywords": [
        "['aa-paa-tool']"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f125c363e3de069a5ea762a9954ff93f66116b58b8be31071be979448f3bd0a1",
                "md5": "54ffce6fef940cced9e298b3d2bfa1e0",
                "sha256": "d534d59694da7d79db60a5c8aeb3752f237c916a61a0b389dacb10fcd20d9cd9"
            },
            "downloads": -1,
            "filename": "mocker_db-0.2.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "54ffce6fef940cced9e298b3d2bfa1e0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22396,
            "upload_time": "2024-09-21T02:44:19",
            "upload_time_iso_8601": "2024-09-21T02:44:19.648505Z",
            "url": "https://files.pythonhosted.org/packages/f1/25/c363e3de069a5ea762a9954ff93f66116b58b8be31071be979448f3bd0a1/mocker_db-0.2.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44566c57b8a0c276ec28f88bb4934f8bcff4342eaf4dc565819cf430b094ce89",
                "md5": "162c195f95f296503642a244cf19c9ba",
                "sha256": "42bf7e73b59d2a91ac755f6cfa338ac1680fa7aa77f6fdb1f9555ba0cf03de34"
            },
            "downloads": -1,
            "filename": "mocker_db-0.2.5.tar.gz",
            "has_sig": false,
            "md5_digest": "162c195f95f296503642a244cf19c9ba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23069,
            "upload_time": "2024-09-21T02:44:21",
            "upload_time_iso_8601": "2024-09-21T02:44:21.076484Z",
            "url": "https://files.pythonhosted.org/packages/44/56/6c57b8a0c276ec28f88bb4934f8bcff4342eaf4dc565819cf430b094ce89/mocker_db-0.2.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-21 02:44:21",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mocker-db"
}
        
Elapsed time: 0.36935s