mdbrtools

Name	mdbrtools JSON
Version	0.1.1 JSON
	download
home_page	http://github.com/mongodb-labs/mdbrtools
Summary	Collection of tools for schema parsing and workload generation used by MongoDB Research
upload_time	2025-01-29 02:28:22
maintainer	None
docs_url	None
author	Thomas Rueckstiess, Alana Huang, Robin Vujanic
requires_python	None
license	MIT
keywords	mongodb research tools schema queries workload
VCS
bugtrack_url
requirements	pymongo tqdm pandas
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # mdbrtools

This package contains experimental tools for schema analysis and query workload generation used by MongoDB Research (MDBR).

## Disclaimer

This tool is not officially supported or endorsed by MongoDB Inc. The code is released for use "AS IS" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool in critical production systems.

## Installation

#### Installation with pip

This tool requires python 3.x and pip on your system. To install `mdbrtools`, run the following command:

```bash
pip install mdbrtools
```

#### Installation from source

Clone the respository from github. From the top-level directory, run:

```
pip install -e .
```

This installs an _editable_ development version of `mdbrtools` in your current Python environment.

## Usage

See the `./notebooks` directory for more detailed examples for schema parsing and workload generation.

### Schema Parsing

Schema parsing operates on a list of Python dictionaries.

```python
from mdbrtools.schema import parse_schema
from pprint import pprint

docs = [
    {"_id": 1, "mixed_field": "world", "missing_field": False},
    {"_id": 2, "mixed_field": 123},
    {"_id": 3, "mixed_field": False, "missing_field": True},
]

schema = parse_schema(docs)
pprint(dict(schema))
```

Converting the schema object to a dictionary will output some general information about the schema:

```
{'_id': [{'counter': 3, 'type': 'int'}],
 'missing_field': [{'counter': 2, 'type': 'bool'}],
 'mixed_field': [{'counter': 1, 'type': 'str'},
                 {'counter': 1, 'type': 'int'},
                 {'counter': 1, 'type': 'bool'}]}
```

For access to types, values and uniqueness information, see the examples in [`./notebooks/schema_parsing.ipynb`](./notebooks/schema_parsing.ipynb).

## Workload Generation

Workload generation takes either a list of Python dictionaries, or a `MongoCollection` object as input.

```python
from mdbrtools.workload import Workload

docs = [
    {"_id": 1, "mixed_field": "world", "missing_field": False},
    {"_id": 2, "mixed_field": 123},
    {"_id": 3, "mixed_field": False, "missing_field": True},
]

workload = Workload()
workload.generate(docs, num_queries=5)

for query in workload:
    print(query.to_mql())
```

The generated MQL queries are:

```python
{'missing_field': True}
{'missing_field': {'$exists': False}, '_id': {'$gte': 3}}
{'_id': {'$gt': 3}, 'mixed_field': False, 'missing_field': {'$exists': False}}
{'mixed_field': {'$gte': 'world'}, '_id': 3, 'missing_field': {'$ne': False}}
{'mixed_field': 'world'}
```

The workload generator supports a number of different constraints on the queries:

- min. and max. number of predicates per query
- allowing only certain fields
- which query operators are allowed for which data types
- control over the weights by which operators are randomly chosen
- min. and max. query selectivity constraints

See the notebook under [`./notebooks/workload_generation.ipynb`](./notebooks/workload_generation.ipynb) for examples.

## Tests

To execute the unit tests, run from the top-level directory:

```
python -m unittest discover ./tests
```

## License

MIT, see [LICENSE](./LICENSE).

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/mongodb-labs/mdbrtools",
    "name": "mdbrtools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "mongodb research tools schema queries workload",
    "author": "Thomas Rueckstiess, Alana Huang, Robin Vujanic",
    "author_email": "thomas+mdbrtools@mongodb.com",
    "download_url": "https://files.pythonhosted.org/packages/5b/1c/b871fa8894102b476a34e928ec1a83210d5b8a9d5fd175a1a5b35839709a/mdbrtools-0.1.1.tar.gz",
    "platform": null,
    "description": "# mdbrtools\n\nThis package contains experimental tools for schema analysis and query workload generation used by MongoDB Research (MDBR).\n\n## Disclaimer\n\nThis tool is not officially supported or endorsed by MongoDB Inc. The code is released for use \"AS IS\" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool in critical production systems.\n\n## Installation\n\n#### Installation with pip\n\nThis tool requires python 3.x and pip on your system. To install `mdbrtools`, run the following command:\n\n```bash\npip install mdbrtools\n```\n\n#### Installation from source\n\nClone the respository from github. From the top-level directory, run:\n\n```\npip install -e .\n```\n\nThis installs an _editable_ development version of `mdbrtools` in your current Python environment.\n\n## Usage\n\nSee the `./notebooks` directory for more detailed examples for schema parsing and workload generation.\n\n### Schema Parsing\n\nSchema parsing operates on a list of Python dictionaries.\n\n```python\nfrom mdbrtools.schema import parse_schema\nfrom pprint import pprint\n\ndocs = [\n    {\"_id\": 1, \"mixed_field\": \"world\", \"missing_field\": False},\n    {\"_id\": 2, \"mixed_field\": 123},\n    {\"_id\": 3, \"mixed_field\": False, \"missing_field\": True},\n]\n\nschema = parse_schema(docs)\npprint(dict(schema))\n```\n\nConverting the schema object to a dictionary will output some general information about the schema:\n\n```\n{'_id': [{'counter': 3, 'type': 'int'}],\n 'missing_field': [{'counter': 2, 'type': 'bool'}],\n 'mixed_field': [{'counter': 1, 'type': 'str'},\n                 {'counter': 1, 'type': 'int'},\n                 {'counter': 1, 'type': 'bool'}]}\n```\n\nFor access to types, values and uniqueness information, see the examples in [`./notebooks/schema_parsing.ipynb`](./notebooks/schema_parsing.ipynb).\n\n## Workload Generation\n\nWorkload generation takes either a list of Python dictionaries, or a `MongoCollection` object as input.\n\n```python\nfrom mdbrtools.workload import Workload\n\ndocs = [\n    {\"_id\": 1, \"mixed_field\": \"world\", \"missing_field\": False},\n    {\"_id\": 2, \"mixed_field\": 123},\n    {\"_id\": 3, \"mixed_field\": False, \"missing_field\": True},\n]\n\nworkload = Workload()\nworkload.generate(docs, num_queries=5)\n\nfor query in workload:\n    print(query.to_mql())\n```\n\nThe generated MQL queries are:\n\n```python\n{'missing_field': True}\n{'missing_field': {'$exists': False}, '_id': {'$gte': 3}}\n{'_id': {'$gt': 3}, 'mixed_field': False, 'missing_field': {'$exists': False}}\n{'mixed_field': {'$gte': 'world'}, '_id': 3, 'missing_field': {'$ne': False}}\n{'mixed_field': 'world'}\n```\n\nThe workload generator supports a number of different constraints on the queries:\n\n- min. and max. number of predicates per query\n- allowing only certain fields\n- which query operators are allowed for which data types\n- control over the weights by which operators are randomly chosen\n- min. and max. query selectivity constraints\n\nSee the notebook under [`./notebooks/workload_generation.ipynb`](./notebooks/workload_generation.ipynb) for examples.\n\n## Tests\n\nTo execute the unit tests, run from the top-level directory:\n\n```\npython -m unittest discover ./tests\n```\n\n## License\n\nMIT, see [LICENSE](./LICENSE).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Collection of tools for schema parsing and workload generation used by MongoDB Research",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "http://github.com/mongodb-labs/mdbrtools"
    },
    "split_keywords": [
        "mongodb",
        "research",
        "tools",
        "schema",
        "queries",
        "workload"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1172050551524d314fc3f9a6f5e632f5c5bd0448f816727e6edd09ad04a4ce2c",
                "md5": "4effbf22f029ef4a54f3889a33d634d4",
                "sha256": "e7de812155e13f7f8a07516cc5d9a74af4d35fffbbb430e9e4f161a51a2160a0"
            },
            "downloads": -1,
            "filename": "mdbrtools-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4effbf22f029ef4a54f3889a33d634d4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25293,
            "upload_time": "2025-01-29T02:28:20",
            "upload_time_iso_8601": "2025-01-29T02:28:20.830136Z",
            "url": "https://files.pythonhosted.org/packages/11/72/050551524d314fc3f9a6f5e632f5c5bd0448f816727e6edd09ad04a4ce2c/mdbrtools-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b1cb871fa8894102b476a34e928ec1a83210d5b8a9d5fd175a1a5b35839709a",
                "md5": "c05eb5529d1ac3f961a1e63d47858a3c",
                "sha256": "e22eaa86b5b595cf6ce83321417561fca6412e4c7b40d17137d9066d87f2cdd8"
            },
            "downloads": -1,
            "filename": "mdbrtools-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c05eb5529d1ac3f961a1e63d47858a3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23697,
            "upload_time": "2025-01-29T02:28:22",
            "upload_time_iso_8601": "2025-01-29T02:28:22.958809Z",
            "url": "https://files.pythonhosted.org/packages/5b/1c/b871fa8894102b476a34e928ec1a83210d5b8a9d5fd175a1a5b35839709a/mdbrtools-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-29 02:28:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mongodb-labs",
    "github_project": "mdbrtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.7.2"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        }
    ],
    "lcname": "mdbrtools"
}

Thomas Rueckstiess, Alana Huang, Robin Vujanic