pyrsona


Namepyrsona JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/johnbullnz/pyrsona
SummaryNone
upload_time2024-08-07 04:01:29
maintainerNone
docs_urlNone
authorJohn
requires_python<4.0,>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyrsona

<img align="left" src="https://github.com/johnbullnz/pyrsona/actions/workflows/python.yml/badge.svg"><br>

Text data file validation and structure management using the [pydantic](https://pydantic-docs.helpmanual.io/) and [parse](https://github.com/r1chardj0n3s/parse) Python packages.


## Installation

Install using `pip install pyrsona`.


## A Simple Example

For the text file `example.txt`:

```
operator name: Jane Smith
country: NZ
year: 2022

ID,Time,Duration (sec),Reading
1,20:04:05,12.2,2098
2,20:05:00,2.35,4328
```

The following *pyrsona* file structure model can be defined:

```python
from pyrsona import BaseStructure
from pydantic import BaseModel
from datetime import time


class ExampleStructure(BaseStructure):

    structure = (
        "operator name: {operator_name}\n"
        "country: {country}\n"
        "year: {}\n"
        "\n"
        "ID,Time,Duration (sec),Reading\n"
    )

    class meta_model(BaseModel):
        operator_name: str
        country: str

    class row_model(BaseModel):
        id: int
        time: time
        duration_sec: float
        value: float
```

The `read()` method can then be used to read the file, parse its contents and validate the meta data and table rows:

```python
meta, table_rows, structure_id = ExampleStructure.read("example.txt")

print(meta)
#> {'operator_name': 'Jane Smith', 'country': 'NZ'}

print(table_rows)
#> [{'id': 1, 'time': datetime.time(20, 4, 5), 'value': 2098.0}, {'id': 2,
# 'time': datetime.time(20, 5), 'value': 4328.0}]

print(structure_id)
#> ExampleStructure
```

**What's going on here:**

- The `structure` class attribute contains a definition of the basic file structure. This definition includes the meta data lines and table header lines. Any variable text of interest is replaced with curly brackets and a field name, E.g. `'{operator_name}'`, while any variable text that should be ignored is replaced with empty curly brackets, E.g. `'{}'`. The `structure` definition must contain all spaces, tabs and new line characters in order for a file to successfully match it. The named fields in the `structure` definition will be passed to `meta_model`.

- `meta_model` is simply a [pydantic model](https://pydantic-docs.helpmanual.io/usage/models/) with field names that match the named fields in the `structure` definition. All values sent to `meta_model` will be strings and these will be converted to the field types defined in `meta_model`. Custom [pydantic validators](https://pydantic-docs.helpmanual.io/usage/validators/) can be included in the `meta_model` definition as per standard pydantic models.

- `row_model` is also a [pydantic model](https://pydantic-docs.helpmanual.io/usage/models/). This time the field names do not need to match the header line in the `structure` definition; however, the `row_model` fields do need to be provided in the **same order as the table columns**. This allows the table column names to be customised/standardised where the user does not control the file structure itself. Again, custom [pydantic validators](https://pydantic-docs.helpmanual.io/usage/validators/) can be included in the `row_model` definition if required.


## Another Example

Should the file structure change at some point in the future a new model can be created based on the original model. This is referred to as a *sub-model*, where the original model is the *parent* model.

Given the slightly modified file structure of `new_example.txt`:

```
operator name: Jane Smith
country: NZ
city: Auckland
year: 2022

ID,Time,Duration (sec),Reading
1,20:04:05,12.2,2098
2,20:05:00,2.35,4328
```

Attempting to parse this file using the original `ExampleStructure` model will raise a `PyrsonaError` due to the addition of the `'city: Auckland'` line. In order to successfully parse the file and capture the new `'city'` field the following *sub-model* should be defined.

```python
from pyrsona import BaseStructure
from pydantic import BaseModel
from datetime import time


class NewExampleStructure(ExampleStructure):

    structure = (
        "operator name: {operator_name}\n"
        "country: {country}\n"
        "city: {city}\n"
        "year: {}\n"
        "\n"
        "ID,Time,Duration (sec),Reading\n"
    )

    class meta_model(BaseModel):
        operator_name: str
        country: str
        city: str
```

`ExampleStructure` is still used as the entry point; however, *pyrsona* will attempt to parse the file using any *sub-models* that exist (in this case `NewExampleStructure`) before using `ExampleStructure` itself.

```python
meta, table_rows, structure_id = ExampleStructure.read("new_example.txt")

print(meta)
#> {'operator_name': 'Jane Smith', 'country': 'NZ', 'city': 'Auckland'}

print(table_rows)
#> [{'id': 1, 'time': datetime.time(20, 4, 5), 'value': 2098.0}, {'id': 2,
# 'time': datetime.time(20, 5), 'value': 4328.0}]

print(structure_id)
#> NewExampleStructure
```

**What's going on here:**

- A new *pyrsona* file structure model is defined based on the original `ExampleStructure` model. This means that `structure`, `meta_model` and `row_model` will be inherited from `ExampleStructure`. This also provides a single entry point (I.e. `ExampleStructure.read()`) when attempting to read the different file versions.

- `structure` and `meta_model` are redefined to include the new `"city: Auckland"` meta data line. Alternatively, the original `meta_model` in `ExampleStructure` could have been updated to include an *optional* `city` field.


## Post-processors

It is sometimes necessary to modify some of the data following parsing by the `meta_model` and `row_model`. Two post-processing methods are available for this purpose.

Using the `ExampleStructure` class above, `meta_postprocessor` and `table_postprocessor` static methods are defined for post-processing the meta data and table_rows, respectively:

```python
class ExampleStructure(BaseStructure):

    # Lines omitted for brevity

    @staticmethod
    def meta_postprocessor(meta):
        meta["version"] = 3
        return meta

    @staticmethod
    def table_postprocessor(table_rows, meta):
        # Add a cumulative total and delete the "id" field:
        total = 0
        for ii, row in enumerate(table_rows):
            total += row["value"]
            row["total"] = total
            del(row["id"])
            table_rows[ii] = row
        return table_rows
```

The meta data and table_rows are now run through the post-processing stages before being returned, resulting in the following changes:

 - A new *version* field is added to the meta data.
 - The *id* field is deleted from the table_rows and a cumulative total field is added.

```python
meta, table_rows, structure_id = ExampleStructure.read("example.txt")

print(meta)
#> {'operator_name': 'Jane Smith', 'country': 'NZ', 'version': 3}

print(table_rows)
#> [{'time': datetime.time(20, 4, 5), 'duration_sec': 12.2, 'value': 2098.0,
# 'total': 2098.0}, {'time': datetime.time(20, 5), 'duration_sec': 2.35, 'value': 4328.0,
# 'total': 6426.0}]

print(structure_id)
#> NewExampleStructure
```

### Array data in field

Sometimes the table rows contain array data that is not easily converted to a pydantic model. In this case, the `row_model` can be omitted and the `table_postprocessor` method can be used to convert the table rows into a more suitable format.

```python
class ExampleStructure(BaseStructure):

    structure = (
        "operator name: {operator_name}\n"
        "country: {country}\n"
        "year: {}\n"
        "\n"
        "ID,Time,Duration (sec),Reading\n"
    )

    class meta_model(BaseModel):
        operator_name: str
        country: str

    @staticmethod
    def table_postprocessor(table_rows, meta):

        class row_model(BaseModel):
            id: int
            array_data: list[str]

        ids = [row[0] for row in table_rows]
        array_data = [row[1:] for row in table_rows]

        table_rows = [
            row_model(id=row_id, array_data=row_array_data).dict()
            for row_id, row_array_data in zip(ids, array_data)
        ]

        return table_rows
```

With an undefined `row_model` the table row data would be returned as a list of strings. The `table_postprocessor` method can then be used to convert the data into a more suitable format using custom logic.

```python
print(table_rows)
#> [{'id': 1, 'array_data': ['20:04:05', '12.2', '2098']}, {'id': 2, 'array_data': ['20:05:00','2.35','4328']}]
```


## Extra details


### All meta lines MUST be included

While the *parse* package allows a wildcard `'{}'` to be used to ignore several lines this can cause a named field to be unexpectedly included in the wildcard section. *pyrsona* therefore checks for the presence of a new line character `'\n'` in the named field values and fails if one is found.


### Sub-sub-models

Calling the `read()` method will first build a list of *pyrsona* file structure models from the *parent* model down. 

Any *sub-models* of the *parent* model will themselves be checked for *sub-models*, meaning that every model in the tree below the *parent* model will be used when attempting to parse a file.

Each branch of models will be ordered bottom-up so that the deepest nested model in a branch will be used first. The *parent* model will be the final model used if all others fail.

### Model names

The `read()` method returns a `structure_id` variable that matches the model name. This `structure_id` can be useful when creating automated tests that sit alongside the *pyrsona* models as it provides a mechanism for confirming that a text file was parsed using the expected *pyrsona* model where multiple *sub-models* exist.

As the number of *sub-models* grows a naming convention becomes more important. One option is to set the names of any `sub-models` to a random hexadecimal value prefixed with a single underscore (in case the value begins with a number), E.g. `'_a4c15356'`. The initial underscore will be removed from model name when returning the `structure_id` value.


### *parse* formats

The *parse* package allows format specifications to be included alongside the fields, E.g. `'{year:d}'`. While including these format types in the structure definition is valid, more complex format conversions can be made using `meta_model`. Keeping all format conversions in `meta_model` means that all conversions are defined in one place.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/johnbullnz/pyrsona",
    "name": "pyrsona",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "John",
    "author_email": "johnbullnz@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/78/00/80bf3140b449f4a8917936fdbc40864296504052a3babcd56927f6dbc362/pyrsona-1.0.tar.gz",
    "platform": null,
    "description": "# pyrsona\n\n<img align=\"left\" src=\"https://github.com/johnbullnz/pyrsona/actions/workflows/python.yml/badge.svg\"><br>\n\nText data file validation and structure management using the [pydantic](https://pydantic-docs.helpmanual.io/) and [parse](https://github.com/r1chardj0n3s/parse) Python packages.\n\n\n## Installation\n\nInstall using `pip install pyrsona`.\n\n\n## A Simple Example\n\nFor the text file `example.txt`:\n\n```\noperator name: Jane Smith\ncountry: NZ\nyear: 2022\n\nID,Time,Duration (sec),Reading\n1,20:04:05,12.2,2098\n2,20:05:00,2.35,4328\n```\n\nThe following *pyrsona* file structure model can be defined:\n\n```python\nfrom pyrsona import BaseStructure\nfrom pydantic import BaseModel\nfrom datetime import time\n\n\nclass ExampleStructure(BaseStructure):\n\n    structure = (\n        \"operator name: {operator_name}\\n\"\n        \"country: {country}\\n\"\n        \"year: {}\\n\"\n        \"\\n\"\n        \"ID,Time,Duration (sec),Reading\\n\"\n    )\n\n    class meta_model(BaseModel):\n        operator_name: str\n        country: str\n\n    class row_model(BaseModel):\n        id: int\n        time: time\n        duration_sec: float\n        value: float\n```\n\nThe `read()` method can then be used to read the file, parse its contents and validate the meta data and table rows:\n\n```python\nmeta, table_rows, structure_id = ExampleStructure.read(\"example.txt\")\n\nprint(meta)\n#> {'operator_name': 'Jane Smith', 'country': 'NZ'}\n\nprint(table_rows)\n#> [{'id': 1, 'time': datetime.time(20, 4, 5), 'value': 2098.0}, {'id': 2,\n# 'time': datetime.time(20, 5), 'value': 4328.0}]\n\nprint(structure_id)\n#> ExampleStructure\n```\n\n**What's going on here:**\n\n- The `structure` class attribute contains a definition of the basic file structure. This definition includes the meta data lines and table header lines. Any variable text of interest is replaced with curly brackets and a field name, E.g. `'{operator_name}'`, while any variable text that should be ignored is replaced with empty curly brackets, E.g. `'{}'`. The `structure` definition must contain all spaces, tabs and new line characters in order for a file to successfully match it. The named fields in the `structure` definition will be passed to `meta_model`.\n\n- `meta_model` is simply a [pydantic model](https://pydantic-docs.helpmanual.io/usage/models/) with field names that match the named fields in the `structure` definition. All values sent to `meta_model` will be strings and these will be converted to the field types defined in `meta_model`. Custom [pydantic validators](https://pydantic-docs.helpmanual.io/usage/validators/) can be included in the `meta_model` definition as per standard pydantic models.\n\n- `row_model` is also a [pydantic model](https://pydantic-docs.helpmanual.io/usage/models/). This time the field names do not need to match the header line in the `structure` definition; however, the `row_model` fields do need to be provided in the **same order as the table columns**. This allows the table column names to be customised/standardised where the user does not control the file structure itself. Again, custom [pydantic validators](https://pydantic-docs.helpmanual.io/usage/validators/) can be included in the `row_model` definition if required.\n\n\n## Another Example\n\nShould the file structure change at some point in the future a new model can be created based on the original model. This is referred to as a *sub-model*, where the original model is the *parent* model.\n\nGiven the slightly modified file structure of `new_example.txt`:\n\n```\noperator name: Jane Smith\ncountry: NZ\ncity: Auckland\nyear: 2022\n\nID,Time,Duration (sec),Reading\n1,20:04:05,12.2,2098\n2,20:05:00,2.35,4328\n```\n\nAttempting to parse this file using the original `ExampleStructure` model will raise a `PyrsonaError` due to the addition of the `'city: Auckland'` line. In order to successfully parse the file and capture the new `'city'` field the following *sub-model* should be defined.\n\n```python\nfrom pyrsona import BaseStructure\nfrom pydantic import BaseModel\nfrom datetime import time\n\n\nclass NewExampleStructure(ExampleStructure):\n\n    structure = (\n        \"operator name: {operator_name}\\n\"\n        \"country: {country}\\n\"\n        \"city: {city}\\n\"\n        \"year: {}\\n\"\n        \"\\n\"\n        \"ID,Time,Duration (sec),Reading\\n\"\n    )\n\n    class meta_model(BaseModel):\n        operator_name: str\n        country: str\n        city: str\n```\n\n`ExampleStructure` is still used as the entry point; however, *pyrsona* will attempt to parse the file using any *sub-models* that exist (in this case `NewExampleStructure`) before using `ExampleStructure` itself.\n\n```python\nmeta, table_rows, structure_id = ExampleStructure.read(\"new_example.txt\")\n\nprint(meta)\n#> {'operator_name': 'Jane Smith', 'country': 'NZ', 'city': 'Auckland'}\n\nprint(table_rows)\n#> [{'id': 1, 'time': datetime.time(20, 4, 5), 'value': 2098.0}, {'id': 2,\n# 'time': datetime.time(20, 5), 'value': 4328.0}]\n\nprint(structure_id)\n#> NewExampleStructure\n```\n\n**What's going on here:**\n\n- A new *pyrsona* file structure model is defined based on the original `ExampleStructure` model. This means that `structure`, `meta_model` and `row_model` will be inherited from `ExampleStructure`. This also provides a single entry point (I.e. `ExampleStructure.read()`) when attempting to read the different file versions.\n\n- `structure` and `meta_model` are redefined to include the new `\"city: Auckland\"` meta data line. Alternatively, the original `meta_model` in `ExampleStructure` could have been updated to include an *optional* `city` field.\n\n\n## Post-processors\n\nIt is sometimes necessary to modify some of the data following parsing by the `meta_model` and `row_model`. Two post-processing methods are available for this purpose.\n\nUsing the `ExampleStructure` class above, `meta_postprocessor` and `table_postprocessor` static methods are defined for post-processing the meta data and table_rows, respectively:\n\n```python\nclass ExampleStructure(BaseStructure):\n\n    # Lines omitted for brevity\n\n    @staticmethod\n    def meta_postprocessor(meta):\n        meta[\"version\"] = 3\n        return meta\n\n    @staticmethod\n    def table_postprocessor(table_rows, meta):\n        # Add a cumulative total and delete the \"id\" field:\n        total = 0\n        for ii, row in enumerate(table_rows):\n            total += row[\"value\"]\n            row[\"total\"] = total\n            del(row[\"id\"])\n            table_rows[ii] = row\n        return table_rows\n```\n\nThe meta data and table_rows are now run through the post-processing stages before being returned, resulting in the following changes:\n\n - A new *version* field is added to the meta data.\n - The *id* field is deleted from the table_rows and a cumulative total field is added.\n\n```python\nmeta, table_rows, structure_id = ExampleStructure.read(\"example.txt\")\n\nprint(meta)\n#> {'operator_name': 'Jane Smith', 'country': 'NZ', 'version': 3}\n\nprint(table_rows)\n#> [{'time': datetime.time(20, 4, 5), 'duration_sec': 12.2, 'value': 2098.0,\n# 'total': 2098.0}, {'time': datetime.time(20, 5), 'duration_sec': 2.35, 'value': 4328.0,\n# 'total': 6426.0}]\n\nprint(structure_id)\n#> NewExampleStructure\n```\n\n### Array data in field\n\nSometimes the table rows contain array data that is not easily converted to a pydantic model. In this case, the `row_model` can be omitted and the `table_postprocessor` method can be used to convert the table rows into a more suitable format.\n\n```python\nclass ExampleStructure(BaseStructure):\n\n    structure = (\n        \"operator name: {operator_name}\\n\"\n        \"country: {country}\\n\"\n        \"year: {}\\n\"\n        \"\\n\"\n        \"ID,Time,Duration (sec),Reading\\n\"\n    )\n\n    class meta_model(BaseModel):\n        operator_name: str\n        country: str\n\n    @staticmethod\n    def table_postprocessor(table_rows, meta):\n\n        class row_model(BaseModel):\n            id: int\n            array_data: list[str]\n\n        ids = [row[0] for row in table_rows]\n        array_data = [row[1:] for row in table_rows]\n\n        table_rows = [\n            row_model(id=row_id, array_data=row_array_data).dict()\n            for row_id, row_array_data in zip(ids, array_data)\n        ]\n\n        return table_rows\n```\n\nWith an undefined `row_model` the table row data would be returned as a list of strings. The `table_postprocessor` method can then be used to convert the data into a more suitable format using custom logic.\n\n```python\nprint(table_rows)\n#> [{'id': 1, 'array_data': ['20:04:05', '12.2', '2098']}, {'id': 2, 'array_data': ['20:05:00','2.35','4328']}]\n```\n\n\n## Extra details\n\n\n### All meta lines MUST be included\n\nWhile the *parse* package allows a wildcard `'{}'` to be used to ignore several lines this can cause a named field to be unexpectedly included in the wildcard section. *pyrsona* therefore checks for the presence of a new line character `'\\n'` in the named field values and fails if one is found.\n\n\n### Sub-sub-models\n\nCalling the `read()` method will first build a list of *pyrsona* file structure models from the *parent* model down. \n\nAny *sub-models* of the *parent* model will themselves be checked for *sub-models*, meaning that every model in the tree below the *parent* model will be used when attempting to parse a file.\n\nEach branch of models will be ordered bottom-up so that the deepest nested model in a branch will be used first. The *parent* model will be the final model used if all others fail.\n\n### Model names\n\nThe `read()` method returns a `structure_id` variable that matches the model name. This `structure_id` can be useful when creating automated tests that sit alongside the *pyrsona* models as it provides a mechanism for confirming that a text file was parsed using the expected *pyrsona* model where multiple *sub-models* exist.\n\nAs the number of *sub-models* grows a naming convention becomes more important. One option is to set the names of any `sub-models` to a random hexadecimal value prefixed with a single underscore (in case the value begins with a number), E.g. `'_a4c15356'`. The initial underscore will be removed from model name when returning the `structure_id` value.\n\n\n### *parse* formats\n\nThe *parse* package allows format specifications to be included alongside the fields, E.g. `'{year:d}'`. While including these format types in the structure definition is valid, more complex format conversions can be made using `meta_model`. Keeping all format conversions in `meta_model` means that all conversions are defined in one place.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": null,
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/johnbullnz/pyrsona"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a3b5450bc7197918f9379654617114dfed9ac3e88ef41b8f985b3bf6504760c",
                "md5": "605e14dfcaeec4c6463a3c6d94b6256c",
                "sha256": "7b8951b1b8d0ce2b0d385809e80ef1fc51bf1f44a3561827aa42dfee29348a61"
            },
            "downloads": -1,
            "filename": "pyrsona-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "605e14dfcaeec4c6463a3c6d94b6256c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 7818,
            "upload_time": "2024-08-07T04:01:27",
            "upload_time_iso_8601": "2024-08-07T04:01:27.975155Z",
            "url": "https://files.pythonhosted.org/packages/7a/3b/5450bc7197918f9379654617114dfed9ac3e88ef41b8f985b3bf6504760c/pyrsona-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "780080bf3140b449f4a8917936fdbc40864296504052a3babcd56927f6dbc362",
                "md5": "6496f2d3d1c080eea4e2cd1c269398c3",
                "sha256": "3e43e2007633d5a9c5480454922e4d1326ebc5fb82683b8a286db6b2965d798f"
            },
            "downloads": -1,
            "filename": "pyrsona-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6496f2d3d1c080eea4e2cd1c269398c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 7469,
            "upload_time": "2024-08-07T04:01:29",
            "upload_time_iso_8601": "2024-08-07T04:01:29.324015Z",
            "url": "https://files.pythonhosted.org/packages/78/00/80bf3140b449f4a8917936fdbc40864296504052a3babcd56927f6dbc362/pyrsona-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-07 04:01:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "johnbullnz",
    "github_project": "pyrsona",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pyrsona"
}
        
Elapsed time: 0.28380s