pytransflow


Namepytransflow JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummarySimple library for record-level processing using flows of transformations defined as YAML files.
upload_time2024-07-19 20:43:32
maintainerVladimir Sivcevic
docs_urlNone
authorVladimir Sivcevic
requires_python<4.0.0,>=3.8.11
licenseMIT
keywords flow transformation record processing data pipelines
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pytransflow

A simple library for record-level processing using flows of transformations defined as YAML files

## Overview

pytransflow lets you process records by defining a flow of transformations.
Each flow has its configuration which is defined using YAML files and can be as simple as

```yaml
description: A simple test flow
instant_fail: True
fail_scenarios:
  percentage_of_failed_records: 90
variables:
  a: B
transformations:
  - prefix:
      field: a
      value: test
      condition: "@a/c/d/e == !:a"
      ignore_errors:
        - output_already_exists
      output_datasets:
        - k
  - add_field:
      name: test/a/b
      value: { "a": "b" }
      input_datasets:
        - k
      output_datasets:
        - x
        - z
```

Processing is initiated using the `Flow` class:

```python
from pytransflow.core import Flow
records = [...]

flow = Flow(name="<flow-name>")
flow.process(records)
pprint(flow.datasets)  # End result
pprint(flow.failed_records)  # Failed records
```

Refer to the [Getting Started](https://github.com/VladimirSiv/pytransflow/wiki/Getting-Started)
wiki page for additional examples and guided initial steps or check out the blog post that
introduces this library [pytransflow](https://www.vladsiv.com/pytransflow/).

## Features

The following are some of the features that pytransflow provides:

- Define processing flows using YAML files
- Use all kinds of flow configurations to fine-tune the flow
- Leverage [pydantic](https://github.com/pydantic/pydantic)‘s features for data validation
- Apply transformations only if defined condition is met
- Build your own library of transformations
- Use multiple input and output datasets
- Ignore specific errors during processing
- Set conditions for output datasets
- Track failed records
- Define flow fail scenarios
- Process records in parallel
- Use flow level variables etc.

For more information on these features and how to use them, please refer to the
[Wiki Page](https://github.com/VladimirSiv/pytransflow/wiki).

## License

MIT


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pytransflow",
    "maintainer": "Vladimir Sivcevic",
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.8.11",
    "maintainer_email": "vladsiv@protonmail.com",
    "keywords": "Flow, Transformation, Record Processing, Data, Pipelines",
    "author": "Vladimir Sivcevic",
    "author_email": "vladsiv@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/df/49/5e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b/pytransflow-0.1.1.tar.gz",
    "platform": null,
    "description": "# pytransflow\n\nA simple library for record-level processing using flows of transformations defined as YAML files\n\n## Overview\n\npytransflow lets you process records by defining a flow of transformations.\nEach flow has its configuration which is defined using YAML files and can be as simple as\n\n```yaml\ndescription: A simple test flow\ninstant_fail: True\nfail_scenarios:\n  percentage_of_failed_records: 90\nvariables:\n  a: B\ntransformations:\n  - prefix:\n      field: a\n      value: test\n      condition: \"@a/c/d/e == !:a\"\n      ignore_errors:\n        - output_already_exists\n      output_datasets:\n        - k\n  - add_field:\n      name: test/a/b\n      value: { \"a\": \"b\" }\n      input_datasets:\n        - k\n      output_datasets:\n        - x\n        - z\n```\n\nProcessing is initiated using the `Flow` class:\n\n```python\nfrom pytransflow.core import Flow\nrecords = [...]\n\nflow = Flow(name=\"<flow-name>\")\nflow.process(records)\npprint(flow.datasets)  # End result\npprint(flow.failed_records)  # Failed records\n```\n\nRefer to the [Getting Started](https://github.com/VladimirSiv/pytransflow/wiki/Getting-Started)\nwiki page for additional examples and guided initial steps or check out the blog post that\nintroduces this library [pytransflow](https://www.vladsiv.com/pytransflow/).\n\n## Features\n\nThe following are some of the features that pytransflow provides:\n\n- Define processing flows using YAML files\n- Use all kinds of flow configurations to fine-tune the flow\n- Leverage [pydantic](https://github.com/pydantic/pydantic)\u2018s features for data validation\n- Apply transformations only if defined condition is met\n- Build your own library of transformations\n- Use multiple input and output datasets\n- Ignore specific errors during processing\n- Set conditions for output datasets\n- Track failed records\n- Define flow fail scenarios\n- Process records in parallel\n- Use flow level variables etc.\n\nFor more information on these features and how to use them, please refer to the\n[Wiki Page](https://github.com/VladimirSiv/pytransflow/wiki).\n\n## License\n\nMIT\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple library for record-level processing using flows of transformations defined as YAML files.",
    "version": "0.1.1",
    "project_urls": null,
    "split_keywords": [
        "flow",
        " transformation",
        " record processing",
        " data",
        " pipelines"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8b4f4184aaad49869f67d90f79493b48873b147df4dce2d1dda1994370fc8ea",
                "md5": "446c5eb51cb67ae69a77098418ffc224",
                "sha256": "9ae60f9552489865a277b3df2fd26fbe144579f4a44510532143a5c4aa7ee775"
            },
            "downloads": -1,
            "filename": "pytransflow-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "446c5eb51cb67ae69a77098418ffc224",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.8.11",
            "size": 44457,
            "upload_time": "2024-07-19T20:43:31",
            "upload_time_iso_8601": "2024-07-19T20:43:31.239856Z",
            "url": "https://files.pythonhosted.org/packages/e8/b4/f4184aaad49869f67d90f79493b48873b147df4dce2d1dda1994370fc8ea/pytransflow-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df495e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b",
                "md5": "5e8a62873aad248e2d3e70dc215a9c18",
                "sha256": "059224e7d7216b337b56ca4b23eeb4bd36efdcc2cada9c160eb94b1722e9db9c"
            },
            "downloads": -1,
            "filename": "pytransflow-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5e8a62873aad248e2d3e70dc215a9c18",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.8.11",
            "size": 24554,
            "upload_time": "2024-07-19T20:43:32",
            "upload_time_iso_8601": "2024-07-19T20:43:32.587784Z",
            "url": "https://files.pythonhosted.org/packages/df/49/5e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b/pytransflow-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-19 20:43:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pytransflow"
}
        
Elapsed time: 0.28672s