# pytransflow
A simple library for record-level processing using flows of transformations defined as YAML files
## Overview
pytransflow lets you process records by defining a flow of transformations.
Each flow has its configuration which is defined using YAML files and can be as simple as
```yaml
description: A simple test flow
instant_fail: True
fail_scenarios:
percentage_of_failed_records: 90
variables:
a: B
transformations:
- prefix:
field: a
value: test
condition: "@a/c/d/e == !:a"
ignore_errors:
- output_already_exists
output_datasets:
- k
- add_field:
name: test/a/b
value: { "a": "b" }
input_datasets:
- k
output_datasets:
- x
- z
```
Processing is initiated using the `Flow` class:
```python
from pytransflow.core import Flow
records = [...]
flow = Flow(name="<flow-name>")
flow.process(records)
pprint(flow.datasets) # End result
pprint(flow.failed_records) # Failed records
```
Refer to the [Getting Started](https://github.com/VladimirSiv/pytransflow/wiki/Getting-Started)
wiki page for additional examples and guided initial steps or check out the blog post that
introduces this library [pytransflow](https://www.vladsiv.com/pytransflow/).
## Features
The following are some of the features that pytransflow provides:
- Define processing flows using YAML files
- Use all kinds of flow configurations to fine-tune the flow
- Leverage [pydantic](https://github.com/pydantic/pydantic)‘s features for data validation
- Apply transformations only if defined condition is met
- Build your own library of transformations
- Use multiple input and output datasets
- Ignore specific errors during processing
- Set conditions for output datasets
- Track failed records
- Define flow fail scenarios
- Process records in parallel
- Use flow level variables etc.
For more information on these features and how to use them, please refer to the
[Wiki Page](https://github.com/VladimirSiv/pytransflow/wiki).
## License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "pytransflow",
"maintainer": "Vladimir Sivcevic",
"docs_url": null,
"requires_python": "<4.0.0,>=3.8.11",
"maintainer_email": "vladsiv@protonmail.com",
"keywords": "Flow, Transformation, Record Processing, Data, Pipelines",
"author": "Vladimir Sivcevic",
"author_email": "vladsiv@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/df/49/5e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b/pytransflow-0.1.1.tar.gz",
"platform": null,
"description": "# pytransflow\n\nA simple library for record-level processing using flows of transformations defined as YAML files\n\n## Overview\n\npytransflow lets you process records by defining a flow of transformations.\nEach flow has its configuration which is defined using YAML files and can be as simple as\n\n```yaml\ndescription: A simple test flow\ninstant_fail: True\nfail_scenarios:\n percentage_of_failed_records: 90\nvariables:\n a: B\ntransformations:\n - prefix:\n field: a\n value: test\n condition: \"@a/c/d/e == !:a\"\n ignore_errors:\n - output_already_exists\n output_datasets:\n - k\n - add_field:\n name: test/a/b\n value: { \"a\": \"b\" }\n input_datasets:\n - k\n output_datasets:\n - x\n - z\n```\n\nProcessing is initiated using the `Flow` class:\n\n```python\nfrom pytransflow.core import Flow\nrecords = [...]\n\nflow = Flow(name=\"<flow-name>\")\nflow.process(records)\npprint(flow.datasets) # End result\npprint(flow.failed_records) # Failed records\n```\n\nRefer to the [Getting Started](https://github.com/VladimirSiv/pytransflow/wiki/Getting-Started)\nwiki page for additional examples and guided initial steps or check out the blog post that\nintroduces this library [pytransflow](https://www.vladsiv.com/pytransflow/).\n\n## Features\n\nThe following are some of the features that pytransflow provides:\n\n- Define processing flows using YAML files\n- Use all kinds of flow configurations to fine-tune the flow\n- Leverage [pydantic](https://github.com/pydantic/pydantic)\u2018s features for data validation\n- Apply transformations only if defined condition is met\n- Build your own library of transformations\n- Use multiple input and output datasets\n- Ignore specific errors during processing\n- Set conditions for output datasets\n- Track failed records\n- Define flow fail scenarios\n- Process records in parallel\n- Use flow level variables etc.\n\nFor more information on these features and how to use them, please refer to the\n[Wiki Page](https://github.com/VladimirSiv/pytransflow/wiki).\n\n## License\n\nMIT\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple library for record-level processing using flows of transformations defined as YAML files.",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"flow",
" transformation",
" record processing",
" data",
" pipelines"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e8b4f4184aaad49869f67d90f79493b48873b147df4dce2d1dda1994370fc8ea",
"md5": "446c5eb51cb67ae69a77098418ffc224",
"sha256": "9ae60f9552489865a277b3df2fd26fbe144579f4a44510532143a5c4aa7ee775"
},
"downloads": -1,
"filename": "pytransflow-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "446c5eb51cb67ae69a77098418ffc224",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0.0,>=3.8.11",
"size": 44457,
"upload_time": "2024-07-19T20:43:31",
"upload_time_iso_8601": "2024-07-19T20:43:31.239856Z",
"url": "https://files.pythonhosted.org/packages/e8/b4/f4184aaad49869f67d90f79493b48873b147df4dce2d1dda1994370fc8ea/pytransflow-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "df495e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b",
"md5": "5e8a62873aad248e2d3e70dc215a9c18",
"sha256": "059224e7d7216b337b56ca4b23eeb4bd36efdcc2cada9c160eb94b1722e9db9c"
},
"downloads": -1,
"filename": "pytransflow-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "5e8a62873aad248e2d3e70dc215a9c18",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0.0,>=3.8.11",
"size": 24554,
"upload_time": "2024-07-19T20:43:32",
"upload_time_iso_8601": "2024-07-19T20:43:32.587784Z",
"url": "https://files.pythonhosted.org/packages/df/49/5e1765e95cdf57141e1760d3b10a61046fcb0a64221e7cdd79bedba21b4b/pytransflow-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-19 20:43:32",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pytransflow"
}