Name | framelink JSON |
Version | 0.2.2 JSON |
download | |
home_page | |
Summary | |
upload_time | 2023-05-06 06:12:05 |
maintainer | |
docs_url | None |
author | |
requires_python | <4.0,>=3.8 |
license | |
keywords | data dag orchastration dataframe |
VCS | |
bugtrack_url | |
requirements | No requirements were recorded. |
Travis-CI | No Travis. |
coveralls test coverage | No coveralls. |
[![Version](https://img.shields.io/pypi/v/framelink)](https://pypi.org/project/framelink/) [![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/gittoby/framelink/lint_test_build.yml)](https://github.com/GitToby/framelink) [![GitHub Release Date](https://img.shields.io/github/release-date/GitToby/framelink)](https://github.com/GitToby/framelink) [![codecov](https://codecov.io/gh/GitToby/framelink/branch/master/graph/badge.svg?token=Uh8viFPyOG)](https://codecov.io/gh/GitToby/framelink) [![PyPi downloads](https://img.shields.io/pypi/dm/framelink)](https://pypi.org/project/framelink/) Framelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See roadmap below for future of the project. **This project is still in prerelease, consider the API unstable. Any usage should be pinned.** ```bash pip install framelink ``` ## Goals Framelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free! - **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree, schemas & model metadata. - **Simple to run** - writing models should be agnostic of running models, once the models are written execution wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally. - **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project for free. ## Concepts - A **Pipeline** is a DAG of _models_ that can be executed in a particular way. - A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form. - A **Frame** is a result of a _model_ run. ## Features - [x] Model links & DAG + diagramming - [x] Context logging per model - [x] Diagramming and tracking of the model DAG - [x] Caches and auto-persistence - [ ] Dynamic sourcing for models - [x] Cli to run a project - [ ] Transpiler for popular DAG execution environments ## Example ```python from pathlib import Path import pandas as pd import polars as pl from framelink.core import FramelinkPipeline, FramelinkSettings from framelink.storage.core import PickleStorage, NoStorage settings = FramelinkSettings( default_storage=PickleStorage(Path(__file__).parent / "data") ) pipeline = FramelinkPipeline(settings=settings) @pipeline.model() def src_frame_1(_: FramelinkPipeline) -> pd.DataFrame: return pd.DataFrame(data={ "name": ["amy", "peter"], "age": [31, 12], }) @pipeline.model(storage=NoStorage()) def src_frame_2(_: FramelinkPipeline) -> pd.DataFrame: return pd.DataFrame(data={ "name": ["amy", "peter", "helen"], "fave_food": ["oranges", "chocolate", "water"], }) @pipeline.model() def merge_model(ctx: FramelinkPipeline) -> pl.DataFrame: res_1 = ctx.ref(src_frame_1) res_2 = ctx.ref(src_frame_2) key = "name" ctx.log.info(f"Merging both sources on {key}") return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key) # build with implicit context r_1 = pipeline.build(merge_model) print(r_1) # shape: (2, 3) # ┌───────┬─────┬───────────┐ # │ name ┆ age ┆ fave_food │ # │ --- ┆ --- ┆ --- │ # │ str ┆ i64 ┆ str │ # ╞═══════╪═════╪═══════════╡ # │ amy ┆ 31 ┆ oranges │ # │ peter ┆ 12 ┆ chocolate │ # └───────┴─────┴───────────┘ print(merge_model.upstreams) # {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>} print(src_frame_1.downstreams) # {<merge_model at 0x1477c2910>} print(pipeline.model_names) # ['merge_model', 'src_frame_1', 'src_frame_2'] print(list(pipeline.topological_sorted_nodes())) # [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)] # if you have the graphing options engaged. pipeline.graph_plt() # will draw you a matplotlib of the DAG dot = pipeline.graph_dot() # will provide a DOT language representation of the DAG ``` ## Feature Roadmap This could change... ### v0.2.0 - [x] Model links & DAG implemented - [x] Context logger available - [x] Diagramming and tracking of the model DAG ### v0.3.0 - [ ] Cleaner graph results - [ ] Merging of multiple framelink pipelines enabling - [ ] Orchestration passthrough and local execution. - [x] Caches and auto-persistence - [x] Dynamic sourcing for models - [ ] model overrides for CLI and python runtimes. - [x] Cli to run a project ### v0.4.0 - [ ] SQL models & dbt, sqlmesh compatability - [ ] Open Tracing integration
{ "_id": null, "home_page": "", "name": "framelink", "maintainer": "", "docs_url": null, "requires_python": "<4.0,>=3.8", "maintainer_email": "", "keywords": "data,DAG,orchastration,dataframe", "author": "", "author_email": "Toby Devlin <toby@tobydevlin.com>", "download_url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz", "platform": null, "description": "[![Version](https://img.shields.io/pypi/v/framelink)](https://pypi.org/project/framelink/)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/gittoby/framelink/lint_test_build.yml)](https://github.com/GitToby/framelink)\n[![GitHub Release Date](https://img.shields.io/github/release-date/GitToby/framelink)](https://github.com/GitToby/framelink)\n[![codecov](https://codecov.io/gh/GitToby/framelink/branch/master/graph/badge.svg?token=Uh8viFPyOG)](https://codecov.io/gh/GitToby/framelink)\n[![PyPi downloads](https://img.shields.io/pypi/dm/framelink)](https://pypi.org/project/framelink/)\n\nFramelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See\nroadmap below for future of the project.\n\n**This project is still in prerelease, consider the API unstable. Any usage should be pinned.**\n\n```bash\npip install framelink\n```\n\n## Goals\n\nFramelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free!\n\n- **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree,\n schemas & model metadata.\n- **Simple to run** - writing models should be agnostic of running models, once the models are written execution\n wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally.\n- **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project\n for free.\n\n## Concepts\n\n- A **Pipeline** is a DAG of _models_ that can be executed in a particular way.\n- A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form.\n- A **Frame** is a result of a _model_ run.\n\n## Features\n\n- [x] Model links & DAG + diagramming\n- [x] Context logging per model\n- [x] Diagramming and tracking of the model DAG\n- [x] Caches and auto-persistence\n- [ ] Dynamic sourcing for models\n- [x] Cli to run a project\n- [ ] Transpiler for popular DAG execution environments\n\n## Example\n\n```python\nfrom pathlib import Path\n\nimport pandas as pd\nimport polars as pl\n\nfrom framelink.core import FramelinkPipeline, FramelinkSettings\nfrom framelink.storage.core import PickleStorage, NoStorage\n\nsettings = FramelinkSettings(\n default_storage=PickleStorage(Path(__file__).parent / \"data\")\n)\n\npipeline = FramelinkPipeline(settings=settings)\n\n\n@pipeline.model()\ndef src_frame_1(_: FramelinkPipeline) -> pd.DataFrame:\n return pd.DataFrame(data={\n \"name\": [\"amy\", \"peter\"],\n \"age\": [31, 12],\n })\n\n\n@pipeline.model(storage=NoStorage())\ndef src_frame_2(_: FramelinkPipeline) -> pd.DataFrame:\n return pd.DataFrame(data={\n \"name\": [\"amy\", \"peter\", \"helen\"],\n \"fave_food\": [\"oranges\", \"chocolate\", \"water\"],\n })\n\n\n@pipeline.model()\ndef merge_model(ctx: FramelinkPipeline) -> pl.DataFrame:\n res_1 = ctx.ref(src_frame_1)\n res_2 = ctx.ref(src_frame_2)\n key = \"name\"\n ctx.log.info(f\"Merging both sources on {key}\")\n return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key)\n\n\n# build with implicit context\nr_1 = pipeline.build(merge_model)\nprint(r_1)\n# shape: (2, 3)\n# \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n# \u2502 name \u2506 age \u2506 fave_food \u2502\n# \u2502 --- \u2506 --- \u2506 --- \u2502\n# \u2502 str \u2506 i64 \u2506 str \u2502\n# \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n# \u2502 amy \u2506 31 \u2506 oranges \u2502\n# \u2502 peter \u2506 12 \u2506 chocolate \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\nprint(merge_model.upstreams)\n# {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>}\n\nprint(src_frame_1.downstreams)\n# {<merge_model at 0x1477c2910>}\n\nprint(pipeline.model_names)\n# ['merge_model', 'src_frame_1', 'src_frame_2']\n\nprint(list(pipeline.topological_sorted_nodes()))\n# [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)]\n\n# if you have the graphing options engaged.\npipeline.graph_plt() # will draw you a matplotlib of the DAG\ndot = pipeline.graph_dot() # will provide a DOT language representation of the DAG\n```\n\n## Feature Roadmap\n\nThis could change...\n\n### v0.2.0\n\n- [x] Model links & DAG implemented\n- [x] Context logger available\n- [x] Diagramming and tracking of the model DAG\n\n### v0.3.0\n\n- [ ] Cleaner graph results\n- [ ] Merging of multiple framelink pipelines enabling\n- [ ] Orchestration passthrough and local execution.\n- [x] Caches and auto-persistence\n- [x] Dynamic sourcing for models\n- [ ] model overrides for CLI and python runtimes.\n- [x] Cli to run a project\n\n### v0.4.0\n\n- [ ] SQL models & dbt, sqlmesh compatability\n- [ ] Open Tracing integration\n", "bugtrack_url": null, "license": "", "summary": "", "version": "0.2.2", "project_urls": { "github": "https://github.com/GitToby/framelink" }, "split_keywords": [ "data", "dag", "orchastration", "dataframe" ], "urls": [ { "comment_text": "", "digests": { "blake2b_256": "645f0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5", "md5": "0827f027585a7bf300491074a5369bdf", "sha256": "23cf8e3ccd0275d580046fcc140e76b4b95b1742de848ee36cc50263ef53e046" }, "downloads": -1, "filename": "framelink-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "0827f027585a7bf300491074a5369bdf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "<4.0,>=3.8", "size": 12165, "upload_time": "2023-05-06T06:12:02", "upload_time_iso_8601": "2023-05-06T06:12:02.524722Z", "url": "https://files.pythonhosted.org/packages/64/5f/0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5/framelink-0.2.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "291e4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341", "md5": "59240e9ebd58b0f93e68ce1fcef086e7", "sha256": "4b34199bc101c7b6691a23341e153d3690de4ae5050960bbaa2c3bd9a4e68f46" }, "downloads": -1, "filename": "framelink-0.2.2.tar.gz", "has_sig": false, "md5_digest": "59240e9ebd58b0f93e68ce1fcef086e7", "packagetype": "sdist", "python_version": "source", "requires_python": "<4.0,>=3.8", "size": 157975, "upload_time": "2023-05-06T06:12:05", "upload_time_iso_8601": "2023-05-06T06:12:05.243725Z", "url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz", "yanked": false, "yanked_reason": null } ], "upload_time": "2023-05-06 06:12:05", "github": true, "gitlab": false, "bitbucket": false, "codeberg": false, "github_user": "GitToby", "github_project": "framelink", "travis_ci": false, "coveralls": false, "github_actions": true, "lcname": "framelink" }