Name | framelink JSON |
Version |
0.2.2
JSON |
| download |
home_page | |
Summary | |
upload_time | 2023-05-06 06:12:05 |
maintainer | |
docs_url | None |
author | |
requires_python | <4.0,>=3.8 |
license | |
keywords |
data
dag
orchastration
dataframe
|
VCS |
data:image/s3,"s3://crabby-images/c29d3/c29d3b011f5f6236c399e5a53b3f9d303ea352c2" alt="" |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[data:image/s3,"s3://crabby-images/49ed9/49ed9ee80d8959296e8684081472d551f24529fb" alt="Version"](https://pypi.org/project/framelink/)
[data:image/s3,"s3://crabby-images/d9519/d95197f9fd832ec1e929f3e80aeaaa04cd59d3e2" alt="GitHub Workflow Status"](https://github.com/GitToby/framelink)
[data:image/s3,"s3://crabby-images/e1472/e14727e66810c65b143d277f51a5d36d6f5dc68c" alt="GitHub Release Date"](https://github.com/GitToby/framelink)
[data:image/s3,"s3://crabby-images/365a5/365a55050010e2056b5b4270372d8923eaf01759" alt="codecov"](https://codecov.io/gh/GitToby/framelink)
[data:image/s3,"s3://crabby-images/82f15/82f15ae302320e4a758be81d7221e23f99843dd5" alt="PyPi downloads"](https://pypi.org/project/framelink/)
Framelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See
roadmap below for future of the project.
**This project is still in prerelease, consider the API unstable. Any usage should be pinned.**
```bash
pip install framelink
```
## Goals
Framelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free!
- **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree,
schemas & model metadata.
- **Simple to run** - writing models should be agnostic of running models, once the models are written execution
wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally.
- **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project
for free.
## Concepts
- A **Pipeline** is a DAG of _models_ that can be executed in a particular way.
- A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form.
- A **Frame** is a result of a _model_ run.
## Features
- [x] Model links & DAG + diagramming
- [x] Context logging per model
- [x] Diagramming and tracking of the model DAG
- [x] Caches and auto-persistence
- [ ] Dynamic sourcing for models
- [x] Cli to run a project
- [ ] Transpiler for popular DAG execution environments
## Example
```python
from pathlib import Path
import pandas as pd
import polars as pl
from framelink.core import FramelinkPipeline, FramelinkSettings
from framelink.storage.core import PickleStorage, NoStorage
settings = FramelinkSettings(
default_storage=PickleStorage(Path(__file__).parent / "data")
)
pipeline = FramelinkPipeline(settings=settings)
@pipeline.model()
def src_frame_1(_: FramelinkPipeline) -> pd.DataFrame:
return pd.DataFrame(data={
"name": ["amy", "peter"],
"age": [31, 12],
})
@pipeline.model(storage=NoStorage())
def src_frame_2(_: FramelinkPipeline) -> pd.DataFrame:
return pd.DataFrame(data={
"name": ["amy", "peter", "helen"],
"fave_food": ["oranges", "chocolate", "water"],
})
@pipeline.model()
def merge_model(ctx: FramelinkPipeline) -> pl.DataFrame:
res_1 = ctx.ref(src_frame_1)
res_2 = ctx.ref(src_frame_2)
key = "name"
ctx.log.info(f"Merging both sources on {key}")
return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key)
# build with implicit context
r_1 = pipeline.build(merge_model)
print(r_1)
# shape: (2, 3)
# ┌───────┬─────┬───────────┐
# │ name ┆ age ┆ fave_food │
# │ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ str │
# ╞═══════╪═════╪═══════════╡
# │ amy ┆ 31 ┆ oranges │
# │ peter ┆ 12 ┆ chocolate │
# └───────┴─────┴───────────┘
print(merge_model.upstreams)
# {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>}
print(src_frame_1.downstreams)
# {<merge_model at 0x1477c2910>}
print(pipeline.model_names)
# ['merge_model', 'src_frame_1', 'src_frame_2']
print(list(pipeline.topological_sorted_nodes()))
# [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)]
# if you have the graphing options engaged.
pipeline.graph_plt() # will draw you a matplotlib of the DAG
dot = pipeline.graph_dot() # will provide a DOT language representation of the DAG
```
## Feature Roadmap
This could change...
### v0.2.0
- [x] Model links & DAG implemented
- [x] Context logger available
- [x] Diagramming and tracking of the model DAG
### v0.3.0
- [ ] Cleaner graph results
- [ ] Merging of multiple framelink pipelines enabling
- [ ] Orchestration passthrough and local execution.
- [x] Caches and auto-persistence
- [x] Dynamic sourcing for models
- [ ] model overrides for CLI and python runtimes.
- [x] Cli to run a project
### v0.4.0
- [ ] SQL models & dbt, sqlmesh compatability
- [ ] Open Tracing integration
Raw data
{
"_id": null,
"home_page": "",
"name": "framelink",
"maintainer": "",
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": "",
"keywords": "data,DAG,orchastration,dataframe",
"author": "",
"author_email": "Toby Devlin <toby@tobydevlin.com>",
"download_url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz",
"platform": null,
"description": "[data:image/s3,"s3://crabby-images/49ed9/49ed9ee80d8959296e8684081472d551f24529fb" alt="Version"](https://pypi.org/project/framelink/)\n[data:image/s3,"s3://crabby-images/d9519/d95197f9fd832ec1e929f3e80aeaaa04cd59d3e2" alt="GitHub Workflow Status"](https://github.com/GitToby/framelink)\n[data:image/s3,"s3://crabby-images/e1472/e14727e66810c65b143d277f51a5d36d6f5dc68c" alt="GitHub Release Date"](https://github.com/GitToby/framelink)\n[data:image/s3,"s3://crabby-images/365a5/365a55050010e2056b5b4270372d8923eaf01759" alt="codecov"](https://codecov.io/gh/GitToby/framelink)\n[data:image/s3,"s3://crabby-images/82f15/82f15ae302320e4a758be81d7221e23f99843dd5" alt="PyPi downloads"](https://pypi.org/project/framelink/)\n\nFramelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See\nroadmap below for future of the project.\n\n**This project is still in prerelease, consider the API unstable. Any usage should be pinned.**\n\n```bash\npip install framelink\n```\n\n## Goals\n\nFramelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free!\n\n- **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree,\n schemas & model metadata.\n- **Simple to run** - writing models should be agnostic of running models, once the models are written execution\n wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally.\n- **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project\n for free.\n\n## Concepts\n\n- A **Pipeline** is a DAG of _models_ that can be executed in a particular way.\n- A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form.\n- A **Frame** is a result of a _model_ run.\n\n## Features\n\n- [x] Model links & DAG + diagramming\n- [x] Context logging per model\n- [x] Diagramming and tracking of the model DAG\n- [x] Caches and auto-persistence\n- [ ] Dynamic sourcing for models\n- [x] Cli to run a project\n- [ ] Transpiler for popular DAG execution environments\n\n## Example\n\n```python\nfrom pathlib import Path\n\nimport pandas as pd\nimport polars as pl\n\nfrom framelink.core import FramelinkPipeline, FramelinkSettings\nfrom framelink.storage.core import PickleStorage, NoStorage\n\nsettings = FramelinkSettings(\n default_storage=PickleStorage(Path(__file__).parent / \"data\")\n)\n\npipeline = FramelinkPipeline(settings=settings)\n\n\n@pipeline.model()\ndef src_frame_1(_: FramelinkPipeline) -> pd.DataFrame:\n return pd.DataFrame(data={\n \"name\": [\"amy\", \"peter\"],\n \"age\": [31, 12],\n })\n\n\n@pipeline.model(storage=NoStorage())\ndef src_frame_2(_: FramelinkPipeline) -> pd.DataFrame:\n return pd.DataFrame(data={\n \"name\": [\"amy\", \"peter\", \"helen\"],\n \"fave_food\": [\"oranges\", \"chocolate\", \"water\"],\n })\n\n\n@pipeline.model()\ndef merge_model(ctx: FramelinkPipeline) -> pl.DataFrame:\n res_1 = ctx.ref(src_frame_1)\n res_2 = ctx.ref(src_frame_2)\n key = \"name\"\n ctx.log.info(f\"Merging both sources on {key}\")\n return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key)\n\n\n# build with implicit context\nr_1 = pipeline.build(merge_model)\nprint(r_1)\n# shape: (2, 3)\n# \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n# \u2502 name \u2506 age \u2506 fave_food \u2502\n# \u2502 --- \u2506 --- \u2506 --- \u2502\n# \u2502 str \u2506 i64 \u2506 str \u2502\n# \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n# \u2502 amy \u2506 31 \u2506 oranges \u2502\n# \u2502 peter \u2506 12 \u2506 chocolate \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\nprint(merge_model.upstreams)\n# {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>}\n\nprint(src_frame_1.downstreams)\n# {<merge_model at 0x1477c2910>}\n\nprint(pipeline.model_names)\n# ['merge_model', 'src_frame_1', 'src_frame_2']\n\nprint(list(pipeline.topological_sorted_nodes()))\n# [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)]\n\n# if you have the graphing options engaged.\npipeline.graph_plt() # will draw you a matplotlib of the DAG\ndot = pipeline.graph_dot() # will provide a DOT language representation of the DAG\n```\n\n## Feature Roadmap\n\nThis could change...\n\n### v0.2.0\n\n- [x] Model links & DAG implemented\n- [x] Context logger available\n- [x] Diagramming and tracking of the model DAG\n\n### v0.3.0\n\n- [ ] Cleaner graph results\n- [ ] Merging of multiple framelink pipelines enabling\n- [ ] Orchestration passthrough and local execution.\n- [x] Caches and auto-persistence\n- [x] Dynamic sourcing for models\n- [ ] model overrides for CLI and python runtimes.\n- [x] Cli to run a project\n\n### v0.4.0\n\n- [ ] SQL models & dbt, sqlmesh compatability\n- [ ] Open Tracing integration\n",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.2.2",
"project_urls": {
"github": "https://github.com/GitToby/framelink"
},
"split_keywords": [
"data",
"dag",
"orchastration",
"dataframe"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "645f0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5",
"md5": "0827f027585a7bf300491074a5369bdf",
"sha256": "23cf8e3ccd0275d580046fcc140e76b4b95b1742de848ee36cc50263ef53e046"
},
"downloads": -1,
"filename": "framelink-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0827f027585a7bf300491074a5369bdf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 12165,
"upload_time": "2023-05-06T06:12:02",
"upload_time_iso_8601": "2023-05-06T06:12:02.524722Z",
"url": "https://files.pythonhosted.org/packages/64/5f/0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5/framelink-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "291e4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341",
"md5": "59240e9ebd58b0f93e68ce1fcef086e7",
"sha256": "4b34199bc101c7b6691a23341e153d3690de4ae5050960bbaa2c3bd9a4e68f46"
},
"downloads": -1,
"filename": "framelink-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "59240e9ebd58b0f93e68ce1fcef086e7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 157975,
"upload_time": "2023-05-06T06:12:05",
"upload_time_iso_8601": "2023-05-06T06:12:05.243725Z",
"url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-06 06:12:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "GitToby",
"github_project": "framelink",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "framelink"
}