framelink


Nameframelink JSON
Version 0.2.2 PyPI version JSON
download
home_page
Summary
upload_time2023-05-06 06:12:05
maintainer
docs_urlNone
author
requires_python<4.0,>=3.8
license
keywords data dag orchastration dataframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Version](https://img.shields.io/pypi/v/framelink)](https://pypi.org/project/framelink/)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/gittoby/framelink/lint_test_build.yml)](https://github.com/GitToby/framelink)
[![GitHub Release Date](https://img.shields.io/github/release-date/GitToby/framelink)](https://github.com/GitToby/framelink)
[![codecov](https://codecov.io/gh/GitToby/framelink/branch/master/graph/badge.svg?token=Uh8viFPyOG)](https://codecov.io/gh/GitToby/framelink)
[![PyPi downloads](https://img.shields.io/pypi/dm/framelink)](https://pypi.org/project/framelink/)

Framelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See
roadmap below for future of the project.

**This project is still in prerelease, consider the API unstable. Any usage should be pinned.**

```bash
pip install framelink
```

## Goals

Framelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free!

- **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree,
  schemas & model metadata.
- **Simple to run** - writing models should be agnostic of running models, once the models are written execution
  wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally.
- **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project
  for free.

## Concepts

- A **Pipeline** is a DAG of _models_ that can be executed in a particular way.
- A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form.
- A **Frame** is a result of a _model_ run.

## Features

- [x] Model links & DAG + diagramming
- [x] Context logging per model
- [x] Diagramming and tracking of the model DAG
- [x] Caches and auto-persistence
- [ ] Dynamic sourcing for models
- [x] Cli to run a project
- [ ] Transpiler for popular DAG execution environments

## Example

```python
from pathlib import Path

import pandas as pd
import polars as pl

from framelink.core import FramelinkPipeline, FramelinkSettings
from framelink.storage.core import PickleStorage, NoStorage

settings = FramelinkSettings(
    default_storage=PickleStorage(Path(__file__).parent / "data")
)

pipeline = FramelinkPipeline(settings=settings)


@pipeline.model()
def src_frame_1(_: FramelinkPipeline) -> pd.DataFrame:
    return pd.DataFrame(data={
        "name": ["amy", "peter"],
        "age": [31, 12],
    })


@pipeline.model(storage=NoStorage())
def src_frame_2(_: FramelinkPipeline) -> pd.DataFrame:
    return pd.DataFrame(data={
        "name": ["amy", "peter", "helen"],
        "fave_food": ["oranges", "chocolate", "water"],
    })


@pipeline.model()
def merge_model(ctx: FramelinkPipeline) -> pl.DataFrame:
    res_1 = ctx.ref(src_frame_1)
    res_2 = ctx.ref(src_frame_2)
    key = "name"
    ctx.log.info(f"Merging both sources on {key}")
    return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key)


# build with implicit context
r_1 = pipeline.build(merge_model)
print(r_1)
# shape: (2, 3)
# ┌───────┬─────┬───────────┐
# │ name  ┆ age ┆ fave_food │
# │ ---   ┆ --- ┆ ---       │
# │ str   ┆ i64 ┆ str       │
# ╞═══════╪═════╪═══════════╡
# │ amy   ┆ 31  ┆ oranges   │
# │ peter ┆ 12  ┆ chocolate │
# └───────┴─────┴───────────┘

print(merge_model.upstreams)
# {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>}

print(src_frame_1.downstreams)
# {<merge_model at 0x1477c2910>}

print(pipeline.model_names)
# ['merge_model', 'src_frame_1', 'src_frame_2']

print(list(pipeline.topological_sorted_nodes()))
# [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)]

# if you have the graphing options engaged.
pipeline.graph_plt()  # will draw you a matplotlib of the DAG
dot = pipeline.graph_dot()  # will provide a DOT language representation of the DAG
```

## Feature Roadmap

This could change...

### v0.2.0

- [x] Model links & DAG implemented
- [x] Context logger available
- [x] Diagramming and tracking of the model DAG

### v0.3.0

- [ ] Cleaner graph results
- [ ] Merging of multiple framelink pipelines enabling
- [ ] Orchestration passthrough and local execution.
- [x] Caches and auto-persistence
- [x] Dynamic sourcing for models
- [ ] model overrides for CLI and python runtimes.
- [x] Cli to run a project

### v0.4.0

- [ ] SQL models & dbt, sqlmesh compatability
- [ ] Open Tracing integration

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "framelink",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": "",
    "keywords": "data,DAG,orchastration,dataframe",
    "author": "",
    "author_email": "Toby Devlin <toby@tobydevlin.com>",
    "download_url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz",
    "platform": null,
    "description": "[![Version](https://img.shields.io/pypi/v/framelink)](https://pypi.org/project/framelink/)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/gittoby/framelink/lint_test_build.yml)](https://github.com/GitToby/framelink)\n[![GitHub Release Date](https://img.shields.io/github/release-date/GitToby/framelink)](https://github.com/GitToby/framelink)\n[![codecov](https://codecov.io/gh/GitToby/framelink/branch/master/graph/badge.svg?token=Uh8viFPyOG)](https://codecov.io/gh/GitToby/framelink)\n[![PyPi downloads](https://img.shields.io/pypi/dm/framelink)](https://pypi.org/project/framelink/)\n\nFramelink is a simple wrapper thats designed to provide context into pandas, polars and other Dataframe engines. See\nroadmap below for future of the project.\n\n**This project is still in prerelease, consider the API unstable. Any usage should be pinned.**\n\n```bash\npip install framelink\n```\n\n## Goals\n\nFramelink should provide a way for collaborating teams to write python or SQL models to see their data flow easily and get the a whole load of stuff for free!\n\n- **Simple to write** - writing models should be no harder than a function implementation but provide a dependency tree,\n  schemas & model metadata.\n- **Simple to run** - writing models should be agnostic of running models, once the models are written execution\n  wrappers with diagnostics, tracing & lineage should be easy to derive for the execution platform any team is running without having any special requirements for running locally.\n- **Scheduler agnostic** - we are not making a new airflow, dagster etc. Framelink serves to add metadata to a project\n  for free.\n\n## Concepts\n\n- A **Pipeline** is a DAG of _models_ that can be executed in a particular way.\n- A **Model** is a definition of sourcing data and, potentially, a transform. It's an ETL in its most basic form.\n- A **Frame** is a result of a _model_ run.\n\n## Features\n\n- [x] Model links & DAG + diagramming\n- [x] Context logging per model\n- [x] Diagramming and tracking of the model DAG\n- [x] Caches and auto-persistence\n- [ ] Dynamic sourcing for models\n- [x] Cli to run a project\n- [ ] Transpiler for popular DAG execution environments\n\n## Example\n\n```python\nfrom pathlib import Path\n\nimport pandas as pd\nimport polars as pl\n\nfrom framelink.core import FramelinkPipeline, FramelinkSettings\nfrom framelink.storage.core import PickleStorage, NoStorage\n\nsettings = FramelinkSettings(\n    default_storage=PickleStorage(Path(__file__).parent / \"data\")\n)\n\npipeline = FramelinkPipeline(settings=settings)\n\n\n@pipeline.model()\ndef src_frame_1(_: FramelinkPipeline) -> pd.DataFrame:\n    return pd.DataFrame(data={\n        \"name\": [\"amy\", \"peter\"],\n        \"age\": [31, 12],\n    })\n\n\n@pipeline.model(storage=NoStorage())\ndef src_frame_2(_: FramelinkPipeline) -> pd.DataFrame:\n    return pd.DataFrame(data={\n        \"name\": [\"amy\", \"peter\", \"helen\"],\n        \"fave_food\": [\"oranges\", \"chocolate\", \"water\"],\n    })\n\n\n@pipeline.model()\ndef merge_model(ctx: FramelinkPipeline) -> pl.DataFrame:\n    res_1 = ctx.ref(src_frame_1)\n    res_2 = ctx.ref(src_frame_2)\n    key = \"name\"\n    ctx.log.info(f\"Merging both sources on {key}\")\n    return pl.from_pandas(res_1).join(pl.from_pandas(res_2), on=key)\n\n\n# build with implicit context\nr_1 = pipeline.build(merge_model)\nprint(r_1)\n# shape: (2, 3)\n# \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n# \u2502 name  \u2506 age \u2506 fave_food \u2502\n# \u2502 ---   \u2506 --- \u2506 ---       \u2502\n# \u2502 str   \u2506 i64 \u2506 str       \u2502\n# \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n# \u2502 amy   \u2506 31  \u2506 oranges   \u2502\n# \u2502 peter \u2506 12  \u2506 chocolate \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\nprint(merge_model.upstreams)\n# {<src_frame_2 at 0x1477c2c90>, <src_frame_1 at 0x144f0ab50>}\n\nprint(src_frame_1.downstreams)\n# {<merge_model at 0x1477c2910>}\n\nprint(pipeline.model_names)\n# ['merge_model', 'src_frame_1', 'src_frame_2']\n\nprint(list(pipeline.topological_sorted_nodes()))\n# [(<src_frame_1 at 0x144f0ab50>, <src_frame_2 at 0x1477c2c90>), (<merge_model at 0x1477c2910>,)]\n\n# if you have the graphing options engaged.\npipeline.graph_plt()  # will draw you a matplotlib of the DAG\ndot = pipeline.graph_dot()  # will provide a DOT language representation of the DAG\n```\n\n## Feature Roadmap\n\nThis could change...\n\n### v0.2.0\n\n- [x] Model links & DAG implemented\n- [x] Context logger available\n- [x] Diagramming and tracking of the model DAG\n\n### v0.3.0\n\n- [ ] Cleaner graph results\n- [ ] Merging of multiple framelink pipelines enabling\n- [ ] Orchestration passthrough and local execution.\n- [x] Caches and auto-persistence\n- [x] Dynamic sourcing for models\n- [ ] model overrides for CLI and python runtimes.\n- [x] Cli to run a project\n\n### v0.4.0\n\n- [ ] SQL models & dbt, sqlmesh compatability\n- [ ] Open Tracing integration\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.2.2",
    "project_urls": {
        "github": "https://github.com/GitToby/framelink"
    },
    "split_keywords": [
        "data",
        "dag",
        "orchastration",
        "dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "645f0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5",
                "md5": "0827f027585a7bf300491074a5369bdf",
                "sha256": "23cf8e3ccd0275d580046fcc140e76b4b95b1742de848ee36cc50263ef53e046"
            },
            "downloads": -1,
            "filename": "framelink-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0827f027585a7bf300491074a5369bdf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 12165,
            "upload_time": "2023-05-06T06:12:02",
            "upload_time_iso_8601": "2023-05-06T06:12:02.524722Z",
            "url": "https://files.pythonhosted.org/packages/64/5f/0a8fe3aea2797ff1c6aa25e381aa9e152be5c42b9f791b32b2b6cd6530b5/framelink-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "291e4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341",
                "md5": "59240e9ebd58b0f93e68ce1fcef086e7",
                "sha256": "4b34199bc101c7b6691a23341e153d3690de4ae5050960bbaa2c3bd9a4e68f46"
            },
            "downloads": -1,
            "filename": "framelink-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "59240e9ebd58b0f93e68ce1fcef086e7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 157975,
            "upload_time": "2023-05-06T06:12:05",
            "upload_time_iso_8601": "2023-05-06T06:12:05.243725Z",
            "url": "https://files.pythonhosted.org/packages/29/1e/4aa978c6ea05b0ff31384da773b9396aea843776c5b5477c6745d83f9341/framelink-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-06 06:12:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GitToby",
    "github_project": "framelink",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "framelink"
}
        
Elapsed time: 0.06301s