simple-dag


Namesimple-dag JSON
Version 1.0.4 PyPI version JSON
download
home_pageNone
SummaryCreate simple Pipelines with Python
upload_time2024-08-12 08:04:43
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseMIT License Copyright (c) 2023, Tim Rohner Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords simple_dag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # 🎯 simple_dag

![pypi](https://img.shields.io/pypi/v/simple_dag.svg)
[![Documentation Status](https://readthedocs.org/projects/simple-pipeline/badge/?version=latest)](https://simple-pipeline.readthedocs.io/en/latest/?version=latest)
[![Updates](https://pyup.io/repos/github/leokster/simple_dag/shield.svg)](https://pyup.io/repos/github/leokster/simple_dag/)

Welcome to `simple_dag`! Here, we provide the easiest way to create a pipeline in an orchestration-agnostic manner. Just decorate your functions with our `@transform` decorator! 🎉

- Free software: MIT license

## 🚀 Getting Started

![DAG](https://raw.githubusercontent.com/leokster/simple_dag/main/assets/dagster.png)

```
git clone https://github.com/leokster/simple_dag.git
cd simple_dag
python3.10 -m venv venv
source venv/bin/activate
pip install simple_dag
venv/bin/dagit -f examples/dag.py
```

## 💡 The Main Ideas

### What is a DAG? 🤔: 
A DAG, or Directed Acyclic Graph, represents a set of functions (the nodes) and their dependencies (the edges). It allows us to execute many functions, which depend on each other, in a specific order.

### Aren't there already many DAG libraries?: 
Absolutely, but most of them are tightly coupled to specific orchestration frameworks and require a very specific way to define a DAG. This makes it challenging to switch between frameworks. Our library, however, is different! 🎈

### What is the goal of this library?: 
Our library aims to offer a simple and streamlined way to define a DAG in a framework-agnostic manner. This means you can switch between frameworks without having to rewrite your DAG. As of now, we support Dagster and direct execution. 🎯

### What is a transform?: 
In the context of a data pipeline, a transform is a function that takes some data as input and produces some new data as output. It's like the magic wand in your data pipeline. 🪄

### Show me some code! 👩‍💻: 
Imagine we have a transformation where we read a CSV file, filter the data, and write it to a new CSV file. The `@transform` decorator marks a function as a transformation function. `PandasDFInput` and `PandasDFOutput` prepare the data for the transformation and write the post-transformation data, respectively. `df` is the input data and `output` is the output data.

```
import os
from simple_dag import transform, PandasDFInput, PandasDFOutput

@transform(
        df=PandasDFInput(
                os.path.join("data/curated/ds_salaries_2023.csv"),
        ),
        output=PandasDFOutput(
                os.path.join("data/curated/ds_salaries_2023_ES.csv"),
        ),
)
def create_2023_salaries_ES(df, output: PandasDFOutput):
df = df[df["company_location"] == "ES"]
output.write_data(df, index=False)
```

### `@transform`: 
This decorator indicates that a function is a transformation. It accepts `Input` and `Output` arguments. Please note, the `Output` arguments are passed directly to your function, while the `Input` arguments are processed by the `Input` class and then the resultant data is passed to your function.

### `Input`: 
Inputs prepare the data for your function. Currently, we support the following inputs:
- `PandasDFInput`: Reads a pandas dataframe from a CSV file. The function receives this data as a pandas dataframe.
- `BinaryInput`: Reads a binary file. The function receives this data as a bytes object.
- `SparkDFInput`: Reads a Spark dataframe from a parquet file (Experimental). The function receives this data as a Spark dataframe.

### `Output`: 
Outputs write the data after your function has processed it. The `Output` objects have a `write_data` method, which can be used in your function to write the data. Currently, we support the following outputs:
- `PandasDFOutput`: Writes a pandas dataframe to a CSV file.
- `BinaryOutput`: Writes a binary file.
- `SparkDFOutput`: Writes a Spark dataframe to a parquet file (Experimental).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "simple-dag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "simple_dag",
    "author": null,
    "author_email": "Tim Rohner <info@timrohner.ch>",
    "download_url": "https://files.pythonhosted.org/packages/01/10/d6b9a6ef7b42dc9dbf677073f336bccf886d7ba657c3fd9c0510565dce01/simple_dag-1.0.4.tar.gz",
    "platform": null,
    "description": "# \ud83c\udfaf simple_dag\n\n![pypi](https://img.shields.io/pypi/v/simple_dag.svg)\n[![Documentation Status](https://readthedocs.org/projects/simple-pipeline/badge/?version=latest)](https://simple-pipeline.readthedocs.io/en/latest/?version=latest)\n[![Updates](https://pyup.io/repos/github/leokster/simple_dag/shield.svg)](https://pyup.io/repos/github/leokster/simple_dag/)\n\nWelcome to `simple_dag`! Here, we provide the easiest way to create a pipeline in an orchestration-agnostic manner. Just decorate your functions with our `@transform` decorator! \ud83c\udf89\n\n- Free software: MIT license\n\n## \ud83d\ude80 Getting Started\n\n![DAG](https://raw.githubusercontent.com/leokster/simple_dag/main/assets/dagster.png)\n\n```\ngit clone https://github.com/leokster/simple_dag.git\ncd simple_dag\npython3.10 -m venv venv\nsource venv/bin/activate\npip install simple_dag\nvenv/bin/dagit -f examples/dag.py\n```\n\n## \ud83d\udca1 The Main Ideas\n\n### What is a DAG? \ud83e\udd14: \nA DAG, or Directed Acyclic Graph, represents a set of functions (the nodes) and their dependencies (the edges). It allows us to execute many functions, which depend on each other, in a specific order.\n\n### Aren't there already many DAG libraries?: \nAbsolutely, but most of them are tightly coupled to specific orchestration frameworks and require a very specific way to define a DAG. This makes it challenging to switch between frameworks. Our library, however, is different! \ud83c\udf88\n\n### What is the goal of this library?: \nOur library aims to offer a simple and streamlined way to define a DAG in a framework-agnostic manner. This means you can switch between frameworks without having to rewrite your DAG. As of now, we support Dagster and direct execution. \ud83c\udfaf\n\n### What is a transform?: \nIn the context of a data pipeline, a transform is a function that takes some data as input and produces some new data as output. It's like the magic wand in your data pipeline. \ud83e\ude84\n\n### Show me some code! \ud83d\udc69\u200d\ud83d\udcbb: \nImagine we have a transformation where we read a CSV file, filter the data, and write it to a new CSV file. The `@transform` decorator marks a function as a transformation function. `PandasDFInput` and `PandasDFOutput` prepare the data for the transformation and write the post-transformation data, respectively. `df` is the input data and `output` is the output data.\n\n```\nimport os\nfrom simple_dag import transform, PandasDFInput, PandasDFOutput\n\n@transform(\n        df=PandasDFInput(\n                os.path.join(\"data/curated/ds_salaries_2023.csv\"),\n        ),\n        output=PandasDFOutput(\n                os.path.join(\"data/curated/ds_salaries_2023_ES.csv\"),\n        ),\n)\ndef create_2023_salaries_ES(df, output: PandasDFOutput):\ndf = df[df[\"company_location\"] == \"ES\"]\noutput.write_data(df, index=False)\n```\n\n### `@transform`: \nThis decorator indicates that a function is a transformation. It accepts `Input` and `Output` arguments. Please note, the `Output` arguments are passed directly to your function, while the `Input` arguments are processed by the `Input` class and then the resultant data is passed to your function.\n\n### `Input`: \nInputs prepare the data for your function. Currently, we support the following inputs:\n- `PandasDFInput`: Reads a pandas dataframe from a CSV file. The function receives this data as a pandas dataframe.\n- `BinaryInput`: Reads a binary file. The function receives this data as a bytes object.\n- `SparkDFInput`: Reads a Spark dataframe from a parquet file (Experimental). The function receives this data as a Spark dataframe.\n\n### `Output`: \nOutputs write the data after your function has processed it. The `Output` objects have a `write_data` method, which can be used in your function to write the data. Currently, we support the following outputs:\n- `PandasDFOutput`: Writes a pandas dataframe to a CSV file.\n- `BinaryOutput`: Writes a binary file.\n- `SparkDFOutput`: Writes a Spark dataframe to a parquet file (Experimental).\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023, Tim Rohner  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  ",
    "summary": "Create simple Pipelines with Python",
    "version": "1.0.4",
    "project_urls": {
        "Source": "https://github.com/leokster/simple_dag"
    },
    "split_keywords": [
        "simple_dag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46e4274d5500a4f32979f3be125f4a8eaedfc5797000c70be350305e91484866",
                "md5": "133bc0727f0eb15b4d8e9334d4d92813",
                "sha256": "e49216237547202a3d771f57e5072989521ceeaca5496891af9b0f18cfe24995"
            },
            "downloads": -1,
            "filename": "simple_dag-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "133bc0727f0eb15b4d8e9334d4d92813",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16032,
            "upload_time": "2024-08-12T08:04:41",
            "upload_time_iso_8601": "2024-08-12T08:04:41.121314Z",
            "url": "https://files.pythonhosted.org/packages/46/e4/274d5500a4f32979f3be125f4a8eaedfc5797000c70be350305e91484866/simple_dag-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0110d6b9a6ef7b42dc9dbf677073f336bccf886d7ba657c3fd9c0510565dce01",
                "md5": "2a2f188b6d16dd16f077fb43fbb70bc0",
                "sha256": "2a9a6a1898db82e93f5cdcdccb7371d47d3ffc6c39b9d8ee59d5f851e7c65a15"
            },
            "downloads": -1,
            "filename": "simple_dag-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2a2f188b6d16dd16f077fb43fbb70bc0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1397963,
            "upload_time": "2024-08-12T08:04:43",
            "upload_time_iso_8601": "2024-08-12T08:04:43.512673Z",
            "url": "https://files.pythonhosted.org/packages/01/10/d6b9a6ef7b42dc9dbf677073f336bccf886d7ba657c3fd9c0510565dce01/simple_dag-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-12 08:04:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "leokster",
    "github_project": "simple_dag",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "simple-dag"
}
        
Elapsed time: 0.32437s