blueno


Nameblueno JSON
Version 0.1.21 PyPI version JSON
download
home_pageNone
SummaryA Python ETL library for creating declarative data pipelines.
upload_time2025-07-14 17:38:49
maintainerNone
docs_urlNone
authorJimmy Jensen
requires_python>=3.10
licenseNone
keywords etl deltalake polars delta-rs fabric microsoft azure
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
![blueno logo](./assets/images/blueno-128x128.png)
 
A platform agnostic Python library for writing declarative data pipelines the ***bueno*** way

# blueno
A collection of Python ETL utilities for working with data pipelines with built in orchestration.

Mainly focused on Data Engineering tasks utilising [Polars](https://github.com/pola-rs/polars) and [delta-rs](https://github.com/delta-io/delta-rs).

It features **blueprints** as the central orchestration unit, and is inspired by the likes of dbt [models](https://docs.getdbt.com/docs/build/models), SQLMesh [models](https://sqlmesh.readthedocs.io/en/stable/concepts/models/sql_models/) and Dagster's [software defined assets](https://dagster.io/glossary/software-defined-assets).

While it also has features for running on Microsoft Fabric, its utilities are decoupled in a way to let you choose your own storage and compute.

## Demo of sequential run

![blueno-demo-concurrency-1](./docs/assets/blueno-demo-concurrency-1.gif)

## Demo of parallel run

![blueno-demo-concurrency-1](./docs/assets/blueno-demo-concurrency-2.gif)

## Table of contents

- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Features](#features)
  - [Local development first](#local-development-first)
  - [Remote code execution](#remote-code-execution)
  - [ETL helper functions](#etl-helper-functions)
  - [Blueprints](#blueprints)
- [Documentation](#documentation)
- [Contributing](#contributing)

## Installation
Regular installation:
```bash
pip install blueno
```

For Microsoft Fabric or Azure:
```bash
pip install "blueno[azure]"
```

## Features

### Local development first
Aim to provide a local development environment. This means you can develop, run code and store data locally as part of your development cycle. You can also read and write to Azure Data Lake or Microsoft Fabric lakehouses.

### Remote code execution
While local development is favored, sometimes we need run our data pipelines a real setting. The library lets you run code directly in Microsoft Fabric.

### ETL helper functions
- Read from delta tables or parquet files with automatic authentication to Azure Data Lake or OneLake
- Common transformations (add audit columns, reorder columns, deduplicate etc.)
- Load delta tables with one of the provided load methods (upsert, overwrite, append etc.)

### Blueprints
Blueprints is a feature to declaratively specify your entities in your data pipelines.

Configuration such as storage location, write behaviour and other configuration is set in the decorator, while the function itself only is concerned about business logic.

```python
from blueno import blueprint, Blueprint
import polars as pl

workspace_name = ...
lakehouse_name = ...

lakehouse_base_uri = f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables"

@blueprint(
    table_uri=f"{lakehouse_base_uri}/silver/customer",
    primary_keys=["customer_id"],
    write_mode="overwrite",
)
def silver_customer(self: Blueprint, bronze_customer: pl.DataFrame) -> pl.DataFrame:
    
    # Deduplicate customers
    df = bronze_customers.unique(subset=self.primary_keys)

    return df
```

Given you also specified a blueprint for `bronze_customer` specified as a dependency for `silver_customer`, the blueprints will automatically be wired and executed in the correct order.

See the documentation for an elaborate example.

## Documentation
For quick start and detailed documentation, examples, and API reference, visit our [GitHub Pages documentation](https://mrjsj.github.io/blueno/).

## Contributing
Contributions are welcome! Here are some ways you can contribute:

- Report bugs and feature requests through GitHub issues
- Submit pull requests for bug fixes or new features
- Improve documentation
- Share ideas for new utilities


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "blueno",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "etl, deltalake, polars, delta-rs, fabric, Microsoft, azure",
    "author": "Jimmy Jensen",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "\n![blueno logo](./assets/images/blueno-128x128.png)\n \nA platform agnostic Python library for writing declarative data pipelines the ***bueno*** way\n\n# blueno\nA collection of Python ETL utilities for working with data pipelines with built in orchestration.\n\nMainly focused on Data Engineering tasks utilising [Polars](https://github.com/pola-rs/polars) and [delta-rs](https://github.com/delta-io/delta-rs).\n\nIt features **blueprints** as the central orchestration unit, and is inspired by the likes of dbt [models](https://docs.getdbt.com/docs/build/models), SQLMesh [models](https://sqlmesh.readthedocs.io/en/stable/concepts/models/sql_models/) and Dagster's [software defined assets](https://dagster.io/glossary/software-defined-assets).\n\nWhile it also has features for running on Microsoft Fabric, its utilities are decoupled in a way to let you choose your own storage and compute.\n\n## Demo of sequential run\n\n![blueno-demo-concurrency-1](./docs/assets/blueno-demo-concurrency-1.gif)\n\n## Demo of parallel run\n\n![blueno-demo-concurrency-1](./docs/assets/blueno-demo-concurrency-2.gif)\n\n## Table of contents\n\n- [Table of contents](#table-of-contents)\n- [Installation](#installation)\n- [Features](#features)\n  - [Local development first](#local-development-first)\n  - [Remote code execution](#remote-code-execution)\n  - [ETL helper functions](#etl-helper-functions)\n  - [Blueprints](#blueprints)\n- [Documentation](#documentation)\n- [Contributing](#contributing)\n\n## Installation\nRegular installation:\n```bash\npip install blueno\n```\n\nFor Microsoft Fabric or Azure:\n```bash\npip install \"blueno[azure]\"\n```\n\n## Features\n\n### Local development first\nAim to provide a local development environment. This means you can develop, run code and store data locally as part of your development cycle. You can also read and write to Azure Data Lake or Microsoft Fabric lakehouses.\n\n### Remote code execution\nWhile local development is favored, sometimes we need run our data pipelines a real setting. The library lets you run code directly in Microsoft Fabric.\n\n### ETL helper functions\n- Read from delta tables or parquet files with automatic authentication to Azure Data Lake or OneLake\n- Common transformations (add audit columns, reorder columns, deduplicate etc.)\n- Load delta tables with one of the provided load methods (upsert, overwrite, append etc.)\n\n### Blueprints\nBlueprints is a feature to declaratively specify your entities in your data pipelines.\n\nConfiguration such as storage location, write behaviour and other configuration is set in the decorator, while the function itself only is concerned about business logic.\n\n```python\nfrom blueno import blueprint, Blueprint\nimport polars as pl\n\nworkspace_name = ...\nlakehouse_name = ...\n\nlakehouse_base_uri = f\"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables\"\n\n@blueprint(\n    table_uri=f\"{lakehouse_base_uri}/silver/customer\",\n    primary_keys=[\"customer_id\"],\n    write_mode=\"overwrite\",\n)\ndef silver_customer(self: Blueprint, bronze_customer: pl.DataFrame) -> pl.DataFrame:\n    \n    # Deduplicate customers\n    df = bronze_customers.unique(subset=self.primary_keys)\n\n    return df\n```\n\nGiven you also specified a blueprint for `bronze_customer` specified as a dependency for `silver_customer`, the blueprints will automatically be wired and executed in the correct order.\n\nSee the documentation for an elaborate example.\n\n## Documentation\nFor quick start and detailed documentation, examples, and API reference, visit our [GitHub Pages documentation](https://mrjsj.github.io/blueno/).\n\n## Contributing\nContributions are welcome! Here are some ways you can contribute:\n\n- Report bugs and feature requests through GitHub issues\n- Submit pull requests for bug fixes or new features\n- Improve documentation\n- Share ideas for new utilities\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python ETL library for creating declarative data pipelines.",
    "version": "0.1.21",
    "project_urls": {
        "Repository": "https://github.com/mrjsj/blueno"
    },
    "split_keywords": [
        "etl",
        " deltalake",
        " polars",
        " delta-rs",
        " fabric",
        " microsoft",
        " azure"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "51100d4683ac99da086283f527ee6bd027c1af7adaa585b7e2b34d3e83befec7",
                "md5": "72f298875a853c9ca0d5b8602ab99d1a",
                "sha256": "763d670dd886138521613ea8733daa1629eb0f47a69fda2726fd15ea4ce2f16e"
            },
            "downloads": -1,
            "filename": "blueno-0.1.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "72f298875a853c9ca0d5b8602ab99d1a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 48809,
            "upload_time": "2025-07-14T17:38:49",
            "upload_time_iso_8601": "2025-07-14T17:38:49.543500Z",
            "url": "https://files.pythonhosted.org/packages/51/10/0d4683ac99da086283f527ee6bd027c1af7adaa585b7e2b34d3e83befec7/blueno-0.1.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 17:38:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mrjsj",
    "github_project": "blueno",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "blueno"
}
        
Elapsed time: 0.83484s