# <img src=https://raw.githubusercontent.com/RLado/Canonada/refs/heads/master/logo.svg height=27> Canonada
Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.
[](https://github.com/RLado/Canonada)
[](https://pypi.org/project/canonada/)
[](https://pypi.org/project/canonada/)
## Why Canonada?
- **Standardized**: Canonada provides a standardized way to build your data projects
- **Modular**: Canonada is modular and allows you to build and visualize data pipelines with ease
- **Memory Efficient**: Canonada is memory efficient and can handle large datasets by streaming data through the pipeline instead of loading it all at once
## Features
- **Centralized control of data sources**: Manage all your data sources in one place, enabling you to keep your team in sync
- **Centralized control of the project configuration**: Manage all your project configurations in one place
- **Easy dataloading**: Load data from various sources like CSV, JSON, Parquet, etc.
- **Use functions as nodes**: Functions are the building blocks of Canonada. You can use any function as a node in your pipeline
- **Create streaming data pipelines**: Create parallel and sequential data pipelines with ease
- **Visualize your data pipeline**: Visualize your data pipelines, nodes and connections
## Summary
The goal of Canonada is to help data scientists and engineers to organize their data projects with a standardized structure that facilitates more maintainable code compared to one-off scripts and notebooks.
Canonada allows you to define data projects as graphs, composed of nodes and edges, that stream data dynamically from your defined sources to memory, allowing the usage datasets bigger than memory. The system parallelizes the execution of your projects allowing you to focus exclusively on the data processing logic you care about.
**Let's quickly define a data pipeline as an example:**
We will define this simple pipeline that transforms a few timeseries signals:
<img src="https://github.com/user-attachments/assets/b773f613-4f86-4de2-95c3-b9dceacb58fd" width="800" />
> Use `canonada view` to get a representation of your data pipelines
```python
# Import example functions to transform the data
from .nodes import example_nodes
# Define the pipeline
streaming_pipe = Pipeline("streaming_pipe", [
# Read each signal from the catalog and add an offset defined in the parameters
Node(
func=example_nodes.add_offset,
input=["raw_signals", "params:section_1.offset"], # Load inputs from the catalog
output=["offset_signals"],
name="create_offsets",
description="Adds parametrized offset to the signals"
),
# Save the previous output to disk with a dummy module
Node(
func=lambda x: x, # Just pass the input to the output
input=["offset_signals"],
output=["offset_signals_catalog"],
name="save_offsets",
description="Saves the offset signals using the datahandler specified in the catalog"
),
# Calculate the maximum value of each signal
Node(
func=example_nodes.get_signal_max,
input=["offset_signals"],
output=["max_values"],
name="get_signal_max",
description="Calculates the maximum value of the signals"
),
# Calculate the mean value of each signal
Node(
func=example_nodes.calculate_mean,
input=["offset_signals"],
output=["mean_values"],
name="calculate_mean",
description="Calculates the mean value of the signals"
),
# Save the stats of the signals in a CSV file
Node(
func=example_nodes.list_stats,
input=["offset_signals", "max_values", "mean_values"],
output=["stats"], # It will be saved in the defined file in the catalog
name="list_stats",
description="Returns the stats of the signals"
)
],
description="This pipeline reads signals from the catalog, adds an offset, calculates the maximum and mean values, and saves the stats to disk"
)
```
**Done!** Defining a data pipeline is as simple as that. To execute it you can type `canonada run pipelines streaming_pipe` on your terminal or use the `.run()` method of your pipeline object. Canonada will take care of the rest and parallelize the execution without any extra effort.
> Checkout the [Getting Started](https://github.com/RLado/Canonada/wiki/GettingStarted) guide for more information.
## Usage
Available commands:
```
Usage: canonada <command> <args>
Commands:
new <project_name> - Create a new project
catalog [list/params] - List all available datasets or get the project parameters
registry [pipelines/systems] - List all available pipelines or systems
run [pipelines/systems] <name(s)> - Run a pipeline or system
view [pipelines/systems] <name(s)> - View a pipeline or system
version - Print the version of Canonada
```
## Installation
Canonada is available on [PyPI](https://pypi.org/project/canonada/) and can be installed using pip:
```bash
pip install canonada
```
> Check out the [Getting Started](https://github.com/RLado/Canonada/wiki/GettingStarted) guide to learn how to create a new project with Canonada.
## Documentation
Check out the project's documentation [here](https://github.com/RLado/Canonada/wiki)
## Contributing
Contributions are welcome! If you have any suggestions, examples, datahandlers, bug reports, or feature requests, please open an issue or a discussion thread.
Raw data
{
"_id": null,
"home_page": null,
"name": "canonada",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "data science, streaming, pipeline, dataflow, canonada",
"author": null,
"author_email": "Ricard Lado <ricard@lado.one>",
"download_url": "https://files.pythonhosted.org/packages/23/a6/cfec7d01da51cdbff6bf2c8e3ae978a343b53c5193dc32b690168980f041/canonada-0.4.0.tar.gz",
"platform": null,
"description": "# <img src=https://raw.githubusercontent.com/RLado/Canonada/refs/heads/master/logo.svg height=27> Canonada\n\nCanonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.\n\n[](https://github.com/RLado/Canonada)\n[](https://pypi.org/project/canonada/)\n[](https://pypi.org/project/canonada/)\n\n## Why Canonada?\n- **Standardized**: Canonada provides a standardized way to build your data projects\n- **Modular**: Canonada is modular and allows you to build and visualize data pipelines with ease\n- **Memory Efficient**: Canonada is memory efficient and can handle large datasets by streaming data through the pipeline instead of loading it all at once\n\n## Features\n- **Centralized control of data sources**: Manage all your data sources in one place, enabling you to keep your team in sync\n- **Centralized control of the project configuration**: Manage all your project configurations in one place\n- **Easy dataloading**: Load data from various sources like CSV, JSON, Parquet, etc.\n- **Use functions as nodes**: Functions are the building blocks of Canonada. You can use any function as a node in your pipeline\n- **Create streaming data pipelines**: Create parallel and sequential data pipelines with ease\n- **Visualize your data pipeline**: Visualize your data pipelines, nodes and connections\n\n## Summary\nThe goal of Canonada is to help data scientists and engineers to organize their data projects with a standardized structure that facilitates more maintainable code compared to one-off scripts and notebooks.\n\nCanonada allows you to define data projects as graphs, composed of nodes and edges, that stream data dynamically from your defined sources to memory, allowing the usage datasets bigger than memory. The system parallelizes the execution of your projects allowing you to focus exclusively on the data processing logic you care about.\n\n**Let's quickly define a data pipeline as an example:**\n\nWe will define this simple pipeline that transforms a few timeseries signals:\n<img src=\"https://github.com/user-attachments/assets/b773f613-4f86-4de2-95c3-b9dceacb58fd\" width=\"800\" />\n> Use `canonada view` to get a representation of your data pipelines\n\n```python\n# Import example functions to transform the data\nfrom .nodes import example_nodes\n\n# Define the pipeline\nstreaming_pipe = Pipeline(\"streaming_pipe\", [\n # Read each signal from the catalog and add an offset defined in the parameters\n Node(\n func=example_nodes.add_offset, \n input=[\"raw_signals\", \"params:section_1.offset\"], # Load inputs from the catalog\n output=[\"offset_signals\"],\n name=\"create_offsets\",\n description=\"Adds parametrized offset to the signals\"\n ),\n # Save the previous output to disk with a dummy module\n Node(\n func=lambda x: x, # Just pass the input to the output\n input=[\"offset_signals\"],\n output=[\"offset_signals_catalog\"],\n name=\"save_offsets\",\n description=\"Saves the offset signals using the datahandler specified in the catalog\"\n ),\n # Calculate the maximum value of each signal\n Node(\n func=example_nodes.get_signal_max,\n input=[\"offset_signals\"],\n output=[\"max_values\"],\n name=\"get_signal_max\",\n description=\"Calculates the maximum value of the signals\"\n ),\n # Calculate the mean value of each signal\n Node(\n func=example_nodes.calculate_mean,\n input=[\"offset_signals\"],\n output=[\"mean_values\"],\n name=\"calculate_mean\",\n description=\"Calculates the mean value of the signals\"\n ),\n # Save the stats of the signals in a CSV file\n Node(\n func=example_nodes.list_stats,\n input=[\"offset_signals\", \"max_values\", \"mean_values\"],\n output=[\"stats\"], # It will be saved in the defined file in the catalog\n name=\"list_stats\",\n description=\"Returns the stats of the signals\"\n )\n ],\n description=\"This pipeline reads signals from the catalog, adds an offset, calculates the maximum and mean values, and saves the stats to disk\"\n)\n```\n\n**Done!** Defining a data pipeline is as simple as that. To execute it you can type `canonada run pipelines streaming_pipe` on your terminal or use the `.run()` method of your pipeline object. Canonada will take care of the rest and parallelize the execution without any extra effort.\n\n> Checkout the [Getting Started](https://github.com/RLado/Canonada/wiki/GettingStarted) guide for more information.\n\n## Usage\nAvailable commands:\n```\nUsage: canonada <command> <args>\nCommands:\n new <project_name> - Create a new project\n catalog [list/params] - List all available datasets or get the project parameters\n registry [pipelines/systems] - List all available pipelines or systems\n run [pipelines/systems] <name(s)> - Run a pipeline or system\n view [pipelines/systems] <name(s)> - View a pipeline or system\n version - Print the version of Canonada\n```\n\n## Installation\nCanonada is available on [PyPI](https://pypi.org/project/canonada/) and can be installed using pip:\n```bash\npip install canonada\n```\n\n> Check out the [Getting Started](https://github.com/RLado/Canonada/wiki/GettingStarted) guide to learn how to create a new project with Canonada.\n\n## Documentation\nCheck out the project's documentation [here](https://github.com/RLado/Canonada/wiki)\n\n## Contributing\nContributions are welcome! If you have any suggestions, examples, datahandlers, bug reports, or feature requests, please open an issue or a discussion thread.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/RLado/Canonada",
"Issues": "https://github.com/RLado/Canonada/issues"
},
"split_keywords": [
"data science",
" streaming",
" pipeline",
" dataflow",
" canonada"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "860c1ee3e150f0b9284f2ff8f9381205b5196c9806f8e28f17a02ca99b9fafe6",
"md5": "0fcb86870fade260459148f679879156",
"sha256": "637a8b373f67f35e47495d5c396524d07bb10ff1ab131f2e9c6e6b77adc77f83"
},
"downloads": -1,
"filename": "canonada-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0fcb86870fade260459148f679879156",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 24667,
"upload_time": "2025-09-05T08:17:20",
"upload_time_iso_8601": "2025-09-05T08:17:20.339928Z",
"url": "https://files.pythonhosted.org/packages/86/0c/1ee3e150f0b9284f2ff8f9381205b5196c9806f8e28f17a02ca99b9fafe6/canonada-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "23a6cfec7d01da51cdbff6bf2c8e3ae978a343b53c5193dc32b690168980f041",
"md5": "1264fbafad94c2352aed14c7a2fcf33e",
"sha256": "283fc16831936cb7ce13923152253a0764558a6d68872fbec4bbaf4e2b30216c"
},
"downloads": -1,
"filename": "canonada-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "1264fbafad94c2352aed14c7a2fcf33e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 25156,
"upload_time": "2025-09-05T08:17:21",
"upload_time_iso_8601": "2025-09-05T08:17:21.849498Z",
"url": "https://files.pythonhosted.org/packages/23/a6/cfec7d01da51cdbff6bf2c8e3ae978a343b53c5193dc32b690168980f041/canonada-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 08:17:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RLado",
"github_project": "Canonada",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "build",
"specs": [
[
">=",
"1.2.2"
]
]
},
{
"name": "coverage",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
">=",
"61.0"
]
]
},
{
"name": "twine",
"specs": [
[
">=",
"6.1.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.15.0"
]
]
},
{
"name": "graphviz",
"specs": [
[
"==",
"0.20.3"
]
]
}
],
"lcname": "canonada"
}