Name | sf-hamilton JSON |
Version |
1.86.1
JSON |
| download |
home_page | None |
Summary | Hamilton, the micro-framework for creating dataframes. |
upload_time | 2025-01-04 05:28:54 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <4,>=3.8 |
license | BSD-3-Clause |
keywords |
hamilton
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<h1><img src="https://github.com/DAGWorks-Inc/hamilton/assets/2328071/feb6abaa-b6d5-4271-a320-0ae4a18d8aa7" width="50"/> Hamilton β portable & expressive <br> data transformation DAGs</h1>
<a href='https://hamilton.dagworks.io/en/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/hamilton/badge/?version=latest' alt='Documentation Status' />
</a><a href="https://www.python.org/downloads/" target="_blank">
<img src="https://img.shields.io/badge/python-3.8|%203.9|%203.10|%203.11|%203.12-blue.svg" alt="Python supported"/>
</a>
<a href="https://pypi.org/project/sf-hamilton/" target="_blank">
<img src="https://badge.fury.io/py/sf-hamilton.svg" alt="PyPi Version"/>
</a>
<a href="https://pepy.tech/project/sf-hamilton" target="_blank">
<img src="https://pepy.tech/badge/sf-hamilton" alt="Total Downloads"/>
</a>
<a href="https://pepy.tech/project/sf-hamilton" target="_blank">
<img src="https://static.pepy.tech/badge/sf-hamilton/month" alt="Total Monthly Downloads"/>
</a>
<br/>
<a target="_blank" href="https://linkedin.com/showcase/dagster" style="background:none">
<img src="https://img.shields.io/badge/DAGWorks-Follow-purple.svg?logo=linkedin" />
</a>
<a href="https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g" target="_blank">
<img src="https://img.shields.io/badge/Hamilton-Join-purple.svg?logo=slack" alt="Hamilton Slack"/>
</a>
<a href="https://twitter.com/hamilton_os" target="_blank">
<img src="https://img.shields.io/badge/HamiltonOS-Follow-purple.svg?logo=X"/>
</a>
<a href="https://twitter.com/dagworks" target="_blank">
<img src="https://img.shields.io/badge/DAGWorks-Follow-purple.svg?logo=X"/>
</a>
</div>
<br></br>
Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is **portable**; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is **expressive**; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution).
To create a DAG, write regular Python functions that specify their dependencies with their parameters. As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you!
<div align="center">
<img src="./docs/_static/abc_highlight.png" alt="Create a project" width="65%"/>
</div>
<div align="center">
Functions <code>B()</code> and <code>C()</code> refer to function <code>A</code> via their parameters
</div>
<br>
Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the [Hamilton UI](https://hamilton.dagworks.io/en/latest/concepts/ui) allows you to automatically visualize, catalog, and monitor execution.
> Hamilton is great for DAGs, but if you need loops or conditional logic to create an LLM agent or a simulation, take a look at our sister library [Burr](https://github.com/dagworks-inc/burr) π€ .
# Installation
Hamilton supports Python 3.8+. We include the optional `visualization` dependency to display our Hamilton DAG. For visualizations, [Graphviz](https://graphviz.org/download/) needs to be installed on your system separately.
```bash
pip install "sf-hamilton[visualization]"
```
To use the Hamilton UI, install the `ui` and `sdk` dependencies.
```bash
pip install "sf-hamilton[ui,sdk]"
```
To try Hamilton in the browser, visit [www.tryhamilton.dev](https://www.tryhamilton.dev/?utm_source=README)
# Why use Hamilton?
Data teams write code to deliver business value, but few have the resources to standardize practices and provide quality assurance. Moving from proof-of-concept to production and cross-function collaboration (e.g., data science, engineering, ops) remain challenging for teams, big or small. Hamilton is designed to help throughout a project's lifecycle:
- **Separation of concerns**. Hamilton separates the DAG "definition" and "execution" which lets data scientists focus on solving problems and engineers manage production pipelines.
- **Effective collaboration**. The [Hamilton UI provides a shared interface](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/) for teams to inspect results and debug failures throughout the development cycle.
- **Low-friction dev to prod**. Use `@config.when()` to modify your DAG between execution environments instead of error-prone `if/else` feature flags. The notebook extension prevents the pain of migrating code from a notebook to a Python module.
- **Portable transformations**. Your DAG is [independent of infrastructure or orchestration](https://blog.dagworks.io/publish/posts/detail/145543927?referrer=%2Fpublish%2Fposts), meaning you can develop and debug locally and reuse code across contexts (local, Airflow, FastAPI, etc.).
- **Maintainable DAG definition**. Hamilton [automatically builds the DAG from a single line of code whether it has 10 or 1000 nodes](https://hamilton.dagworks.io/en/latest/concepts/driver/). It can also assemble multiple Python modules into a pipeline, encouraging modularity.
- **Expressive DAGs**. [Function modifiers](https://hamilton.dagworks.io/en/latest/concepts/function-modifiers/) are a unique feature to keep your code [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself) and reduce the complexity of maintaining large DAGs. Other frameworks inevitably lead to code redundancy or bloated functions.
- **Built-in coding style**. The Hamilton DAG is [defined using Python functions](https://hamilton.dagworks.io/en/latest/concepts/node/), encouraging modular, easy-to-read, self-documenting, and unit testable code.
- **Data and schema validation**. Decorate functions with `@check_output` to validate output properties, and raise warnings or exceptions. Add the `SchemaValidator()` adapter to automatically inspect dataframe-like objects (pandas, polars, Ibis, etc.) to track and validate their schema.
- **Built for plugins**. Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack. Our lively community will help you build what you need!
# Hamilton UI
You can track the execution of your Hamilton DAG in the [Hamilton UI](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/). It automatically populates a data catalog with lineage / tracing and provides execution observability to inspect results and debug errors. You can run it as a [local server](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/#local-mode) or a [self-hosted application using Docker](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/#docker-deployed-mode).
<p align="center">
<img src="./docs/_static/hamilton_1.jpeg" alt="Description1" width="30%" style="margin-right: 20px;"/>
<img src="./docs/_static/hamilton_2.jpeg" alt="Description2" width="30%" style="margin-right: 20px;"/>
<img src="./docs/_static/hamilton_3.jpeg" alt="Description3" width="30%"/>
</p>
<p align="center">
<em>DAG catalog, automatic dataset profiling, and execution tracking</em>
</p>
## Get started with the Hamilton UI
1. To use the Hamilton UI, install the dependencies (see `Installation` section) and start the server with
```bash
hamilton ui
```
2. On the first connection, create a `username` and a new project (the `project_id` should be `1`).
<div align="center">
<img src="./docs/_static/new_project.png" alt="Create a project" width="70%"/>
</div>
<br>
3. Track your Hamilton DAG by creating a `HamiltonTracker` object with your `username` and `project_id` and adding it to your `Builder`. Now, your DAG will appear in the UI's catalog and all executions will be tracked!
```python
from hamilton import driver
from hamilton_sdk.adapters import HamiltonTracker
import my_dag
# use your `username` and `project_id`
tracker = HamiltonTracker(
username="my_username",
project_id=1,
dag_name="hello_world",
)
# adding the tracker to the `Builder` will add the DAG to the catalog
dr = (
driver.Builder()
.with_modules(my_dag)
.with_adapters(tracker) # add your tracker here
.build()
)
# executing the `Driver` will track results
dr.execute(["C"])
```
# Documentation & learning resources
* π See the [official documentation](https://hamilton.dagworks.io/) to learn about the core concepts of Hamilton.
* π¨βπ« Consult the [examples on GitHub](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples) to learn about specific features or integrations with other frameworks.
* π° The [DAGWorks blog](https://blog.dagworks.io/) includes guides about how to build a data platform and narrative tutorials.
* πΊ Find video tutorials on the [DAGWorks YouTube channel](https://www.youtube.com/@DAGWorks-Inc)
* π£ Reach out via the [Hamilton Slack community](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g) for help and troubleshooting
# How does Hamilton compare to X?
Hamilton is not an orchestrator ([you might not need one](https://blog.dagworks.io/p/lean-data-automation-a-principal)), nor a feature store ([but you can use it to build one!](https://blog.dagworks.io/p/featurization-integrating-hamilton)). Its purpose is to help you structure and manage data transformations. If you know dbt, Hamilton does for Python what dbt does for SQL.
Another way to frame it is to think about the different layers of a data stack. Hamilton is at the **asset layer**. It helps you organize data transformations code (the **expression layer**), manage changes, and validate & test data.
<div align="center" style="width: 100%">
<table>
<colgroup>
<col style="width: 20%">
<col style="width: 40%">
<col style="width: 40%">
</colgroup>
<thead>
<tr>
<th>Layer</th>
<th>Purpose</th>
<th>Example Tools</th>
</tr>
</thead>
<tbody>
<tr>
<td>Orchestration</td>
<td>Operational system for the creation of assets</td>
<td>Airflow, Metaflow, Prefect, Dagster</td>
</tr>
<tr>
<td>Asset</td>
<td>Organize expressions into meaningful units <br> (e.g., dataset, ML model, table)</td>
<td><b>Hamilton</b>, dbt, dlt, SQLMesh, <a href="https://github.com/dagworks-inc/burr">Burr</a></td>
</tr>
<tr>
<td>Expression</td>
<td>Language to write data transformations</td>
<td>pandas, SQL, polars, Ibis, LangChain</td>
</tr>
<tr>
<td>Execution</td>
<td>Perform data transformations</td>
<td>Spark, Snowflake, DuckDB, RAPIDS</td>
</tr>
<tr>
<td>Data</td>
<td>Physical representation of data, inputs and outputs</td>
<td>S3, Postgres, file system, Snowflake</td>
</tr>
</tbody>
</table>
</div>
See our page on [Why use Hamilton?](https://hamilton.dagworks.io/en/latest/get-started/why-hamilton/) and framework [code comparisons](https://hamilton.dagworks.io/en/latest/code-comparisons/) for more information.
# π License
Hamilton is released under the BSD 3-Clause Clear License. See [LICENSE](https://github.com/DAGWorks-Inc/hamilton/blob/main/LICENSE.md) for details.
# π Community
## π¨βπ» Contributing
We're very supportive of changes by new contributors, big or small! Make sure to discuss potential changes by creating an issue or commenting on an existing one before opening a pull request. Good first contributions include creating an example or an integration with your favorite Python library!
To contribute, checkout our [contributing guidelines](https://github.com/DAGWorks-Inc/hamilton/blob/main/CONTRIBUTING.md), our [developer setup guide](https://github.com/DAGWorks-Inc/hamilton/blob/main/developer_setup.md), and our [Code of Conduct](https://github.com/DAGWorks-Inc/hamilton/blob/main/CODE_OF_CONDUCT.md).
## π Used by
Hamilton was started at Stitch Fix before the original creators founded DAGWorks Inc! The library is battle-tested and has been supporting production use cases since 2019.
>Read more about the [origin story](https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/).
* [Stitch Fix](https://www.stitchfix.com/) β Time series forecasting
* [UK Government Digital Services](https://github.com/alphagov/govuk-feedback-analysis) β National feedback pipeline (processing & analysis)
* [IBM](https://www.ibm.com/) β Internal search and ML pipelines
* [Opendoor](https://www.opendoor.com/) β Manage PySpark pipelines
* [Lexis Nexis](https://www.lexisnexis.com/en-us/home.page) β Feature processing and lineage
* [Adobe](https://www.adobe.com/) β Prompt engineering research
* [WrenAI](https://github.com/Canner/WrenAI) β async text-to-SQL workflows
* [British Cycling](https://www.britishcycling.org.uk/) β Telemetry analysis
* [Oak Ridge & PNNL](https://pnnl.gov/) β Naturf project
* [ORNL](https://www.ornl.gov/)
* [Federal Reserve Board](https://www.federalreserve.gov/)
* [Joby Aviation](https://www.jobyaviation.com/) β Flight data processing
* [Two](https://www.two.inc/)
* [Transfix](https://transfix.io/) β Online featurization and prediction
* [Railofy](https://www.railofy.com) β Orchestrate pandas code
* [Habitat Energy](https://www.habitat.energy/) β Time-series feature engineering
* [KI-Insurance](https://www.ki-insurance.com/) β Feature engineering
* [Ascena Retail](https://www.ascena.com/) β Feature engineering
* [NaroHQ](https://www.narohq.com/)
* [EquipmentShare](https://www.equipmentshare.com/)
* [Everstream.ai](https://www.everstream.ai/)
* [Flectere](https://flectere.net/)
* [F33.ai](https://f33.ai/)
* [Kora Money](https://www.koramoney.com)
* [Capitec Bank](https://www.capitecbank.co.za/)
* [Best Egg](https://bestegg.com/)
* [RTV Euro AGD](https://www.euro.com.pl/)
* [Wealth.com](https://www.wealth.com/)
* [wren.ai](https://wren.ai/)
## π€ Code Contributors
[![Contributors](https://contrib.rocks/image?repo=dagworks-inc/hamilton)](https://github.com/DAGWorks-Inc/hamilton/graphs/contributors)
## π Special Mentions & π¦ Bug Hunters
Thanks to our awesome community and their active involvement in the Hamilton library.
[Nils Olsson](https://github.com/nilsso), [MichaΕ Siedlaczek](https://github.com/elshize), [Alaa Abedrabbo](https://github.com/AAbedrabbo), [Shreya Datar](https://github.com/datarshreya), [Baldo Faieta](https://github.com/baldofaieta), [Anwar Brini](https://github.com/AnwarBrini), [Gourav Kumar](https://github.com/gms101), [Amos Aikman](https://github.com/amosaikman), [Ankush Kundaliya](https://github.com/akundaliya), [David Weselowski](https://github.com/j7zAhU), [Peter Robinson](https://github.com/Peter4137), [Seth Stokes](https://github.com/sT0v), [Louis Maddox](https://github.com/lmmx), [Stephen Bias](https://github.com/s-ducks), [Anup Joseph](https://github.com/AnupJoseph), [Jan Hurst](https://github.com/janhurst), [Flavia Santos](https://github.com/flaviassantos), [Nicolas Huray](https://github.com/nhuray), [Manabu Niseki](https://github.com/ninoseki), [Kyle Pounder](https://github.com/kpounder), [Alex Bustos](https://github.com/bustosalex1), [Andy Day](https://github.com/adayNU), [Alexander Cai](https://github.com/adzcai), [Nils MΓΌller-Wendt](https://github.com/ChronoJon), [Paul Larsen](https://github.com/munichpavel), [
Kemal Eren](https://github.com/kemaleren), [Jernej Frank](https://github.com/jernejfrank), [Noah Ridge](https://github.com/noahridge)
# π Citations
We'd appreciate citing Hamilton by referencing one of the following:
```bibtex
@inproceedings{DBLP:conf/vldb/KrawczykI22,
title = {Hamilton: a modular open source declarative paradigm for high level
modeling of dataflows},
author = {Stefan Krawczyk and Elijah ben Izzy},
editor = {Satyanarayana R. Valluri and Mohamed Za{\"{\i}}t},
booktitle = {1st International Workshop on Composable Data Management Systems,
CDMS@VLDB 2022, Sydney, Australia, September 9, 2022},
year = {2022},
url = {https://cdmsworkshop.github.io/2022/Proceedings/ShortPapers/Paper6\_StefanKrawczyk.pdf},
timestamp = {Wed, 19 Oct 2022 16:20:48 +0200},
biburl = {https://dblp.org/rec/conf/vldb/KrawczykI22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
```bibtex
@inproceedings{CEURWS:conf/vldb/KrawczykIQ22,
title = {Hamilton: enabling software engineering best practices for data transformations via generalized dataflow graphs},
author = {Stefan Krawczyk and Elijah ben Izzy and Danielle Quinn},
editor = {Cinzia Cappiello and Sandra Geisler and Maria-Esther Vidal},
booktitle = {1st International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022)},
pages = {41--50},
url = {https://ceur-ws.org/Vol-3306/paper5.pdf},
year = {2022}
}
```
# π Libraries built on / for Hamilton
* [Hypster](https://github.com/gilad-rubin/hypster) - hyperparameter management
* [DSP Decision Engine](https://github.com/capitec/dsp-decision-engine) - decision trees
* [NaturF](https://github.com/IMMM-SFA/naturf/) - library for data transformation for weather forecasting
* [WrenAI](https://github.com/Canner/WrenAI) - RAG
* [FlowerPower](https://github.com/legout/flowerpower/) - Scheduler for Hamilton
Raw data
{
"_id": null,
"home_page": null,
"name": "sf-hamilton",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": null,
"keywords": "hamilton",
"author": null,
"author_email": "Stefan Krawczyk <stefan@dagworks.io>, Elijah ben Izzy <elijah@dagworks.io>",
"download_url": "https://files.pythonhosted.org/packages/ce/3b/9f5bfb40decef54b2b338b27e6ff560d5fcceddefc336eb729f09102fb9b/sf_hamilton-1.86.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <h1><img src=\"https://github.com/DAGWorks-Inc/hamilton/assets/2328071/feb6abaa-b6d5-4271-a320-0ae4a18d8aa7\" width=\"50\"/> Hamilton \u2014 portable & expressive <br> data transformation DAGs</h1>\n <a href='https://hamilton.dagworks.io/en/latest/?badge=latest'>\n <img src='https://readthedocs.org/projects/hamilton/badge/?version=latest' alt='Documentation Status' />\n </a><a href=\"https://www.python.org/downloads/\" target=\"_blank\">\n <img src=\"https://img.shields.io/badge/python-3.8|%203.9|%203.10|%203.11|%203.12-blue.svg\" alt=\"Python supported\"/>\n </a>\n <a href=\"https://pypi.org/project/sf-hamilton/\" target=\"_blank\">\n <img src=\"https://badge.fury.io/py/sf-hamilton.svg\" alt=\"PyPi Version\"/>\n </a>\n <a href=\"https://pepy.tech/project/sf-hamilton\" target=\"_blank\">\n <img src=\"https://pepy.tech/badge/sf-hamilton\" alt=\"Total Downloads\"/>\n </a>\n <a href=\"https://pepy.tech/project/sf-hamilton\" target=\"_blank\">\n <img src=\"https://static.pepy.tech/badge/sf-hamilton/month\" alt=\"Total Monthly Downloads\"/>\n </a>\n <br/>\n <a target=\"_blank\" href=\"https://linkedin.com/showcase/dagster\" style=\"background:none\">\n <img src=\"https://img.shields.io/badge/DAGWorks-Follow-purple.svg?logo=linkedin\" />\n </a>\n <a href=\"https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g\" target=\"_blank\">\n <img src=\"https://img.shields.io/badge/Hamilton-Join-purple.svg?logo=slack\" alt=\"Hamilton Slack\"/>\n </a>\n <a href=\"https://twitter.com/hamilton_os\" target=\"_blank\">\n <img src=\"https://img.shields.io/badge/HamiltonOS-Follow-purple.svg?logo=X\"/>\n </a>\n <a href=\"https://twitter.com/dagworks\" target=\"_blank\">\n <img src=\"https://img.shields.io/badge/DAGWorks-Follow-purple.svg?logo=X\"/>\n </a>\n</div>\n<br></br>\n\nHamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is **portable**; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is **expressive**; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution).\n\nTo create a DAG, write regular Python functions that specify their dependencies with their parameters. As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you!\n\n<div align=\"center\">\n <img src=\"./docs/_static/abc_highlight.png\" alt=\"Create a project\" width=\"65%\"/>\n</div>\n<div align=\"center\">\n Functions <code>B()</code> and <code>C()</code> refer to function <code>A</code> via their parameters\n</div>\n<br>\n\nHamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the [Hamilton UI](https://hamilton.dagworks.io/en/latest/concepts/ui) allows you to automatically visualize, catalog, and monitor execution.\n\n> Hamilton is great for DAGs, but if you need loops or conditional logic to create an LLM agent or a simulation, take a look at our sister library [Burr](https://github.com/dagworks-inc/burr) \ud83e\udd16 .\n\n\n# Installation\n\nHamilton supports Python 3.8+. We include the optional `visualization` dependency to display our Hamilton DAG. For visualizations, [Graphviz](https://graphviz.org/download/) needs to be installed on your system separately.\n\n```bash\npip install \"sf-hamilton[visualization]\"\n```\n\nTo use the Hamilton UI, install the `ui` and `sdk` dependencies.\n\n```bash\npip install \"sf-hamilton[ui,sdk]\"\n```\n\nTo try Hamilton in the browser, visit [www.tryhamilton.dev](https://www.tryhamilton.dev/?utm_source=README)\n\n# Why use Hamilton?\n\nData teams write code to deliver business value, but few have the resources to standardize practices and provide quality assurance. Moving from proof-of-concept to production and cross-function collaboration (e.g., data science, engineering, ops) remain challenging for teams, big or small. Hamilton is designed to help throughout a project's lifecycle:\n\n- **Separation of concerns**. Hamilton separates the DAG \"definition\" and \"execution\" which lets data scientists focus on solving problems and engineers manage production pipelines.\n\n- **Effective collaboration**. The [Hamilton UI provides a shared interface](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/) for teams to inspect results and debug failures throughout the development cycle.\n\n- **Low-friction dev to prod**. Use `@config.when()` to modify your DAG between execution environments instead of error-prone `if/else` feature flags. The notebook extension prevents the pain of migrating code from a notebook to a Python module.\n\n- **Portable transformations**. Your DAG is [independent of infrastructure or orchestration](https://blog.dagworks.io/publish/posts/detail/145543927?referrer=%2Fpublish%2Fposts), meaning you can develop and debug locally and reuse code across contexts (local, Airflow, FastAPI, etc.).\n\n- **Maintainable DAG definition**. Hamilton [automatically builds the DAG from a single line of code whether it has 10 or 1000 nodes](https://hamilton.dagworks.io/en/latest/concepts/driver/). It can also assemble multiple Python modules into a pipeline, encouraging modularity.\n\n- **Expressive DAGs**. [Function modifiers](https://hamilton.dagworks.io/en/latest/concepts/function-modifiers/) are a unique feature to keep your code [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself) and reduce the complexity of maintaining large DAGs. Other frameworks inevitably lead to code redundancy or bloated functions.\n\n- **Built-in coding style**. The Hamilton DAG is [defined using Python functions](https://hamilton.dagworks.io/en/latest/concepts/node/), encouraging modular, easy-to-read, self-documenting, and unit testable code.\n\n- **Data and schema validation**. Decorate functions with `@check_output` to validate output properties, and raise warnings or exceptions. Add the `SchemaValidator()` adapter to automatically inspect dataframe-like objects (pandas, polars, Ibis, etc.) to track and validate their schema.\n\n- **Built for plugins**. Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack. Our lively community will help you build what you need!\n\n\n# Hamilton UI\n\nYou can track the execution of your Hamilton DAG in the [Hamilton UI](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/). It automatically populates a data catalog with lineage / tracing and provides execution observability to inspect results and debug errors. You can run it as a [local server](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/#local-mode) or a [self-hosted application using Docker](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/#docker-deployed-mode).\n\n<p align=\"center\">\n <img src=\"./docs/_static/hamilton_1.jpeg\" alt=\"Description1\" width=\"30%\" style=\"margin-right: 20px;\"/>\n <img src=\"./docs/_static/hamilton_2.jpeg\" alt=\"Description2\" width=\"30%\" style=\"margin-right: 20px;\"/>\n <img src=\"./docs/_static/hamilton_3.jpeg\" alt=\"Description3\" width=\"30%\"/>\n</p>\n<p align=\"center\">\n <em>DAG catalog, automatic dataset profiling, and execution tracking</em>\n</p>\n\n\n## Get started with the Hamilton UI\n\n1. To use the Hamilton UI, install the dependencies (see `Installation` section) and start the server with\n\n ```bash\n hamilton ui\n ```\n\n2. On the first connection, create a `username` and a new project (the `project_id` should be `1`).\n\n<div align=\"center\">\n <img src=\"./docs/_static/new_project.png\" alt=\"Create a project\" width=\"70%\"/>\n</div>\n<br>\n\n3. Track your Hamilton DAG by creating a `HamiltonTracker` object with your `username` and `project_id` and adding it to your `Builder`. Now, your DAG will appear in the UI's catalog and all executions will be tracked!\n\n ```python\n from hamilton import driver\n from hamilton_sdk.adapters import HamiltonTracker\n import my_dag\n\n # use your `username` and `project_id`\n tracker = HamiltonTracker(\n username=\"my_username\",\n project_id=1,\n dag_name=\"hello_world\",\n )\n\n # adding the tracker to the `Builder` will add the DAG to the catalog\n dr = (\n driver.Builder()\n .with_modules(my_dag)\n .with_adapters(tracker) # add your tracker here\n .build()\n )\n\n # executing the `Driver` will track results\n dr.execute([\"C\"])\n ```\n\n# Documentation & learning resources\n\n* \ud83d\udcda See the [official documentation](https://hamilton.dagworks.io/) to learn about the core concepts of Hamilton.\n\n* \ud83d\udc68\u200d\ud83c\udfeb Consult the [examples on GitHub](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples) to learn about specific features or integrations with other frameworks.\n\n* \ud83d\udcf0 The [DAGWorks blog](https://blog.dagworks.io/) includes guides about how to build a data platform and narrative tutorials.\n\n* \ud83d\udcfa Find video tutorials on the [DAGWorks YouTube channel](https://www.youtube.com/@DAGWorks-Inc)\n\n* \ud83d\udce3 Reach out via the [Hamilton Slack community](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g) for help and troubleshooting\n\n\n# How does Hamilton compare to X?\n\nHamilton is not an orchestrator ([you might not need one](https://blog.dagworks.io/p/lean-data-automation-a-principal)), nor a feature store ([but you can use it to build one!](https://blog.dagworks.io/p/featurization-integrating-hamilton)). Its purpose is to help you structure and manage data transformations. If you know dbt, Hamilton does for Python what dbt does for SQL.\n\nAnother way to frame it is to think about the different layers of a data stack. Hamilton is at the **asset layer**. It helps you organize data transformations code (the **expression layer**), manage changes, and validate & test data.\n\n<div align=\"center\" style=\"width: 100%\">\n <table>\n <colgroup>\n <col style=\"width: 20%\">\n <col style=\"width: 40%\">\n <col style=\"width: 40%\">\n </colgroup>\n <thead>\n <tr>\n <th>Layer</th>\n <th>Purpose</th>\n <th>Example Tools</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>Orchestration</td>\n <td>Operational system for the creation of assets</td>\n <td>Airflow, Metaflow, Prefect, Dagster</td>\n </tr>\n <tr>\n <td>Asset</td>\n <td>Organize expressions into meaningful units <br> (e.g., dataset, ML model, table)</td>\n <td><b>Hamilton</b>, dbt, dlt, SQLMesh, <a href=\"https://github.com/dagworks-inc/burr\">Burr</a></td>\n </tr>\n <tr>\n <td>Expression</td>\n <td>Language to write data transformations</td>\n <td>pandas, SQL, polars, Ibis, LangChain</td>\n </tr>\n <tr>\n <td>Execution</td>\n <td>Perform data transformations</td>\n <td>Spark, Snowflake, DuckDB, RAPIDS</td>\n </tr>\n <tr>\n <td>Data</td>\n <td>Physical representation of data, inputs and outputs</td>\n <td>S3, Postgres, file system, Snowflake</td>\n </tr>\n </tbody>\n </table>\n</div>\n\nSee our page on [Why use Hamilton?](https://hamilton.dagworks.io/en/latest/get-started/why-hamilton/) and framework [code comparisons](https://hamilton.dagworks.io/en/latest/code-comparisons/) for more information.\n\n# \ud83d\udcd1 License\n\nHamilton is released under the BSD 3-Clause Clear License. See [LICENSE](https://github.com/DAGWorks-Inc/hamilton/blob/main/LICENSE.md) for details.\n\n\n# \ud83c\udf0e Community\n## \ud83d\udc68\u200d\ud83d\udcbb Contributing\nWe're very supportive of changes by new contributors, big or small! Make sure to discuss potential changes by creating an issue or commenting on an existing one before opening a pull request. Good first contributions include creating an example or an integration with your favorite Python library!\n\n To contribute, checkout our [contributing guidelines](https://github.com/DAGWorks-Inc/hamilton/blob/main/CONTRIBUTING.md), our [developer setup guide](https://github.com/DAGWorks-Inc/hamilton/blob/main/developer_setup.md), and our [Code of Conduct](https://github.com/DAGWorks-Inc/hamilton/blob/main/CODE_OF_CONDUCT.md).\n\n\n## \ud83d\ude0e Used by\nHamilton was started at Stitch Fix before the original creators founded DAGWorks Inc! The library is battle-tested and has been supporting production use cases since 2019.\n\n>Read more about the [origin story](https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/).\n\n\n* [Stitch Fix](https://www.stitchfix.com/) \u2014 Time series forecasting\n* [UK Government Digital Services](https://github.com/alphagov/govuk-feedback-analysis) \u2014 National feedback pipeline (processing & analysis)\n* [IBM](https://www.ibm.com/) \u2014 Internal search and ML pipelines\n* [Opendoor](https://www.opendoor.com/) \u2014 Manage PySpark pipelines\n* [Lexis Nexis](https://www.lexisnexis.com/en-us/home.page) \u2014 Feature processing and lineage\n* [Adobe](https://www.adobe.com/) \u2014 Prompt engineering research\n* [WrenAI](https://github.com/Canner/WrenAI) \u2014 async text-to-SQL workflows\n* [British Cycling](https://www.britishcycling.org.uk/) \u2014 Telemetry analysis\n* [Oak Ridge & PNNL](https://pnnl.gov/) \u2014 Naturf project\n* [ORNL](https://www.ornl.gov/)\n* [Federal Reserve Board](https://www.federalreserve.gov/)\n* [Joby Aviation](https://www.jobyaviation.com/) \u2014 Flight data processing\n* [Two](https://www.two.inc/)\n* [Transfix](https://transfix.io/) \u2014 Online featurization and prediction\n* [Railofy](https://www.railofy.com) \u2014 Orchestrate pandas code\n* [Habitat Energy](https://www.habitat.energy/) \u2014 Time-series feature engineering\n* [KI-Insurance](https://www.ki-insurance.com/) \u2014 Feature engineering\n* [Ascena Retail](https://www.ascena.com/) \u2014 Feature engineering\n* [NaroHQ](https://www.narohq.com/)\n* [EquipmentShare](https://www.equipmentshare.com/)\n* [Everstream.ai](https://www.everstream.ai/)\n* [Flectere](https://flectere.net/)\n* [F33.ai](https://f33.ai/)\n* [Kora Money](https://www.koramoney.com)\n* [Capitec Bank](https://www.capitecbank.co.za/)\n* [Best Egg](https://bestegg.com/)\n* [RTV Euro AGD](https://www.euro.com.pl/)\n* [Wealth.com](https://www.wealth.com/)\n* [wren.ai](https://wren.ai/)\n\n## \ud83e\udd1d Code Contributors\n[![Contributors](https://contrib.rocks/image?repo=dagworks-inc/hamilton)](https://github.com/DAGWorks-Inc/hamilton/graphs/contributors)\n\n\n## \ud83d\ude4c Special Mentions & \ud83e\udd9f Bug Hunters\n\nThanks to our awesome community and their active involvement in the Hamilton library.\n\n[Nils Olsson](https://github.com/nilsso), [Micha\u0142 Siedlaczek](https://github.com/elshize), [Alaa Abedrabbo](https://github.com/AAbedrabbo), [Shreya Datar](https://github.com/datarshreya), [Baldo Faieta](https://github.com/baldofaieta), [Anwar Brini](https://github.com/AnwarBrini), [Gourav Kumar](https://github.com/gms101), [Amos Aikman](https://github.com/amosaikman), [Ankush Kundaliya](https://github.com/akundaliya), [David Weselowski](https://github.com/j7zAhU), [Peter Robinson](https://github.com/Peter4137), [Seth Stokes](https://github.com/sT0v), [Louis Maddox](https://github.com/lmmx), [Stephen Bias](https://github.com/s-ducks), [Anup Joseph](https://github.com/AnupJoseph), [Jan Hurst](https://github.com/janhurst), [Flavia Santos](https://github.com/flaviassantos), [Nicolas Huray](https://github.com/nhuray), [Manabu Niseki](https://github.com/ninoseki), [Kyle Pounder](https://github.com/kpounder), [Alex Bustos](https://github.com/bustosalex1), [Andy Day](https://github.com/adayNU), [Alexander Cai](https://github.com/adzcai), [Nils M\u00fcller-Wendt](https://github.com/ChronoJon), [Paul Larsen](https://github.com/munichpavel), [\nKemal Eren](https://github.com/kemaleren), [Jernej Frank](https://github.com/jernejfrank), [Noah Ridge](https://github.com/noahridge)\n\n# \ud83c\udf93 Citations\nWe'd appreciate citing Hamilton by referencing one of the following:\n\n```bibtex\n@inproceedings{DBLP:conf/vldb/KrawczykI22,\n title = {Hamilton: a modular open source declarative paradigm for high level\n modeling of dataflows},\n author = {Stefan Krawczyk and Elijah ben Izzy},\n editor = {Satyanarayana R. Valluri and Mohamed Za{\\\"{\\i}}t},\n booktitle = {1st International Workshop on Composable Data Management Systems,\n CDMS@VLDB 2022, Sydney, Australia, September 9, 2022},\n year = {2022},\n url = {https://cdmsworkshop.github.io/2022/Proceedings/ShortPapers/Paper6\\_StefanKrawczyk.pdf},\n timestamp = {Wed, 19 Oct 2022 16:20:48 +0200},\n biburl = {https://dblp.org/rec/conf/vldb/KrawczykI22.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n```bibtex\n@inproceedings{CEURWS:conf/vldb/KrawczykIQ22,\n title = {Hamilton: enabling software engineering best practices for data transformations via generalized dataflow graphs},\n author = {Stefan Krawczyk and Elijah ben Izzy and Danielle Quinn},\n editor = {Cinzia Cappiello and Sandra Geisler and Maria-Esther Vidal},\n booktitle = {1st International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022)},\n pages = {41--50},\n url = {https://ceur-ws.org/Vol-3306/paper5.pdf},\n year = {2022}\n}\n```\n\n# \ud83d\udcda Libraries built on / for Hamilton\n* [Hypster](https://github.com/gilad-rubin/hypster) - hyperparameter management\n* [DSP Decision Engine](https://github.com/capitec/dsp-decision-engine) - decision trees\n* [NaturF](https://github.com/IMMM-SFA/naturf/) - library for data transformation for weather forecasting\n* [WrenAI](https://github.com/Canner/WrenAI) - RAG\n* [FlowerPower](https://github.com/legout/flowerpower/) - Scheduler for Hamilton\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Hamilton, the micro-framework for creating dataframes.",
"version": "1.86.1",
"project_urls": {
"changelog": "https://github.com/DAGWorks-Inc/hamilton/releases",
"documentation": "https://hamilton.dagworks.io/en/latest/",
"homepage": "https://www.tryhamilton.dev/",
"issues": "https://github.com/dagworks-inc/hamilton/issues",
"slack": "https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g",
"source": "https://github.com/dagworks-inc/hamilton"
},
"split_keywords": [
"hamilton"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "51db033c34a0d3f291d343c076a30a4504f98dbad8172ed267434aabaa9fa812",
"md5": "bd401d9d5ab97835f399e09d44a84ad1",
"sha256": "57751476d5bce1f3dbe5e1391cf0d4b90ba6674adb33d913aa18440e2377cd5d"
},
"downloads": -1,
"filename": "sf_hamilton-1.86.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bd401d9d5ab97835f399e09d44a84ad1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 400468,
"upload_time": "2025-01-04T05:28:50",
"upload_time_iso_8601": "2025-01-04T05:28:50.433666Z",
"url": "https://files.pythonhosted.org/packages/51/db/033c34a0d3f291d343c076a30a4504f98dbad8172ed267434aabaa9fa812/sf_hamilton-1.86.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ce3b9f5bfb40decef54b2b338b27e6ff560d5fcceddefc336eb729f09102fb9b",
"md5": "720a8275c2b218093ec1738354a51f83",
"sha256": "b435ff63b3bb0517161f2708d12dd0a96fdcb8afc38fbbc5caf2bbdaff366c6d"
},
"downloads": -1,
"filename": "sf_hamilton-1.86.1.tar.gz",
"has_sig": false,
"md5_digest": "720a8275c2b218093ec1738354a51f83",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 509248,
"upload_time": "2025-01-04T05:28:54",
"upload_time_iso_8601": "2025-01-04T05:28:54.622249Z",
"url": "https://files.pythonhosted.org/packages/ce/3b/9f5bfb40decef54b2b338b27e6ff560d5fcceddefc336eb729f09102fb9b/sf_hamilton-1.86.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-04 05:28:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DAGWorks-Inc",
"github_project": "hamilton",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"circle": true,
"lcname": "sf-hamilton"
}