# Airflow Dag Parse Benchmarking
**Stop creating bad DAGs!**
Use this tool to measure and compare the parse time of your DAGs, identify bottlenecks, and optimize your Airflow environment for better performance.
# Contents
- [How it works](#how)
- [Installation](#installation)
- [Install your Airflow dependencies](#install-dependencies)
- [Configure your Airflow Variables](#configure-variables)
- [Usage](#usage)
- [Additional Options](#options)
- [Roadmap](#roadmap)
- [Contribute](#contribute)
# How It Works <a id="how"></a>
Retrieving parse metrics from an Airflow cluster is straightforward, but measuring the effectiveness of code optimizations can be tedious. Each code change requires redeploying the Python file to your cloud provider, waiting for the DAG to be parsed, and then extracting a new report — a slow and time-consuming process.
This tool simplifies the process of measuring and comparing DAG parse times. It uses the same parse method as Airflow (from the Airflow repository) to measure the time taken to parse your DAGs locally, storing results for future comparisons.
# Installation <a id="installation"></a>
It's recommended to use a [virtualenv](https://docs.python.org/3/library/venv.html) to avoid library conflicts. Once set up, you can install the package by running the following command:
```bash
pip install airflow-parse-bench
```
## Install your Airflow dependencies <a id="install-dependencies"></a>
The command above installs only the essential library dependencies (Airflow and Airflow providers). You’ll need to manually install any additional libraries that your DAGs depend on.
For example, if a DAG uses ```boto3``` to interact with AWS, ensure that boto3 is installed in your environment. Otherwise, you'll encounter parse errors.
## Configure your Airflow Variables <a id="configure-variables"></a>
If your DAGs use **Airflow Variables**, you must define them locally as well. Use placeholder values, as the actual values aren't required for parsing purposes.
To setup Airflow Variables locally, you can use the following command:
```bash
airflow variables set MY_VARIABLE 'ANY TEST VALUE'
```
Without this, you'll encounter an error like:
```bash
error: 'Variable MY_VARIABLE does not exist'
```
# Usage <a id="usage"></a>
To measure the parse time of a single Python file, just run:
```bash
airflow-parse-bench --path your_path/dag_test.py
```
The output will look like this:

The result table includes the following columns:
- **Filename**: The name of the Python module containing the DAG. This unique name is the key to store DAG information.
- **Current Parse Time**: The time (in seconds) taken to parse the DAG.
- **Previous Parse Time**: The parse time from the previous run.
- **Difference**: The difference between the current and previous parse times.
- **Best Parse Time**: The best parse time recorded for the DAG.
You can also measure the parse time for all Python files in a directory by running:
```bash
airflow-parse-bench --path your_path/your_dag_folder
```
This time, the output table will display parse times for all Python files in the folder:

## Additional Options <a id="options"></a>
The library supports some additional arguments to customize the results. To see all available options, run:
```bash
airflow-parse-bench --help
```
It will display the following options:
- **--path**: The path to the Python file or directory containing the DAGs.
- **--order**: The order in which the results are displayed. You can choose between 'asc' (ascending) or 'desc' (descending).
- **--num-iterations**: The number of times to parse each DAG. The parse time will be averaged across iterations.
- **--skip-unchanged**: Skip DAGs that haven't changed since the last run.
- **--reset-db**: Clear all stored data in the local database, starting a fresh execution.
> **Note**: If a Python file has parsing errors or contains no valid DAGs, it will be excluded from the results table, and an error message will be displayed.
# Roadmap <a id="roadmap"></a>
This project is still in its early stages, and there are many improvements planned for the future. Some of the features we're considering include:
- **Cloud DAG Parsing:** Automatically download and parse DAGs from cloud providers like AWS S3 or Google Cloud Storage.
- **Parallel Parsing:** Speed up processing by parsing multiple DAGs simultaneously.
- **Support .airflowignore:** Ignore files and directories specified in the ```.airflowignore``` file.
If you’d like to suggest a feature or report a bug, please open a new issue!
# Contributing <a id="contribute"></a>
This project is open to contributions! If you want to collaborate to improve the tool, please follow these steps:
1. Open a new issue to discuss the feature or bug you want to address.
2. Once approved, fork the repository and create a new branch.
3. Implement the changes.
4. Create a pull request with a detailed description of the changes.
Raw data
{
"_id": null,
"home_page": "https://github.com/AlvaroCavalcante/airflow-parse-bench",
"name": "airflow-parse-bench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "airflow, python, python3, dag, parse, benchmark, apache, data, data-engineering, benchmarking",
"author": "Alvaro Leandro Cavalcante Carneiro",
"author_email": "alvaroleandro250@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f4/b9/5dd5a5bd42c4140481e39f8f467df12296214b85234d9cc8433ac8a676ba/airflow-parse-bench-1.0.1.tar.gz",
"platform": null,
"description": "# Airflow Dag Parse Benchmarking\n\n**Stop creating bad DAGs!**\n\nUse this tool to measure and compare the parse time of your DAGs, identify bottlenecks, and optimize your Airflow environment for better performance.\n\n# Contents\n\n- [How it works](#how)\n- [Installation](#installation)\n - [Install your Airflow dependencies](#install-dependencies)\n - [Configure your Airflow Variables](#configure-variables)\n- [Usage](#usage)\n - [Additional Options](#options)\n- [Roadmap](#roadmap)\n- [Contribute](#contribute)\n\n# How It Works <a id=\"how\"></a>\nRetrieving parse metrics from an Airflow cluster is straightforward, but measuring the effectiveness of code optimizations can be tedious. Each code change requires redeploying the Python file to your cloud provider, waiting for the DAG to be parsed, and then extracting a new report\u200a\u2014\u200aa slow and time-consuming process.\n\nThis tool simplifies the process of measuring and comparing DAG parse times. It uses the same parse method as Airflow (from the Airflow repository) to measure the time taken to parse your DAGs locally, storing results for future comparisons.\n\n# Installation <a id=\"installation\"></a>\nIt's recommended to use a [virtualenv](https://docs.python.org/3/library/venv.html) to avoid library conflicts. Once set up, you can install the package by running the following command:\n\n```bash\npip install airflow-parse-bench\n```\n\n## Install your Airflow dependencies <a id=\"install-dependencies\"></a>\nThe command above installs only the essential library dependencies (Airflow and Airflow providers). You\u2019ll need to manually install any additional libraries that your DAGs depend on.\n\nFor example, if a DAG uses ```boto3``` to interact with AWS, ensure that boto3 is installed in your environment. Otherwise, you'll encounter parse errors.\n\n## Configure your Airflow Variables <a id=\"configure-variables\"></a>\nIf your DAGs use **Airflow Variables**, you must define them locally as well. Use placeholder values, as the actual values aren't required for parsing purposes. \n\nTo setup Airflow Variables locally, you can use the following command:\n\n```bash\nairflow variables set MY_VARIABLE 'ANY TEST VALUE'\n```\nWithout this, you'll encounter an error like:\n```bash\nerror: 'Variable MY_VARIABLE does not exist'\n```\n\n# Usage <a id=\"usage\"></a>\nTo measure the parse time of a single Python file, just run:\n\n```bash\nairflow-parse-bench --path your_path/dag_test.py\n```\nThe output will look like this:\n\n\nThe result table includes the following columns:\n\n- **Filename**: The name of the Python module containing the DAG. This unique name is the key to store DAG information.\n- **Current Parse Time**: The time (in seconds) taken to parse the DAG.\n- **Previous Parse Time**: The parse time from the previous run.\n\n- **Difference**: The difference between the current and previous parse times.\n- **Best Parse Time**: The best parse time recorded for the DAG.\n\nYou can also measure the parse time for all Python files in a directory by running:\n\n```bash\nairflow-parse-bench --path your_path/your_dag_folder\n```\nThis time, the output table will display parse times for all Python files in the folder:\n\n\n## Additional Options <a id=\"options\"></a>\nThe library supports some additional arguments to customize the results. To see all available options, run:\n\n```bash\nairflow-parse-bench --help\n```\nIt will display the following options:\n- **--path**: The path to the Python file or directory containing the DAGs.\n- **--order**: The order in which the results are displayed. You can choose between 'asc' (ascending) or 'desc' (descending).\n- **--num-iterations**: The number of times to parse each DAG. The parse time will be averaged across iterations.\n- **--skip-unchanged**: Skip DAGs that haven't changed since the last run.\n- **--reset-db**: Clear all stored data in the local database, starting a fresh execution.\n\n> **Note**: If a Python file has parsing errors or contains no valid DAGs, it will be excluded from the results table, and an error message will be displayed. \n\n# Roadmap <a id=\"roadmap\"></a>\nThis project is still in its early stages, and there are many improvements planned for the future. Some of the features we're considering include:\n\n- **Cloud DAG Parsing:** Automatically download and parse DAGs from cloud providers like AWS S3 or Google Cloud Storage.\n- **Parallel Parsing:** Speed up processing by parsing multiple DAGs simultaneously.\n- **Support .airflowignore:** Ignore files and directories specified in the ```.airflowignore``` file.\n\nIf you\u2019d like to suggest a feature or report a bug, please open a new issue!\n\n# Contributing <a id=\"contribute\"></a>\nThis project is open to contributions! If you want to collaborate to improve the tool, please follow these steps:\n\n1. Open a new issue to discuss the feature or bug you want to address.\n2. Once approved, fork the repository and create a new branch.\n3. Implement the changes.\n4. Create a pull request with a detailed description of the changes.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Easily measure and compare your Airflow DAGs' parse time.",
"version": "1.0.1",
"project_urls": {
"Download": "https://github.com/AlvaroCavalcante/airflow-parse-bench",
"Homepage": "https://github.com/AlvaroCavalcante/airflow-parse-bench"
},
"split_keywords": [
"airflow",
" python",
" python3",
" dag",
" parse",
" benchmark",
" apache",
" data",
" data-engineering",
" benchmarking"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ef81ba060849bdc42d77dcf6f7c2a586272af8f539a483520c9101135ea13f29",
"md5": "49a5a1b6dc4a8498cc1dd39a60e8c2ec",
"sha256": "1efd8144ad2688b83bfcdd88199c7d2a38df3d9527bd8749ff063a523cda833f"
},
"downloads": -1,
"filename": "airflow_parse_bench-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "49a5a1b6dc4a8498cc1dd39a60e8c2ec",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12181,
"upload_time": "2025-01-26T03:39:21",
"upload_time_iso_8601": "2025-01-26T03:39:21.989000Z",
"url": "https://files.pythonhosted.org/packages/ef/81/ba060849bdc42d77dcf6f7c2a586272af8f539a483520c9101135ea13f29/airflow_parse_bench-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f4b95dd5a5bd42c4140481e39f8f467df12296214b85234d9cc8433ac8a676ba",
"md5": "f3c3bede9c910bfadad7e02fa6a04edd",
"sha256": "56f06bfc24dcf2f08fbe06d8abe4bb8cf4d797847245705ccf1320f42682b374"
},
"downloads": -1,
"filename": "airflow-parse-bench-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f3c3bede9c910bfadad7e02fa6a04edd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11367,
"upload_time": "2025-01-26T03:39:23",
"upload_time_iso_8601": "2025-01-26T03:39:23.488166Z",
"url": "https://files.pythonhosted.org/packages/f4/b9/5dd5a5bd42c4140481e39f8f467df12296214b85234d9cc8433ac8a676ba/airflow-parse-bench-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-26 03:39:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AlvaroCavalcante",
"github_project": "airflow-parse-bench",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "airflow-parse-bench"
}