azure-data-factory-testing-framework

Name	azure-data-factory-testing-framework JSON
Version	0.0.0a186 JSON
	download
home_page
Summary	A unit test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.
upload_time	2023-12-01 19:06:27
maintainer
docs_url	None
author	Arjen Kroezen
requires_python	>=3.9,<4.0
license	MIT
keywords	fabric datafactory unit-testing functional-testing azure
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Data Factory - Unit Testing Framework

A unit test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.

Supporting currently:
* [Fabric Data Factory](https://learn.microsoft.com/en-us/fabric/data-factory/)
* [Azure Data Factory v2](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?tabs=data-factory)

Planned:
* [Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext&tabs=data-factory/)


## Disclaimer

This unit test framework is not officially supported. It is currently in experimental state and has not been tested with every single data factory resource. It should support all activities out-of-the-box, but has not been thoroughly tested, please report any issues in the issues section and include an example of the pipeline that is not working as expected.

If there's a lot of interest in this framework, then we will continue to improve it and move it to a production ready state. 

## Features

Goal: Validate that the evaluated pipeline configuration with its expressions are behaving as expected on runtime.

1. Evaluate expressions with their functions and arguments instantly by using the framework's internal expression parser.
2. Test a pipeline or activity against any state to assert expected outcome. State can be configured with pipeline parameters, global parameters, variables and activity outputs.
3. Simulate a pipeline run and evaluate the execution flow and outcome of each activity.
4. Dynamically supports all activity types with all their attributes.

> Pipelines and activities are not executed on any Data Factory environment, but the evaluation of the pipeline configuration is validated locally. This is different from the "validation" functionality present in the UI, which only validates the syntax of the pipeline configuration.


## Why

Data Factory does not support unit testing out of the box. The only way to validate your changes is through manual testing or running e2e tests against a deployed data factory. These tests are great to have, but miss the following benefits that unit tests, like using this unit test framework, provides:

* Shift left with immediate feedback on changes - Evaluate any individual data factory resource (pipelines, activities, triggers, datasets, linkedServices etc..), including (complex) expressions
* Allows testing individual resources (e.g. activity) for many different input values to cover more scenarios.
* Less issues in production - due to the fast nature of writing and running unit tests, you will write more tests in less time and therefore have a higher test coverage. This means more confidence in new changes, less risks in breaking existing features (regression tests) and thus far less issues in production.

> Even though Data Factory is UI-driven and writing unit tests might not be in the nature of it. How can you be confident that your changes will work as expected, and existing pipelines will not break, without writing unit tests?

## Getting started

1. Set up an empty Python project with your favorite testing library
2. Install the package using your preferred package manager:
   * Pip: `pip install azure-data-factory-testing-framework`
   * Poetry: `poetry add azure-data-factory-testing-framework`
3. Start writing tests

## Features - Examples

The samples seen below is the _only_ code that you need to write! The framework will take care of the rest. 

1. Evaluate activities (e.g. a WebActivity that calls Azure Batch API)

    ```python
    # Arrange
    activity: Activity = pipeline.get_activity_by_name("Trigger Azure Batch Job")
    state = PipelineRunState(
        parameters=[
            RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
            RunParameter(RunParameterType.Pipeline, "JobId", "123"),
        ],
        variables=[
            PipelineRunVariable("JobName", "Job-123"),
        ])
    state.add_activity_result("Get version", DependencyCondition.SUCCEEDED, {"Version": "version1"})

    # Act
    activity.evaluate(state)

    # Assert
    assert "https://example.com/jobs" == activity.type_properties["url"].value
    assert "POST" == activity.type_properties["method"].value
    assert "{ \n    \"JobId\": \"123\",\n    \"JobName\": \"Job-123\",\n    \"Version\": \"version1\",\n}" == activity.type_properties["body"].value
    ```
   
2. Evaluate Pipelines and test the flow of activities given a specific input

    ```python
    # Arrange
    pipeline: PipelineResource = test_framework.repository.get_pipeline_by_name("batch_job")

    # Runs the pipeline with the provided parameters
    activities = test_framework.evaluate_pipeline(pipeline, [
        RunParameter(RunParameterType.Pipeline, "JobId", "123"),
        RunParameter(RunParameterType.Pipeline, "ContainerName", "test-container"),
        RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
    ])

    set_variable_activity: Activity = next(activities)
    assert set_variable_activity is not None
    assert "Set JobName" == set_variable_activity.name
    assert "JobName" == activity.type_properties["variableName"]
    assert "Job-123" == activity.type_properties["value"].value

    get_version_activity = next(activities)
    assert get_version_activity is not None
    assert "Get version" == get_version_activity.name
    assert "https://example.com/version" == get_version_activity.type_properties["url"].value
    assert "GET" == get_version_activity.type_properties["method"]
    get_version_activity.set_result(DependencyCondition.Succeeded,{"Version": "version1"})

    create_batch_activity = next(activities)
    assert create_batch_activity is not None
    assert "Trigger Azure Batch Job" == create_batch_activity.name
    assert "https://example.com/jobs" == create_batch_activity.type_properties["url"].value
    assert "POST" == create_batch_activity.type_properties["method"]
    assert "{ \n    \"JobId\": \"123\",\n    \"JobName\": \"Job-123\",\n    \"Version\": \"version1\",\n}" == create_batch_activity.type_properties["body"].value

    with pytest.raises(StopIteration):
        next(activities)
    ```
   
> See Examples folder for more samples

## Registering missing expression functions

As the framework is interpreting expressions containing functions, these functions need to be implemented in Python. The goal is to start supporting more and more functions, but if a function is not supported, then the following code can be used to register a missing function:

```python
   FunctionsRepository.register("concat", lambda arguments: "".join(arguments))
   FunctionsRepository.register("trim", lambda text, trim_argument: text.strip(trim_argument[0]))
``` 

On runtime when evaluating expressions, the framework will try to find a matching function and assert the expected amount of arguments are supplied. If no matching function is found, then an exception will be thrown.

> Feel free to add a pull request with your own custom functions, so that they can be added to the framework and enjoyed by everyone.

## Tips

1. After parsing a data factory resource file, you can use the debugger to easily discover which classes are actually initialized so that you can cast them to the correct type.

## Recommended development workflow for Azure Data Factory v2

* Use ADF Git integration
* Use UI to create feature branch, build initial pipeline and save to feature branch
* Pull feature branch locally
* Start writing tests unit and functional tests, run them locally for immediate feedback and fix bugs
* Push changes to feature branch
* Test the new features manually through the UI in sandbox environment
* Create PR, which will run the tests in the CI pipeline
* Approve PR
* Merge to main and start deploying to dev/test/prd environments
* Run e2e tests after each deployment to validate all happy flows work on that specific environment

## Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "azure-data-factory-testing-framework",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "fabric,datafactory,unit-testing,functional-testing,azure",
    "author": "Arjen Kroezen",
    "author_email": "arjenkroezen@microsoft.com",
    "download_url": "https://files.pythonhosted.org/packages/1b/de/3465898d44b4d994eb03b81d04a2c8b3cf1ef296117c32ae42758613d00d/azure_data_factory_testing_framework-0.0.0a186.tar.gz",
    "platform": null,
    "description": "# Data Factory - Unit Testing Framework\n\nA unit test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.\n\nSupporting currently:\n* [Fabric Data Factory](https://learn.microsoft.com/en-us/fabric/data-factory/)\n* [Azure Data Factory v2](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?tabs=data-factory)\n\nPlanned:\n* [Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext&tabs=data-factory/)\n\n\n## Disclaimer\n\nThis unit test framework is not officially supported. It is currently in experimental state and has not been tested with every single data factory resource. It should support all activities out-of-the-box, but has not been thoroughly tested, please report any issues in the issues section and include an example of the pipeline that is not working as expected.\n\nIf there's a lot of interest in this framework, then we will continue to improve it and move it to a production ready state. \n\n## Features\n\nGoal: Validate that the evaluated pipeline configuration with its expressions are behaving as expected on runtime.\n\n1. Evaluate expressions with their functions and arguments instantly by using the framework's internal expression parser.\n2. Test a pipeline or activity against any state to assert expected outcome. State can be configured with pipeline parameters, global parameters, variables and activity outputs.\n3. Simulate a pipeline run and evaluate the execution flow and outcome of each activity.\n4. Dynamically supports all activity types with all their attributes.\n\n> Pipelines and activities are not executed on any Data Factory environment, but the evaluation of the pipeline configuration is validated locally. This is different from the \"validation\" functionality present in the UI, which only validates the syntax of the pipeline configuration.\n\n\n## Why\n\nData Factory does not support unit testing out of the box. The only way to validate your changes is through manual testing or running e2e tests against a deployed data factory. These tests are great to have, but miss the following benefits that unit tests, like using this unit test framework, provides:\n\n* Shift left with immediate feedback on changes - Evaluate any individual data factory resource (pipelines, activities, triggers, datasets, linkedServices etc..), including (complex) expressions\n* Allows testing individual resources (e.g. activity) for many different input values to cover more scenarios.\n* Less issues in production - due to the fast nature of writing and running unit tests, you will write more tests in less time and therefore have a higher test coverage. This means more confidence in new changes, less risks in breaking existing features (regression tests) and thus far less issues in production.\n\n> Even though Data Factory is UI-driven and writing unit tests might not be in the nature of it. How can you be confident that your changes will work as expected, and existing pipelines will not break, without writing unit tests?\n\n## Getting started\n\n1. Set up an empty Python project with your favorite testing library\n2. Install the package using your preferred package manager:\n   * Pip: `pip install azure-data-factory-testing-framework`\n   * Poetry: `poetry add azure-data-factory-testing-framework`\n3. Start writing tests\n\n## Features - Examples\n\nThe samples seen below is the _only_ code that you need to write! The framework will take care of the rest. \n\n1. Evaluate activities (e.g. a WebActivity that calls Azure Batch API)\n\n    ```python\n    # Arrange\n    activity: Activity = pipeline.get_activity_by_name(\"Trigger Azure Batch Job\")\n    state = PipelineRunState(\n        parameters=[\n            RunParameter(RunParameterType.Global, \"BaseUrl\", \"https://example.com\"),\n            RunParameter(RunParameterType.Pipeline, \"JobId\", \"123\"),\n        ],\n        variables=[\n            PipelineRunVariable(\"JobName\", \"Job-123\"),\n        ])\n    state.add_activity_result(\"Get version\", DependencyCondition.SUCCEEDED, {\"Version\": \"version1\"})\n\n    # Act\n    activity.evaluate(state)\n\n    # Assert\n    assert \"https://example.com/jobs\" == activity.type_properties[\"url\"].value\n    assert \"POST\" == activity.type_properties[\"method\"].value\n    assert \"{ \\n    \\\"JobId\\\": \\\"123\\\",\\n    \\\"JobName\\\": \\\"Job-123\\\",\\n    \\\"Version\\\": \\\"version1\\\",\\n}\" == activity.type_properties[\"body\"].value\n    ```\n   \n2. Evaluate Pipelines and test the flow of activities given a specific input\n\n    ```python\n    # Arrange\n    pipeline: PipelineResource = test_framework.repository.get_pipeline_by_name(\"batch_job\")\n\n    # Runs the pipeline with the provided parameters\n    activities = test_framework.evaluate_pipeline(pipeline, [\n        RunParameter(RunParameterType.Pipeline, \"JobId\", \"123\"),\n        RunParameter(RunParameterType.Pipeline, \"ContainerName\", \"test-container\"),\n        RunParameter(RunParameterType.Global, \"BaseUrl\", \"https://example.com\"),\n    ])\n\n    set_variable_activity: Activity = next(activities)\n    assert set_variable_activity is not None\n    assert \"Set JobName\" == set_variable_activity.name\n    assert \"JobName\" == activity.type_properties[\"variableName\"]\n    assert \"Job-123\" == activity.type_properties[\"value\"].value\n\n    get_version_activity = next(activities)\n    assert get_version_activity is not None\n    assert \"Get version\" == get_version_activity.name\n    assert \"https://example.com/version\" == get_version_activity.type_properties[\"url\"].value\n    assert \"GET\" == get_version_activity.type_properties[\"method\"]\n    get_version_activity.set_result(DependencyCondition.Succeeded,{\"Version\": \"version1\"})\n\n    create_batch_activity = next(activities)\n    assert create_batch_activity is not None\n    assert \"Trigger Azure Batch Job\" == create_batch_activity.name\n    assert \"https://example.com/jobs\" == create_batch_activity.type_properties[\"url\"].value\n    assert \"POST\" == create_batch_activity.type_properties[\"method\"]\n    assert \"{ \\n    \\\"JobId\\\": \\\"123\\\",\\n    \\\"JobName\\\": \\\"Job-123\\\",\\n    \\\"Version\\\": \\\"version1\\\",\\n}\" == create_batch_activity.type_properties[\"body\"].value\n\n    with pytest.raises(StopIteration):\n        next(activities)\n    ```\n   \n> See Examples folder for more samples\n\n## Registering missing expression functions\n\nAs the framework is interpreting expressions containing functions, these functions need to be implemented in Python. The goal is to start supporting more and more functions, but if a function is not supported, then the following code can be used to register a missing function:\n\n```python\n   FunctionsRepository.register(\"concat\", lambda arguments: \"\".join(arguments))\n   FunctionsRepository.register(\"trim\", lambda text, trim_argument: text.strip(trim_argument[0]))\n``` \n\nOn runtime when evaluating expressions, the framework will try to find a matching function and assert the expected amount of arguments are supplied. If no matching function is found, then an exception will be thrown.\n\n> Feel free to add a pull request with your own custom functions, so that they can be added to the framework and enjoyed by everyone.\n\n## Tips\n\n1. After parsing a data factory resource file, you can use the debugger to easily discover which classes are actually initialized so that you can cast them to the correct type.\n\n## Recommended development workflow for Azure Data Factory v2\n\n* Use ADF Git integration\n* Use UI to create feature branch, build initial pipeline and save to feature branch\n* Pull feature branch locally\n* Start writing tests unit and functional tests, run them locally for immediate feedback and fix bugs\n* Push changes to feature branch\n* Test the new features manually through the UI in sandbox environment\n* Create PR, which will run the tests in the CI pipeline\n* Approve PR\n* Merge to main and start deploying to dev/test/prd environments\n* Run e2e tests after each deployment to validate all happy flows work on that specific environment\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft\ntrademarks or logos is subject to and must follow\n[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A unit test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.",
    "version": "0.0.0a186",
    "project_urls": null,
    "split_keywords": [
        "fabric",
        "datafactory",
        "unit-testing",
        "functional-testing",
        "azure"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10003f5636f7cbe29a22e8bc1cbbc225b319fcf0d43c171c149fcb29421516ac",
                "md5": "8dda84ee074561113a9a15ef1c693958",
                "sha256": "e23a3dc7cbd4d009a502f07612d6e73c2dbad24f26f4664cb5f3a900238ccac5"
            },
            "downloads": -1,
            "filename": "azure_data_factory_testing_framework-0.0.0a186-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8dda84ee074561113a9a15ef1c693958",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 58053,
            "upload_time": "2023-12-01T19:06:25",
            "upload_time_iso_8601": "2023-12-01T19:06:25.253206Z",
            "url": "https://files.pythonhosted.org/packages/10/00/3f5636f7cbe29a22e8bc1cbbc225b319fcf0d43c171c149fcb29421516ac/azure_data_factory_testing_framework-0.0.0a186-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1bde3465898d44b4d994eb03b81d04a2c8b3cf1ef296117c32ae42758613d00d",
                "md5": "0ff91ff240962f93ff5daf9ea0a26417",
                "sha256": "06064dfd699f05e17e6eac27c0d1cb3019de98020897c8cbc4ea817f2a5443d7"
            },
            "downloads": -1,
            "filename": "azure_data_factory_testing_framework-0.0.0a186.tar.gz",
            "has_sig": false,
            "md5_digest": "0ff91ff240962f93ff5daf9ea0a26417",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 31872,
            "upload_time": "2023-12-01T19:06:27",
            "upload_time_iso_8601": "2023-12-01T19:06:27.252699Z",
            "url": "https://files.pythonhosted.org/packages/1b/de/3465898d44b4d994eb03b81d04a2c8b3cf1ef296117c32ae42758613d00d/azure_data_factory_testing_framework-0.0.0a186.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-01 19:06:27",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "azure-data-factory-testing-framework"
}

Arjen Kroezen