apache-airflow-provider-transfers


Nameapache-airflow-provider-transfers JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryThis project contains the Universal Transfer Operator which can transfer all the data that could be read from the source Dataset into the destination Dataset. From a DAG author standpoint, all transfers would be performed through the invocation of only the Universal Transfer Operator.
upload_time2023-03-28 16:53:57
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords airflow provider astronomer sql decorator task flow elt etl dag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
  Universal Transfer Operator
</h1>
  <h3 align="center">
transfers made easy<br><br>
</h3>


[![CI](https://github.com/astronomer/apache-airflow-provider-transfers/actions/workflows/ci-uto.yaml/badge.svg)](https://github.com/astronomer/apache-airflow-provider-transfers)

The **Universal Transfer Operator** simplifies how users transfer data from a source to a destination using [Apache Airflow](https://airflow.apache.org/). It offers a consistent agnostic interface, improving the users' experience so they do not need to use explicitly specific providers or operators.

At the moment, it supports transferring data between [file locations](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32) and [databases](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74) (in both directions) and cross-database transfers.

This project is maintained by [Astronomer](https://astronomer.io).

## Installation

```
pip install apache-airflow-provider-transfers
```


## Example DAGs

Checkout the [example_dags](./example_dags) folder for examples of how the UniversalTransfeOperator can be used.


## How Universal Transfer Operator Works

![Approach](./docs/images/approach.png)

With Universal Transfer Operator, users can perform data transfers using the following transfer modes:

1. Non-native
2. Native
3. Third-party


### Non-native transfer

Non-native transfers rely on transferring the data through the Airflow worker node. Chunking is applied where possible. This method can be suitable for datasets smaller than 2GB, depending on the source and target. The performance of this method is highly dependent upon the worker's memory, disk, processor and network configuration.

Internally, the steps involved are:
- Retrieve the dataset data in chunks from dataset storage to the worker node.
- Send data to the cloud dataset from the worker node.

Following is an example of non-native transfers between Google cloud storage and Sqlite:

https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_universal_transfer_operator.py#L37-L41


### Improving bottlenecks by using native transfer

An alternative to using the Non-native transfer method is the native method. The native transfers rely on mechanisms and tools offered by the data source or data target providers. In the case of moving from object storage to a Snowflake database, for instance, a native transfer consists in using the built-in ``COPY INTO`` command. When loading data from S3 to BigQuery, the Universal Transfer Operator uses the GCP  Storage Transfer Service.

The benefit of native transfers is that they will likely perform better for larger datasets (2 GB) and do not rely on the Airflow worker node hardware configuration. With this approach, the Airflow worker nodes are used as orchestrators and do not perform the transfer. The speed depends exclusively on the service being used and the bandwidth between the source and destination.

Steps:
- Request destination dataset to ingest data from the source dataset.
- Destination dataset requests source dataset for data.

> **_NOTE:_**
 The Native method implementation is in progress and will be available in future releases.


### Transfer using a third-party tool
The Universal Transfer Operator can also offer an interface to generic third-party services that transfer data, similar to Fivetran.

Here is an example of how to use Fivetran for transfers:

https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_dag_fivetran.py#L52-L58




## Supported technologies

- Databases supported:

    https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74

- File store supported:

    https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32


## Documentation

The documentation is a work in progress -- we aim to follow the [Diátaxis](https://diataxis.fr/) system.

- **[Reference guide](https://apache-airflow-provider-transfers.readthedocs.io/)**: Commands, modules, classes and methods

- **[Getting Started Tutorial](https://apache-airflow-provider-transfers.readthedocs.io/en/latest/getting-started/GETTING_STARTED.html)**: A hands-on introduction to the Universal Transfer Operator


## Changelog

The **Universal Transfer Operator** follows semantic versioning for releases. Check the [changelog](/docs/CHANGELOG.md) for the latest changes.


## Release management

See [Managing Releases](/docs/development/RELEASE.md) to learn more about our release philosophy and steps.


## Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the [Contribution Guideline](/docs/development/CONTRIBUTING.md) for a detailed overview of how to contribute.

Contributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).


## License

[Apache Licence 2.0](LICENSE)


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "apache-airflow-provider-transfers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "airflow,provider,astronomer,sql,decorator,task flow,elt,etl,dag",
    "author": "",
    "author_email": "Astronomer <humans@astronomer.io>",
    "download_url": "https://files.pythonhosted.org/packages/e5/75/a351df3dec0b824113fd0fd201e16b1bacd552ad45a4c220957d86071e60/apache-airflow-provider-transfers-0.1.0.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  Universal Transfer Operator\n</h1>\n  <h3 align=\"center\">\ntransfers made easy<br><br>\n</h3>\n\n\n[![CI](https://github.com/astronomer/apache-airflow-provider-transfers/actions/workflows/ci-uto.yaml/badge.svg)](https://github.com/astronomer/apache-airflow-provider-transfers)\n\nThe **Universal Transfer Operator** simplifies how users transfer data from a source to a destination using [Apache Airflow](https://airflow.apache.org/). It offers a consistent agnostic interface, improving the users' experience so they do not need to use explicitly specific providers or operators.\n\nAt the moment, it supports transferring data between [file locations](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32) and [databases](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74) (in both directions) and cross-database transfers.\n\nThis project is maintained by [Astronomer](https://astronomer.io).\n\n## Installation\n\n```\npip install apache-airflow-provider-transfers\n```\n\n\n## Example DAGs\n\nCheckout the [example_dags](./example_dags) folder for examples of how the UniversalTransfeOperator can be used.\n\n\n## How Universal Transfer Operator Works\n\n![Approach](./docs/images/approach.png)\n\nWith Universal Transfer Operator, users can perform data transfers using the following transfer modes:\n\n1. Non-native\n2. Native\n3. Third-party\n\n\n### Non-native transfer\n\nNon-native transfers rely on transferring the data through the Airflow worker node. Chunking is applied where possible. This method can be suitable for datasets smaller than 2GB, depending on the source and target. The performance of this method is highly dependent upon the worker's memory, disk, processor and network configuration.\n\nInternally, the steps involved are:\n- Retrieve the dataset data in chunks from dataset storage to the worker node.\n- Send data to the cloud dataset from the worker node.\n\nFollowing is an example of non-native transfers between Google cloud storage and Sqlite:\n\nhttps://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_universal_transfer_operator.py#L37-L41\n\n\n### Improving bottlenecks by using native transfer\n\nAn alternative to using the Non-native transfer method is the native method. The native transfers rely on mechanisms and tools offered by the data source or data target providers. In the case of moving from object storage to a Snowflake database, for instance, a native transfer consists in using the built-in ``COPY INTO`` command. When loading data from S3 to BigQuery, the Universal Transfer Operator uses the GCP  Storage Transfer Service.\n\nThe benefit of native transfers is that they will likely perform better for larger datasets (2 GB) and do not rely on the Airflow worker node hardware configuration. With this approach, the Airflow worker nodes are used as orchestrators and do not perform the transfer. The speed depends exclusively on the service being used and the bandwidth between the source and destination.\n\nSteps:\n- Request destination dataset to ingest data from the source dataset.\n- Destination dataset requests source dataset for data.\n\n> **_NOTE:_**\n The Native method implementation is in progress and will be available in future releases.\n\n\n### Transfer using a third-party tool\nThe Universal Transfer Operator can also offer an interface to generic third-party services that transfer data, similar to Fivetran.\n\nHere is an example of how to use Fivetran for transfers:\n\nhttps://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_dag_fivetran.py#L52-L58\n\n\n\n\n## Supported technologies\n\n- Databases supported:\n\n    https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74\n\n- File store supported:\n\n    https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32\n\n\n## Documentation\n\nThe documentation is a work in progress -- we aim to follow the [Di\u00e1taxis](https://diataxis.fr/) system.\n\n- **[Reference guide](https://apache-airflow-provider-transfers.readthedocs.io/)**: Commands, modules, classes and methods\n\n- **[Getting Started Tutorial](https://apache-airflow-provider-transfers.readthedocs.io/en/latest/getting-started/GETTING_STARTED.html)**: A hands-on introduction to the Universal Transfer Operator\n\n\n## Changelog\n\nThe **Universal Transfer Operator** follows semantic versioning for releases. Check the [changelog](/docs/CHANGELOG.md) for the latest changes.\n\n\n## Release management\n\nSee [Managing Releases](/docs/development/RELEASE.md) to learn more about our release philosophy and steps.\n\n\n## Contribution guidelines\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\n\nRead the [Contribution Guideline](/docs/development/CONTRIBUTING.md) for a detailed overview of how to contribute.\n\nContributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).\n\n\n## License\n\n[Apache Licence 2.0](LICENSE)\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "This project contains the Universal Transfer Operator which can transfer all the data that could be read from the source Dataset into the destination Dataset. From a DAG author standpoint, all transfers would be performed through the invocation of only the Universal Transfer Operator.",
    "version": "0.1.0",
    "split_keywords": [
        "airflow",
        "provider",
        "astronomer",
        "sql",
        "decorator",
        "task flow",
        "elt",
        "etl",
        "dag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e257d869d65840fd3e5edb31b3fa97252db01f00b4c452d841266c8ba372c1b",
                "md5": "c3623e1ae8739e3ac6b175f1484cf08b",
                "sha256": "97eee03c6fbadffe68c24956dee260311c5608affbb48c96c16475263dba3deb"
            },
            "downloads": -1,
            "filename": "apache_airflow_provider_transfers-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c3623e1ae8739e3ac6b175f1484cf08b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 62311,
            "upload_time": "2023-03-28T16:53:56",
            "upload_time_iso_8601": "2023-03-28T16:53:56.326645Z",
            "url": "https://files.pythonhosted.org/packages/1e/25/7d869d65840fd3e5edb31b3fa97252db01f00b4c452d841266c8ba372c1b/apache_airflow_provider_transfers-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e575a351df3dec0b824113fd0fd201e16b1bacd552ad45a4c220957d86071e60",
                "md5": "8a92ac4e3bdbf1d4e4ddf77e303761af",
                "sha256": "7b949719d371ff06996fd450073ee62c17687601a063cbfd955f595c127c662f"
            },
            "downloads": -1,
            "filename": "apache-airflow-provider-transfers-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8a92ac4e3bdbf1d4e4ddf77e303761af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 40682,
            "upload_time": "2023-03-28T16:53:57",
            "upload_time_iso_8601": "2023-03-28T16:53:57.935998Z",
            "url": "https://files.pythonhosted.org/packages/e5/75/a351df3dec0b824113fd0fd201e16b1bacd552ad45a4c220957d86071e60/apache-airflow-provider-transfers-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-28 16:53:57",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "apache-airflow-provider-transfers"
}
        
Elapsed time: 0.28120s