astro-sdk-python


Nameastro-sdk-python JSON
Version 1.8.1 PyPI version JSON
download
home_pageNone
SummaryAstro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
upload_time2024-06-21 09:52:32
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords airflow provider astronomer sql decorator task flow elt etl dag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
  astro
</h1>
  <h3 align="center">
  workflows made easy<br><br>
</h3>

[![Python versions](https://img.shields.io/pypi/pyversions/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)
[![License](https://img.shields.io/pypi/l/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)
[![Development Status](https://img.shields.io/pypi/status/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)
[![PyPI downloads](https://img.shields.io/pypi/dm/astro-sdk-python.svg)](https://pypistats.org/packages/astro-sdk-python)
[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-sdk)](https://github.com/astronomer/astro-sdk)
[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-sdk)](https://github.com/astronomer/astro-sdk)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/astronomer/astro-sdk/main.svg)](https://results.pre-commit.ci/latest/github/astronomer/astro-sdk/main)
[![CI](https://github.com/astronomer/astro-sdk/actions/workflows/ci-python-sdk.yaml/badge.svg)](https://github.com/astronomer/astro-sdk)
[![codecov](https://codecov.io/gh/astronomer/astro-sdk/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-sdk)

**Astro Python SDK** is a Python SDK for rapid development of extract, transform, and load workflows in [Apache Airflow](https://airflow.apache.org/). It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by [Astronomer](https://astronomer.io).

## Prerequisites

- Apache Airflow >= 2.1.0.

## Install

The Astro Python SDK is available at [PyPI](https://pypi.org/project/astro-sdk-python/). Use the standard Python
[installation tools](https://packaging.python.org/en/latest/tutorials/installing-packages/).

To install a cloud-agnostic version of the SDK, run:

```shell
pip install astro-sdk-python
```

You can also install dependencies for using the SDK with popular cloud providers:

```shell
pip install astro-sdk-python[amazon,google,snowflake,postgres]
```


## Quickstart
1. Ensure that your Airflow environment is set up correctly by running the following commands:

    ```shell
    export AIRFLOW_HOME=`pwd`
    export AIRFLOW__CORE__XCOM_BACKEND=astro.custom_backend.astro_custom_backend.AstroCustomXcomBackend
    export AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV=true
    airflow db init
    ```
   > **Note:** `AIRFLOW__CORE__ENABLE_XCOM_PICKLING` no longer needs to be enabled for `astro-sdk-python`. This functionality is now deprecated as our custom xcom backend handles serialization.

    The `AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV` should only be used for local development. The [XCom backend docs](https://astro-sdk-python.readthedocs.io/en/latest/guides/xcom_backend.html#airflow_xcom_backend) give further details about how to set this up in non-local environments.

    Currently, custom XCom backends are limited to data types that are json serializable. Since Dataframes are not json serializable, we need to enable XCom pickling to store dataframes.

    The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

    Read more: [enable_xcom_pickling](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#enable-xcom-pickling) and [pickle](https://docs.python.org/3/library/pickle.html#comparison-with-json):


2. Create a SQLite database for the example to run with:

    ```shell
    # The sqlite_default connection has different host for MAC vs. Linux
    export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`
    sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
    ```

3. Copy the following workflow into a file named `calculate_popular_movies.py` and add it to the `dags` directory of your Airflow project:

   https://github.com/astronomer/astro-sdk/blob/d5aa768b2d4bca72ef98f8d533fe3f99624b172f/example_dags/calculate_popular_movies.py#L1-L37

   Alternatively, you can download `calculate_popular_movies.py`
   ```shell
    curl -O https://raw.githubusercontent.com/astronomer/astro-sdk/main/python-sdk/example_dags/calculate_popular_movies.py
   ```

4. Run the example DAG:

    ```sh
    airflow dags test calculate_popular_movies `date -Iseconds`
    ```

5. Check the result of your DAG by running:

    ```shell
    sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
    ```

    You should see the following output:

    ```shell
    $ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
    Toy Story 3 (2010)|8.3
    Inside Out (2015)|8.2
    How to Train Your Dragon (2010)|8.1
    Zootopia (2016)|8.1
    How to Train Your Dragon 2 (2014)|7.9
    ```

## Supported technologies

| Databases        |
|------------------|
| Databricks Delta |
| Google BigQuery  |
| Postgres         |
| Snowflake        |
| SQLite           |
| Amazon Redshift  |
| Microsoft SQL    |
| DuckDB           |

| File types |
|------------|
| CSV        |
| JSON       |
| NDJSON     |
| Parquet    |

| File stores  |
|--------------|
| Amazon S3    |
| Filesystem   |
| Google GCS   |
| Google Drive |
| SFTP         |
| FTP          |
| Azure WASB   |
| Azure WASBS  |

## Available operations

The following are some key functions available in the SDK:

- [`load_file`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/load_file.html): Load a given file into a SQL table
- [`transform`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/transform.html): Applies a SQL select statement to a source table and saves the result to a destination table
- [`drop_table`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/drop_table.html): Drops a SQL table
- [`run_raw_sql`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/raw_sql.html): Run any SQL statement without handling its output
- [`append`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/append.html): Insert rows from the source SQL table into the destination SQL table, if there are no conflicts
- [`merge`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/merge.html): Insert rows from the source SQL table into the destination SQL table, depending on conflicts:
  - `ignore`: Do not add rows that already exist
  - `update`: Replace existing rows with new ones
- [`export_file`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/export.html): Export SQL table rows into a destination file
- [`dataframe`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/dataframe.html): Export given SQL table into in-memory Pandas data-frame

For a full list of available operators, see the [SDK reference documentation](https://astro-sdk-python.readthedocs.io/en/stable/operators.html).

## Documentation

The documentation is a work in progress--we aim to follow the [Diátaxis](https://diataxis.fr/) system:

- **[Getting Started Tutorial](https://docs.astronomer.io/learn/astro-python-sdk)**: A hands-on introduction to the Astro Python SDK
- **How-to guides**: Simple step-by-step user guides to accomplish specific tasks
- **[Reference guide](https://astro-sdk-python.readthedocs.io/)**: Commands, modules, classes and methods
- **Explanation**: Clarification and discussion of key decisions when designing the project

## Changelog

The Astro Python SDK follows semantic versioning for releases. Check the [changelog](docs/CHANGELOG.md) for the latest changes.

## Release managements

To learn more about our release philosophy and steps, see [Managing Releases](docs/development/RELEASE.md).

## Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the [Contribution Guideline](./docs/development/CONTRIBUTING.md) for a detailed overview on how to contribute.

Contributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).

## License

[Apache Licence 2.0](LICENSE)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "astro-sdk-python",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "airflow, provider, astronomer, sql, decorator, task flow, elt, etl, dag",
    "author": null,
    "author_email": "Astronomer <humans@astronomer.io>",
    "download_url": "https://files.pythonhosted.org/packages/e0/3a/555cacd6478edd5cb6966358d159466ad325fcf5570e70225a528ca6418f/astro_sdk_python-1.8.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  astro\n</h1>\n  <h3 align=\"center\">\n  workflows made easy<br><br>\n</h3>\n\n[![Python versions](https://img.shields.io/pypi/pyversions/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)\n[![License](https://img.shields.io/pypi/l/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)\n[![Development Status](https://img.shields.io/pypi/status/astro-sdk-python.svg)](https://pypi.org/pypi/astro-sdk-python)\n[![PyPI downloads](https://img.shields.io/pypi/dm/astro-sdk-python.svg)](https://pypistats.org/packages/astro-sdk-python)\n[![Contributors](https://img.shields.io/github/contributors/astronomer/astro-sdk)](https://github.com/astronomer/astro-sdk)\n[![Commit activity](https://img.shields.io/github/commit-activity/m/astronomer/astro-sdk)](https://github.com/astronomer/astro-sdk)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/astronomer/astro-sdk/main.svg)](https://results.pre-commit.ci/latest/github/astronomer/astro-sdk/main)\n[![CI](https://github.com/astronomer/astro-sdk/actions/workflows/ci-python-sdk.yaml/badge.svg)](https://github.com/astronomer/astro-sdk)\n[![codecov](https://codecov.io/gh/astronomer/astro-sdk/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-sdk)\n\n**Astro Python SDK** is a Python SDK for rapid development of extract, transform, and load workflows in [Apache Airflow](https://airflow.apache.org/). It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by [Astronomer](https://astronomer.io).\n\n## Prerequisites\n\n- Apache Airflow >= 2.1.0.\n\n## Install\n\nThe Astro Python SDK is available at [PyPI](https://pypi.org/project/astro-sdk-python/). Use the standard Python\n[installation tools](https://packaging.python.org/en/latest/tutorials/installing-packages/).\n\nTo install a cloud-agnostic version of the SDK, run:\n\n```shell\npip install astro-sdk-python\n```\n\nYou can also install dependencies for using the SDK with popular cloud providers:\n\n```shell\npip install astro-sdk-python[amazon,google,snowflake,postgres]\n```\n\n\n## Quickstart\n1. Ensure that your Airflow environment is set up correctly by running the following commands:\n\n    ```shell\n    export AIRFLOW_HOME=`pwd`\n    export AIRFLOW__CORE__XCOM_BACKEND=astro.custom_backend.astro_custom_backend.AstroCustomXcomBackend\n    export AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV=true\n    airflow db init\n    ```\n   > **Note:** `AIRFLOW__CORE__ENABLE_XCOM_PICKLING` no longer needs to be enabled for `astro-sdk-python`. This functionality is now deprecated as our custom xcom backend handles serialization.\n\n    The `AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV` should only be used for local development. The [XCom backend docs](https://astro-sdk-python.readthedocs.io/en/latest/guides/xcom_backend.html#airflow_xcom_backend) give further details about how to set this up in non-local environments.\n\n    Currently, custom XCom backends are limited to data types that are json serializable. Since Dataframes are not json serializable, we need to enable XCom pickling to store dataframes.\n\n    The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can\u2019t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.\n\n    Read more: [enable_xcom_pickling](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#enable-xcom-pickling) and [pickle](https://docs.python.org/3/library/pickle.html#comparison-with-json):\n\n\n2. Create a SQLite database for the example to run with:\n\n    ```shell\n    # The sqlite_default connection has different host for MAC vs. Linux\n    export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`\n    sqlite3 \"$SQL_TABLE_NAME\" \"VACUUM;\"\n    ```\n\n3. Copy the following workflow into a file named `calculate_popular_movies.py` and add it to the `dags` directory of your Airflow project:\n\n   https://github.com/astronomer/astro-sdk/blob/d5aa768b2d4bca72ef98f8d533fe3f99624b172f/example_dags/calculate_popular_movies.py#L1-L37\n\n   Alternatively, you can download `calculate_popular_movies.py`\n   ```shell\n    curl -O https://raw.githubusercontent.com/astronomer/astro-sdk/main/python-sdk/example_dags/calculate_popular_movies.py\n   ```\n\n4. Run the example DAG:\n\n    ```sh\n    airflow dags test calculate_popular_movies `date -Iseconds`\n    ```\n\n5. Check the result of your DAG by running:\n\n    ```shell\n    sqlite3 \"$SQL_TABLE_NAME\" \"select * from top_animation;\" \".exit\"\n    ```\n\n    You should see the following output:\n\n    ```shell\n    $ sqlite3 \"$SQL_TABLE_NAME\" \"select * from top_animation;\" \".exit\"\n    Toy Story 3 (2010)|8.3\n    Inside Out (2015)|8.2\n    How to Train Your Dragon (2010)|8.1\n    Zootopia (2016)|8.1\n    How to Train Your Dragon 2 (2014)|7.9\n    ```\n\n## Supported technologies\n\n| Databases        |\n|------------------|\n| Databricks Delta |\n| Google BigQuery  |\n| Postgres         |\n| Snowflake        |\n| SQLite           |\n| Amazon Redshift  |\n| Microsoft SQL    |\n| DuckDB           |\n\n| File types |\n|------------|\n| CSV        |\n| JSON       |\n| NDJSON     |\n| Parquet    |\n\n| File stores  |\n|--------------|\n| Amazon S3    |\n| Filesystem   |\n| Google GCS   |\n| Google Drive |\n| SFTP         |\n| FTP          |\n| Azure WASB   |\n| Azure WASBS  |\n\n## Available operations\n\nThe following are some key functions available in the SDK:\n\n- [`load_file`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/load_file.html): Load a given file into a SQL table\n- [`transform`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/transform.html): Applies a SQL select statement to a source table and saves the result to a destination table\n- [`drop_table`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/drop_table.html): Drops a SQL table\n- [`run_raw_sql`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/raw_sql.html): Run any SQL statement without handling its output\n- [`append`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/append.html): Insert rows from the source SQL table into the destination SQL table, if there are no conflicts\n- [`merge`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/merge.html): Insert rows from the source SQL table into the destination SQL table, depending on conflicts:\n  - `ignore`: Do not add rows that already exist\n  - `update`: Replace existing rows with new ones\n- [`export_file`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/export.html): Export SQL table rows into a destination file\n- [`dataframe`](https://astro-sdk-python.readthedocs.io/en/stable/astro/sql/operators/dataframe.html): Export given SQL table into in-memory Pandas data-frame\n\nFor a full list of available operators, see the [SDK reference documentation](https://astro-sdk-python.readthedocs.io/en/stable/operators.html).\n\n## Documentation\n\nThe documentation is a work in progress--we aim to follow the [Di\u00e1taxis](https://diataxis.fr/) system:\n\n- **[Getting Started Tutorial](https://docs.astronomer.io/learn/astro-python-sdk)**: A hands-on introduction to the Astro Python SDK\n- **How-to guides**: Simple step-by-step user guides to accomplish specific tasks\n- **[Reference guide](https://astro-sdk-python.readthedocs.io/)**: Commands, modules, classes and methods\n- **Explanation**: Clarification and discussion of key decisions when designing the project\n\n## Changelog\n\nThe Astro Python SDK follows semantic versioning for releases. Check the [changelog](docs/CHANGELOG.md) for the latest changes.\n\n## Release managements\n\nTo learn more about our release philosophy and steps, see [Managing Releases](docs/development/RELEASE.md).\n\n## Contribution guidelines\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.\n\nRead the [Contribution Guideline](./docs/development/CONTRIBUTING.md) for a detailed overview on how to contribute.\n\nContributors and maintainers should abide by the [Contributor Code of Conduct](CODE_OF_CONDUCT.md).\n\n## License\n\n[Apache Licence 2.0](LICENSE)\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.",
    "version": "1.8.1",
    "project_urls": {
        "Documentation": "https://astro-sdk-python.rtfd.io/",
        "Home": "https://astronomer.io/",
        "Source": "https://github.com/astronomer/astro-sdk/tree/main/python-sdk"
    },
    "split_keywords": [
        "airflow",
        " provider",
        " astronomer",
        " sql",
        " decorator",
        " task flow",
        " elt",
        " etl",
        " dag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "662420e00334ae3343c8df1a21a27830a89bd138bfdf650517ce95a1bd32c174",
                "md5": "5ea6800e8986a4e1e81aec8fae8a2d50",
                "sha256": "ef2c64c54a1676e73bcb95f094237d76874440775375ab1ebc26626d16f61aaf"
            },
            "downloads": -1,
            "filename": "astro_sdk_python-1.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ea6800e8986a4e1e81aec8fae8a2d50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 157218,
            "upload_time": "2024-06-21T09:52:30",
            "upload_time_iso_8601": "2024-06-21T09:52:30.450022Z",
            "url": "https://files.pythonhosted.org/packages/66/24/20e00334ae3343c8df1a21a27830a89bd138bfdf650517ce95a1bd32c174/astro_sdk_python-1.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e03a555cacd6478edd5cb6966358d159466ad325fcf5570e70225a528ca6418f",
                "md5": "eed61eb329c50af602800053b1c1ea35",
                "sha256": "89f2559e6ae07e051850b6b7267febd0aefa503504dce5fee85a575afbd3f4c1"
            },
            "downloads": -1,
            "filename": "astro_sdk_python-1.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "eed61eb329c50af602800053b1c1ea35",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 111783,
            "upload_time": "2024-06-21T09:52:32",
            "upload_time_iso_8601": "2024-06-21T09:52:32.539487Z",
            "url": "https://files.pythonhosted.org/packages/e0/3a/555cacd6478edd5cb6966358d159466ad325fcf5570e70225a528ca6418f/astro_sdk_python-1.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-21 09:52:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "astronomer",
    "github_project": "astro-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "astro-sdk-python"
}
        
Elapsed time: 3.69169s