airflow-provider-whylogs


Nameairflow-provider-whylogs JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/whylabs/airflow-provider-whylogs
SummaryAn Apache Airflow provider for whylogs
upload_time2023-11-29 22:12:56
maintainer
docs_urlNone
authorWhyLabs.ai
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # whylogs Airflow Operator

This is a package for the [whylogs](https://github.com/whylabs/whylogs) provider, the open source standard for data and ML logging. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:

- Track changes in their dataset
- Create data constraints to know whether their data looks the way it should
- Quickly visualize key summary statistics about their datasets

This Airflow operator focuses on simplifying whylogs' usage along with Airflow. Users are encouraged to benefit from their existing Data Profiles, which are created with whylogs and can bring a lot of value and visibility to track their data changes over time.  
 
## Installation

You can install this package on top of an existing Airflow 2.0+ installation ([Requirements](#requirements)) by simply running:

```bash
$ pip install airflow-provider-whylogs
```

To install this provider from source, run these instead:

```bash
$ git clone git@github.com:whylabs/airflow-provider-whylogs.git
$ cd airflow-provider-whylogs
$ python3 -m venv .env && source .env/bin/activate
$ pip3 install -e .
```

## Usage example

In order to benefir from the existing operators, users will have to profile their data first, with their **processing** environment of choice. To create and store a profile locally, run the following command on a pandas DataFrame:

```python
import whylogs as why

df = pd.read_csv("some_file.csv")
results = why.log(df)
results.writer("local").write()
```

And after that, you can use our operators to either:

- Create a Summary Drift Report, to visually help you identify if there was drift in your data

```python
from whylogs_provider.operators.whylogs import WhylogsSummaryDriftOperator

summary_drift = WhylogsSummaryDriftOperator(
        task_id="drift_report",
        target_profile_path="data/profile.bin",
        reference_profile_path="data/profile.bin",
        reader="local",
        write_report_path="data/Profile.html",
    )
```

- Run a Constraints check, to check if your profiled data met some criteria

```python
from whylogs_provider.operators.whylogs import WhylogsConstraintsOperator
from whylogs.core.constraints.factories import greater_than_number

constraints = WhylogsConstraintsOperator(
        task_id="constraints_check",
        profile_path="data/profile.bin",
        reader="local",
        constraint=greater_than_number(column_name="my_column", number=0.0),
    )
```

>**NOTE**: It is important to note that even though it is possible to create a Dataset Profile with the Python Operator, Airflow tries to separate the concern of orchestration from processing, so that is one of the reasons why we didn't want to have a strong opinion on how to read data and profile it, enabling users to best adjust this step to their existing scenario.

A full DAG example can be found on the whylogs_provider package [directory](https://github.com/whylabs/airflow-provider-whylogs/tree/mainline/whylogs_provider/example_dags).  

## Requirements

The current requirements to use this Airflow Provider are described on the table below. 

| PIP package        | Version required |
|--------------------|------------------|
| ``apache-airflow`` | ``>=2.0``      |
| ``whylogs[viz, s3]``   | ``>=1.0.10``     |

## Contributing

Users are always welcome to ask questions and contribute to this repository, by submitting issues and communicating with us through our [community Slack](http://join.slack.whylabs.ai/). Feel free to reach out and make `whylogs` even more awesome to use with Airflow.

Happy coding! 😄

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/whylabs/airflow-provider-whylogs",
    "name": "airflow-provider-whylogs",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "WhyLabs.ai",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/c5/be/71c7f4747baac90517b27ddc7761c367114de726b6ac1e92123c8ff0f2a0/airflow-provider-whylogs-0.0.3.tar.gz",
    "platform": null,
    "description": "# whylogs Airflow Operator\n\nThis is a package for the [whylogs](https://github.com/whylabs/whylogs) provider, the open source standard for data and ML logging. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:\n\n- Track changes in their dataset\n- Create data constraints to know whether their data looks the way it should\n- Quickly visualize key summary statistics about their datasets\n\nThis Airflow operator focuses on simplifying whylogs' usage along with Airflow. Users are encouraged to benefit from their existing Data Profiles, which are created with whylogs and can bring a lot of value and visibility to track their data changes over time.  \n \n## Installation\n\nYou can install this package on top of an existing Airflow 2.0+ installation ([Requirements](#requirements)) by simply running:\n\n```bash\n$ pip install airflow-provider-whylogs\n```\n\nTo install this provider from source, run these instead:\n\n```bash\n$ git clone git@github.com:whylabs/airflow-provider-whylogs.git\n$ cd airflow-provider-whylogs\n$ python3 -m venv .env && source .env/bin/activate\n$ pip3 install -e .\n```\n\n## Usage example\n\nIn order to benefir from the existing operators, users will have to profile their data first, with their **processing** environment of choice. To create and store a profile locally, run the following command on a pandas DataFrame:\n\n```python\nimport whylogs as why\n\ndf = pd.read_csv(\"some_file.csv\")\nresults = why.log(df)\nresults.writer(\"local\").write()\n```\n\nAnd after that, you can use our operators to either:\n\n- Create a Summary Drift Report, to visually help you identify if there was drift in your data\n\n```python\nfrom whylogs_provider.operators.whylogs import WhylogsSummaryDriftOperator\n\nsummary_drift = WhylogsSummaryDriftOperator(\n        task_id=\"drift_report\",\n        target_profile_path=\"data/profile.bin\",\n        reference_profile_path=\"data/profile.bin\",\n        reader=\"local\",\n        write_report_path=\"data/Profile.html\",\n    )\n```\n\n- Run a Constraints check, to check if your profiled data met some criteria\n\n```python\nfrom whylogs_provider.operators.whylogs import WhylogsConstraintsOperator\nfrom whylogs.core.constraints.factories import greater_than_number\n\nconstraints = WhylogsConstraintsOperator(\n        task_id=\"constraints_check\",\n        profile_path=\"data/profile.bin\",\n        reader=\"local\",\n        constraint=greater_than_number(column_name=\"my_column\", number=0.0),\n    )\n```\n\n>**NOTE**: It is important to note that even though it is possible to create a Dataset Profile with the Python Operator, Airflow tries to separate the concern of orchestration from processing, so that is one of the reasons why we didn't want to have a strong opinion on how to read data and profile it, enabling users to best adjust this step to their existing scenario.\n\nA full DAG example can be found on the whylogs_provider package [directory](https://github.com/whylabs/airflow-provider-whylogs/tree/mainline/whylogs_provider/example_dags).  \n\n## Requirements\n\nThe current requirements to use this Airflow Provider are described on the table below. \n\n| PIP package        | Version required |\n|--------------------|------------------|\n| ``apache-airflow`` | ``>=2.0``      |\n| ``whylogs[viz, s3]``   | ``>=1.0.10``     |\n\n## Contributing\n\nUsers are always welcome to ask questions and contribute to this repository, by submitting issues and communicating with us through our [community Slack](http://join.slack.whylabs.ai/). Feel free to reach out and make `whylogs` even more awesome to use with Airflow.\n\nHappy coding! \ud83d\ude04\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An Apache Airflow provider for whylogs",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/whylabs/airflow-provider-whylogs"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fcf588852264c520b1bba74ccc4d68f0251016f953372d7236034f8e7ddbe959",
                "md5": "2d21cbb46d757849871418bc07d2e3d2",
                "sha256": "792d79a755afb0bc06d4c264f9a7d58afbd526d57e6a7b7f40ca671c824dd04d"
            },
            "downloads": -1,
            "filename": "airflow_provider_whylogs-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d21cbb46d757849871418bc07d2e3d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10637,
            "upload_time": "2023-11-29T22:12:54",
            "upload_time_iso_8601": "2023-11-29T22:12:54.554050Z",
            "url": "https://files.pythonhosted.org/packages/fc/f5/88852264c520b1bba74ccc4d68f0251016f953372d7236034f8e7ddbe959/airflow_provider_whylogs-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5be71c7f4747baac90517b27ddc7761c367114de726b6ac1e92123c8ff0f2a0",
                "md5": "447d5722ed61a1f24d4e403d7c13d3e3",
                "sha256": "272af3c6f0e6a14cc01f724709f79bf62fff15ac5a95b65aba8cb8a519868d11"
            },
            "downloads": -1,
            "filename": "airflow-provider-whylogs-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "447d5722ed61a1f24d4e403d7c13d3e3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9463,
            "upload_time": "2023-11-29T22:12:56",
            "upload_time_iso_8601": "2023-11-29T22:12:56.290191Z",
            "url": "https://files.pythonhosted.org/packages/c5/be/71c7f4747baac90517b27ddc7761c367114de726b6ac1e92123c8ff0f2a0/airflow-provider-whylogs-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-29 22:12:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "whylabs",
    "github_project": "airflow-provider-whylogs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "airflow-provider-whylogs"
}
        
Elapsed time: 0.14805s