flytekitplugins-whylogs


Nameflytekitplugins-whylogs JSON
Version 1.14.3 PyPI version JSON
download
home_pagehttps://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-whylogs
SummaryEnable the use of whylogs profiles to be used in flyte tasks to get aggregate statistics about data.
upload_time2024-12-26 22:44:59
maintainerNone
docs_urlNone
authorwhylabs
requires_python>=3.9
licenseapache2
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flytekit whylogs Plugin

whylogs is an open source library for logging any kind of data. With whylogs,
you are able to generate summaries of datasets (called whylogs profiles) which
can be used to:

- Create data constraints to know whether your data looks the way it should
- Quickly visualize key summary statistics about a dataset
- Track changes in a dataset over time

```bash
pip install flytekitplugins-whylogs
```

To generate profiles, you can add a task like the following:

```python
import whylogs as why
from whylogs.core import DatasetProfileView

import pandas as pd

from flytekit import task

@task
def profile(df: pd.DataFrame) -> DatasetProfileView:
    result = why.log(df) # Various overloads for different common data types exist
    profile_view = result.view()
    return profile
```

>**NOTE:** You'll be passing around `DatasetProfileView` from tasks, not `DatasetProfile`.

## Validating Data

A common step in data pipelines is data validation. This can be done in
`whylogs` through the constraint feature. You'll be able to create failure tasks
if the data in the workflow doesn't conform to some configured constraints, like
min/max values on features, data types on features, etc.

```python
from whylogs.core.constraints.factories import greater_than_number, mean_between_range

@task
def validate_data(profile_view: DatasetProfileView):
    builder = ConstraintsBuilder(dataset_profile_view=profile_view)
    builder.add_constraint(greater_than_number(column_name="my_column", number=0.14))
    builder.add_constraint(mean_between_range(column_name="my_other_column", lower=2, upper=3))
    constraint = builder.build()
    valid = constraint.validate()

    if valid is False:
        print(constraint.report())
        raise Exception("Invalid data found")
```

If you want to learn more about whylogs, check out our [example notebooks](https://github.com/whylabs/whylogs/tree/mainline/python/examples).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-whylogs",
    "name": "flytekitplugins-whylogs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "whylabs",
    "author_email": "support@whylabs.ai",
    "download_url": "https://files.pythonhosted.org/packages/a7/43/9ec213865401f013c313fac456729a0426bbc83a650e816487598b24d9bf/flytekitplugins_whylogs-1.14.3.tar.gz",
    "platform": null,
    "description": "# Flytekit whylogs Plugin\n\nwhylogs is an open source library for logging any kind of data. With whylogs,\nyou are able to generate summaries of datasets (called whylogs profiles) which\ncan be used to:\n\n- Create data constraints to know whether your data looks the way it should\n- Quickly visualize key summary statistics about a dataset\n- Track changes in a dataset over time\n\n```bash\npip install flytekitplugins-whylogs\n```\n\nTo generate profiles, you can add a task like the following:\n\n```python\nimport whylogs as why\nfrom whylogs.core import DatasetProfileView\n\nimport pandas as pd\n\nfrom flytekit import task\n\n@task\ndef profile(df: pd.DataFrame) -> DatasetProfileView:\n    result = why.log(df) # Various overloads for different common data types exist\n    profile_view = result.view()\n    return profile\n```\n\n>**NOTE:** You'll be passing around `DatasetProfileView` from tasks, not `DatasetProfile`.\n\n## Validating Data\n\nA common step in data pipelines is data validation. This can be done in\n`whylogs` through the constraint feature. You'll be able to create failure tasks\nif the data in the workflow doesn't conform to some configured constraints, like\nmin/max values on features, data types on features, etc.\n\n```python\nfrom whylogs.core.constraints.factories import greater_than_number, mean_between_range\n\n@task\ndef validate_data(profile_view: DatasetProfileView):\n    builder = ConstraintsBuilder(dataset_profile_view=profile_view)\n    builder.add_constraint(greater_than_number(column_name=\"my_column\", number=0.14))\n    builder.add_constraint(mean_between_range(column_name=\"my_other_column\", lower=2, upper=3))\n    constraint = builder.build()\n    valid = constraint.validate()\n\n    if valid is False:\n        print(constraint.report())\n        raise Exception(\"Invalid data found\")\n```\n\nIf you want to learn more about whylogs, check out our [example notebooks](https://github.com/whylabs/whylogs/tree/mainline/python/examples).\n",
    "bugtrack_url": null,
    "license": "apache2",
    "summary": "Enable the use of whylogs profiles to be used in flyte tasks to get aggregate statistics about data.",
    "version": "1.14.3",
    "project_urls": {
        "Homepage": "https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-whylogs"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "537a5f5238d4684c9c39c869db04765200a5809114b256a1d820b29a05eddb0e",
                "md5": "3abbdae3c121a1adf5e9197e9c48f04a",
                "sha256": "1eb1b259b4256c53566f34e7add45443349ea99deaa9d036037cd9387e49430f"
            },
            "downloads": -1,
            "filename": "flytekitplugins_whylogs-1.14.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3abbdae3c121a1adf5e9197e9c48f04a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5774,
            "upload_time": "2024-12-26T22:44:13",
            "upload_time_iso_8601": "2024-12-26T22:44:13.454583Z",
            "url": "https://files.pythonhosted.org/packages/53/7a/5f5238d4684c9c39c869db04765200a5809114b256a1d820b29a05eddb0e/flytekitplugins_whylogs-1.14.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7439ec213865401f013c313fac456729a0426bbc83a650e816487598b24d9bf",
                "md5": "837d8a0ba40a21402f8a7525c005af86",
                "sha256": "33bb3fab6b7823a17b6e428ea6783c3761d0f30b560f2b303f9158a9a853ede4"
            },
            "downloads": -1,
            "filename": "flytekitplugins_whylogs-1.14.3.tar.gz",
            "has_sig": false,
            "md5_digest": "837d8a0ba40a21402f8a7525c005af86",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 5453,
            "upload_time": "2024-12-26T22:44:59",
            "upload_time_iso_8601": "2024-12-26T22:44:59.251492Z",
            "url": "https://files.pythonhosted.org/packages/a7/43/9ec213865401f013c313fac456729a0426bbc83a650e816487598b24d9bf/flytekitplugins_whylogs-1.14.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-26 22:44:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "flyteorg",
    "github_project": "flytekit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "flytekitplugins-whylogs"
}
        
Elapsed time: 0.50092s