grizz


Namegrizz JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/durandtibo/grizz
SummaryA light library to preprocess data with polars
upload_time2024-11-08 20:52:42
maintainerNone
docs_urlNone
authorThibaut Durand
requires_python<3.14,>=3.9
licenseBSD-3-Clause
keywords polars dataframe ingestor transformer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # grizz

<p align="center">
    <a href="https://github.com/durandtibo/grizz/actions">
        <img alt="CI" src="https://github.com/durandtibo/grizz/workflows/CI/badge.svg">
    </a>
    <a href="https://github.com/durandtibo/grizz/actions">
        <img alt="Nightly Tests" src="https://github.com/durandtibo/grizz/workflows/Nightly%20Tests/badge.svg">
    </a>
    <a href="https://github.com/durandtibo/grizz/actions">
        <img alt="Nightly Package Tests" src="https://github.com/durandtibo/grizz/workflows/Nightly%20Package%20Tests/badge.svg">
    </a>
    <br/>
    <a href="https://durandtibo.github.io/grizz/">
        <img alt="Documentation" src="https://github.com/durandtibo/grizz/workflows/Documentation%20(stable)/badge.svg">
    </a>
    <a href="https://durandtibo.github.io/grizz/">
        <img alt="Documentation" src="https://github.com/durandtibo/grizz/workflows/Documentation%20(unstable)/badge.svg">
    </a>
    <br/>
    <a href="https://codecov.io/gh/durandtibo/grizz">
        <img alt="Codecov" src="https://codecov.io/gh/durandtibo/grizz/branch/main/graph/badge.svg">
    </a>
    <a href="https://codeclimate.com/github/durandtibo/grizz/maintainability">
        <img src="https://api.codeclimate.com/v1/badges/7f2bd443a970c115cd94/maintainability" />
    </a>
    <a href="https://codeclimate.com/github/durandtibo/grizz/test_coverage">
        <img src="https://api.codeclimate.com/v1/badges/7f2bd443a970c115cd94/test_coverage" />
    </a>
    <br/>
    <a href="https://github.com/psf/black">
        <img  alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg">
    </a>
    <a href="https://google.github.io/styleguide/pyguide.html#s3.8-comments-and-docstrings">
        <img  alt="Doc style: google" src="https://img.shields.io/badge/%20style-google-3666d6.svg">
    </a>
    <a href="https://github.com/astral-sh/ruff">
        <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff" style="max-width:100%;">
    </a>
    <a href="https://github.com/guilatrova/tryceratops">
        <img  alt="Doc style: google" src="https://img.shields.io/badge/try%2Fexcept%20style-tryceratops%20%F0%9F%A6%96%E2%9C%A8-black">
    </a>
    <br/>
    <a href="https://pypi.org/project/grizz/">
        <img alt="PYPI version" src="https://img.shields.io/pypi/v/grizz">
    </a>
    <a href="https://pypi.org/project/grizz/">
        <img alt="Python" src="https://img.shields.io/pypi/pyversions/grizz.svg">
    </a>
    <a href="https://opensource.org/licenses/BSD-3-Clause">
        <img alt="BSD-3-Clause" src="https://img.shields.io/pypi/l/grizz">
    </a>
    <br/>
    <a href="https://pepy.tech/project/grizz">
        <img  alt="Downloads" src="https://static.pepy.tech/badge/grizz">
    </a>
    <a href="https://pepy.tech/project/grizz">
        <img  alt="Monthly downloads" src="https://static.pepy.tech/badge/grizz/month">
    </a>
    <br/>
</p>

## Overview

`grizz` is a light library to ingest and transform data
in [polars](https://docs.pola.rs/api/python/stable/reference/index.html) DataFrame.
`grizz` uses an object-oriented strategy, where ingestors and transformers are building blocks that
can be combined together.
`grizz` can be extend to add custom DataFrame ingestors and transformers.
For example, the following example shows how to change the casting of some columns.

```pycon

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

```

- [Documentation](https://durandtibo.github.io/grizz/)
- [Installation](#installation)
- [Contributing](#contributing)
- [API stability](#api-stability)
- [License](#license)

## Documentation

- [latest (stable)](https://durandtibo.github.io/grizz/): documentation from the latest stable
  release.
- [main (unstable)](https://durandtibo.github.io/grizz/main/): documentation associated to the
  main branch of the repo. This documentation may contain a lot of work-in-progress/outdated/missing
  parts.

## Installation

We highly recommend installing
a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
`grizz` can be installed from pip using the following command:

```shell
pip install grizz
```

To make the package as slim as possible, only the minimal packages required to use `grizz` are
installed.
To include all the dependencies, you can use the following command:

```shell
pip install grizz[all]
```

Please check the [get started page](https://durandtibo.github.io/grizz/get_started) to see how to
install only some specific dependencies or other alternatives to install the library.
The following is the corresponding `grizz` versions and their dependencies.

| `grizz` | `coola`        | `iden`         | `objectory`  | `polars`     | `python`      |
|---------|----------------|----------------|--------------|--------------|---------------|
| `main`  | `>=0.8.5,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |
| `0.1.1` | `>=0.8.5,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |
| `0.1.0` | `>=0.8.4,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |
| `0.0.5` | `>=0.7,<1.0`   | `>=0.0.4,<1.0` | `>=0.1,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.13` |
| `0.0.4` | `>=0.7,<1.0`   | `>=0.0.4,<1.0` | `>=0.1,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.13` |

Optional dependencies

| `grizz` | `clickhouse-connect`<sup>*</sup> | `pyarrow`<sup>*</sup> | `tqdm`<sup>*</sup> |
|---------|----------------------------------|-----------------------|--------------------|
| `main`  | `>=0.7,<1.0`                     | `>=10.0,<19.0`        | `>=4.65,<5.0`      |
| `0.1.1` | `>=0.7,<1.0`                     | `>=10.0,<19.0`        | `>=4.65,<5.0`      |
| `0.1.0` | `>=0.7,<1.0`                     | `>=10.0,<18.0`        | `>=4.65,<5.0`      |
| `0.0.5` | `>=0.7,<1.0`                     | `>=10.0,<18.0`        | `>=4.65,<5.0`      |
| `0.0.4` | `>=0.7,<1.0`                     | `>=10.0,<17.0`        | `>=4.65,<5.0`      |

<sup>*</sup> indicates an optional dependency

## Contributing

Please check the instructions in [CONTRIBUTING.md](.github/CONTRIBUTING.md).

## Suggestions and Communication

Everyone is welcome to contribute to the community.
If you have any questions or suggestions, you can
submit [Github Issues](https://github.com/durandtibo/grizz/issues).
We will reply to you as soon as possible. Thank you very much.

## API stability

:warning: While `grizz` is in development stage, no API is guaranteed to be stable from one
release to the next.
In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release.
In practice, this means that upgrading `grizz` to a new version will possibly break any code that
was using the old version of `grizz`.

## License

`grizz` is licensed under BSD 3-Clause "New" or "Revised" license available in [LICENSE](LICENSE)
file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/durandtibo/grizz",
    "name": "grizz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": null,
    "keywords": "polars, DataFrame, ingestor, transformer",
    "author": "Thibaut Durand",
    "author_email": "durand.tibo+gh@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0b/2d/a3fc6f6a5765f8517c0f3745384710d2fec8eea511886eb2eed4c3c6b8c3/grizz-0.1.1.tar.gz",
    "platform": null,
    "description": "# grizz\n\n<p align=\"center\">\n    <a href=\"https://github.com/durandtibo/grizz/actions\">\n        <img alt=\"CI\" src=\"https://github.com/durandtibo/grizz/workflows/CI/badge.svg\">\n    </a>\n    <a href=\"https://github.com/durandtibo/grizz/actions\">\n        <img alt=\"Nightly Tests\" src=\"https://github.com/durandtibo/grizz/workflows/Nightly%20Tests/badge.svg\">\n    </a>\n    <a href=\"https://github.com/durandtibo/grizz/actions\">\n        <img alt=\"Nightly Package Tests\" src=\"https://github.com/durandtibo/grizz/workflows/Nightly%20Package%20Tests/badge.svg\">\n    </a>\n    <br/>\n    <a href=\"https://durandtibo.github.io/grizz/\">\n        <img alt=\"Documentation\" src=\"https://github.com/durandtibo/grizz/workflows/Documentation%20(stable)/badge.svg\">\n    </a>\n    <a href=\"https://durandtibo.github.io/grizz/\">\n        <img alt=\"Documentation\" src=\"https://github.com/durandtibo/grizz/workflows/Documentation%20(unstable)/badge.svg\">\n    </a>\n    <br/>\n    <a href=\"https://codecov.io/gh/durandtibo/grizz\">\n        <img alt=\"Codecov\" src=\"https://codecov.io/gh/durandtibo/grizz/branch/main/graph/badge.svg\">\n    </a>\n    <a href=\"https://codeclimate.com/github/durandtibo/grizz/maintainability\">\n        <img src=\"https://api.codeclimate.com/v1/badges/7f2bd443a970c115cd94/maintainability\" />\n    </a>\n    <a href=\"https://codeclimate.com/github/durandtibo/grizz/test_coverage\">\n        <img src=\"https://api.codeclimate.com/v1/badges/7f2bd443a970c115cd94/test_coverage\" />\n    </a>\n    <br/>\n    <a href=\"https://github.com/psf/black\">\n        <img  alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\">\n    </a>\n    <a href=\"https://google.github.io/styleguide/pyguide.html#s3.8-comments-and-docstrings\">\n        <img  alt=\"Doc style: google\" src=\"https://img.shields.io/badge/%20style-google-3666d6.svg\">\n    </a>\n    <a href=\"https://github.com/astral-sh/ruff\">\n        <img src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\" alt=\"Ruff\" style=\"max-width:100%;\">\n    </a>\n    <a href=\"https://github.com/guilatrova/tryceratops\">\n        <img  alt=\"Doc style: google\" src=\"https://img.shields.io/badge/try%2Fexcept%20style-tryceratops%20%F0%9F%A6%96%E2%9C%A8-black\">\n    </a>\n    <br/>\n    <a href=\"https://pypi.org/project/grizz/\">\n        <img alt=\"PYPI version\" src=\"https://img.shields.io/pypi/v/grizz\">\n    </a>\n    <a href=\"https://pypi.org/project/grizz/\">\n        <img alt=\"Python\" src=\"https://img.shields.io/pypi/pyversions/grizz.svg\">\n    </a>\n    <a href=\"https://opensource.org/licenses/BSD-3-Clause\">\n        <img alt=\"BSD-3-Clause\" src=\"https://img.shields.io/pypi/l/grizz\">\n    </a>\n    <br/>\n    <a href=\"https://pepy.tech/project/grizz\">\n        <img  alt=\"Downloads\" src=\"https://static.pepy.tech/badge/grizz\">\n    </a>\n    <a href=\"https://pepy.tech/project/grizz\">\n        <img  alt=\"Monthly downloads\" src=\"https://static.pepy.tech/badge/grizz/month\">\n    </a>\n    <br/>\n</p>\n\n## Overview\n\n`grizz` is a light library to ingest and transform data\nin [polars](https://docs.pola.rs/api/python/stable/reference/index.html) DataFrame.\n`grizz` uses an object-oriented strategy, where ingestors and transformers are building blocks that\ncan be combined together.\n`grizz` can be extend to add custom DataFrame ingestors and transformers.\nFor example, the following example shows how to change the casting of some columns.\n\n```pycon\n\n>>> import polars as pl\n>>> from grizz.transformer import Cast\n>>> transformer = Cast(columns=[\"col1\", \"col3\"], dtype=pl.Int32)\n>>> frame = pl.DataFrame(\n...     {\n...         \"col1\": [1, 2, 3, 4, 5],\n...         \"col2\": [\"1\", \"2\", \"3\", \"4\", \"5\"],\n...         \"col3\": [\"1\", \"2\", \"3\", \"4\", \"5\"],\n...         \"col4\": [\"a\", \"b\", \"c\", \"d\", \"e\"],\n...     }\n... )\n>>> out = transformer.transform(frame)\n>>> out\nshape: (5, 4)\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 col1 \u2506 col2 \u2506 col3 \u2506 col4 \u2502\n\u2502 ---  \u2506 ---  \u2506 ---  \u2506 ---  \u2502\n\u2502 i32  \u2506 str  \u2506 i32  \u2506 str  \u2502\n\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n\u2502 1    \u2506 1    \u2506 1    \u2506 a    \u2502\n\u2502 2    \u2506 2    \u2506 2    \u2506 b    \u2502\n\u2502 3    \u2506 3    \u2506 3    \u2506 c    \u2502\n\u2502 4    \u2506 4    \u2506 4    \u2506 d    \u2502\n\u2502 5    \u2506 5    \u2506 5    \u2506 e    \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\n```\n\n- [Documentation](https://durandtibo.github.io/grizz/)\n- [Installation](#installation)\n- [Contributing](#contributing)\n- [API stability](#api-stability)\n- [License](#license)\n\n## Documentation\n\n- [latest (stable)](https://durandtibo.github.io/grizz/): documentation from the latest stable\n  release.\n- [main (unstable)](https://durandtibo.github.io/grizz/main/): documentation associated to the\n  main branch of the repo. This documentation may contain a lot of work-in-progress/outdated/missing\n  parts.\n\n## Installation\n\nWe highly recommend installing\na [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).\n`grizz` can be installed from pip using the following command:\n\n```shell\npip install grizz\n```\n\nTo make the package as slim as possible, only the minimal packages required to use `grizz` are\ninstalled.\nTo include all the dependencies, you can use the following command:\n\n```shell\npip install grizz[all]\n```\n\nPlease check the [get started page](https://durandtibo.github.io/grizz/get_started) to see how to\ninstall only some specific dependencies or other alternatives to install the library.\nThe following is the corresponding `grizz` versions and their dependencies.\n\n| `grizz` | `coola`        | `iden`         | `objectory`  | `polars`     | `python`      |\n|---------|----------------|----------------|--------------|--------------|---------------|\n| `main`  | `>=0.8.5,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |\n| `0.1.1` | `>=0.8.5,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |\n| `0.1.0` | `>=0.8.4,<1.0` | `>=0.1.0,<1.0` | `>=0.2,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.14` |\n| `0.0.5` | `>=0.7,<1.0`   | `>=0.0.4,<1.0` | `>=0.1,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.13` |\n| `0.0.4` | `>=0.7,<1.0`   | `>=0.0.4,<1.0` | `>=0.1,<1.0` | `>=1.0,<2.0` | `>=3.9,<3.13` |\n\nOptional dependencies\n\n| `grizz` | `clickhouse-connect`<sup>*</sup> | `pyarrow`<sup>*</sup> | `tqdm`<sup>*</sup> |\n|---------|----------------------------------|-----------------------|--------------------|\n| `main`  | `>=0.7,<1.0`                     | `>=10.0,<19.0`        | `>=4.65,<5.0`      |\n| `0.1.1` | `>=0.7,<1.0`                     | `>=10.0,<19.0`        | `>=4.65,<5.0`      |\n| `0.1.0` | `>=0.7,<1.0`                     | `>=10.0,<18.0`        | `>=4.65,<5.0`      |\n| `0.0.5` | `>=0.7,<1.0`                     | `>=10.0,<18.0`        | `>=4.65,<5.0`      |\n| `0.0.4` | `>=0.7,<1.0`                     | `>=10.0,<17.0`        | `>=4.65,<5.0`      |\n\n<sup>*</sup> indicates an optional dependency\n\n## Contributing\n\nPlease check the instructions in [CONTRIBUTING.md](.github/CONTRIBUTING.md).\n\n## Suggestions and Communication\n\nEveryone is welcome to contribute to the community.\nIf you have any questions or suggestions, you can\nsubmit [Github Issues](https://github.com/durandtibo/grizz/issues).\nWe will reply to you as soon as possible. Thank you very much.\n\n## API stability\n\n:warning: While `grizz` is in development stage, no API is guaranteed to be stable from one\nrelease to the next.\nIn fact, it is very likely that the API will change multiple times before a stable 1.0.0 release.\nIn practice, this means that upgrading `grizz` to a new version will possibly break any code that\nwas using the old version of `grizz`.\n\n## License\n\n`grizz` is licensed under BSD 3-Clause \"New\" or \"Revised\" license available in [LICENSE](LICENSE)\nfile.\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A light library to preprocess data with polars",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/durandtibo/grizz",
        "Repository": "https://github.com/durandtibo/grizz"
    },
    "split_keywords": [
        "polars",
        " dataframe",
        " ingestor",
        " transformer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7e39e46d9db946767f8db808a5f824fb62041787046521a9aa833c010a05ed3f",
                "md5": "a7ab30bfd0c664e86703da989823b54e",
                "sha256": "b99789f7f02dfc6bec4d1633c66cc2c77b164993538f04122825216a1712293d"
            },
            "downloads": -1,
            "filename": "grizz-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a7ab30bfd0c664e86703da989823b54e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 45475,
            "upload_time": "2024-11-08T20:52:40",
            "upload_time_iso_8601": "2024-11-08T20:52:40.929979Z",
            "url": "https://files.pythonhosted.org/packages/7e/39/e46d9db946767f8db808a5f824fb62041787046521a9aa833c010a05ed3f/grizz-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b2da3fc6f6a5765f8517c0f3745384710d2fec8eea511886eb2eed4c3c6b8c3",
                "md5": "8df28871bebc81af6ad227fb5a3bea62",
                "sha256": "63fc80e1b526d7830a9eb8e152bf8dad315ad10d799bca5443de063f9eea9e69"
            },
            "downloads": -1,
            "filename": "grizz-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8df28871bebc81af6ad227fb5a3bea62",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 28100,
            "upload_time": "2024-11-08T20:52:42",
            "upload_time_iso_8601": "2024-11-08T20:52:42.599460Z",
            "url": "https://files.pythonhosted.org/packages/0b/2d/a3fc6f6a5765f8517c0f3745384710d2fec8eea511886eb2eed4c3c6b8c3/grizz-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-08 20:52:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "durandtibo",
    "github_project": "grizz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "grizz"
}
        
Elapsed time: 0.48443s