tubular


Nametubular JSON
Version 2.1.0 PyPI version JSON
download
home_pageNone
SummaryPackage to perform pre processing steps for machine learning models
upload_time2025-10-30 14:40:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseBSD 3-Clause License Copyright (c) 2021, Liverpool Victoria General Insurance Group. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords data science feature engineering data transforms pipeline sklearn machine learning ml ds
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://github.com/azukds/tubular/raw/main/logo.png">
</p>

Tubular pre-processing for machine learning!

----

![PyPI](https://img.shields.io/pypi/v/tubular?color=success&style=flat)
![Read the Docs](https://img.shields.io/readthedocs/tubular)
![GitHub](https://img.shields.io/github/license/azukds/tubular)
![GitHub last commit](https://img.shields.io/github/last-commit/azukds/tubular)
![GitHub issues](https://img.shields.io/github/issues/azukds/tubular)
![Build](https://github.com/azukds/tubular/actions/workflows/python-package.yml/badge.svg?branch=main)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/azukds/tubular/HEAD?labpath=examples)

`tubular` implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with [scikit-learn](https://scikit-learn.org/) [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). Each has a `transform` method to apply the pre-processing step to data and a `fit` method to learn the relevant information from the data, if applicable.

The transformers in `tubular` are written in narwhals [narwhals](https://narwhals-dev.github.io/narwhals/), so are agnostic between [pandas](https://pandas.pydata.org/) and [polars](https://pola.rs/) dataframes, and will utilise the chosen (pandas/polars) API under the hood.

There are a variety of transformers to assist with;

- capping
- dates
- imputation
- mapping
- categorical encoding
- numeric operations

Here is a simple example of applying capping to two columns;

```python
import polars as pl

transformer=CappingTransformer(
capping_values={'a': [10, 20], 'b': [1,3]},
  )

test_df=pl.DataFrame({'a': [1,15,18,25], 'b': [6,2,7,1], 'c':[1,2,3,4]})

transformer.transform(test_df)
# ->
# shape: (4, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 10  ┆ 3   ┆ 1   │
# │ 15  ┆ 2   ┆ 2   │
# │ 18  ┆ 3   ┆ 3   │
# │ 20  ┆ 1   ┆ 4   │
# └─────┴─────┴─────┘
```

## Installation

The easiest way to get `tubular` is directly from [pypi](https://pypi.org/project/tubular/) with;

 `pip install tubular`

## Documentation

The documentation for `tubular` can be found on [readthedocs](https://tubular.readthedocs.io/en/latest/).

Instructions for building the docs locally can be found in [docs/README](https://github.com/azukds/tubular/blob/main/docs/README.md).

## Examples

We utilise [doctest](https://docs.python.org/3/library/doctest.html) to keep valid usage examples in the docstrings of transformers in the package, so please see these for getting started!

## Issues

For bugs and feature requests please open an [issue](https://github.com/azukds/tubular/issues).

## Build and test

The test framework we are using for this project is [pytest](https://docs.pytest.org/en/stable/). To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

```shell
git clone https://github.com/azukds/tubular.git
cd tubular
```

Next install `tubular` and development dependencies;

```shell
pip install . -r requirements-dev.txt
```

Finally run the test suite with `pytest`;

```shell
pytest
```

## Contribute

`tubular` is under active development, we're super excited if you're interested in contributing! 

See the [CONTRIBUTING](https://github.com/azukds/tubular/blob/main/CONTRIBUTING.rst) file for the full details of our working practices.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tubular",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "data science, feature engineering, data transforms, pipeline, sklearn, machine learning, ML, DS",
    "author": null,
    "author_email": "Allianz UK Data Science Team <datasciencepackages@allianz.co.uk>",
    "download_url": "https://files.pythonhosted.org/packages/d8/ed/16f81c3cc226670cd00f7e852f413fd32153541d1c5c2f6b0ebe1c547a41/tubular-2.1.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://github.com/azukds/tubular/raw/main/logo.png\">\n</p>\n\nTubular pre-processing for machine learning!\n\n----\n\n![PyPI](https://img.shields.io/pypi/v/tubular?color=success&style=flat)\n![Read the Docs](https://img.shields.io/readthedocs/tubular)\n![GitHub](https://img.shields.io/github/license/azukds/tubular)\n![GitHub last commit](https://img.shields.io/github/last-commit/azukds/tubular)\n![GitHub issues](https://img.shields.io/github/issues/azukds/tubular)\n![Build](https://github.com/azukds/tubular/actions/workflows/python-package.yml/badge.svg?branch=main)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/azukds/tubular/HEAD?labpath=examples)\n\n`tubular` implements pre-processing steps for tabular data commonly used in machine learning pipelines.\n\nThe transformers are compatible with [scikit-learn](https://scikit-learn.org/) [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). Each has a `transform` method to apply the pre-processing step to data and a `fit` method to learn the relevant information from the data, if applicable.\n\nThe transformers in `tubular` are written in narwhals [narwhals](https://narwhals-dev.github.io/narwhals/), so are agnostic between [pandas](https://pandas.pydata.org/) and [polars](https://pola.rs/) dataframes, and will utilise the chosen (pandas/polars) API under the hood.\n\nThere are a variety of transformers to assist with;\n\n- capping\n- dates\n- imputation\n- mapping\n- categorical encoding\n- numeric operations\n\nHere is a simple example of applying capping to two columns;\n\n```python\nimport polars as pl\n\ntransformer=CappingTransformer(\ncapping_values={'a': [10, 20], 'b': [1,3]},\n  )\n\ntest_df=pl.DataFrame({'a': [1,15,18,25], 'b': [6,2,7,1], 'c':[1,2,3,4]})\n\ntransformer.transform(test_df)\n# ->\n# shape: (4, 3)\n# \u250c\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2510\n# \u2502 a   \u2506 b   \u2506 c   \u2502\n# \u2502 --- \u2506 --- \u2506 --- \u2502\n# \u2502 i64 \u2506 i64 \u2506 i64 \u2502\n# \u255e\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2561\n# \u2502 10  \u2506 3   \u2506 1   \u2502\n# \u2502 15  \u2506 2   \u2506 2   \u2502\n# \u2502 18  \u2506 3   \u2506 3   \u2502\n# \u2502 20  \u2506 1   \u2506 4   \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n## Installation\n\nThe easiest way to get `tubular` is directly from [pypi](https://pypi.org/project/tubular/) with;\n\n `pip install tubular`\n\n## Documentation\n\nThe documentation for `tubular` can be found on [readthedocs](https://tubular.readthedocs.io/en/latest/).\n\nInstructions for building the docs locally can be found in [docs/README](https://github.com/azukds/tubular/blob/main/docs/README.md).\n\n## Examples\n\nWe utilise [doctest](https://docs.python.org/3/library/doctest.html) to keep valid usage examples in the docstrings of transformers in the package, so please see these for getting started!\n\n## Issues\n\nFor bugs and feature requests please open an [issue](https://github.com/azukds/tubular/issues).\n\n## Build and test\n\nThe test framework we are using for this project is [pytest](https://docs.pytest.org/en/stable/). To build the package locally and run the tests follow the steps below.\n\nFirst clone the repo and move to the root directory;\n\n```shell\ngit clone https://github.com/azukds/tubular.git\ncd tubular\n```\n\nNext install `tubular` and development dependencies;\n\n```shell\npip install . -r requirements-dev.txt\n```\n\nFinally run the test suite with `pytest`;\n\n```shell\npytest\n```\n\n## Contribute\n\n`tubular` is under active development, we're super excited if you're interested in contributing! \n\nSee the [CONTRIBUTING](https://github.com/azukds/tubular/blob/main/CONTRIBUTING.rst) file for the full details of our working practices.\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License\n        \n        Copyright (c) 2021, Liverpool Victoria General Insurance Group.\n        All rights reserved.\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions are met:\n        \n        1. Redistributions of source code must retain the above copyright notice, this\n           list of conditions and the following disclaimer.\n        \n        2. Redistributions in binary form must reproduce the above copyright notice,\n           this list of conditions and the following disclaimer in the documentation\n           and/or other materials provided with the distribution.\n        \n        3. Neither the name of the copyright holder nor the names of its\n           contributors may be used to endorse or promote products derived from\n           this software without specific prior written permission.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
    "summary": "Package to perform pre processing steps for machine learning models",
    "version": "2.1.0",
    "project_urls": {
        "Changelog": "https://github.com/azukds/tubular/CHANGELOG.md",
        "Documentation": "https://tubular.readthedocs.io/en/latest/index.html",
        "Issues": "https://github.com/azukds/tubular/issues",
        "Repository": "https://github.com/azukds/tubular"
    },
    "split_keywords": [
        "data science",
        " feature engineering",
        " data transforms",
        " pipeline",
        " sklearn",
        " machine learning",
        " ml",
        " ds"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "44a23e96598c62c3651a1e97e19f30c7487a77016f41a0b64d8acb0048473cf1",
                "md5": "e0ba34a3bb01a184f446f0af5a128187",
                "sha256": "8ba95ca1b5cb8d98930fa0cbd5aeefa7007092b49e7f6f23818d70b8812ab11d"
            },
            "downloads": -1,
            "filename": "tubular-2.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e0ba34a3bb01a184f446f0af5a128187",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 82966,
            "upload_time": "2025-10-30T14:40:21",
            "upload_time_iso_8601": "2025-10-30T14:40:21.194376Z",
            "url": "https://files.pythonhosted.org/packages/44/a2/3e96598c62c3651a1e97e19f30c7487a77016f41a0b64d8acb0048473cf1/tubular-2.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d8ed16f81c3cc226670cd00f7e852f413fd32153541d1c5c2f6b0ebe1c547a41",
                "md5": "abf1f4347bc37499cce51231448cc64f",
                "sha256": "7b0ec1632280647d33be5f41ed17ddc71c07d83e148b314d69c4428648d482e2"
            },
            "downloads": -1,
            "filename": "tubular-2.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "abf1f4347bc37499cce51231448cc64f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 241429,
            "upload_time": "2025-10-30T14:40:23",
            "upload_time_iso_8601": "2025-10-30T14:40:23.745180Z",
            "url": "https://files.pythonhosted.org/packages/d8/ed/16f81c3cc226670cd00f7e852f413fd32153541d1c5c2f6b0ebe1c547a41/tubular-2.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 14:40:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "azukds",
    "github_project": "tubular",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tubular"
}
        
Elapsed time: 4.06150s