sanex


Namesanex JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA data cleaning library for Pandas and Polars DataFrames with a simple, chainable API.
upload_time2025-09-08 02:37:51
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2025 JohnTocci Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords data cleaning pandas polars data science etl data processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">๐Ÿงน Sanex</h1>

<div align="center">

[![PyPI version](https://img.shields.io/pypi/v/sanex.svg)](https://pypi.org/project/sanex/)
[![Build Status](https://img.shields.io/travis/com/your-username/sanex.svg)](https://travis-ci.com/your-username/sanex)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/sanex.svg)](https://pypi.org/project/sanex/)

</div>

**Sanex** is a powerful and intuitive data cleaning library for Python, designed to work seamlessly with both **pandas** and **polars** DataFrames. With a fluent, chainable API, Sanex makes the process of cleaning and preparing your data not just easy, but enjoyable.

---

## ๐Ÿš€ Key Features

- **Fluent, Chainable API**: Clean your data in a single, readable chain of commands.
- **Dual Backend Support**: Works effortlessly with both pandas and polars DataFrames.
- **Comprehensive Cleaning Functions**: From column name standardization to outlier handling, Sanex has you covered.
- **Extensible**: Easily add your own cleaning functions to the pipeline.
- **Lightweight and Performant**: Designed to be fast and efficient.

---

## ๐Ÿ“ฆ Installation

Install Sanex easily with pip:

```bash
pip install sanex
```

---

## โšก Quick Start

Here's a quick example of how to use Sanex to clean a DataFrame:

```python
import pandas as pd
import sanex as sx

# Create a sample DataFrame
data = {
    'First Name': [' John ', 'Jane', '  Peter', 'JOHN'],
    'Last Name': ['Smith', 'Doe', 'Jones', 'Smith'],
    'Age': [28, 34, 22, 28],
    'Salary': [70000, 80000, 65000, 70000],
    'is_active': ['True', 'False', 'true', 'TRUE']
}
df = pd.DataFrame(data)

# Clean the data with Sanex
clean_df = (
    sx(df)
    .clean_column_names()
    .remove_whitespace()
    .remove_duplicates()
    .standardize_booleans()
    .to_df()
)

print(clean_df)
```

---

## ๐Ÿ“– API Reference

The `sanex` library provides a fluent, chainable API for cleaning DataFrames.

### Initialization

- `sx(df)`: Initializes the cleaner with a pandas or polars DataFrame, allowing you to chain cleaning methods.

### Column Name Cleaning

- `.clean_column_names(case='snake')`: Cleans and standardizes all column names to a specified case.
  - `case` (str): The target case. Options: `'snake'`, `'camel'`, `'pascal'`, `'kebab'`, `'title'`, `'lower'`, `'screaming_snake'`.

- `.snakecase()`: Converts column names to `snake_case`.
- `.camelcase()`: Converts column names to `camelCase`.
- `.pascalcase()`: Converts column names to `PascalCase`.
- `.kebabcase()`: Converts column names to `kebab-case`.
- `.titlecase()`: Converts column names to `Title Case`.
- `.lowercase()`: Converts column names to `lowercase`.
- `.screaming_snakecase()`: Converts column names to `SCREAMING_SNAKE_CASE`.

### Data Deduplication

- `.remove_duplicates()`: Removes duplicate rows from the DataFrame.

### Missing Data Handling

- `.fill_missing(value=0, subset=None)`: Fills missing values.
  - `value`: The value to fill missing entries with.
  - `subset` (list): A list of columns to fill. Defaults to all columns.

- `.drop_missing(how='any', thresh=None, subset=None, axis='rows')`: Drops rows or columns with missing values.
  - `how` (str): `'any'` or `'all'`.
  - `thresh` (int): The number of non-NA values required to keep a row/column.
  - `subset` (list): Columns to consider.
  - `axis` (str): `'rows'` or `'columns'`.

### Whitespace and Text Manipulation

- `.remove_whitespace()`: Removes leading and trailing whitespace from all string columns.
- `.replace_text(to_replace, value, subset=None)`: Replaces text in string columns.
  - `to_replace` (str): The text to find.
  - `value` (str): The text to replace with.
  - `subset` (list): Columns to apply the replacement to.

### Column Management

- `.drop_single_value_columns()`: Drops columns that have only one unique value.

### Outlier Handling

- `.handle_outliers(method='iqr', factor=1.5, subset=None)`: A general method that can be configured to cap or remove outliers.
- `.cap_outliers(method='iqr', factor=1.5, subset=None)`: Caps outliers at a specified threshold.
- `.remove_outliers(method='iqr', factor=1.5, subset=None)`: Removes rows containing outliers.
  - `method` (str): `'iqr'` (Interquartile Range) or `'zscore'`.
  - `factor` (float): The multiplier for the chosen method to determine the outlier threshold.
  - `subset` (list): Columns to process. Defaults to all numeric columns.

### Data Standardization

- `.standardize_booleans(true_values=None, false_values=None, subset=None)`: Converts boolean-like values into actual booleans.
  - `true_values` (list): A list of strings to be considered `True`.
  - `false_values` (list): A list of strings to be considered `False`.
  - `subset` (list): Columns to standardize.

### Final Output

- `.to_df()`: Returns the cleaned pandas or polars DataFrame.

---

## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or suggestions.

1.  Fork the repository.
2.  Create a new branch (`git checkout -b feature/YourFeature`).
3.  Commit your changes (`git commit -m 'Add some feature'`).
4.  Push to the branch (`git push origin feature/YourFeature`).
5.  Open a pull request.

---

## ๐Ÿ“œ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sanex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "data cleaning, pandas, polars, data science, etl, data processing",
    "author": null,
    "author_email": "John Tocci <john@johntocci.com>",
    "download_url": "https://files.pythonhosted.org/packages/c8/53/17c957ed1667eb235ebf2b0e0cf51f55ca7ca190da8d684151901e6b382f/sanex-0.1.2.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\ud83e\uddf9 Sanex</h1>\r\n\r\n<div align=\"center\">\r\n\r\n[![PyPI version](https://img.shields.io/pypi/v/sanex.svg)](https://pypi.org/project/sanex/)\r\n[![Build Status](https://img.shields.io/travis/com/your-username/sanex.svg)](https://travis-ci.com/your-username/sanex)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\r\n[![Supported Python versions](https://img.shields.io/pypi/pyversions/sanex.svg)](https://pypi.org/project/sanex/)\r\n\r\n</div>\r\n\r\n**Sanex** is a powerful and intuitive data cleaning library for Python, designed to work seamlessly with both **pandas** and **polars** DataFrames. With a fluent, chainable API, Sanex makes the process of cleaning and preparing your data not just easy, but enjoyable.\r\n\r\n---\r\n\r\n## \ud83d\ude80 Key Features\r\n\r\n- **Fluent, Chainable API**: Clean your data in a single, readable chain of commands.\r\n- **Dual Backend Support**: Works effortlessly with both pandas and polars DataFrames.\r\n- **Comprehensive Cleaning Functions**: From column name standardization to outlier handling, Sanex has you covered.\r\n- **Extensible**: Easily add your own cleaning functions to the pipeline.\r\n- **Lightweight and Performant**: Designed to be fast and efficient.\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\nInstall Sanex easily with pip:\r\n\r\n```bash\r\npip install sanex\r\n```\r\n\r\n---\r\n\r\n## \u26a1 Quick Start\r\n\r\nHere's a quick example of how to use Sanex to clean a DataFrame:\r\n\r\n```python\r\nimport pandas as pd\r\nimport sanex as sx\r\n\r\n# Create a sample DataFrame\r\ndata = {\r\n    'First Name': [' John ', 'Jane', '  Peter', 'JOHN'],\r\n    'Last Name': ['Smith', 'Doe', 'Jones', 'Smith'],\r\n    'Age': [28, 34, 22, 28],\r\n    'Salary': [70000, 80000, 65000, 70000],\r\n    'is_active': ['True', 'False', 'true', 'TRUE']\r\n}\r\ndf = pd.DataFrame(data)\r\n\r\n# Clean the data with Sanex\r\nclean_df = (\r\n    sx(df)\r\n    .clean_column_names()\r\n    .remove_whitespace()\r\n    .remove_duplicates()\r\n    .standardize_booleans()\r\n    .to_df()\r\n)\r\n\r\nprint(clean_df)\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udcd6 API Reference\r\n\r\nThe `sanex` library provides a fluent, chainable API for cleaning DataFrames.\r\n\r\n### Initialization\r\n\r\n- `sx(df)`: Initializes the cleaner with a pandas or polars DataFrame, allowing you to chain cleaning methods.\r\n\r\n### Column Name Cleaning\r\n\r\n- `.clean_column_names(case='snake')`: Cleans and standardizes all column names to a specified case.\r\n  - `case` (str): The target case. Options: `'snake'`, `'camel'`, `'pascal'`, `'kebab'`, `'title'`, `'lower'`, `'screaming_snake'`.\r\n\r\n- `.snakecase()`: Converts column names to `snake_case`.\r\n- `.camelcase()`: Converts column names to `camelCase`.\r\n- `.pascalcase()`: Converts column names to `PascalCase`.\r\n- `.kebabcase()`: Converts column names to `kebab-case`.\r\n- `.titlecase()`: Converts column names to `Title Case`.\r\n- `.lowercase()`: Converts column names to `lowercase`.\r\n- `.screaming_snakecase()`: Converts column names to `SCREAMING_SNAKE_CASE`.\r\n\r\n### Data Deduplication\r\n\r\n- `.remove_duplicates()`: Removes duplicate rows from the DataFrame.\r\n\r\n### Missing Data Handling\r\n\r\n- `.fill_missing(value=0, subset=None)`: Fills missing values.\r\n  - `value`: The value to fill missing entries with.\r\n  - `subset` (list): A list of columns to fill. Defaults to all columns.\r\n\r\n- `.drop_missing(how='any', thresh=None, subset=None, axis='rows')`: Drops rows or columns with missing values.\r\n  - `how` (str): `'any'` or `'all'`.\r\n  - `thresh` (int): The number of non-NA values required to keep a row/column.\r\n  - `subset` (list): Columns to consider.\r\n  - `axis` (str): `'rows'` or `'columns'`.\r\n\r\n### Whitespace and Text Manipulation\r\n\r\n- `.remove_whitespace()`: Removes leading and trailing whitespace from all string columns.\r\n- `.replace_text(to_replace, value, subset=None)`: Replaces text in string columns.\r\n  - `to_replace` (str): The text to find.\r\n  - `value` (str): The text to replace with.\r\n  - `subset` (list): Columns to apply the replacement to.\r\n\r\n### Column Management\r\n\r\n- `.drop_single_value_columns()`: Drops columns that have only one unique value.\r\n\r\n### Outlier Handling\r\n\r\n- `.handle_outliers(method='iqr', factor=1.5, subset=None)`: A general method that can be configured to cap or remove outliers.\r\n- `.cap_outliers(method='iqr', factor=1.5, subset=None)`: Caps outliers at a specified threshold.\r\n- `.remove_outliers(method='iqr', factor=1.5, subset=None)`: Removes rows containing outliers.\r\n  - `method` (str): `'iqr'` (Interquartile Range) or `'zscore'`.\r\n  - `factor` (float): The multiplier for the chosen method to determine the outlier threshold.\r\n  - `subset` (list): Columns to process. Defaults to all numeric columns.\r\n\r\n### Data Standardization\r\n\r\n- `.standardize_booleans(true_values=None, false_values=None, subset=None)`: Converts boolean-like values into actual booleans.\r\n  - `true_values` (list): A list of strings to be considered `True`.\r\n  - `false_values` (list): A list of strings to be considered `False`.\r\n  - `subset` (list): Columns to standardize.\r\n\r\n### Final Output\r\n\r\n- `.to_df()`: Returns the cleaned pandas or polars DataFrame.\r\n\r\n---\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or suggestions.\r\n\r\n1.  Fork the repository.\r\n2.  Create a new branch (`git checkout -b feature/YourFeature`).\r\n3.  Commit your changes (`git commit -m 'Add some feature'`).\r\n4.  Push to the branch (`git push origin feature/YourFeature`).\r\n5.  Open a pull request.\r\n\r\n---\r\n\r\n## \ud83d\udcdc License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) 2025 JohnTocci\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.\r\n        ",
    "summary": "A data cleaning library for Pandas and Polars DataFrames with a simple, chainable API.",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/johntocci/sanex/issues",
        "Homepage": "https://github.com/johntocci/sanex",
        "Repository": "https://github.com/johntocci/sanex"
    },
    "split_keywords": [
        "data cleaning",
        " pandas",
        " polars",
        " data science",
        " etl",
        " data processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a9dfc0be5e2fb4a44d4ba961fdf90b5be7853df753d5ab356a76661bf706b122",
                "md5": "8de339ae6cdc5a45d26d6da674449744",
                "sha256": "e777fecee3b7ec289cc123e28cfa6d5567cf106978257156792029588f7dd73a"
            },
            "downloads": -1,
            "filename": "sanex-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8de339ae6cdc5a45d26d6da674449744",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20319,
            "upload_time": "2025-09-08T02:37:50",
            "upload_time_iso_8601": "2025-09-08T02:37:50.585907Z",
            "url": "https://files.pythonhosted.org/packages/a9/df/c0be5e2fb4a44d4ba961fdf90b5be7853df753d5ab356a76661bf706b122/sanex-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c85317c957ed1667eb235ebf2b0e0cf51f55ca7ca190da8d684151901e6b382f",
                "md5": "a410793f31c1c676f6fa5d8a515ab13e",
                "sha256": "de3f9686dfe7586c89dda65d1b652d1378a8b5d0758515b1a91dadc95c4b0c64"
            },
            "downloads": -1,
            "filename": "sanex-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a410793f31c1c676f6fa5d8a515ab13e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17290,
            "upload_time": "2025-09-08T02:37:51",
            "upload_time_iso_8601": "2025-09-08T02:37:51.637470Z",
            "url": "https://files.pythonhosted.org/packages/c8/53/17c957ed1667eb235ebf2b0e0cf51f55ca7ca190da8d684151901e6b382f/sanex-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 02:37:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "johntocci",
    "github_project": "sanex",
    "github_not_found": true,
    "lcname": "sanex"
}
        
Elapsed time: 1.82992s