rdatapp


Namerdatapp JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/Beegash/Recoded-Data-Processing-Library
SummaryA recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.
upload_time2024-05-20 21:08:11
maintainerNone
docs_urlNone
authorIzzettin Furkan Özmen, Ismail Cifci
requires_python>=3.6
licenseNone
keywords data preprocessing data-cleaning one-hot-encoding text-cleaning missing-value-imputation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RDATAPP: Recoded Data Preprocessing Library

[![PyPI version](https://badge.fury.io/py/rdatapp.svg)](https://badge.fury.io/py/rdatapp)
[![Python versions](https://img.shields.io/pypi/pyversions/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)
[![License](https://img.shields.io/pypi/l/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)

## Overview

RDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.

## Features

- **Text Cleaning**: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.
- **Missing Value Handling**: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.
- **Encoding**: One-hot encode and label encode categorical columns.
- **Outlier Detection**: Detect and remove outliers using the Interquartile Range (IQR) method.
- **Scaling**: Apply Min-Max scaling and standard scaling to numerical columns.
- **Feature Engineering**: Create new features by applying functions to existing columns.
- **Date-Time Handling**: Convert columns to datetime format and extract date parts like year, month, and day.

## Installation

You can install RDATAPP from PyPI using pip:

```sh

pip install rdatapp

```
## Usage

Below are examples of how to use the different classes and methods provided by RDATAPP.

### Text Cleaning

```python
from rdatapp.text_cleaning import TextCleaner

text_cleaner = TextCleaner()
cleaned_text = text_cleaner.clean_text("This is a Sample TEXT, with Punctuation!")
print(cleaned_text)
```

### Missing Value Handling

```python
import pandas as pd
from rdatapp.missing_value_handler import MissingValueHandler

df = pd.DataFrame({'A': [1, 2, None, 4]})
df = MissingValueHandler.impute_mean(df, 'A')
print(df)
```

### Encoding

```python
import pandas as pd
from rdatapp.categorical_encoder import CategoricalEncoder

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
df = CategoricalEncoder.one_hot_encode(df, 'Category')
print(df)
```

### Outlier Detection

```python
import pandas as pd
from rdatapp.outlier_handler import OutlierHandler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})
df = OutlierHandler.iqr_outlier_detection(df, 'Values')
print(df)
```

### Scaling

```python
import pandas as pd
from rdatapp.scaler import Scaler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = Scaler.min_max_scale(df, 'Values')
print(df)
```

### Feature Engineering

```python
import pandas as pd
from rdatapp.feature_engineer import FeatureEngineer

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)
print(df)
```

### Date-Time Handling

```python
import pandas as pd
from rdatapp.date_time_handler import DateTimeHandler

df = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})
df = DateTimeHandler.to_datetime(df, 'Date')
df = DateTimeHandler.extract_date_parts(df, 'Date')
print(df)
```

## Authors

- **Izzettin Furkan Özmen** - [izzettinfurkan.ozmen@stu.fsm.edu.tr](mailto:izzettinfurkan.ozmen@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/izzettinozmen/)
- **Ismail Cifci** - [ismail.cifci@stu.fsm.edu.tr](mailto:ismail.cifci@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/ismail-cifci/)

## License

This project is not licensed. Feel free to use.

## Contributing

We welcome contributions! Please contact us via E-mail addresses.

## Acknowledgments

Special thanks to the instructors who provided guidance and support throughout the development of this project.

## Project Links

- [Source](https://github.com/Beegash/Recoded-Data-Processing-Library)

---

For any issues, please contact the authors or open an issue on GitHub.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Beegash/Recoded-Data-Processing-Library",
    "name": "rdatapp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "data preprocessing data-cleaning one-hot-encoding text-cleaning missing-value-imputation",
    "author": "Izzettin Furkan \u00d6zmen, Ismail Cifci",
    "author_email": "izzettinfurkan.ozmen@stu.fsm.edu.tr, ismail.cifci@stu.fsm.edu.tr",
    "download_url": "https://files.pythonhosted.org/packages/31/c5/a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce/rdatapp-1.0.tar.gz",
    "platform": null,
    "description": "# RDATAPP: Recoded Data Preprocessing Library\n\n[![PyPI version](https://badge.fury.io/py/rdatapp.svg)](https://badge.fury.io/py/rdatapp)\n[![Python versions](https://img.shields.io/pypi/pyversions/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)\n[![License](https://img.shields.io/pypi/l/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)\n\n## Overview\n\nRDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.\n\n## Features\n\n- **Text Cleaning**: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.\n- **Missing Value Handling**: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.\n- **Encoding**: One-hot encode and label encode categorical columns.\n- **Outlier Detection**: Detect and remove outliers using the Interquartile Range (IQR) method.\n- **Scaling**: Apply Min-Max scaling and standard scaling to numerical columns.\n- **Feature Engineering**: Create new features by applying functions to existing columns.\n- **Date-Time Handling**: Convert columns to datetime format and extract date parts like year, month, and day.\n\n## Installation\n\nYou can install RDATAPP from PyPI using pip:\n\n```sh\n\npip install rdatapp\n\n```\n## Usage\n\nBelow are examples of how to use the different classes and methods provided by RDATAPP.\n\n### Text Cleaning\n\n```python\nfrom rdatapp.text_cleaning import TextCleaner\n\ntext_cleaner = TextCleaner()\ncleaned_text = text_cleaner.clean_text(\"This is a Sample TEXT, with Punctuation!\")\nprint(cleaned_text)\n```\n\n### Missing Value Handling\n\n```python\nimport pandas as pd\nfrom rdatapp.missing_value_handler import MissingValueHandler\n\ndf = pd.DataFrame({'A': [1, 2, None, 4]})\ndf = MissingValueHandler.impute_mean(df, 'A')\nprint(df)\n```\n\n### Encoding\n\n```python\nimport pandas as pd\nfrom rdatapp.categorical_encoder import CategoricalEncoder\n\ndf = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})\ndf = CategoricalEncoder.one_hot_encode(df, 'Category')\nprint(df)\n```\n\n### Outlier Detection\n\n```python\nimport pandas as pd\nfrom rdatapp.outlier_handler import OutlierHandler\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})\ndf = OutlierHandler.iqr_outlier_detection(df, 'Values')\nprint(df)\n```\n\n### Scaling\n\n```python\nimport pandas as pd\nfrom rdatapp.scaler import Scaler\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})\ndf = Scaler.min_max_scale(df, 'Values')\nprint(df)\n```\n\n### Feature Engineering\n\n```python\nimport pandas as pd\nfrom rdatapp.feature_engineer import FeatureEngineer\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})\ndf = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)\nprint(df)\n```\n\n### Date-Time Handling\n\n```python\nimport pandas as pd\nfrom rdatapp.date_time_handler import DateTimeHandler\n\ndf = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})\ndf = DateTimeHandler.to_datetime(df, 'Date')\ndf = DateTimeHandler.extract_date_parts(df, 'Date')\nprint(df)\n```\n\n## Authors\n\n- **Izzettin Furkan \u00d6zmen** - [izzettinfurkan.ozmen@stu.fsm.edu.tr](mailto:izzettinfurkan.ozmen@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/izzettinozmen/)\n- **Ismail Cifci** - [ismail.cifci@stu.fsm.edu.tr](mailto:ismail.cifci@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/ismail-cifci/)\n\n## License\n\nThis project is not licensed. Feel free to use.\n\n## Contributing\n\nWe welcome contributions! Please contact us via E-mail addresses.\n\n## Acknowledgments\n\nSpecial thanks to the instructors who provided guidance and support throughout the development of this project.\n\n## Project Links\n\n- [Source](https://github.com/Beegash/Recoded-Data-Processing-Library)\n\n---\n\nFor any issues, please contact the authors or open an issue on GitHub.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.",
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/Beegash/Recoded-Data-Processing-Library",
        "Source": "https://github.com/Beegash/Recoded-Data-Processing-Library"
    },
    "split_keywords": [
        "data",
        "preprocessing",
        "data-cleaning",
        "one-hot-encoding",
        "text-cleaning",
        "missing-value-imputation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6695f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da",
                "md5": "d17c77f4b9633cdc45ed2df8572960b8",
                "sha256": "0f183b8ac9d842e7342ccad9d71e65a48d17ded981400198a5c0c20cb92784bd"
            },
            "downloads": -1,
            "filename": "rdatapp-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d17c77f4b9633cdc45ed2df8572960b8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 8555,
            "upload_time": "2024-05-20T21:08:10",
            "upload_time_iso_8601": "2024-05-20T21:08:10.306823Z",
            "url": "https://files.pythonhosted.org/packages/66/95/f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da/rdatapp-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "31c5a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce",
                "md5": "eb748214b88aba27b6c7a5b38ad0b74c",
                "sha256": "34f86ea0ca0ce8330160afbf44890a6b6569cb0efe90b87f09f6449f3f12b850"
            },
            "downloads": -1,
            "filename": "rdatapp-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "eb748214b88aba27b6c7a5b38ad0b74c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7781,
            "upload_time": "2024-05-20T21:08:11",
            "upload_time_iso_8601": "2024-05-20T21:08:11.469848Z",
            "url": "https://files.pythonhosted.org/packages/31/c5/a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce/rdatapp-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-20 21:08:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Beegash",
    "github_project": "Recoded-Data-Processing-Library",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rdatapp"
}
        
Elapsed time: 5.16288s