# RDATAPP: Recoded Data Preprocessing Library
[![PyPI version](https://badge.fury.io/py/rdatapp.svg)](https://badge.fury.io/py/rdatapp)
[![Python versions](https://img.shields.io/pypi/pyversions/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)
[![License](https://img.shields.io/pypi/l/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)
## Overview
RDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.
## Features
- **Text Cleaning**: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.
- **Missing Value Handling**: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.
- **Encoding**: One-hot encode and label encode categorical columns.
- **Outlier Detection**: Detect and remove outliers using the Interquartile Range (IQR) method.
- **Scaling**: Apply Min-Max scaling and standard scaling to numerical columns.
- **Feature Engineering**: Create new features by applying functions to existing columns.
- **Date-Time Handling**: Convert columns to datetime format and extract date parts like year, month, and day.
## Installation
You can install RDATAPP from PyPI using pip:
```sh
pip install rdatapp
```
## Usage
Below are examples of how to use the different classes and methods provided by RDATAPP.
### Text Cleaning
```python
from rdatapp.text_cleaning import TextCleaner
text_cleaner = TextCleaner()
cleaned_text = text_cleaner.clean_text("This is a Sample TEXT, with Punctuation!")
print(cleaned_text)
```
### Missing Value Handling
```python
import pandas as pd
from rdatapp.missing_value_handler import MissingValueHandler
df = pd.DataFrame({'A': [1, 2, None, 4]})
df = MissingValueHandler.impute_mean(df, 'A')
print(df)
```
### Encoding
```python
import pandas as pd
from rdatapp.categorical_encoder import CategoricalEncoder
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
df = CategoricalEncoder.one_hot_encode(df, 'Category')
print(df)
```
### Outlier Detection
```python
import pandas as pd
from rdatapp.outlier_handler import OutlierHandler
df = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})
df = OutlierHandler.iqr_outlier_detection(df, 'Values')
print(df)
```
### Scaling
```python
import pandas as pd
from rdatapp.scaler import Scaler
df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = Scaler.min_max_scale(df, 'Values')
print(df)
```
### Feature Engineering
```python
import pandas as pd
from rdatapp.feature_engineer import FeatureEngineer
df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)
print(df)
```
### Date-Time Handling
```python
import pandas as pd
from rdatapp.date_time_handler import DateTimeHandler
df = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})
df = DateTimeHandler.to_datetime(df, 'Date')
df = DateTimeHandler.extract_date_parts(df, 'Date')
print(df)
```
## Authors
- **Izzettin Furkan Özmen** - [izzettinfurkan.ozmen@stu.fsm.edu.tr](mailto:izzettinfurkan.ozmen@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/izzettinozmen/)
- **Ismail Cifci** - [ismail.cifci@stu.fsm.edu.tr](mailto:ismail.cifci@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/ismail-cifci/)
## License
This project is not licensed. Feel free to use.
## Contributing
We welcome contributions! Please contact us via E-mail addresses.
## Acknowledgments
Special thanks to the instructors who provided guidance and support throughout the development of this project.
## Project Links
- [Source](https://github.com/Beegash/Recoded-Data-Processing-Library)
---
For any issues, please contact the authors or open an issue on GitHub.
Raw data
{
"_id": null,
"home_page": "https://github.com/Beegash/Recoded-Data-Processing-Library",
"name": "rdatapp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "data preprocessing data-cleaning one-hot-encoding text-cleaning missing-value-imputation",
"author": "Izzettin Furkan \u00d6zmen, Ismail Cifci",
"author_email": "izzettinfurkan.ozmen@stu.fsm.edu.tr, ismail.cifci@stu.fsm.edu.tr",
"download_url": "https://files.pythonhosted.org/packages/31/c5/a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce/rdatapp-1.0.tar.gz",
"platform": null,
"description": "# RDATAPP: Recoded Data Preprocessing Library\n\n[![PyPI version](https://badge.fury.io/py/rdatapp.svg)](https://badge.fury.io/py/rdatapp)\n[![Python versions](https://img.shields.io/pypi/pyversions/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)\n[![License](https://img.shields.io/pypi/l/rdatapp.svg)](https://pypi.python.org/pypi/rdatapp)\n\n## Overview\n\nRDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.\n\n## Features\n\n- **Text Cleaning**: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.\n- **Missing Value Handling**: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.\n- **Encoding**: One-hot encode and label encode categorical columns.\n- **Outlier Detection**: Detect and remove outliers using the Interquartile Range (IQR) method.\n- **Scaling**: Apply Min-Max scaling and standard scaling to numerical columns.\n- **Feature Engineering**: Create new features by applying functions to existing columns.\n- **Date-Time Handling**: Convert columns to datetime format and extract date parts like year, month, and day.\n\n## Installation\n\nYou can install RDATAPP from PyPI using pip:\n\n```sh\n\npip install rdatapp\n\n```\n## Usage\n\nBelow are examples of how to use the different classes and methods provided by RDATAPP.\n\n### Text Cleaning\n\n```python\nfrom rdatapp.text_cleaning import TextCleaner\n\ntext_cleaner = TextCleaner()\ncleaned_text = text_cleaner.clean_text(\"This is a Sample TEXT, with Punctuation!\")\nprint(cleaned_text)\n```\n\n### Missing Value Handling\n\n```python\nimport pandas as pd\nfrom rdatapp.missing_value_handler import MissingValueHandler\n\ndf = pd.DataFrame({'A': [1, 2, None, 4]})\ndf = MissingValueHandler.impute_mean(df, 'A')\nprint(df)\n```\n\n### Encoding\n\n```python\nimport pandas as pd\nfrom rdatapp.categorical_encoder import CategoricalEncoder\n\ndf = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})\ndf = CategoricalEncoder.one_hot_encode(df, 'Category')\nprint(df)\n```\n\n### Outlier Detection\n\n```python\nimport pandas as pd\nfrom rdatapp.outlier_handler import OutlierHandler\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})\ndf = OutlierHandler.iqr_outlier_detection(df, 'Values')\nprint(df)\n```\n\n### Scaling\n\n```python\nimport pandas as pd\nfrom rdatapp.scaler import Scaler\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})\ndf = Scaler.min_max_scale(df, 'Values')\nprint(df)\n```\n\n### Feature Engineering\n\n```python\nimport pandas as pd\nfrom rdatapp.feature_engineer import FeatureEngineer\n\ndf = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})\ndf = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)\nprint(df)\n```\n\n### Date-Time Handling\n\n```python\nimport pandas as pd\nfrom rdatapp.date_time_handler import DateTimeHandler\n\ndf = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})\ndf = DateTimeHandler.to_datetime(df, 'Date')\ndf = DateTimeHandler.extract_date_parts(df, 'Date')\nprint(df)\n```\n\n## Authors\n\n- **Izzettin Furkan \u00d6zmen** - [izzettinfurkan.ozmen@stu.fsm.edu.tr](mailto:izzettinfurkan.ozmen@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/izzettinozmen/)\n- **Ismail Cifci** - [ismail.cifci@stu.fsm.edu.tr](mailto:ismail.cifci@stu.fsm.edu.tr) [linkedin](https://www.linkedin.com/in/ismail-cifci/)\n\n## License\n\nThis project is not licensed. Feel free to use.\n\n## Contributing\n\nWe welcome contributions! Please contact us via E-mail addresses.\n\n## Acknowledgments\n\nSpecial thanks to the instructors who provided guidance and support throughout the development of this project.\n\n## Project Links\n\n- [Source](https://github.com/Beegash/Recoded-Data-Processing-Library)\n\n---\n\nFor any issues, please contact the authors or open an issue on GitHub.\n",
"bugtrack_url": null,
"license": null,
"summary": "A recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.",
"version": "1.0",
"project_urls": {
"Homepage": "https://github.com/Beegash/Recoded-Data-Processing-Library",
"Source": "https://github.com/Beegash/Recoded-Data-Processing-Library"
},
"split_keywords": [
"data",
"preprocessing",
"data-cleaning",
"one-hot-encoding",
"text-cleaning",
"missing-value-imputation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6695f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da",
"md5": "d17c77f4b9633cdc45ed2df8572960b8",
"sha256": "0f183b8ac9d842e7342ccad9d71e65a48d17ded981400198a5c0c20cb92784bd"
},
"downloads": -1,
"filename": "rdatapp-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d17c77f4b9633cdc45ed2df8572960b8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 8555,
"upload_time": "2024-05-20T21:08:10",
"upload_time_iso_8601": "2024-05-20T21:08:10.306823Z",
"url": "https://files.pythonhosted.org/packages/66/95/f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da/rdatapp-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "31c5a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce",
"md5": "eb748214b88aba27b6c7a5b38ad0b74c",
"sha256": "34f86ea0ca0ce8330160afbf44890a6b6569cb0efe90b87f09f6449f3f12b850"
},
"downloads": -1,
"filename": "rdatapp-1.0.tar.gz",
"has_sig": false,
"md5_digest": "eb748214b88aba27b6c7a5b38ad0b74c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7781,
"upload_time": "2024-05-20T21:08:11",
"upload_time_iso_8601": "2024-05-20T21:08:11.469848Z",
"url": "https://files.pythonhosted.org/packages/31/c5/a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce/rdatapp-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-20 21:08:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Beegash",
"github_project": "Recoded-Data-Processing-Library",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rdatapp"
}