PyScrub


NamePyScrub JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryPyScrub is a powerful Python library designed to streamline data preprocessing and pipeline automation. It provides efficient tools for data cleaning, transformation, feature engineering, and visualization, all integrated into a reproducible and scalable pipeline framework.
upload_time2024-08-21 11:17:01
maintainerNone
docs_urlNone
authorFasugba Ayomide
requires_python>=3.8
licenseNone
keywords python data cleaning data transformation data pipeline machine learning data preprocessing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyScrub
PyScrub is a powerful and flexible library designed to simplify data preprocessing, transformation, and visualization workflows. It allows you to seamlessly integrate data cleaning, feature engineering, and visualization into a single automated pipeline, saving time and ensuring consistent results. PyScrub is ideal for machine learning, data analysis, and research projects.

# Features:
- Automated data preprocessing pipeline for handling missing values, removing duplicates, correcting data types, and more.
- Data normalization, standardization, and feature engineering built into the pipeline.
- Powerful visualization tools for quickly understanding your data.
- Customizable and modular design, allowing you to extend the pipeline with your own functions.
- Focus on automation and reproducibility to streamline your data workflow.

# Installation

You can install the PyScrub package using pip:

```bash
pip install PyScrub
```

# Usage
### Pipeline Setup
Set up a data processing pipeline with PyScrub to clean, transform, and visualize your dataset.


```python
from PyScrub.pipeline_integration import DataPipeline, PipelineMonitor
import PyScrub.data_cleaning as dc
import PyScrub.data_transformation as dt
import PyScrub.feature_engineering as fe
import PyScrub.visualization as viz

# Create your pipeline and add steps
pipeline = DataPipeline()
pipeline.add_step(dc.handle_missing_values, method='ffill')
pipeline.add_step(dc.remove_duplicates)
pipeline.add_step(dc.correct_data_types)
pipeline.add_step(dc.strip_whitespace, columns=['Gender'])
pipeline.add_step(dt.normalize)
pipeline.add_step(fe.create_polynomial_features, degree=2)
pipeline.add_step(fe.apply_pca, n_components=2)

# Monitor and execute the pipeline
monitor = PipelineMonitor()
cleaned_data = monitor.monitor(pipeline, data)

# Visualize the results
viz.histogram(cleaned_data)
viz.boxplot(cleaned_data, num_features=['Age', 'MonthlyIncome'], target='Occupation')
```


### Data Cleaning
Use PyScrub's data cleaning functions to handle missing values, remove duplicates, and ensure your data types are correct.


```python
import PyScrub.data_cleaning as dc

# Handling missing values
cleaned_data = dc.handle_missing_values(data, method='mean')

# Removing duplicates
cleaned_data = dc.remove_duplicates(cleaned_data)

# Correcting data types
cleaned_data = dc.correct_data_types(cleaned_data)
```


### Feature Engineering
Enhance your dataset with polynomial features, interactions, and dimensionality reduction using PyScrub's feature engineering tools.


```python
import PyScrub.feature_engineering as fe

# Create polynomial features
poly_features = fe.create_polynomial_features(data, degree=3)

# Apply PCA for dimensionality reduction
pca_features = fe.apply_pca(data, n_components=3)
```


### Data Visualization
Generate visual insights into your dataset using PyScrub's visualization tools.


```python
import PyScrub.visualization as viz

# Plot missing data
viz.plot_missing(data)

# Create histograms and boxplots
viz.histogram(data)
viz.boxplot(data, num_features=['Age', 'MonthlyIncome'], target='Occupation')
```

### License
This project is licensed under the MIT License. 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "PyScrub",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "python, data cleaning, data transformation, data pipeline, machine learning, data preprocessing",
    "author": "Fasugba Ayomide",
    "author_email": "Ayomide Fasugba <fasugbapaul@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e9/9f/084ce2b68614922b60804bd14dbc85269cfb0e087e4744effd5178c18aed/PyScrub-0.0.1.tar.gz",
    "platform": null,
    "description": "# PyScrub\r\nPyScrub is a powerful and flexible library designed to simplify data preprocessing, transformation, and visualization workflows. It allows you to seamlessly integrate data cleaning, feature engineering, and visualization into a single automated pipeline, saving time and ensuring consistent results. PyScrub is ideal for machine learning, data analysis, and research projects.\r\n\r\n# Features:\r\n- Automated data preprocessing pipeline for handling missing values, removing duplicates, correcting data types, and more.\r\n- Data normalization, standardization, and feature engineering built into the pipeline.\r\n- Powerful visualization tools for quickly understanding your data.\r\n- Customizable and modular design, allowing you to extend the pipeline with your own functions.\r\n- Focus on automation and reproducibility to streamline your data workflow.\r\n\r\n# Installation\r\n\r\nYou can install the PyScrub package using pip:\r\n\r\n```bash\r\npip install PyScrub\r\n```\r\n\r\n# Usage\r\n### Pipeline Setup\r\nSet up a data processing pipeline with PyScrub to clean, transform, and visualize your dataset.\r\n\r\n\r\n```python\r\nfrom PyScrub.pipeline_integration import DataPipeline, PipelineMonitor\r\nimport PyScrub.data_cleaning as dc\r\nimport PyScrub.data_transformation as dt\r\nimport PyScrub.feature_engineering as fe\r\nimport PyScrub.visualization as viz\r\n\r\n# Create your pipeline and add steps\r\npipeline = DataPipeline()\r\npipeline.add_step(dc.handle_missing_values, method='ffill')\r\npipeline.add_step(dc.remove_duplicates)\r\npipeline.add_step(dc.correct_data_types)\r\npipeline.add_step(dc.strip_whitespace, columns=['Gender'])\r\npipeline.add_step(dt.normalize)\r\npipeline.add_step(fe.create_polynomial_features, degree=2)\r\npipeline.add_step(fe.apply_pca, n_components=2)\r\n\r\n# Monitor and execute the pipeline\r\nmonitor = PipelineMonitor()\r\ncleaned_data = monitor.monitor(pipeline, data)\r\n\r\n# Visualize the results\r\nviz.histogram(cleaned_data)\r\nviz.boxplot(cleaned_data, num_features=['Age', 'MonthlyIncome'], target='Occupation')\r\n```\r\n\r\n\r\n### Data Cleaning\r\nUse PyScrub's data cleaning functions to handle missing values, remove duplicates, and ensure your data types are correct.\r\n\r\n\r\n```python\r\nimport PyScrub.data_cleaning as dc\r\n\r\n# Handling missing values\r\ncleaned_data = dc.handle_missing_values(data, method='mean')\r\n\r\n# Removing duplicates\r\ncleaned_data = dc.remove_duplicates(cleaned_data)\r\n\r\n# Correcting data types\r\ncleaned_data = dc.correct_data_types(cleaned_data)\r\n```\r\n\r\n\r\n### Feature Engineering\r\nEnhance your dataset with polynomial features, interactions, and dimensionality reduction using PyScrub's feature engineering tools.\r\n\r\n\r\n```python\r\nimport PyScrub.feature_engineering as fe\r\n\r\n# Create polynomial features\r\npoly_features = fe.create_polynomial_features(data, degree=3)\r\n\r\n# Apply PCA for dimensionality reduction\r\npca_features = fe.apply_pca(data, n_components=3)\r\n```\r\n\r\n\r\n### Data Visualization\r\nGenerate visual insights into your dataset using PyScrub's visualization tools.\r\n\r\n\r\n```python\r\nimport PyScrub.visualization as viz\r\n\r\n# Plot missing data\r\nviz.plot_missing(data)\r\n\r\n# Create histograms and boxplots\r\nviz.histogram(data)\r\nviz.boxplot(data, num_features=['Age', 'MonthlyIncome'], target='Occupation')\r\n```\r\n\r\n### License\r\nThis project is licensed under the MIT License. \r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PyScrub is a powerful Python library designed to streamline data preprocessing and pipeline automation. It provides efficient tools for data cleaning, transformation, feature engineering, and visualization, all integrated into a reproducible and scalable pipeline framework.",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/fashjr/PyScrub/docs",
        "Homepage": "https://github.com/fashjr/PyScrub",
        "Issues": "https://github.com/fashjr/PyScrub/issues",
        "Source": "https://github.com/fashjr/PyScrub/source"
    },
    "split_keywords": [
        "python",
        " data cleaning",
        " data transformation",
        " data pipeline",
        " machine learning",
        " data preprocessing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "600b1b64efd824984a3777f9cc1ca1005bfbc330df30bb7d4214000208b2f8ad",
                "md5": "15275f0ca8bd1891464de2bb00acc8be",
                "sha256": "df391dbe736627382474e3a92e62d35519c8a107d74db16948bed037745b8d27"
            },
            "downloads": -1,
            "filename": "PyScrub-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "15275f0ca8bd1891464de2bb00acc8be",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12574,
            "upload_time": "2024-08-21T11:16:59",
            "upload_time_iso_8601": "2024-08-21T11:16:59.760107Z",
            "url": "https://files.pythonhosted.org/packages/60/0b/1b64efd824984a3777f9cc1ca1005bfbc330df30bb7d4214000208b2f8ad/PyScrub-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e99f084ce2b68614922b60804bd14dbc85269cfb0e087e4744effd5178c18aed",
                "md5": "d29f5583dabeb6f1dd6682010ad4e587",
                "sha256": "48adbfb99d5154a16b5f0c9b7866da42b7989622aae901f8ee0e749e21549512"
            },
            "downloads": -1,
            "filename": "PyScrub-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d29f5583dabeb6f1dd6682010ad4e587",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12235,
            "upload_time": "2024-08-21T11:17:01",
            "upload_time_iso_8601": "2024-08-21T11:17:01.709577Z",
            "url": "https://files.pythonhosted.org/packages/e9/9f/084ce2b68614922b60804bd14dbc85269cfb0e087e4744effd5178c18aed/PyScrub-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-21 11:17:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fashjr",
    "github_project": "PyScrub",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyscrub"
}
        
Elapsed time: 0.56479s