DataRefine


NameDataRefine JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/Shahanafarvin/DataRefine
SummaryA no-code solution for performing data cleaning like misssing value imputation,outlier handling,normalisation,transformation and quality check with an intuitive interface for interactive DataFrame manipulation and easy CSV export.
upload_time2024-11-02 16:34:00
maintainerNone
docs_urlNone
authorShahana Farvin
requires_python>=3.8
licenseNone
keywords data transformation missing value imputation outlier handling normalisation transformation machine learning data preprocessing pandas scikit-learn feature engineering data science python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DataRefine
# <img src="DataRefine/scripts/drlogo.jpeg" alt="DataRefine logo" width="200"/>

![PyPI](https://img.shields.io/pypi/v/DataRefine?color=#2e86c1&label=pypi&logo=pypi)
![License](https://img.shields.io/github/license/Shahanafarvin/DataRefine)
![Python Versions](https://img.shields.io/pypi/pyversions/DataRefine)

**DataRefine** is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.

## Features

- **Interactive Data Upload**: Easy CSV file upload functionality
- **Missing Data Handling**:
  - Multiple imputation strategies (mean, median, mode, predictive)
  - Visual representation of missing value patterns
  - Column-specific imputation options
  
- **Outlier Detection & Treatment**:
  - Multiple detection methods (IQR, Z-score)
  - Configurable thresholds
  - Visual outlier analysis using box plots
  - Multiple handling strategies (capping, removal, imputation)

- **Data Normalization**:
  - Multiple normalization methods (Min-Max, Z-score, Robust scaling)
  - Interactive distribution visualization
  - Column-specific normalization

- **Data Transformation**:
  - Log transformation
  - Square root transformation
  - Box-Cox transformation
  - Before/after distribution comparison

- **Data Quality Assessment**:
  - Summary statistics
  - Visual quality reports

## Installation

It's recommended to install `DataRefine` in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.

### 1. Set Up a Virtual Environment

**For Python 3.3 and above:**

1. **Create a Virtual Environment:**

    ```bash
    python -m venv env
    ```

    Replace `env` with your preferred name for the virtual environment.

2. **Activate the Virtual Environment:**

    - **On Windows:**
      ```bash
      env\Scripts\activate
      ```

    - **On macOS/Linux:**
      ```bash
      source env/bin/activate
      ```

### 2. Install DataRefine

Once the virtual environment is activated, you can install `DataRefine` using `pip`:

```bash
pip install datarefine==1.0
```
## Quick Start

After installation, you can start DataRefine directly by running:

```bash
DataRefine
```
Open your web browser and navigate to the provided local URL.

Upload your CSV file.

Start cleaning your data!

## How to use?

- **Data Upload:**
    - Click the "Upload CSV" button.
    - Select your CSV file from your local system.

- **Data Cleaning:**
    - Use the sidebar to navigate between different cleaning operations.
    - Configure parameters using the interactive controls.
    - View real-time visualizations of the changes.
    - Download the cleaned dataset when finished.
    - For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.

## Requirements

- Python >= 3.7
- Streamlit
- Pandas
- NumPy
- plotly
- scikit-learn

For more detailed information, see the `requirements.txt` file.

## Contributing

We welcome contributions! Please follow these steps:

- Fork the repository
- Create a new branch (git checkout -b feature/improvement)
- Make your changes
- Commit your changes (git commit -am 'Add new feature')
- Push to the branch (git push origin feature/improvement)
- Create a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md)file for details.

## Acknowledgments

Special thanks to all the libraries and frameworks that have helped in developing this package.

## Version History

- 1.0.0: Initial release
- Basic data cleaning functionality
- Interactive web interface
- Visualization capabilities








            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Shahanafarvin/DataRefine",
    "name": "DataRefine",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "data transformation, missing value imputation, outlier handling, normalisation, transformation, machine learning, data preprocessing, pandas, scikit-learn, feature engineering, data science, Python",
    "author": "Shahana Farvin",
    "author_email": "shahana50997@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cb/ef/4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633/DataRefine-1.0.tar.gz",
    "platform": null,
    "description": "# DataRefine\n# <img src=\"DataRefine/scripts/drlogo.jpeg\" alt=\"DataRefine logo\" width=\"200\"/>\n\n![PyPI](https://img.shields.io/pypi/v/DataRefine?color=#2e86c1&label=pypi&logo=pypi)\n![License](https://img.shields.io/github/license/Shahanafarvin/DataRefine)\n![Python Versions](https://img.shields.io/pypi/pyversions/DataRefine)\n\n**DataRefine** is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.\n\n## Features\n\n- **Interactive Data Upload**: Easy CSV file upload functionality\n- **Missing Data Handling**:\n  - Multiple imputation strategies (mean, median, mode, predictive)\n  - Visual representation of missing value patterns\n  - Column-specific imputation options\n  \n- **Outlier Detection & Treatment**:\n  - Multiple detection methods (IQR, Z-score)\n  - Configurable thresholds\n  - Visual outlier analysis using box plots\n  - Multiple handling strategies (capping, removal, imputation)\n\n- **Data Normalization**:\n  - Multiple normalization methods (Min-Max, Z-score, Robust scaling)\n  - Interactive distribution visualization\n  - Column-specific normalization\n\n- **Data Transformation**:\n  - Log transformation\n  - Square root transformation\n  - Box-Cox transformation\n  - Before/after distribution comparison\n\n- **Data Quality Assessment**:\n  - Summary statistics\n  - Visual quality reports\n\n## Installation\n\nIt's recommended to install `DataRefine` in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.\n\n### 1. Set Up a Virtual Environment\n\n**For Python 3.3 and above:**\n\n1. **Create a Virtual Environment:**\n\n    ```bash\n    python -m venv env\n    ```\n\n    Replace `env` with your preferred name for the virtual environment.\n\n2. **Activate the Virtual Environment:**\n\n    - **On Windows:**\n      ```bash\n      env\\Scripts\\activate\n      ```\n\n    - **On macOS/Linux:**\n      ```bash\n      source env/bin/activate\n      ```\n\n### 2. Install DataRefine\n\nOnce the virtual environment is activated, you can install `DataRefine` using `pip`:\n\n```bash\npip install datarefine==1.0\n```\n## Quick Start\n\nAfter installation, you can start DataRefine directly by running:\n\n```bash\nDataRefine\n```\nOpen your web browser and navigate to the provided local URL.\n\nUpload your CSV file.\n\nStart cleaning your data!\n\n## How to use?\n\n- **Data Upload:**\n    - Click the \"Upload CSV\" button.\n    - Select your CSV file from your local system.\n\n- **Data Cleaning:**\n    - Use the sidebar to navigate between different cleaning operations.\n    - Configure parameters using the interactive controls.\n    - View real-time visualizations of the changes.\n    - Download the cleaned dataset when finished.\n    - For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.\n\n## Requirements\n\n- Python >= 3.7\n- Streamlit\n- Pandas\n- NumPy\n- plotly\n- scikit-learn\n\nFor more detailed information, see the `requirements.txt` file.\n\n## Contributing\n\nWe welcome contributions! Please follow these steps:\n\n- Fork the repository\n- Create a new branch (git checkout -b feature/improvement)\n- Make your changes\n- Commit your changes (git commit -am 'Add new feature')\n- Push to the branch (git push origin feature/improvement)\n- Create a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md)file for details.\n\n## Acknowledgments\n\nSpecial thanks to all the libraries and frameworks that have helped in developing this package.\n\n## Version History\n\n- 1.0.0: Initial release\n- Basic data cleaning functionality\n- Interactive web interface\n- Visualization capabilities\n\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A no-code solution for performing data cleaning like misssing value imputation,outlier handling,normalisation,transformation and quality check with an intuitive interface for interactive DataFrame manipulation and easy CSV export.",
    "version": "1.0",
    "project_urls": {
        "Documentation": "https://github.com/Shahanafarvin/DataRefine/blob/main/README.md",
        "Homepage": "https://github.com/Shahanafarvin/DataRefine",
        "Source": "https://github.com/Shahanafarvin/DataRefine/tree/main/datarefine",
        "Tracker": "https://github.com/Shahanafarvin/DataRefine/issues"
    },
    "split_keywords": [
        "data transformation",
        " missing value imputation",
        " outlier handling",
        " normalisation",
        " transformation",
        " machine learning",
        " data preprocessing",
        " pandas",
        " scikit-learn",
        " feature engineering",
        " data science",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a5688c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083",
                "md5": "31ce9e42b04e9437cb557975fbc1cff7",
                "sha256": "e06a6dd082c0300f475eba6c7857043fb5e029245166c8db3f0325f8d36ecf25"
            },
            "downloads": -1,
            "filename": "DataRefine-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "31ce9e42b04e9437cb557975fbc1cff7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 67899,
            "upload_time": "2024-11-02T16:33:58",
            "upload_time_iso_8601": "2024-11-02T16:33:58.565125Z",
            "url": "https://files.pythonhosted.org/packages/a5/68/8c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083/DataRefine-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbef4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633",
                "md5": "8b5a9f5d9530fc15b88a021244271ea0",
                "sha256": "6deafb2f6fe1cd524f828f45d16f986809b2e3b71a74badcb1f6dc6a1b58403c"
            },
            "downloads": -1,
            "filename": "DataRefine-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8b5a9f5d9530fc15b88a021244271ea0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 67703,
            "upload_time": "2024-11-02T16:34:00",
            "upload_time_iso_8601": "2024-11-02T16:34:00.784341Z",
            "url": "https://files.pythonhosted.org/packages/cb/ef/4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633/DataRefine-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-02 16:34:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Shahanafarvin",
    "github_project": "DataRefine",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "datarefine"
}
        
Elapsed time: 0.37971s