# DataRefine
# <img src="DataRefine/scripts/drlogo.jpeg" alt="DataRefine logo" width="200"/>
![PyPI](https://img.shields.io/pypi/v/DataRefine?color=#2e86c1&label=pypi&logo=pypi)
![License](https://img.shields.io/github/license/Shahanafarvin/DataRefine)
![Python Versions](https://img.shields.io/pypi/pyversions/DataRefine)
**DataRefine** is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.
## Features
- **Interactive Data Upload**: Easy CSV file upload functionality
- **Missing Data Handling**:
- Multiple imputation strategies (mean, median, mode, predictive)
- Visual representation of missing value patterns
- Column-specific imputation options
- **Outlier Detection & Treatment**:
- Multiple detection methods (IQR, Z-score)
- Configurable thresholds
- Visual outlier analysis using box plots
- Multiple handling strategies (capping, removal, imputation)
- **Data Normalization**:
- Multiple normalization methods (Min-Max, Z-score, Robust scaling)
- Interactive distribution visualization
- Column-specific normalization
- **Data Transformation**:
- Log transformation
- Square root transformation
- Box-Cox transformation
- Before/after distribution comparison
- **Data Quality Assessment**:
- Summary statistics
- Visual quality reports
## Installation
It's recommended to install `DataRefine` in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.
### 1. Set Up a Virtual Environment
**For Python 3.3 and above:**
1. **Create a Virtual Environment:**
```bash
python -m venv env
```
Replace `env` with your preferred name for the virtual environment.
2. **Activate the Virtual Environment:**
- **On Windows:**
```bash
env\Scripts\activate
```
- **On macOS/Linux:**
```bash
source env/bin/activate
```
### 2. Install DataRefine
Once the virtual environment is activated, you can install `DataRefine` using `pip`:
```bash
pip install datarefine==1.0
```
## Quick Start
After installation, you can start DataRefine directly by running:
```bash
DataRefine
```
Open your web browser and navigate to the provided local URL.
Upload your CSV file.
Start cleaning your data!
## How to use?
- **Data Upload:**
- Click the "Upload CSV" button.
- Select your CSV file from your local system.
- **Data Cleaning:**
- Use the sidebar to navigate between different cleaning operations.
- Configure parameters using the interactive controls.
- View real-time visualizations of the changes.
- Download the cleaned dataset when finished.
- For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.
## Requirements
- Python >= 3.7
- Streamlit
- Pandas
- NumPy
- plotly
- scikit-learn
For more detailed information, see the `requirements.txt` file.
## Contributing
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a new branch (git checkout -b feature/improvement)
- Make your changes
- Commit your changes (git commit -am 'Add new feature')
- Push to the branch (git push origin feature/improvement)
- Create a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md)file for details.
## Acknowledgments
Special thanks to all the libraries and frameworks that have helped in developing this package.
## Version History
- 1.0.0: Initial release
- Basic data cleaning functionality
- Interactive web interface
- Visualization capabilities
Raw data
{
"_id": null,
"home_page": "https://github.com/Shahanafarvin/DataRefine",
"name": "DataRefine",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "data transformation, missing value imputation, outlier handling, normalisation, transformation, machine learning, data preprocessing, pandas, scikit-learn, feature engineering, data science, Python",
"author": "Shahana Farvin",
"author_email": "shahana50997@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cb/ef/4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633/DataRefine-1.0.tar.gz",
"platform": null,
"description": "# DataRefine\n# <img src=\"DataRefine/scripts/drlogo.jpeg\" alt=\"DataRefine logo\" width=\"200\"/>\n\n![PyPI](https://img.shields.io/pypi/v/DataRefine?color=#2e86c1&label=pypi&logo=pypi)\n![License](https://img.shields.io/github/license/Shahanafarvin/DataRefine)\n![Python Versions](https://img.shields.io/pypi/pyversions/DataRefine)\n\n**DataRefine** is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.\n\n## Features\n\n- **Interactive Data Upload**: Easy CSV file upload functionality\n- **Missing Data Handling**:\n - Multiple imputation strategies (mean, median, mode, predictive)\n - Visual representation of missing value patterns\n - Column-specific imputation options\n \n- **Outlier Detection & Treatment**:\n - Multiple detection methods (IQR, Z-score)\n - Configurable thresholds\n - Visual outlier analysis using box plots\n - Multiple handling strategies (capping, removal, imputation)\n\n- **Data Normalization**:\n - Multiple normalization methods (Min-Max, Z-score, Robust scaling)\n - Interactive distribution visualization\n - Column-specific normalization\n\n- **Data Transformation**:\n - Log transformation\n - Square root transformation\n - Box-Cox transformation\n - Before/after distribution comparison\n\n- **Data Quality Assessment**:\n - Summary statistics\n - Visual quality reports\n\n## Installation\n\nIt's recommended to install `DataRefine` in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.\n\n### 1. Set Up a Virtual Environment\n\n**For Python 3.3 and above:**\n\n1. **Create a Virtual Environment:**\n\n ```bash\n python -m venv env\n ```\n\n Replace `env` with your preferred name for the virtual environment.\n\n2. **Activate the Virtual Environment:**\n\n - **On Windows:**\n ```bash\n env\\Scripts\\activate\n ```\n\n - **On macOS/Linux:**\n ```bash\n source env/bin/activate\n ```\n\n### 2. Install DataRefine\n\nOnce the virtual environment is activated, you can install `DataRefine` using `pip`:\n\n```bash\npip install datarefine==1.0\n```\n## Quick Start\n\nAfter installation, you can start DataRefine directly by running:\n\n```bash\nDataRefine\n```\nOpen your web browser and navigate to the provided local URL.\n\nUpload your CSV file.\n\nStart cleaning your data!\n\n## How to use?\n\n- **Data Upload:**\n - Click the \"Upload CSV\" button.\n - Select your CSV file from your local system.\n\n- **Data Cleaning:**\n - Use the sidebar to navigate between different cleaning operations.\n - Configure parameters using the interactive controls.\n - View real-time visualizations of the changes.\n - Download the cleaned dataset when finished.\n - For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.\n\n## Requirements\n\n- Python >= 3.7\n- Streamlit\n- Pandas\n- NumPy\n- plotly\n- scikit-learn\n\nFor more detailed information, see the `requirements.txt` file.\n\n## Contributing\n\nWe welcome contributions! Please follow these steps:\n\n- Fork the repository\n- Create a new branch (git checkout -b feature/improvement)\n- Make your changes\n- Commit your changes (git commit -am 'Add new feature')\n- Push to the branch (git push origin feature/improvement)\n- Create a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md)file for details.\n\n## Acknowledgments\n\nSpecial thanks to all the libraries and frameworks that have helped in developing this package.\n\n## Version History\n\n- 1.0.0: Initial release\n- Basic data cleaning functionality\n- Interactive web interface\n- Visualization capabilities\n\n\n\n\n\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A no-code solution for performing data cleaning like misssing value imputation,outlier handling,normalisation,transformation and quality check with an intuitive interface for interactive DataFrame manipulation and easy CSV export.",
"version": "1.0",
"project_urls": {
"Documentation": "https://github.com/Shahanafarvin/DataRefine/blob/main/README.md",
"Homepage": "https://github.com/Shahanafarvin/DataRefine",
"Source": "https://github.com/Shahanafarvin/DataRefine/tree/main/datarefine",
"Tracker": "https://github.com/Shahanafarvin/DataRefine/issues"
},
"split_keywords": [
"data transformation",
" missing value imputation",
" outlier handling",
" normalisation",
" transformation",
" machine learning",
" data preprocessing",
" pandas",
" scikit-learn",
" feature engineering",
" data science",
" python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a5688c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083",
"md5": "31ce9e42b04e9437cb557975fbc1cff7",
"sha256": "e06a6dd082c0300f475eba6c7857043fb5e029245166c8db3f0325f8d36ecf25"
},
"downloads": -1,
"filename": "DataRefine-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "31ce9e42b04e9437cb557975fbc1cff7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 67899,
"upload_time": "2024-11-02T16:33:58",
"upload_time_iso_8601": "2024-11-02T16:33:58.565125Z",
"url": "https://files.pythonhosted.org/packages/a5/68/8c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083/DataRefine-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cbef4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633",
"md5": "8b5a9f5d9530fc15b88a021244271ea0",
"sha256": "6deafb2f6fe1cd524f828f45d16f986809b2e3b71a74badcb1f6dc6a1b58403c"
},
"downloads": -1,
"filename": "DataRefine-1.0.tar.gz",
"has_sig": false,
"md5_digest": "8b5a9f5d9530fc15b88a021244271ea0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 67703,
"upload_time": "2024-11-02T16:34:00",
"upload_time_iso_8601": "2024-11-02T16:34:00.784341Z",
"url": "https://files.pythonhosted.org/packages/cb/ef/4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633/DataRefine-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-02 16:34:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Shahanafarvin",
"github_project": "DataRefine",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "datarefine"
}