# mizuio - Python Data Processing Toolkit
mizuio is a comprehensive Python toolkit for data cleaning, visualization, and analysis. It provides a modern command-line interface and Python API for efficient data workflows, leveraging Pandas, NumPy, Matplotlib, Seaborn, and scikit-learn.
---
## 🚀 Features
### Data Cleaning (`DataCleaner`)
- Handle missing values: drop, fill, or interpolate
- Remove duplicates by columns
- Automatic data type conversion
- Outlier detection and removal (IQR, Z-score)
- Text normalization (case, whitespace)
### Data Visualization (`DataVisualizer`)
- Histograms and distribution plots
- Box plots for outlier analysis
- Scatter plots for variable relationships
- Correlation heatmaps
- Bar and line charts (categorical/time series)
- Missing value visualization
### Utility Tools (`DataUtils`)
- Multi-format support: CSV, JSON, Excel, Parquet, Pickle
- Data validation (columns, types, value ranges)
- Data sampling (random, systematic, stratified)
- Data splitting (train/validation/test)
- Categorical encoding (label, one-hot, ordinal)
- Feature scaling (standard, minmax, robust)
---
## 📦 Installation
### Requirements
- Python 3.7+
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
### Steps
1. **Clone the repository:**
```sh
git clone https://github.com/mertskzc/mizuio.git
cd mizuio
```
2. **Install dependencies:**
```sh
pip install -r requirements.txt
```
3. **Install in development mode (optional):**
```sh
pip install -e .
```
---
## 🖥️ Usage
### Command Line Interface
mizuio provides a CLI for common data tasks:
```sh
# Clean a dataset
mizuio clean data.csv --output cleaned_data.csv --remove-duplicates --fill-missing --remove-outliers
# Visualize a column
mizuio visualize data.csv --plot histogram --column age --output age_hist.png
# Show data info
mizuio info data.csv
```
#### CLI Commands
- `clean`: Clean data (remove duplicates, fill missing, remove outliers)
- `visualize`: Visualize data (histogram, boxplot, scatter, correlation)
- `info`: Show data summary (shape, memory, columns, missing values, duplicates)
---
## 🧪 Testing
Run all tests:
```sh
python -m pytest tests/
```
Run a specific test file:
```sh
python -m pytest tests/test_cleaner.py
```
---
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/your-feature`)
3. Commit your changes (`git commit -m 'Add feature'`)
4. Push your branch (`git push origin feature/your-feature`)
5. Open a Pull Request
---
## 📝 License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## 📞 Contact
- **Project Link:** [https://github.com/mertskzc/mizuio](https://github.com/mertskzc/mizuio)
- **E-mail:** mertskzc@gmail.com
---
## 🙏 Acknowledgements
mizuio uses the following open source libraries:
- [pandas](https://pandas.pydata.org/)
- [numpy](https://numpy.org/)
- [matplotlib](https://matplotlib.org/)
- [seaborn](https://seaborn.pydata.org/)
- [scikit-learn](https://scikit-learn.org/)
Raw data
{
"_id": null,
"home_page": "https://github.com/mertskzc/mizu",
"name": "mizuio",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Mert Sak\u0131zc\u0131 <mertskzc@gmail.com>",
"keywords": "data-science, data-analysis, data-cleaning, data-visualization, pandas, numpy, matplotlib, seaborn, machine-learning, data-processing",
"author": "Mert Sak\u0131zc\u0131",
"author_email": "Mert Sak\u0131zc\u0131 <mertskzc@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/76/c2/1d4f363b5d811a609158b53aae5a713eb3b8fb54fdb6190ef319e64b606c/mizuio-0.1.0.tar.gz",
"platform": null,
"description": "\r\n# mizuio - Python Data Processing Toolkit\r\n\r\nmizuio is a comprehensive Python toolkit for data cleaning, visualization, and analysis. It provides a modern command-line interface and Python API for efficient data workflows, leveraging Pandas, NumPy, Matplotlib, Seaborn, and scikit-learn.\r\n\r\n---\r\n\r\n## \ud83d\ude80 Features\r\n\r\n### Data Cleaning (`DataCleaner`)\r\n- Handle missing values: drop, fill, or interpolate\r\n- Remove duplicates by columns\r\n- Automatic data type conversion\r\n- Outlier detection and removal (IQR, Z-score)\r\n- Text normalization (case, whitespace)\r\n\r\n### Data Visualization (`DataVisualizer`)\r\n- Histograms and distribution plots\r\n- Box plots for outlier analysis\r\n- Scatter plots for variable relationships\r\n- Correlation heatmaps\r\n- Bar and line charts (categorical/time series)\r\n- Missing value visualization\r\n\r\n### Utility Tools (`DataUtils`)\r\n- Multi-format support: CSV, JSON, Excel, Parquet, Pickle\r\n- Data validation (columns, types, value ranges)\r\n- Data sampling (random, systematic, stratified)\r\n- Data splitting (train/validation/test)\r\n- Categorical encoding (label, one-hot, ordinal)\r\n- Feature scaling (standard, minmax, robust)\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### Requirements\r\n- Python 3.7+\r\n- pandas\r\n- numpy\r\n- matplotlib\r\n- seaborn\r\n- scikit-learn\r\n\r\n### Steps\r\n1. **Clone the repository:**\r\n\t```sh\r\n\t\tgit clone https://github.com/mertskzc/mizuio.git\r\n\t\tcd mizuio\r\n\t```\r\n2. **Install dependencies:**\r\n\t```sh\r\n\tpip install -r requirements.txt\r\n\t```\r\n3. **Install in development mode (optional):**\r\n\t```sh\r\n\tpip install -e .\r\n\t```\r\n\r\n---\r\n\r\n## \ud83d\udda5\ufe0f Usage\r\n\r\n### Command Line Interface\r\n\r\nmizuio provides a CLI for common data tasks:\r\n\r\n```sh\r\n# Clean a dataset\r\nmizuio clean data.csv --output cleaned_data.csv --remove-duplicates --fill-missing --remove-outliers\r\n\r\n# Visualize a column\r\nmizuio visualize data.csv --plot histogram --column age --output age_hist.png\r\n\r\n# Show data info\r\nmizuio info data.csv\r\n```\r\n\r\n#### CLI Commands\r\n- `clean`: Clean data (remove duplicates, fill missing, remove outliers)\r\n- `visualize`: Visualize data (histogram, boxplot, scatter, correlation)\r\n- `info`: Show data summary (shape, memory, columns, missing values, duplicates)\r\n\r\n---\r\n\r\n## \ud83e\uddea Testing\r\n\r\nRun all tests:\r\n```sh\r\npython -m pytest tests/\r\n```\r\nRun a specific test file:\r\n```sh\r\npython -m pytest tests/test_cleaner.py\r\n```\r\n\r\n---\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/your-feature`)\r\n3. Commit your changes (`git commit -m 'Add feature'`)\r\n4. Push your branch (`git push origin feature/your-feature`)\r\n5. Open a Pull Request\r\n\r\n---\r\n\r\n## \ud83d\udcdd License\r\n\r\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n\r\n## \ud83d\udcde Contact\r\n\r\n- **Project Link:** [https://github.com/mertskzc/mizuio](https://github.com/mertskzc/mizuio)\r\n- **E-mail:** mertskzc@gmail.com\r\n\r\n---\r\n\r\n## \ud83d\ude4f Acknowledgements\r\n\r\nmizuio uses the following open source libraries:\r\n- [pandas](https://pandas.pydata.org/)\r\n- [numpy](https://numpy.org/)\r\n- [matplotlib](https://matplotlib.org/)\r\n- [seaborn](https://seaborn.pydata.org/)\r\n- [scikit-learn](https://scikit-learn.org/)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive Python data processing tool for cleaning, visualization, and analysis",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/mertskzc/mizuio/issues",
"Homepage": "https://github.com/mertskzc/mizuio",
"Repository": "https://github.com/mertskzc/mizuio"
},
"split_keywords": [
"data-science",
" data-analysis",
" data-cleaning",
" data-visualization",
" pandas",
" numpy",
" matplotlib",
" seaborn",
" machine-learning",
" data-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e8125d04bc80579c23d3a820d7d029ade79207f9e520c6f502cb250167d1e4f5",
"md5": "6a4f103caae3a4056ece65b9db751fa1",
"sha256": "398e36fd046afa6210cd1adcc65bd9f155a924edd72e1afa98abf73360903e9a"
},
"downloads": -1,
"filename": "mizuio-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6a4f103caae3a4056ece65b9db751fa1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 14348,
"upload_time": "2025-08-30T16:22:02",
"upload_time_iso_8601": "2025-08-30T16:22:02.963748Z",
"url": "https://files.pythonhosted.org/packages/e8/12/5d04bc80579c23d3a820d7d029ade79207f9e520c6f502cb250167d1e4f5/mizuio-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "76c21d4f363b5d811a609158b53aae5a713eb3b8fb54fdb6190ef319e64b606c",
"md5": "16c36841b357330a8ed8dc585c295860",
"sha256": "76d06f79f16b1ce5705a2ede07c63284aa4a021f1d2dfb9d3fbdbfc8ab256e47"
},
"downloads": -1,
"filename": "mizuio-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "16c36841b357330a8ed8dc585c295860",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 17700,
"upload_time": "2025-08-30T16:22:04",
"upload_time_iso_8601": "2025-08-30T16:22:04.065270Z",
"url": "https://files.pythonhosted.org/packages/76/c2/1d4f363b5d811a609158b53aae5a713eb3b8fb54fdb6190ef319e64b606c/mizuio-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-30 16:22:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mertskzc",
"github_project": "mizu",
"github_not_found": true,
"lcname": "mizuio"
}