# Shaheenviz - Unified EDA Solution π
[](https://badge.fury.io/py/shaheenviz)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
Shaheenviz combines the analytical power of **YData Profiling** with the stunning visuals of **Sweetviz** to deliver a unified, automatic EDA solution. Built with pure Python for maximum compatibility! π
## β¨ Features
- π― **Automatic Backend Selection**: Intelligently chooses between YData Profiling and Sweetviz based on dataset characteristics
- π **Comprehensive Analysis**: Statistical summaries, correlations, missing values, outliers, and more
- π¨ **Beautiful Visualizations**: Interactive plots, histograms, correlation heatmaps, and comparison charts
- π **Smart Target Detection**: Automatically identifies target columns for supervised learning
- π **Dataset Comparison**: Compare train/test distributions and detect data drift
- π‘οΈ **Data Quality Warnings**: Automatic detection of data quality issues
- π» **Multiple Interfaces**: Python API, CLI tool, and Jupyter notebook integration
- π€ **Flexible Output**: HTML reports, JSON export, PDF generation (optional)
- π **Cross-Platform**: Windows, macOS, and Linux support
## π¦ Installation
### Basic Installation
```bash
pip install shaheenviz
```
### With Optional Dependencies
```bash
pip install shaheenviz[dev,pdf]
```
## π― Quick Start
### Basic Usage
```python
import pandas as pd
from shaheenviz import generate_report
# Load your data
df = pd.read_csv('your_data.csv')
# Generate report (automatically detects target and chooses optimal backend)
report = generate_report(df, title="My Dataset Analysis")
# Save as HTML
report.save_html('analysis_report.html')
# Or display in Jupyter notebook
report.show_notebook()
```
### Dataset Comparison
```python
from shaheenviz import compare_datasets
# Compare training and validation sets
train_df = pd.read_csv('train.csv')
val_df = pd.read_csv('validation.csv')
comparison_report = compare_datasets(train_df, val_df, target='target')
comparison_report.save_html('train_vs_val_comparison.html')
```
### Quick Profile
```python
from shaheenviz import quick_profile
# Generate minimal report for fast overview
quick_report = quick_profile(df, target='target')
quick_report.save_html('quick_analysis.html')
```
### Command Line Interface
```bash
# Basic analysis
shaheenviz --file data.csv
# With specific target and output
shaheenviz --file train.csv --target label --output my_report.html
# Compare datasets
shaheenviz --file train.csv --compare test.csv --target target
# Quick analysis with minimal processing
shaheenviz --file large_dataset.csv --minimal --mode ydata
# Verbose output with system info
shaheenviz --file data.csv --verbose --system-info
```
## ποΈ Architecture
Shaheenviz uses a modular architecture that automatically selects the best backend
## π§ Advanced Configuration
### Backend Selection Logic
```python
# Manual backend selection
report = generate_report(df, mode='ydata') # Force YData Profiling
report = generate_report(df, mode='sweetviz') # Force Sweetviz
report = generate_report(df, mode='auto') # Automatic (default)
```
### Custom Profiling
```python
from shaheenviz import ProfileWrapper
# Custom YData Profiling configuration
profile_wrapper = ProfileWrapper()
config_overrides = {
"correlations": {
"spearman": {"calculate": False}, # Disable Spearman for speed
"cramers": {"calculate": True} # Enable CramΓ©r's V
}
}
report = profile_wrapper.generate_profile(
df,
target='target',
config_overrides=config_overrides
)
```
### Utility Functions
```python
from shaheenviz.utils import detect_target, validate_dataframe, get_column_types
# Auto-detect target column
target = detect_target(df)
print(f"Detected target: {target}")
# Validate DataFrame
validation = validate_dataframe(df)
print(f"Dataset valid: {validation['valid']}")
# Get column types
column_types = get_column_types(df)
print(f"Numeric columns: {column_types['numeric']}")
```
## π Performance Tips
1. **Use Minimal Mode**: For quick analysis, use `minimal=True`
2. **Choose Backend Wisely**: YData Profiling for large datasets, Sweetviz for detailed comparisons
3. **Optimize Memory**: Use appropriate data types (e.g., category for strings)
4. **Target Detection**: Manually specify target column when known to save processing time
## π Example Use Cases
### Data Science Workflow
```python
import pandas as pd
from shaheenviz import generate_report, compare_datasets
from sklearn.model_selection import train_test_split
# 1. Initial data exploration
raw_data = pd.read_csv('raw_data.csv')
initial_report = generate_report(raw_data, title="Raw Data Analysis")
initial_report.save_html('01_raw_data_analysis.html')
# 2. After data cleaning
cleaned_data = pd.read_csv('cleaned_data.csv')
cleaned_report = generate_report(cleaned_data, title="Cleaned Data Analysis")
cleaned_report.save_html('02_cleaned_data_analysis.html')
# 3. Train/test split comparison
X = cleaned_data.drop('target', axis=1)
y = cleaned_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
train_data = pd.concat([X_train, y_train], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)
comparison_report = compare_datasets(
train_data, test_data,
target='target',
title="Train vs Test Comparison"
)
comparison_report.save_html('03_train_test_comparison.html')
```
### Batch Processing
```python
from pathlib import Path
from shaheenviz import generate_report
def batch_analyze_datasets(data_dir, output_dir):
"""Analyze all CSV files in a directory."""
data_path = Path(data_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
for csv_file in data_path.glob('*.csv'):
print(f"Analyzing {csv_file.name}...")
try:
df = pd.read_csv(csv_file)
report = generate_report(
df,
title=f"Analysis of {csv_file.stem}",
minimal=True # Use minimal mode for batch processing
)
output_file = output_path / f"{csv_file.stem}_report.html"
report.save_html(str(output_file))
print(f"Report saved: {output_file}")
except Exception as e:
print(f"Error processing {csv_file.name}: {e}")
# Usage
batch_analyze_datasets('data/', 'reports/')
```
## π€ Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md).
### Areas for Contribution
- π Bug fixes and improvements
- π New statistical functions
- π¨ Visualization enhancements
- π Documentation improvements
- π§ͺ Test coverage expansion
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Links
- π¦ [PyPI Package](https://pypi.org/project/shaheenviz/)
---
**Developed β€οΈ by Hamza**
*Shaheenviz - Making EDA fast, beautiful, and effortless!*
Raw data
{
"_id": null,
"home_page": "https://github.com/hamza-0987/shaheenviz",
"name": "shaheenviz",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Hamza <certification1290@gmail.com>",
"keywords": "eda, data-analysis, data-visualization, ydata-profiling, sweetviz, pandas",
"author": "Hamza",
"author_email": "Hamza <certification1290@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/20/81/9f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989/shaheenviz-0.1.3.tar.gz",
"platform": null,
"description": "# Shaheenviz - Unified EDA Solution \ud83d\ude80\r\n\r\n[](https://badge.fury.io/py/shaheenviz)\r\n[](https://opensource.org/licenses/MIT)\r\n[](https://www.python.org/downloads/)\r\n\r\nShaheenviz combines the analytical power of **YData Profiling** with the stunning visuals of **Sweetviz** to deliver a unified, automatic EDA solution. Built with pure Python for maximum compatibility! \ud83d\udc0d\r\n\r\n## \u2728 Features\r\n\r\n- \ud83c\udfaf **Automatic Backend Selection**: Intelligently chooses between YData Profiling and Sweetviz based on dataset characteristics\r\n- \ud83d\udcca **Comprehensive Analysis**: Statistical summaries, correlations, missing values, outliers, and more\r\n- \ud83c\udfa8 **Beautiful Visualizations**: Interactive plots, histograms, correlation heatmaps, and comparison charts\r\n- \ud83d\udd0d **Smart Target Detection**: Automatically identifies target columns for supervised learning\r\n- \ud83d\udcc8 **Dataset Comparison**: Compare train/test distributions and detect data drift\r\n- \ud83d\udee1\ufe0f **Data Quality Warnings**: Automatic detection of data quality issues\r\n- \ud83d\udcbb **Multiple Interfaces**: Python API, CLI tool, and Jupyter notebook integration\r\n- \ud83d\udce4 **Flexible Output**: HTML reports, JSON export, PDF generation (optional)\r\n- \ud83c\udf0d **Cross-Platform**: Windows, macOS, and Linux support\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### Basic Installation\r\n```bash\r\npip install shaheenviz\r\n```\r\n\r\n### With Optional Dependencies\r\n```bash\r\npip install shaheenviz[dev,pdf]\r\n```\r\n\r\n## \ud83c\udfaf Quick Start\r\n\r\n### Basic Usage\r\n\r\n```python\r\nimport pandas as pd\r\nfrom shaheenviz import generate_report\r\n\r\n# Load your data\r\ndf = pd.read_csv('your_data.csv')\r\n\r\n# Generate report (automatically detects target and chooses optimal backend)\r\nreport = generate_report(df, title=\"My Dataset Analysis\")\r\n\r\n# Save as HTML\r\nreport.save_html('analysis_report.html')\r\n\r\n# Or display in Jupyter notebook\r\nreport.show_notebook()\r\n```\r\n\r\n### Dataset Comparison\r\n\r\n```python\r\nfrom shaheenviz import compare_datasets\r\n\r\n# Compare training and validation sets\r\ntrain_df = pd.read_csv('train.csv')\r\nval_df = pd.read_csv('validation.csv')\r\n\r\ncomparison_report = compare_datasets(train_df, val_df, target='target')\r\ncomparison_report.save_html('train_vs_val_comparison.html')\r\n```\r\n\r\n### Quick Profile\r\n\r\n```python\r\nfrom shaheenviz import quick_profile\r\n\r\n# Generate minimal report for fast overview\r\nquick_report = quick_profile(df, target='target')\r\nquick_report.save_html('quick_analysis.html')\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Basic analysis\r\nshaheenviz --file data.csv\r\n\r\n# With specific target and output\r\nshaheenviz --file train.csv --target label --output my_report.html\r\n\r\n# Compare datasets\r\nshaheenviz --file train.csv --compare test.csv --target target\r\n\r\n# Quick analysis with minimal processing\r\nshaheenviz --file large_dataset.csv --minimal --mode ydata\r\n\r\n# Verbose output with system info\r\nshaheenviz --file data.csv --verbose --system-info\r\n```\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\nShaheenviz uses a modular architecture that automatically selects the best backend\r\n\r\n\r\n\r\n## \ud83d\udd27 Advanced Configuration\r\n\r\n### Backend Selection Logic\r\n\r\n```python\r\n# Manual backend selection\r\nreport = generate_report(df, mode='ydata') # Force YData Profiling\r\nreport = generate_report(df, mode='sweetviz') # Force Sweetviz\r\nreport = generate_report(df, mode='auto') # Automatic (default)\r\n```\r\n\r\n### Custom Profiling\r\n\r\n```python\r\nfrom shaheenviz import ProfileWrapper\r\n\r\n# Custom YData Profiling configuration\r\nprofile_wrapper = ProfileWrapper()\r\nconfig_overrides = {\r\n \"correlations\": {\r\n \"spearman\": {\"calculate\": False}, # Disable Spearman for speed\r\n \"cramers\": {\"calculate\": True} # Enable Cram\u00e9r's V\r\n }\r\n}\r\n\r\nreport = profile_wrapper.generate_profile(\r\n df, \r\n target='target',\r\n config_overrides=config_overrides\r\n)\r\n```\r\n\r\n### Utility Functions\r\n\r\n```python\r\nfrom shaheenviz.utils import detect_target, validate_dataframe, get_column_types\r\n\r\n# Auto-detect target column\r\ntarget = detect_target(df)\r\nprint(f\"Detected target: {target}\")\r\n\r\n# Validate DataFrame\r\nvalidation = validate_dataframe(df)\r\nprint(f\"Dataset valid: {validation['valid']}\")\r\n\r\n# Get column types\r\ncolumn_types = get_column_types(df)\r\nprint(f\"Numeric columns: {column_types['numeric']}\")\r\n```\r\n\r\n\r\n\r\n## \ud83d\udcca Performance Tips\r\n\r\n1. **Use Minimal Mode**: For quick analysis, use `minimal=True`\r\n2. **Choose Backend Wisely**: YData Profiling for large datasets, Sweetviz for detailed comparisons\r\n3. **Optimize Memory**: Use appropriate data types (e.g., category for strings)\r\n4. **Target Detection**: Manually specify target column when known to save processing time\r\n\r\n## \ud83c\udf93 Example Use Cases\r\n\r\n### Data Science Workflow\r\n\r\n```python\r\nimport pandas as pd\r\nfrom shaheenviz import generate_report, compare_datasets\r\nfrom sklearn.model_selection import train_test_split\r\n\r\n# 1. Initial data exploration\r\nraw_data = pd.read_csv('raw_data.csv')\r\ninitial_report = generate_report(raw_data, title=\"Raw Data Analysis\")\r\ninitial_report.save_html('01_raw_data_analysis.html')\r\n\r\n# 2. After data cleaning\r\ncleaned_data = pd.read_csv('cleaned_data.csv')\r\ncleaned_report = generate_report(cleaned_data, title=\"Cleaned Data Analysis\")\r\ncleaned_report.save_html('02_cleaned_data_analysis.html')\r\n\r\n# 3. Train/test split comparison\r\nX = cleaned_data.drop('target', axis=1)\r\ny = cleaned_data['target']\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\r\n\r\ntrain_data = pd.concat([X_train, y_train], axis=1)\r\ntest_data = pd.concat([X_test, y_test], axis=1)\r\n\r\ncomparison_report = compare_datasets(\r\n train_data, test_data, \r\n target='target',\r\n title=\"Train vs Test Comparison\"\r\n)\r\ncomparison_report.save_html('03_train_test_comparison.html')\r\n```\r\n\r\n### Batch Processing\r\n\r\n```python\r\nfrom pathlib import Path\r\nfrom shaheenviz import generate_report\r\n\r\ndef batch_analyze_datasets(data_dir, output_dir):\r\n \"\"\"Analyze all CSV files in a directory.\"\"\"\r\n \r\n data_path = Path(data_dir)\r\n output_path = Path(output_dir)\r\n output_path.mkdir(exist_ok=True)\r\n \r\n for csv_file in data_path.glob('*.csv'):\r\n print(f\"Analyzing {csv_file.name}...\")\r\n \r\n try:\r\n df = pd.read_csv(csv_file)\r\n report = generate_report(\r\n df, \r\n title=f\"Analysis of {csv_file.stem}\",\r\n minimal=True # Use minimal mode for batch processing\r\n )\r\n \r\n output_file = output_path / f\"{csv_file.stem}_report.html\"\r\n report.save_html(str(output_file))\r\n print(f\"Report saved: {output_file}\")\r\n \r\n except Exception as e:\r\n print(f\"Error processing {csv_file.name}: {e}\")\r\n\r\n# Usage\r\nbatch_analyze_datasets('data/', 'reports/')\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md).\r\n\r\n### Areas for Contribution\r\n- \ud83d\udc1b Bug fixes and improvements\r\n- \ud83d\udcca New statistical functions\r\n- \ud83c\udfa8 Visualization enhancements\r\n- \ud83d\udcda Documentation improvements\r\n- \ud83e\uddea Test coverage expansion\r\n\r\n\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n\r\n\r\n## \ud83d\udd17 Links\r\n\r\n\r\n\r\n- \ud83d\udce6 [PyPI Package](https://pypi.org/project/shaheenviz/)\r\n\r\n\r\n\r\n---\r\n\r\n**Developed \u2764\ufe0f by Hamza**\r\n\r\n*Shaheenviz - Making EDA fast, beautiful, and effortless!*\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Shaheenviz combines the analytical power of YData Profiling with the stunning visuals of Sweetviz to deliver a unified, automatic EDA solution.",
"version": "0.1.3",
"project_urls": {
"Bug Tracker": "https://github.com/hamza-0987/shaheenviz/issues",
"Documentation": "https://github.com/hamza-0987/shaheenviz#readme",
"Homepage": "https://github.com/hamza-0987/shaheenviz",
"Repository": "https://github.com/hamza-0987/shaheenviz"
},
"split_keywords": [
"eda",
" data-analysis",
" data-visualization",
" ydata-profiling",
" sweetviz",
" pandas"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "54a0396005f286a94b8a7154daa5e77ccc204132365f184a29fe24eb2153f69a",
"md5": "89aa0f3a326a28fc319e251d03284cdb",
"sha256": "d21cf8e9d2a649250f256aab307e684a6d5161a206046c7c737087f9caf55e7d"
},
"downloads": -1,
"filename": "shaheenviz-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "89aa0f3a326a28fc319e251d03284cdb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 138684,
"upload_time": "2025-08-02T09:42:17",
"upload_time_iso_8601": "2025-08-02T09:42:17.064700Z",
"url": "https://files.pythonhosted.org/packages/54/a0/396005f286a94b8a7154daa5e77ccc204132365f184a29fe24eb2153f69a/shaheenviz-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "20819f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989",
"md5": "1ab03b744eeb3296431026e599e6d5ed",
"sha256": "bb39a2b16de2d23c3a4972fdc6af5cecd1119f290b0f4830ba3af1ff61374dca"
},
"downloads": -1,
"filename": "shaheenviz-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "1ab03b744eeb3296431026e599e6d5ed",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 149352,
"upload_time": "2025-08-02T09:42:18",
"upload_time_iso_8601": "2025-08-02T09:42:18.703770Z",
"url": "https://files.pythonhosted.org/packages/20/81/9f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989/shaheenviz-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-02 09:42:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hamza-0987",
"github_project": "shaheenviz",
"github_not_found": true,
"lcname": "shaheenviz"
}