shaheenviz


Nameshaheenviz JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/hamza-0987/shaheenviz
SummaryShaheenviz combines the analytical power of YData Profiling with the stunning visuals of Sweetviz to deliver a unified, automatic EDA solution.
upload_time2025-08-02 09:42:18
maintainerNone
docs_urlNone
authorHamza
requires_python>=3.7
licenseNone
keywords eda data-analysis data-visualization ydata-profiling sweetviz pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Shaheenviz - Unified EDA Solution πŸš€

[![PyPI version](https://badge.fury.io/py/shaheenviz.svg)](https://badge.fury.io/py/shaheenviz)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)

Shaheenviz combines the analytical power of **YData Profiling** with the stunning visuals of **Sweetviz** to deliver a unified, automatic EDA solution. Built with pure Python for maximum compatibility! 🐍

## ✨ Features

- 🎯 **Automatic Backend Selection**: Intelligently chooses between YData Profiling and Sweetviz based on dataset characteristics
- πŸ“Š **Comprehensive Analysis**: Statistical summaries, correlations, missing values, outliers, and more
- 🎨 **Beautiful Visualizations**: Interactive plots, histograms, correlation heatmaps, and comparison charts
- πŸ” **Smart Target Detection**: Automatically identifies target columns for supervised learning
- πŸ“ˆ **Dataset Comparison**: Compare train/test distributions and detect data drift
- πŸ›‘οΈ **Data Quality Warnings**: Automatic detection of data quality issues
- πŸ’» **Multiple Interfaces**: Python API, CLI tool, and Jupyter notebook integration
- πŸ“€ **Flexible Output**: HTML reports, JSON export, PDF generation (optional)
- 🌍 **Cross-Platform**: Windows, macOS, and Linux support

## πŸ“¦ Installation

### Basic Installation
```bash
pip install shaheenviz
```

### With Optional Dependencies
```bash
pip install shaheenviz[dev,pdf]
```

## 🎯 Quick Start

### Basic Usage

```python
import pandas as pd
from shaheenviz import generate_report

# Load your data
df = pd.read_csv('your_data.csv')

# Generate report (automatically detects target and chooses optimal backend)
report = generate_report(df, title="My Dataset Analysis")

# Save as HTML
report.save_html('analysis_report.html')

# Or display in Jupyter notebook
report.show_notebook()
```

### Dataset Comparison

```python
from shaheenviz import compare_datasets

# Compare training and validation sets
train_df = pd.read_csv('train.csv')
val_df = pd.read_csv('validation.csv')

comparison_report = compare_datasets(train_df, val_df, target='target')
comparison_report.save_html('train_vs_val_comparison.html')
```

### Quick Profile

```python
from shaheenviz import quick_profile

# Generate minimal report for fast overview
quick_report = quick_profile(df, target='target')
quick_report.save_html('quick_analysis.html')
```

### Command Line Interface

```bash
# Basic analysis
shaheenviz --file data.csv

# With specific target and output
shaheenviz --file train.csv --target label --output my_report.html

# Compare datasets
shaheenviz --file train.csv --compare test.csv --target target

# Quick analysis with minimal processing
shaheenviz --file large_dataset.csv --minimal --mode ydata

# Verbose output with system info
shaheenviz --file data.csv --verbose --system-info
```

## πŸ—οΈ Architecture

Shaheenviz uses a modular architecture that automatically selects the best backend



## πŸ”§ Advanced Configuration

### Backend Selection Logic

```python
# Manual backend selection
report = generate_report(df, mode='ydata')     # Force YData Profiling
report = generate_report(df, mode='sweetviz')  # Force Sweetviz
report = generate_report(df, mode='auto')      # Automatic (default)
```

### Custom Profiling

```python
from shaheenviz import ProfileWrapper

# Custom YData Profiling configuration
profile_wrapper = ProfileWrapper()
config_overrides = {
    "correlations": {
        "spearman": {"calculate": False},  # Disable Spearman for speed
        "cramers": {"calculate": True}     # Enable CramΓ©r's V
    }
}

report = profile_wrapper.generate_profile(
    df, 
    target='target',
    config_overrides=config_overrides
)
```

### Utility Functions

```python
from shaheenviz.utils import detect_target, validate_dataframe, get_column_types

# Auto-detect target column
target = detect_target(df)
print(f"Detected target: {target}")

# Validate DataFrame
validation = validate_dataframe(df)
print(f"Dataset valid: {validation['valid']}")

# Get column types
column_types = get_column_types(df)
print(f"Numeric columns: {column_types['numeric']}")
```



## πŸ“Š Performance Tips

1. **Use Minimal Mode**: For quick analysis, use `minimal=True`
2. **Choose Backend Wisely**: YData Profiling for large datasets, Sweetviz for detailed comparisons
3. **Optimize Memory**: Use appropriate data types (e.g., category for strings)
4. **Target Detection**: Manually specify target column when known to save processing time

## πŸŽ“ Example Use Cases

### Data Science Workflow

```python
import pandas as pd
from shaheenviz import generate_report, compare_datasets
from sklearn.model_selection import train_test_split

# 1. Initial data exploration
raw_data = pd.read_csv('raw_data.csv')
initial_report = generate_report(raw_data, title="Raw Data Analysis")
initial_report.save_html('01_raw_data_analysis.html')

# 2. After data cleaning
cleaned_data = pd.read_csv('cleaned_data.csv')
cleaned_report = generate_report(cleaned_data, title="Cleaned Data Analysis")
cleaned_report.save_html('02_cleaned_data_analysis.html')

# 3. Train/test split comparison
X = cleaned_data.drop('target', axis=1)
y = cleaned_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

train_data = pd.concat([X_train, y_train], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)

comparison_report = compare_datasets(
    train_data, test_data, 
    target='target',
    title="Train vs Test Comparison"
)
comparison_report.save_html('03_train_test_comparison.html')
```

### Batch Processing

```python
from pathlib import Path
from shaheenviz import generate_report

def batch_analyze_datasets(data_dir, output_dir):
    """Analyze all CSV files in a directory."""
    
    data_path = Path(data_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    
    for csv_file in data_path.glob('*.csv'):
        print(f"Analyzing {csv_file.name}...")
        
        try:
            df = pd.read_csv(csv_file)
            report = generate_report(
                df, 
                title=f"Analysis of {csv_file.stem}",
                minimal=True  # Use minimal mode for batch processing
            )
            
            output_file = output_path / f"{csv_file.stem}_report.html"
            report.save_html(str(output_file))
            print(f"Report saved: {output_file}")
            
        except Exception as e:
            print(f"Error processing {csv_file.name}: {e}")

# Usage
batch_analyze_datasets('data/', 'reports/')
```

## 🀝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md).

### Areas for Contribution
- πŸ› Bug fixes and improvements
- πŸ“Š New statistical functions
- 🎨 Visualization enhancements
- πŸ“š Documentation improvements
- πŸ§ͺ Test coverage expansion



## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.



## πŸ”— Links



- πŸ“¦ [PyPI Package](https://pypi.org/project/shaheenviz/)



---

**Developed ❀️ by Hamza**

*Shaheenviz - Making EDA fast, beautiful, and effortless!*

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hamza-0987/shaheenviz",
    "name": "shaheenviz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Hamza <certification1290@gmail.com>",
    "keywords": "eda, data-analysis, data-visualization, ydata-profiling, sweetviz, pandas",
    "author": "Hamza",
    "author_email": "Hamza <certification1290@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/20/81/9f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989/shaheenviz-0.1.3.tar.gz",
    "platform": null,
    "description": "# Shaheenviz - Unified EDA Solution \ud83d\ude80\r\n\r\n[![PyPI version](https://badge.fury.io/py/shaheenviz.svg)](https://badge.fury.io/py/shaheenviz)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\r\n\r\nShaheenviz combines the analytical power of **YData Profiling** with the stunning visuals of **Sweetviz** to deliver a unified, automatic EDA solution. Built with pure Python for maximum compatibility! \ud83d\udc0d\r\n\r\n## \u2728 Features\r\n\r\n- \ud83c\udfaf **Automatic Backend Selection**: Intelligently chooses between YData Profiling and Sweetviz based on dataset characteristics\r\n- \ud83d\udcca **Comprehensive Analysis**: Statistical summaries, correlations, missing values, outliers, and more\r\n- \ud83c\udfa8 **Beautiful Visualizations**: Interactive plots, histograms, correlation heatmaps, and comparison charts\r\n- \ud83d\udd0d **Smart Target Detection**: Automatically identifies target columns for supervised learning\r\n- \ud83d\udcc8 **Dataset Comparison**: Compare train/test distributions and detect data drift\r\n- \ud83d\udee1\ufe0f **Data Quality Warnings**: Automatic detection of data quality issues\r\n- \ud83d\udcbb **Multiple Interfaces**: Python API, CLI tool, and Jupyter notebook integration\r\n- \ud83d\udce4 **Flexible Output**: HTML reports, JSON export, PDF generation (optional)\r\n- \ud83c\udf0d **Cross-Platform**: Windows, macOS, and Linux support\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### Basic Installation\r\n```bash\r\npip install shaheenviz\r\n```\r\n\r\n### With Optional Dependencies\r\n```bash\r\npip install shaheenviz[dev,pdf]\r\n```\r\n\r\n## \ud83c\udfaf Quick Start\r\n\r\n### Basic Usage\r\n\r\n```python\r\nimport pandas as pd\r\nfrom shaheenviz import generate_report\r\n\r\n# Load your data\r\ndf = pd.read_csv('your_data.csv')\r\n\r\n# Generate report (automatically detects target and chooses optimal backend)\r\nreport = generate_report(df, title=\"My Dataset Analysis\")\r\n\r\n# Save as HTML\r\nreport.save_html('analysis_report.html')\r\n\r\n# Or display in Jupyter notebook\r\nreport.show_notebook()\r\n```\r\n\r\n### Dataset Comparison\r\n\r\n```python\r\nfrom shaheenviz import compare_datasets\r\n\r\n# Compare training and validation sets\r\ntrain_df = pd.read_csv('train.csv')\r\nval_df = pd.read_csv('validation.csv')\r\n\r\ncomparison_report = compare_datasets(train_df, val_df, target='target')\r\ncomparison_report.save_html('train_vs_val_comparison.html')\r\n```\r\n\r\n### Quick Profile\r\n\r\n```python\r\nfrom shaheenviz import quick_profile\r\n\r\n# Generate minimal report for fast overview\r\nquick_report = quick_profile(df, target='target')\r\nquick_report.save_html('quick_analysis.html')\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Basic analysis\r\nshaheenviz --file data.csv\r\n\r\n# With specific target and output\r\nshaheenviz --file train.csv --target label --output my_report.html\r\n\r\n# Compare datasets\r\nshaheenviz --file train.csv --compare test.csv --target target\r\n\r\n# Quick analysis with minimal processing\r\nshaheenviz --file large_dataset.csv --minimal --mode ydata\r\n\r\n# Verbose output with system info\r\nshaheenviz --file data.csv --verbose --system-info\r\n```\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\nShaheenviz uses a modular architecture that automatically selects the best backend\r\n\r\n\r\n\r\n## \ud83d\udd27 Advanced Configuration\r\n\r\n### Backend Selection Logic\r\n\r\n```python\r\n# Manual backend selection\r\nreport = generate_report(df, mode='ydata')     # Force YData Profiling\r\nreport = generate_report(df, mode='sweetviz')  # Force Sweetviz\r\nreport = generate_report(df, mode='auto')      # Automatic (default)\r\n```\r\n\r\n### Custom Profiling\r\n\r\n```python\r\nfrom shaheenviz import ProfileWrapper\r\n\r\n# Custom YData Profiling configuration\r\nprofile_wrapper = ProfileWrapper()\r\nconfig_overrides = {\r\n    \"correlations\": {\r\n        \"spearman\": {\"calculate\": False},  # Disable Spearman for speed\r\n        \"cramers\": {\"calculate\": True}     # Enable Cram\u00e9r's V\r\n    }\r\n}\r\n\r\nreport = profile_wrapper.generate_profile(\r\n    df, \r\n    target='target',\r\n    config_overrides=config_overrides\r\n)\r\n```\r\n\r\n### Utility Functions\r\n\r\n```python\r\nfrom shaheenviz.utils import detect_target, validate_dataframe, get_column_types\r\n\r\n# Auto-detect target column\r\ntarget = detect_target(df)\r\nprint(f\"Detected target: {target}\")\r\n\r\n# Validate DataFrame\r\nvalidation = validate_dataframe(df)\r\nprint(f\"Dataset valid: {validation['valid']}\")\r\n\r\n# Get column types\r\ncolumn_types = get_column_types(df)\r\nprint(f\"Numeric columns: {column_types['numeric']}\")\r\n```\r\n\r\n\r\n\r\n## \ud83d\udcca Performance Tips\r\n\r\n1. **Use Minimal Mode**: For quick analysis, use `minimal=True`\r\n2. **Choose Backend Wisely**: YData Profiling for large datasets, Sweetviz for detailed comparisons\r\n3. **Optimize Memory**: Use appropriate data types (e.g., category for strings)\r\n4. **Target Detection**: Manually specify target column when known to save processing time\r\n\r\n## \ud83c\udf93 Example Use Cases\r\n\r\n### Data Science Workflow\r\n\r\n```python\r\nimport pandas as pd\r\nfrom shaheenviz import generate_report, compare_datasets\r\nfrom sklearn.model_selection import train_test_split\r\n\r\n# 1. Initial data exploration\r\nraw_data = pd.read_csv('raw_data.csv')\r\ninitial_report = generate_report(raw_data, title=\"Raw Data Analysis\")\r\ninitial_report.save_html('01_raw_data_analysis.html')\r\n\r\n# 2. After data cleaning\r\ncleaned_data = pd.read_csv('cleaned_data.csv')\r\ncleaned_report = generate_report(cleaned_data, title=\"Cleaned Data Analysis\")\r\ncleaned_report.save_html('02_cleaned_data_analysis.html')\r\n\r\n# 3. Train/test split comparison\r\nX = cleaned_data.drop('target', axis=1)\r\ny = cleaned_data['target']\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\r\n\r\ntrain_data = pd.concat([X_train, y_train], axis=1)\r\ntest_data = pd.concat([X_test, y_test], axis=1)\r\n\r\ncomparison_report = compare_datasets(\r\n    train_data, test_data, \r\n    target='target',\r\n    title=\"Train vs Test Comparison\"\r\n)\r\ncomparison_report.save_html('03_train_test_comparison.html')\r\n```\r\n\r\n### Batch Processing\r\n\r\n```python\r\nfrom pathlib import Path\r\nfrom shaheenviz import generate_report\r\n\r\ndef batch_analyze_datasets(data_dir, output_dir):\r\n    \"\"\"Analyze all CSV files in a directory.\"\"\"\r\n    \r\n    data_path = Path(data_dir)\r\n    output_path = Path(output_dir)\r\n    output_path.mkdir(exist_ok=True)\r\n    \r\n    for csv_file in data_path.glob('*.csv'):\r\n        print(f\"Analyzing {csv_file.name}...\")\r\n        \r\n        try:\r\n            df = pd.read_csv(csv_file)\r\n            report = generate_report(\r\n                df, \r\n                title=f\"Analysis of {csv_file.stem}\",\r\n                minimal=True  # Use minimal mode for batch processing\r\n            )\r\n            \r\n            output_file = output_path / f\"{csv_file.stem}_report.html\"\r\n            report.save_html(str(output_file))\r\n            print(f\"Report saved: {output_file}\")\r\n            \r\n        except Exception as e:\r\n            print(f\"Error processing {csv_file.name}: {e}\")\r\n\r\n# Usage\r\nbatch_analyze_datasets('data/', 'reports/')\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md).\r\n\r\n### Areas for Contribution\r\n- \ud83d\udc1b Bug fixes and improvements\r\n- \ud83d\udcca New statistical functions\r\n- \ud83c\udfa8 Visualization enhancements\r\n- \ud83d\udcda Documentation improvements\r\n- \ud83e\uddea Test coverage expansion\r\n\r\n\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n\r\n\r\n## \ud83d\udd17 Links\r\n\r\n\r\n\r\n- \ud83d\udce6 [PyPI Package](https://pypi.org/project/shaheenviz/)\r\n\r\n\r\n\r\n---\r\n\r\n**Developed \u2764\ufe0f by Hamza**\r\n\r\n*Shaheenviz - Making EDA fast, beautiful, and effortless!*\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Shaheenviz combines the analytical power of YData Profiling with the stunning visuals of Sweetviz to deliver a unified, automatic EDA solution.",
    "version": "0.1.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/hamza-0987/shaheenviz/issues",
        "Documentation": "https://github.com/hamza-0987/shaheenviz#readme",
        "Homepage": "https://github.com/hamza-0987/shaheenviz",
        "Repository": "https://github.com/hamza-0987/shaheenviz"
    },
    "split_keywords": [
        "eda",
        " data-analysis",
        " data-visualization",
        " ydata-profiling",
        " sweetviz",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "54a0396005f286a94b8a7154daa5e77ccc204132365f184a29fe24eb2153f69a",
                "md5": "89aa0f3a326a28fc319e251d03284cdb",
                "sha256": "d21cf8e9d2a649250f256aab307e684a6d5161a206046c7c737087f9caf55e7d"
            },
            "downloads": -1,
            "filename": "shaheenviz-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "89aa0f3a326a28fc319e251d03284cdb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 138684,
            "upload_time": "2025-08-02T09:42:17",
            "upload_time_iso_8601": "2025-08-02T09:42:17.064700Z",
            "url": "https://files.pythonhosted.org/packages/54/a0/396005f286a94b8a7154daa5e77ccc204132365f184a29fe24eb2153f69a/shaheenviz-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "20819f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989",
                "md5": "1ab03b744eeb3296431026e599e6d5ed",
                "sha256": "bb39a2b16de2d23c3a4972fdc6af5cecd1119f290b0f4830ba3af1ff61374dca"
            },
            "downloads": -1,
            "filename": "shaheenviz-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1ab03b744eeb3296431026e599e6d5ed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 149352,
            "upload_time": "2025-08-02T09:42:18",
            "upload_time_iso_8601": "2025-08-02T09:42:18.703770Z",
            "url": "https://files.pythonhosted.org/packages/20/81/9f565c6d181110d29048351b0f66b2cd772785d87919276b429d33a94989/shaheenviz-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-02 09:42:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hamza-0987",
    "github_project": "shaheenviz",
    "github_not_found": true,
    "lcname": "shaheenviz"
}
        
Elapsed time: 0.87036s