ml-sniff


Nameml-sniff JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/Sherin-SEF-AI/ml-sniffer
SummaryAdvanced Machine Learning Problem Detection with CLI and GUI interfaces
upload_time2025-07-27 10:59:47
maintainerNone
docs_urlNone
authorSherin Joseph Roy
requires_python>=3.7
licenseMIT
keywords machine-learning data-analysis classification regression clustering automation streamlit gui
VCS
bugtrack_url
requirements pandas numpy matplotlib seaborn scikit-learn scipy plotly streamlit streamlit-option-menu streamlit-aggrid
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ML Sniff ๐Ÿ•ต๏ธโ€โ™‚๏ธ

**Advanced Machine Learning Problem Detection from CSV files and DataFrames**

*By [Sherin Joseph Roy](https://sherin-sef-ai.github.io/) - Startup Founder & Hardware/IoT Enthusiast*

ML Sniff is a comprehensive Python package that automatically analyzes your data to determine the most likely machine learning problem type, identifies the target column, suggests appropriate models, and provides advanced data analytics.

## ๐Ÿš€ Features

- ๐Ÿ” **Automatic Target Detection**: Uses advanced heuristics to identify the most likely target column
- ๐ŸŽฏ **Problem Type Classification**: Determines if your data is Classification, Regression, or Clustering
- ๐Ÿค– **Model Suggestions**: Recommends appropriate algorithms with hyperparameters
- ๐Ÿ“Š **Comprehensive Analysis**: Provides detailed statistics and visualizations
- ๐Ÿ† **Feature Importance**: Multiple methods (Random Forest, Mutual Information, Correlation)
- ๐Ÿ” **Data Quality Assessment**: Missing data, duplicates, outliers, and variance analysis
- ๐Ÿ“ˆ **Advanced Visualizations**: Static plots and interactive Plotly dashboards
- ๐Ÿ–ฅ๏ธ **CLI Support**: Analyze files directly from the command line
- ๐Ÿ–ฅ๏ธ **Web GUI**: Beautiful Streamlit interface with interactive dashboards
- ๐Ÿ“ค **Export Capabilities**: Export reports in JSON, CSV, or TXT formats
- ๐Ÿ› ๏ธ **Preprocessing Suggestions**: Automated recommendations for data preparation

## ๐Ÿ“ฆ Installation

### From PyPI (when published)
```bash
pip install ml-sniff
```

### From Source
```bash
git clone https://github.com/Sherin-SEF-AI/ml-sniffer.git
cd ml-sniffer
pip install .
```

## ๐Ÿš€ Quick Start

### Command Line Interface

Basic analysis:
```bash
ml-sniff your_data.csv
```

Show visualizations:
```bash
ml-sniff your_data.csv --visualize
```

Create interactive dashboard:
```bash
ml-sniff your_data.csv --interactive
```

Export detailed report:
```bash
ml-sniff your_data.csv --export report.json --format json
```

Show preprocessing suggestions:
```bash
ml-sniff your_data.csv --preprocessing
```

Show feature importance:
```bash
ml-sniff your_data.csv --feature-importance
```

Show data quality report:
```bash
ml-sniff your_data.csv --data-quality
```

Specify target column manually:
```bash
ml-sniff your_data.csv --target target_column
```

### Web Interface (GUI)

Launch the beautiful Streamlit web interface:

```bash
# Method 1: Using the launcher script
python run_gui.py

# Method 2: Direct streamlit command
streamlit run streamlit_app.py

# Method 3: Using the command line entry point
ml-sniff-gui
```

The GUI will open in your browser at `http://localhost:8501` and provides:

- ๐Ÿ“ **File Upload**: Drag and drop CSV files
- ๐ŸŽฏ **Interactive Analysis**: Real-time analysis with visual feedback
- ๐Ÿ“Š **Interactive Charts**: Plotly visualizations with zoom, pan, and hover
- ๐Ÿ† **Feature Analysis**: Multiple importance methods with interactive charts
- ๐Ÿ” **Data Quality**: Comprehensive quality assessment with detailed reports
- ๐Ÿ“ˆ **Visualizations**: Correlation matrices, distributions, and outlier analysis
- ๐Ÿ“ค **Export**: Download reports in multiple formats
- โš™๏ธ **Customization**: Toggle features and analysis options

### Python API

```python
from ml_sniff import Sniffer

# Basic analysis
sniffer = Sniffer("your_data.csv")
sniffer.report()

# Advanced analysis with manual target
sniffer = Sniffer("your_data.csv", target_column="target")
sniffer.report()

# Get feature importance
top_features = sniffer.get_top_features(5, method='random_forest')
print(f"Top features: {top_features}")

# Get preprocessing suggestions
suggestions = sniffer.suggest_preprocessing()
print(suggestions)

# Create visualizations
sniffer.visualize_data()
sniffer.create_interactive_dashboard()

# Export report
sniffer.export_report("analysis.json", format="json")
```

## ๐Ÿ”ง Advanced Features

### Feature Importance Analysis

ML Sniff provides multiple methods for feature importance:

```python
# Random Forest importance
rf_importance = sniffer.get_feature_importance('random_forest')

# Mutual Information
mi_importance = sniffer.get_feature_importance('mutual_info')

# Correlation-based
corr_importance = sniffer.get_feature_importance('correlation')

# Get top features
top_features = sniffer.get_top_features(5, method='random_forest')
```

### Data Quality Assessment

Comprehensive data quality analysis:

```python
# Get data quality summary
quality_issues = sniffer.get_data_quality_summary()

# Access detailed quality metrics
quality_report = sniffer.data_quality_report

# Check for specific issues
missing_columns = quality_issues['high_missing']
outlier_columns = quality_issues['many_outliers']
```

### Preprocessing Suggestions

Automated recommendations for data preparation:

```python
suggestions = sniffer.suggest_preprocessing()

# Missing data handling
missing_suggestions = suggestions['missing_data']

# Outlier handling
outlier_suggestions = suggestions['outliers']

# Feature scaling
scaling_suggestions = suggestions['scaling']

# Categorical encoding
encoding_suggestions = suggestions['encoding']

# Feature selection
selection_suggestions = suggestions['feature_selection']
```

### Interactive Dashboard

Create interactive Plotly dashboards:

```python
# Create interactive dashboard
sniffer.create_interactive_dashboard()
```

## ๐Ÿ“Š Example Output

```
================================================================================
ML SNIFF - ADVANCED ML PROBLEM DETECTION
================================================================================

๐Ÿ“Š BASIC STATISTICS:
   โ€ข Rows: 1,000
   โ€ข Columns: 10
   โ€ข Missing Data: 2.50%
   โ€ข Memory Usage: 0.78 MB
   โ€ข Numeric Columns: 6
   โ€ข Categorical Columns: 1

๐Ÿ“‹ DATA TYPES:
   โ€ข float64: 6 columns
   โ€ข int64: 3 columns
   โ€ข object: 1 columns

๐Ÿ” DATA QUALITY ASSESSMENT:
   โ€ข High Missing: feature3
   โ€ข Many Outliers: feature1, feature2

๐ŸŽฏ TARGET COLUMN ANALYSIS:
   โ€ข Identified Target: 'target'
   โ€ข Problem Type: Classification
   โ€ข Suggested Model: RandomForestClassifier

   โ€ข Target Statistics:
     - Data Type: int64
     - Unique Values: 3
     - Missing Values: 0
     - Mean: 1.2000
     - Std: 0.8165
     - Min: 0.0000
     - Max: 2.0000
     - Skewness: 0.0000
     - Kurtosis: -1.5000
     - Label Distribution:
       * 0: 400 (40.0%)
       * 1: 350 (35.0%)
       * 2: 250 (25.0%)

๐Ÿ† FEATURE IMPORTANCE:
   1. feature1: 0.3800
   2. feature3: 0.2628
   3. feature4: 0.2000
   4. feature2: 0.1572

๐Ÿ’ก MODEL RECOMMENDATIONS:
   โ€ข Primary Model: RandomForestClassifier
   โ€ข Hyperparameters: {'n_estimators': 100, 'max_depth': 10, 'random_state': 42}
   โ€ข Alternative Models: LogisticRegression, SVM, XGBClassifier
   โ€ข Consider class imbalance if present
   โ€ข Use metrics like accuracy, precision, recall, F1-score

================================================================================
```

## ๐Ÿ› ๏ธ CLI Options

```bash
ml-sniff [OPTIONS] FILE

Options:
  --target, -t TEXT           Manually specify target column name
  --visualize, -v            Show data visualizations
  --interactive, -i          Create interactive Plotly dashboard
  --output, -o TEXT          Save report to file instead of printing to console
  --export, -e TEXT          Export detailed analysis report to file
  --format, -f [json|csv|txt] Export format (default: json)
  --summary, -s              Show only summary information
  --preprocessing, -p        Show preprocessing suggestions
  --no-auto-analyze          Skip automatic analysis on initialization
  --feature-importance       Show feature importance analysis
  --data-quality             Show detailed data quality report
```

## ๐Ÿ“ˆ Sample Data

Create sample datasets to test the package:

```python
import pandas as pd
import numpy as np

# Classification dataset
np.random.seed(42)
n_samples = 1000

classification_data = {
    'feature1': np.random.normal(0, 1, n_samples),
    'feature2': np.random.normal(0, 1, n_samples),
    'feature3': np.random.normal(0, 1, n_samples),
    'feature4': np.random.normal(0, 1, n_samples),
    'categorical_feature': np.random.choice(['A', 'B', 'C'], n_samples),
    'target': np.random.choice([0, 1, 2], n_samples, p=[0.4, 0.35, 0.25])
}

df = pd.DataFrame(classification_data)
df.to_csv('classification_sample.csv', index=False)

# Regression dataset
regression_data = {
    'feature1': np.random.normal(0, 1, n_samples),
    'feature2': np.random.normal(0, 1, n_samples),
    'feature3': np.random.normal(0, 1, n_samples),
    'target': np.random.normal(0, 1, n_samples)
}

df_reg = pd.DataFrame(regression_data)
df_reg.to_csv('regression_sample.csv', index=False)
```

## ๐Ÿ”ฌ API Reference

### Sniffer Class

#### `__init__(data, target_column=None, auto_analyze=True)`
Initialize the Sniffer with data.

**Parameters:**
- `data`: CSV file path (str/Path) or pandas DataFrame
- `target_column`: Optional manual target column specification
- `auto_analyze`: Whether to automatically analyze data on initialization

#### `report()`
Print a comprehensive analysis report to console.

#### `get_summary()`
Get analysis results as a dictionary.

**Returns:**
- Dictionary with keys: `target_column`, `problem_type`, `suggested_model`, `basic_stats`, `label_distribution`, `feature_importance`, `data_quality_report`, `outlier_info`, `clustering_analysis`, `quality_issues`

#### `get_feature_importance(method='random_forest')`
Get feature importance scores.

**Parameters:**
- `method`: 'random_forest', 'mutual_info', or 'correlation'

**Returns:**
- Dictionary of feature importance scores

#### `get_top_features(n=5, method='random_forest')`
Get top n most important features.

**Parameters:**
- `n`: Number of top features to return
- `method`: Feature importance method to use

**Returns:**
- List of top feature names

#### `get_data_quality_summary()`
Get a summary of data quality issues.

**Returns:**
- Dictionary with data quality summary

#### `suggest_preprocessing()`
Suggest preprocessing steps based on data analysis.

**Returns:**
- Dictionary with preprocessing suggestions

#### `visualize_data(figsize=(15, 10))`
Generate comprehensive data visualizations.

#### `create_interactive_dashboard()`
Create an interactive Plotly dashboard.

#### `export_report(filename, format='json')`
Export analysis report to file.

**Parameters:**
- `filename`: Output filename
- `format`: 'json', 'csv', or 'txt'

## ๐Ÿงช Development

### Setup Development Environment

```bash
git clone https://github.com/ml-sniff/ml-sniff.git
cd ml-sniff
pip install -e ".[dev]"
```

### Run Tests

```bash
pytest tests/
```

### Code Formatting

```bash
black ml_sniff/
flake8 ml_sniff/
```

## ๐Ÿ“‹ Dependencies

- pandas >= 1.3.0
- numpy >= 1.20.0
- matplotlib >= 3.3.0
- seaborn >= 0.11.0
- scikit-learn >= 1.0.0
- scipy >= 1.7.0
- plotly >= 5.0.0

## ๐Ÿš€ Roadmap

- [ ] Support for more file formats (Excel, JSON, etc.)
- [ ] Advanced feature engineering suggestions
- [ ] Model performance estimation
- [ ] Integration with popular ML libraries
- [ ] Web interface
- [ ] Batch processing capabilities
- [ ] Time series analysis
- [ ] Anomaly detection
- [ ] AutoML integration

## ๐Ÿค Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ†˜ Support

If you encounter any issues or have questions, please:

1. Check the [documentation](https://github.com/ml-sniff/ml-sniff#readme)
2. Search [existing issues](https://github.com/ml-sniff/ml-sniff/issues)
3. Create a [new issue](https://github.com/ml-sniff/ml-sniff/issues/new)

---

**Made with โค๏ธ for the ML community** 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Sherin-SEF-AI/ml-sniffer",
    "name": "ml-sniff",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Sherin Joseph Roy <sherin.joseph2217@gmail.com>",
    "keywords": "machine-learning, data-analysis, classification, regression, clustering, automation, streamlit, gui",
    "author": "Sherin Joseph Roy",
    "author_email": "Sherin Joseph Roy <sherin.joseph2217@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d3/81/bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd/ml_sniff-1.0.0.tar.gz",
    "platform": null,
    "description": "# ML Sniff \ud83d\udd75\ufe0f\u200d\u2642\ufe0f\n\n**Advanced Machine Learning Problem Detection from CSV files and DataFrames**\n\n*By [Sherin Joseph Roy](https://sherin-sef-ai.github.io/) - Startup Founder & Hardware/IoT Enthusiast*\n\nML Sniff is a comprehensive Python package that automatically analyzes your data to determine the most likely machine learning problem type, identifies the target column, suggests appropriate models, and provides advanced data analytics.\n\n## \ud83d\ude80 Features\n\n- \ud83d\udd0d **Automatic Target Detection**: Uses advanced heuristics to identify the most likely target column\n- \ud83c\udfaf **Problem Type Classification**: Determines if your data is Classification, Regression, or Clustering\n- \ud83e\udd16 **Model Suggestions**: Recommends appropriate algorithms with hyperparameters\n- \ud83d\udcca **Comprehensive Analysis**: Provides detailed statistics and visualizations\n- \ud83c\udfc6 **Feature Importance**: Multiple methods (Random Forest, Mutual Information, Correlation)\n- \ud83d\udd0d **Data Quality Assessment**: Missing data, duplicates, outliers, and variance analysis\n- \ud83d\udcc8 **Advanced Visualizations**: Static plots and interactive Plotly dashboards\n- \ud83d\udda5\ufe0f **CLI Support**: Analyze files directly from the command line\n- \ud83d\udda5\ufe0f **Web GUI**: Beautiful Streamlit interface with interactive dashboards\n- \ud83d\udce4 **Export Capabilities**: Export reports in JSON, CSV, or TXT formats\n- \ud83d\udee0\ufe0f **Preprocessing Suggestions**: Automated recommendations for data preparation\n\n## \ud83d\udce6 Installation\n\n### From PyPI (when published)\n```bash\npip install ml-sniff\n```\n\n### From Source\n```bash\ngit clone https://github.com/Sherin-SEF-AI/ml-sniffer.git\ncd ml-sniffer\npip install .\n```\n\n## \ud83d\ude80 Quick Start\n\n### Command Line Interface\n\nBasic analysis:\n```bash\nml-sniff your_data.csv\n```\n\nShow visualizations:\n```bash\nml-sniff your_data.csv --visualize\n```\n\nCreate interactive dashboard:\n```bash\nml-sniff your_data.csv --interactive\n```\n\nExport detailed report:\n```bash\nml-sniff your_data.csv --export report.json --format json\n```\n\nShow preprocessing suggestions:\n```bash\nml-sniff your_data.csv --preprocessing\n```\n\nShow feature importance:\n```bash\nml-sniff your_data.csv --feature-importance\n```\n\nShow data quality report:\n```bash\nml-sniff your_data.csv --data-quality\n```\n\nSpecify target column manually:\n```bash\nml-sniff your_data.csv --target target_column\n```\n\n### Web Interface (GUI)\n\nLaunch the beautiful Streamlit web interface:\n\n```bash\n# Method 1: Using the launcher script\npython run_gui.py\n\n# Method 2: Direct streamlit command\nstreamlit run streamlit_app.py\n\n# Method 3: Using the command line entry point\nml-sniff-gui\n```\n\nThe GUI will open in your browser at `http://localhost:8501` and provides:\n\n- \ud83d\udcc1 **File Upload**: Drag and drop CSV files\n- \ud83c\udfaf **Interactive Analysis**: Real-time analysis with visual feedback\n- \ud83d\udcca **Interactive Charts**: Plotly visualizations with zoom, pan, and hover\n- \ud83c\udfc6 **Feature Analysis**: Multiple importance methods with interactive charts\n- \ud83d\udd0d **Data Quality**: Comprehensive quality assessment with detailed reports\n- \ud83d\udcc8 **Visualizations**: Correlation matrices, distributions, and outlier analysis\n- \ud83d\udce4 **Export**: Download reports in multiple formats\n- \u2699\ufe0f **Customization**: Toggle features and analysis options\n\n### Python API\n\n```python\nfrom ml_sniff import Sniffer\n\n# Basic analysis\nsniffer = Sniffer(\"your_data.csv\")\nsniffer.report()\n\n# Advanced analysis with manual target\nsniffer = Sniffer(\"your_data.csv\", target_column=\"target\")\nsniffer.report()\n\n# Get feature importance\ntop_features = sniffer.get_top_features(5, method='random_forest')\nprint(f\"Top features: {top_features}\")\n\n# Get preprocessing suggestions\nsuggestions = sniffer.suggest_preprocessing()\nprint(suggestions)\n\n# Create visualizations\nsniffer.visualize_data()\nsniffer.create_interactive_dashboard()\n\n# Export report\nsniffer.export_report(\"analysis.json\", format=\"json\")\n```\n\n## \ud83d\udd27 Advanced Features\n\n### Feature Importance Analysis\n\nML Sniff provides multiple methods for feature importance:\n\n```python\n# Random Forest importance\nrf_importance = sniffer.get_feature_importance('random_forest')\n\n# Mutual Information\nmi_importance = sniffer.get_feature_importance('mutual_info')\n\n# Correlation-based\ncorr_importance = sniffer.get_feature_importance('correlation')\n\n# Get top features\ntop_features = sniffer.get_top_features(5, method='random_forest')\n```\n\n### Data Quality Assessment\n\nComprehensive data quality analysis:\n\n```python\n# Get data quality summary\nquality_issues = sniffer.get_data_quality_summary()\n\n# Access detailed quality metrics\nquality_report = sniffer.data_quality_report\n\n# Check for specific issues\nmissing_columns = quality_issues['high_missing']\noutlier_columns = quality_issues['many_outliers']\n```\n\n### Preprocessing Suggestions\n\nAutomated recommendations for data preparation:\n\n```python\nsuggestions = sniffer.suggest_preprocessing()\n\n# Missing data handling\nmissing_suggestions = suggestions['missing_data']\n\n# Outlier handling\noutlier_suggestions = suggestions['outliers']\n\n# Feature scaling\nscaling_suggestions = suggestions['scaling']\n\n# Categorical encoding\nencoding_suggestions = suggestions['encoding']\n\n# Feature selection\nselection_suggestions = suggestions['feature_selection']\n```\n\n### Interactive Dashboard\n\nCreate interactive Plotly dashboards:\n\n```python\n# Create interactive dashboard\nsniffer.create_interactive_dashboard()\n```\n\n## \ud83d\udcca Example Output\n\n```\n================================================================================\nML SNIFF - ADVANCED ML PROBLEM DETECTION\n================================================================================\n\n\ud83d\udcca BASIC STATISTICS:\n   \u2022 Rows: 1,000\n   \u2022 Columns: 10\n   \u2022 Missing Data: 2.50%\n   \u2022 Memory Usage: 0.78 MB\n   \u2022 Numeric Columns: 6\n   \u2022 Categorical Columns: 1\n\n\ud83d\udccb DATA TYPES:\n   \u2022 float64: 6 columns\n   \u2022 int64: 3 columns\n   \u2022 object: 1 columns\n\n\ud83d\udd0d DATA QUALITY ASSESSMENT:\n   \u2022 High Missing: feature3\n   \u2022 Many Outliers: feature1, feature2\n\n\ud83c\udfaf TARGET COLUMN ANALYSIS:\n   \u2022 Identified Target: 'target'\n   \u2022 Problem Type: Classification\n   \u2022 Suggested Model: RandomForestClassifier\n\n   \u2022 Target Statistics:\n     - Data Type: int64\n     - Unique Values: 3\n     - Missing Values: 0\n     - Mean: 1.2000\n     - Std: 0.8165\n     - Min: 0.0000\n     - Max: 2.0000\n     - Skewness: 0.0000\n     - Kurtosis: -1.5000\n     - Label Distribution:\n       * 0: 400 (40.0%)\n       * 1: 350 (35.0%)\n       * 2: 250 (25.0%)\n\n\ud83c\udfc6 FEATURE IMPORTANCE:\n   1. feature1: 0.3800\n   2. feature3: 0.2628\n   3. feature4: 0.2000\n   4. feature2: 0.1572\n\n\ud83d\udca1 MODEL RECOMMENDATIONS:\n   \u2022 Primary Model: RandomForestClassifier\n   \u2022 Hyperparameters: {'n_estimators': 100, 'max_depth': 10, 'random_state': 42}\n   \u2022 Alternative Models: LogisticRegression, SVM, XGBClassifier\n   \u2022 Consider class imbalance if present\n   \u2022 Use metrics like accuracy, precision, recall, F1-score\n\n================================================================================\n```\n\n## \ud83d\udee0\ufe0f CLI Options\n\n```bash\nml-sniff [OPTIONS] FILE\n\nOptions:\n  --target, -t TEXT           Manually specify target column name\n  --visualize, -v            Show data visualizations\n  --interactive, -i          Create interactive Plotly dashboard\n  --output, -o TEXT          Save report to file instead of printing to console\n  --export, -e TEXT          Export detailed analysis report to file\n  --format, -f [json|csv|txt] Export format (default: json)\n  --summary, -s              Show only summary information\n  --preprocessing, -p        Show preprocessing suggestions\n  --no-auto-analyze          Skip automatic analysis on initialization\n  --feature-importance       Show feature importance analysis\n  --data-quality             Show detailed data quality report\n```\n\n## \ud83d\udcc8 Sample Data\n\nCreate sample datasets to test the package:\n\n```python\nimport pandas as pd\nimport numpy as np\n\n# Classification dataset\nnp.random.seed(42)\nn_samples = 1000\n\nclassification_data = {\n    'feature1': np.random.normal(0, 1, n_samples),\n    'feature2': np.random.normal(0, 1, n_samples),\n    'feature3': np.random.normal(0, 1, n_samples),\n    'feature4': np.random.normal(0, 1, n_samples),\n    'categorical_feature': np.random.choice(['A', 'B', 'C'], n_samples),\n    'target': np.random.choice([0, 1, 2], n_samples, p=[0.4, 0.35, 0.25])\n}\n\ndf = pd.DataFrame(classification_data)\ndf.to_csv('classification_sample.csv', index=False)\n\n# Regression dataset\nregression_data = {\n    'feature1': np.random.normal(0, 1, n_samples),\n    'feature2': np.random.normal(0, 1, n_samples),\n    'feature3': np.random.normal(0, 1, n_samples),\n    'target': np.random.normal(0, 1, n_samples)\n}\n\ndf_reg = pd.DataFrame(regression_data)\ndf_reg.to_csv('regression_sample.csv', index=False)\n```\n\n## \ud83d\udd2c API Reference\n\n### Sniffer Class\n\n#### `__init__(data, target_column=None, auto_analyze=True)`\nInitialize the Sniffer with data.\n\n**Parameters:**\n- `data`: CSV file path (str/Path) or pandas DataFrame\n- `target_column`: Optional manual target column specification\n- `auto_analyze`: Whether to automatically analyze data on initialization\n\n#### `report()`\nPrint a comprehensive analysis report to console.\n\n#### `get_summary()`\nGet analysis results as a dictionary.\n\n**Returns:**\n- Dictionary with keys: `target_column`, `problem_type`, `suggested_model`, `basic_stats`, `label_distribution`, `feature_importance`, `data_quality_report`, `outlier_info`, `clustering_analysis`, `quality_issues`\n\n#### `get_feature_importance(method='random_forest')`\nGet feature importance scores.\n\n**Parameters:**\n- `method`: 'random_forest', 'mutual_info', or 'correlation'\n\n**Returns:**\n- Dictionary of feature importance scores\n\n#### `get_top_features(n=5, method='random_forest')`\nGet top n most important features.\n\n**Parameters:**\n- `n`: Number of top features to return\n- `method`: Feature importance method to use\n\n**Returns:**\n- List of top feature names\n\n#### `get_data_quality_summary()`\nGet a summary of data quality issues.\n\n**Returns:**\n- Dictionary with data quality summary\n\n#### `suggest_preprocessing()`\nSuggest preprocessing steps based on data analysis.\n\n**Returns:**\n- Dictionary with preprocessing suggestions\n\n#### `visualize_data(figsize=(15, 10))`\nGenerate comprehensive data visualizations.\n\n#### `create_interactive_dashboard()`\nCreate an interactive Plotly dashboard.\n\n#### `export_report(filename, format='json')`\nExport analysis report to file.\n\n**Parameters:**\n- `filename`: Output filename\n- `format`: 'json', 'csv', or 'txt'\n\n## \ud83e\uddea Development\n\n### Setup Development Environment\n\n```bash\ngit clone https://github.com/ml-sniff/ml-sniff.git\ncd ml-sniff\npip install -e \".[dev]\"\n```\n\n### Run Tests\n\n```bash\npytest tests/\n```\n\n### Code Formatting\n\n```bash\nblack ml_sniff/\nflake8 ml_sniff/\n```\n\n## \ud83d\udccb Dependencies\n\n- pandas >= 1.3.0\n- numpy >= 1.20.0\n- matplotlib >= 3.3.0\n- seaborn >= 0.11.0\n- scikit-learn >= 1.0.0\n- scipy >= 1.7.0\n- plotly >= 5.0.0\n\n## \ud83d\ude80 Roadmap\n\n- [ ] Support for more file formats (Excel, JSON, etc.)\n- [ ] Advanced feature engineering suggestions\n- [ ] Model performance estimation\n- [ ] Integration with popular ML libraries\n- [ ] Web interface\n- [ ] Batch processing capabilities\n- [ ] Time series analysis\n- [ ] Anomaly detection\n- [ ] AutoML integration\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udd98 Support\n\nIf you encounter any issues or have questions, please:\n\n1. Check the [documentation](https://github.com/ml-sniff/ml-sniff#readme)\n2. Search [existing issues](https://github.com/ml-sniff/ml-sniff/issues)\n3. Create a [new issue](https://github.com/ml-sniff/ml-sniff/issues/new)\n\n---\n\n**Made with \u2764\ufe0f for the ML community** \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Advanced Machine Learning Problem Detection with CLI and GUI interfaces",
    "version": "1.0.0",
    "project_urls": {
        "Author Website": "https://sherin-sef-ai.github.io/",
        "Bug Tracker": "https://github.com/Sherin-SEF-AI/ml-sniffer/issues",
        "Documentation": "https://github.com/Sherin-SEF-AI/ml-sniffer#readme",
        "Homepage": "https://github.com/Sherin-SEF-AI/ml-sniffer",
        "Repository": "https://github.com/Sherin-SEF-AI/ml-sniffer"
    },
    "split_keywords": [
        "machine-learning",
        " data-analysis",
        " classification",
        " regression",
        " clustering",
        " automation",
        " streamlit",
        " gui"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f6889aa371a6bfb22e016bcb845bba0eb64c04d25a2aa1f55fbeeef320e78df9",
                "md5": "8c5f56cdf46de76b63be53663b9f4a2c",
                "sha256": "40dfb8fd4d7b0abe27eab1f58aabb22489f87338321471da9bd4669a589c3026"
            },
            "downloads": -1,
            "filename": "ml_sniff-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8c5f56cdf46de76b63be53663b9f4a2c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 26663,
            "upload_time": "2025-07-27T10:59:21",
            "upload_time_iso_8601": "2025-07-27T10:59:21.072938Z",
            "url": "https://files.pythonhosted.org/packages/f6/88/9aa371a6bfb22e016bcb845bba0eb64c04d25a2aa1f55fbeeef320e78df9/ml_sniff-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d381bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd",
                "md5": "02bdc5e6dade85334b9f0722b786efb2",
                "sha256": "0e6b2ce7c8d02074cf4b83cbc77819a614c90d8ae0c66e72c7f47178fa6a3ad9"
            },
            "downloads": -1,
            "filename": "ml_sniff-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "02bdc5e6dade85334b9f0722b786efb2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 32544,
            "upload_time": "2025-07-27T10:59:47",
            "upload_time_iso_8601": "2025-07-27T10:59:47.356243Z",
            "url": "https://files.pythonhosted.org/packages/d3/81/bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd/ml_sniff-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 10:59:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Sherin-SEF-AI",
    "github_project": "ml-sniffer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.3.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "streamlit",
            "specs": [
                [
                    ">=",
                    "1.28.0"
                ]
            ]
        },
        {
            "name": "streamlit-option-menu",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "streamlit-aggrid",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ]
            ]
        }
    ],
    "lcname": "ml-sniff"
}
        
Elapsed time: 1.91358s