# ML Sniff ๐ต๏ธโโ๏ธ
**Advanced Machine Learning Problem Detection from CSV files and DataFrames**
*By [Sherin Joseph Roy](https://sherin-sef-ai.github.io/) - Startup Founder & Hardware/IoT Enthusiast*
ML Sniff is a comprehensive Python package that automatically analyzes your data to determine the most likely machine learning problem type, identifies the target column, suggests appropriate models, and provides advanced data analytics.
## ๐ Features
- ๐ **Automatic Target Detection**: Uses advanced heuristics to identify the most likely target column
- ๐ฏ **Problem Type Classification**: Determines if your data is Classification, Regression, or Clustering
- ๐ค **Model Suggestions**: Recommends appropriate algorithms with hyperparameters
- ๐ **Comprehensive Analysis**: Provides detailed statistics and visualizations
- ๐ **Feature Importance**: Multiple methods (Random Forest, Mutual Information, Correlation)
- ๐ **Data Quality Assessment**: Missing data, duplicates, outliers, and variance analysis
- ๐ **Advanced Visualizations**: Static plots and interactive Plotly dashboards
- ๐ฅ๏ธ **CLI Support**: Analyze files directly from the command line
- ๐ฅ๏ธ **Web GUI**: Beautiful Streamlit interface with interactive dashboards
- ๐ค **Export Capabilities**: Export reports in JSON, CSV, or TXT formats
- ๐ ๏ธ **Preprocessing Suggestions**: Automated recommendations for data preparation
## ๐ฆ Installation
### From PyPI (when published)
```bash
pip install ml-sniff
```
### From Source
```bash
git clone https://github.com/Sherin-SEF-AI/ml-sniffer.git
cd ml-sniffer
pip install .
```
## ๐ Quick Start
### Command Line Interface
Basic analysis:
```bash
ml-sniff your_data.csv
```
Show visualizations:
```bash
ml-sniff your_data.csv --visualize
```
Create interactive dashboard:
```bash
ml-sniff your_data.csv --interactive
```
Export detailed report:
```bash
ml-sniff your_data.csv --export report.json --format json
```
Show preprocessing suggestions:
```bash
ml-sniff your_data.csv --preprocessing
```
Show feature importance:
```bash
ml-sniff your_data.csv --feature-importance
```
Show data quality report:
```bash
ml-sniff your_data.csv --data-quality
```
Specify target column manually:
```bash
ml-sniff your_data.csv --target target_column
```
### Web Interface (GUI)
Launch the beautiful Streamlit web interface:
```bash
# Method 1: Using the launcher script
python run_gui.py
# Method 2: Direct streamlit command
streamlit run streamlit_app.py
# Method 3: Using the command line entry point
ml-sniff-gui
```
The GUI will open in your browser at `http://localhost:8501` and provides:
- ๐ **File Upload**: Drag and drop CSV files
- ๐ฏ **Interactive Analysis**: Real-time analysis with visual feedback
- ๐ **Interactive Charts**: Plotly visualizations with zoom, pan, and hover
- ๐ **Feature Analysis**: Multiple importance methods with interactive charts
- ๐ **Data Quality**: Comprehensive quality assessment with detailed reports
- ๐ **Visualizations**: Correlation matrices, distributions, and outlier analysis
- ๐ค **Export**: Download reports in multiple formats
- โ๏ธ **Customization**: Toggle features and analysis options
### Python API
```python
from ml_sniff import Sniffer
# Basic analysis
sniffer = Sniffer("your_data.csv")
sniffer.report()
# Advanced analysis with manual target
sniffer = Sniffer("your_data.csv", target_column="target")
sniffer.report()
# Get feature importance
top_features = sniffer.get_top_features(5, method='random_forest')
print(f"Top features: {top_features}")
# Get preprocessing suggestions
suggestions = sniffer.suggest_preprocessing()
print(suggestions)
# Create visualizations
sniffer.visualize_data()
sniffer.create_interactive_dashboard()
# Export report
sniffer.export_report("analysis.json", format="json")
```
## ๐ง Advanced Features
### Feature Importance Analysis
ML Sniff provides multiple methods for feature importance:
```python
# Random Forest importance
rf_importance = sniffer.get_feature_importance('random_forest')
# Mutual Information
mi_importance = sniffer.get_feature_importance('mutual_info')
# Correlation-based
corr_importance = sniffer.get_feature_importance('correlation')
# Get top features
top_features = sniffer.get_top_features(5, method='random_forest')
```
### Data Quality Assessment
Comprehensive data quality analysis:
```python
# Get data quality summary
quality_issues = sniffer.get_data_quality_summary()
# Access detailed quality metrics
quality_report = sniffer.data_quality_report
# Check for specific issues
missing_columns = quality_issues['high_missing']
outlier_columns = quality_issues['many_outliers']
```
### Preprocessing Suggestions
Automated recommendations for data preparation:
```python
suggestions = sniffer.suggest_preprocessing()
# Missing data handling
missing_suggestions = suggestions['missing_data']
# Outlier handling
outlier_suggestions = suggestions['outliers']
# Feature scaling
scaling_suggestions = suggestions['scaling']
# Categorical encoding
encoding_suggestions = suggestions['encoding']
# Feature selection
selection_suggestions = suggestions['feature_selection']
```
### Interactive Dashboard
Create interactive Plotly dashboards:
```python
# Create interactive dashboard
sniffer.create_interactive_dashboard()
```
## ๐ Example Output
```
================================================================================
ML SNIFF - ADVANCED ML PROBLEM DETECTION
================================================================================
๐ BASIC STATISTICS:
โข Rows: 1,000
โข Columns: 10
โข Missing Data: 2.50%
โข Memory Usage: 0.78 MB
โข Numeric Columns: 6
โข Categorical Columns: 1
๐ DATA TYPES:
โข float64: 6 columns
โข int64: 3 columns
โข object: 1 columns
๐ DATA QUALITY ASSESSMENT:
โข High Missing: feature3
โข Many Outliers: feature1, feature2
๐ฏ TARGET COLUMN ANALYSIS:
โข Identified Target: 'target'
โข Problem Type: Classification
โข Suggested Model: RandomForestClassifier
โข Target Statistics:
- Data Type: int64
- Unique Values: 3
- Missing Values: 0
- Mean: 1.2000
- Std: 0.8165
- Min: 0.0000
- Max: 2.0000
- Skewness: 0.0000
- Kurtosis: -1.5000
- Label Distribution:
* 0: 400 (40.0%)
* 1: 350 (35.0%)
* 2: 250 (25.0%)
๐ FEATURE IMPORTANCE:
1. feature1: 0.3800
2. feature3: 0.2628
3. feature4: 0.2000
4. feature2: 0.1572
๐ก MODEL RECOMMENDATIONS:
โข Primary Model: RandomForestClassifier
โข Hyperparameters: {'n_estimators': 100, 'max_depth': 10, 'random_state': 42}
โข Alternative Models: LogisticRegression, SVM, XGBClassifier
โข Consider class imbalance if present
โข Use metrics like accuracy, precision, recall, F1-score
================================================================================
```
## ๐ ๏ธ CLI Options
```bash
ml-sniff [OPTIONS] FILE
Options:
--target, -t TEXT Manually specify target column name
--visualize, -v Show data visualizations
--interactive, -i Create interactive Plotly dashboard
--output, -o TEXT Save report to file instead of printing to console
--export, -e TEXT Export detailed analysis report to file
--format, -f [json|csv|txt] Export format (default: json)
--summary, -s Show only summary information
--preprocessing, -p Show preprocessing suggestions
--no-auto-analyze Skip automatic analysis on initialization
--feature-importance Show feature importance analysis
--data-quality Show detailed data quality report
```
## ๐ Sample Data
Create sample datasets to test the package:
```python
import pandas as pd
import numpy as np
# Classification dataset
np.random.seed(42)
n_samples = 1000
classification_data = {
'feature1': np.random.normal(0, 1, n_samples),
'feature2': np.random.normal(0, 1, n_samples),
'feature3': np.random.normal(0, 1, n_samples),
'feature4': np.random.normal(0, 1, n_samples),
'categorical_feature': np.random.choice(['A', 'B', 'C'], n_samples),
'target': np.random.choice([0, 1, 2], n_samples, p=[0.4, 0.35, 0.25])
}
df = pd.DataFrame(classification_data)
df.to_csv('classification_sample.csv', index=False)
# Regression dataset
regression_data = {
'feature1': np.random.normal(0, 1, n_samples),
'feature2': np.random.normal(0, 1, n_samples),
'feature3': np.random.normal(0, 1, n_samples),
'target': np.random.normal(0, 1, n_samples)
}
df_reg = pd.DataFrame(regression_data)
df_reg.to_csv('regression_sample.csv', index=False)
```
## ๐ฌ API Reference
### Sniffer Class
#### `__init__(data, target_column=None, auto_analyze=True)`
Initialize the Sniffer with data.
**Parameters:**
- `data`: CSV file path (str/Path) or pandas DataFrame
- `target_column`: Optional manual target column specification
- `auto_analyze`: Whether to automatically analyze data on initialization
#### `report()`
Print a comprehensive analysis report to console.
#### `get_summary()`
Get analysis results as a dictionary.
**Returns:**
- Dictionary with keys: `target_column`, `problem_type`, `suggested_model`, `basic_stats`, `label_distribution`, `feature_importance`, `data_quality_report`, `outlier_info`, `clustering_analysis`, `quality_issues`
#### `get_feature_importance(method='random_forest')`
Get feature importance scores.
**Parameters:**
- `method`: 'random_forest', 'mutual_info', or 'correlation'
**Returns:**
- Dictionary of feature importance scores
#### `get_top_features(n=5, method='random_forest')`
Get top n most important features.
**Parameters:**
- `n`: Number of top features to return
- `method`: Feature importance method to use
**Returns:**
- List of top feature names
#### `get_data_quality_summary()`
Get a summary of data quality issues.
**Returns:**
- Dictionary with data quality summary
#### `suggest_preprocessing()`
Suggest preprocessing steps based on data analysis.
**Returns:**
- Dictionary with preprocessing suggestions
#### `visualize_data(figsize=(15, 10))`
Generate comprehensive data visualizations.
#### `create_interactive_dashboard()`
Create an interactive Plotly dashboard.
#### `export_report(filename, format='json')`
Export analysis report to file.
**Parameters:**
- `filename`: Output filename
- `format`: 'json', 'csv', or 'txt'
## ๐งช Development
### Setup Development Environment
```bash
git clone https://github.com/ml-sniff/ml-sniff.git
cd ml-sniff
pip install -e ".[dev]"
```
### Run Tests
```bash
pytest tests/
```
### Code Formatting
```bash
black ml_sniff/
flake8 ml_sniff/
```
## ๐ Dependencies
- pandas >= 1.3.0
- numpy >= 1.20.0
- matplotlib >= 3.3.0
- seaborn >= 0.11.0
- scikit-learn >= 1.0.0
- scipy >= 1.7.0
- plotly >= 5.0.0
## ๐ Roadmap
- [ ] Support for more file formats (Excel, JSON, etc.)
- [ ] Advanced feature engineering suggestions
- [ ] Model performance estimation
- [ ] Integration with popular ML libraries
- [ ] Web interface
- [ ] Batch processing capabilities
- [ ] Time series analysis
- [ ] Anomaly detection
- [ ] AutoML integration
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Support
If you encounter any issues or have questions, please:
1. Check the [documentation](https://github.com/ml-sniff/ml-sniff#readme)
2. Search [existing issues](https://github.com/ml-sniff/ml-sniff/issues)
3. Create a [new issue](https://github.com/ml-sniff/ml-sniff/issues/new)
---
**Made with โค๏ธ for the ML community**
Raw data
{
"_id": null,
"home_page": "https://github.com/Sherin-SEF-AI/ml-sniffer",
"name": "ml-sniff",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Sherin Joseph Roy <sherin.joseph2217@gmail.com>",
"keywords": "machine-learning, data-analysis, classification, regression, clustering, automation, streamlit, gui",
"author": "Sherin Joseph Roy",
"author_email": "Sherin Joseph Roy <sherin.joseph2217@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d3/81/bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd/ml_sniff-1.0.0.tar.gz",
"platform": null,
"description": "# ML Sniff \ud83d\udd75\ufe0f\u200d\u2642\ufe0f\n\n**Advanced Machine Learning Problem Detection from CSV files and DataFrames**\n\n*By [Sherin Joseph Roy](https://sherin-sef-ai.github.io/) - Startup Founder & Hardware/IoT Enthusiast*\n\nML Sniff is a comprehensive Python package that automatically analyzes your data to determine the most likely machine learning problem type, identifies the target column, suggests appropriate models, and provides advanced data analytics.\n\n## \ud83d\ude80 Features\n\n- \ud83d\udd0d **Automatic Target Detection**: Uses advanced heuristics to identify the most likely target column\n- \ud83c\udfaf **Problem Type Classification**: Determines if your data is Classification, Regression, or Clustering\n- \ud83e\udd16 **Model Suggestions**: Recommends appropriate algorithms with hyperparameters\n- \ud83d\udcca **Comprehensive Analysis**: Provides detailed statistics and visualizations\n- \ud83c\udfc6 **Feature Importance**: Multiple methods (Random Forest, Mutual Information, Correlation)\n- \ud83d\udd0d **Data Quality Assessment**: Missing data, duplicates, outliers, and variance analysis\n- \ud83d\udcc8 **Advanced Visualizations**: Static plots and interactive Plotly dashboards\n- \ud83d\udda5\ufe0f **CLI Support**: Analyze files directly from the command line\n- \ud83d\udda5\ufe0f **Web GUI**: Beautiful Streamlit interface with interactive dashboards\n- \ud83d\udce4 **Export Capabilities**: Export reports in JSON, CSV, or TXT formats\n- \ud83d\udee0\ufe0f **Preprocessing Suggestions**: Automated recommendations for data preparation\n\n## \ud83d\udce6 Installation\n\n### From PyPI (when published)\n```bash\npip install ml-sniff\n```\n\n### From Source\n```bash\ngit clone https://github.com/Sherin-SEF-AI/ml-sniffer.git\ncd ml-sniffer\npip install .\n```\n\n## \ud83d\ude80 Quick Start\n\n### Command Line Interface\n\nBasic analysis:\n```bash\nml-sniff your_data.csv\n```\n\nShow visualizations:\n```bash\nml-sniff your_data.csv --visualize\n```\n\nCreate interactive dashboard:\n```bash\nml-sniff your_data.csv --interactive\n```\n\nExport detailed report:\n```bash\nml-sniff your_data.csv --export report.json --format json\n```\n\nShow preprocessing suggestions:\n```bash\nml-sniff your_data.csv --preprocessing\n```\n\nShow feature importance:\n```bash\nml-sniff your_data.csv --feature-importance\n```\n\nShow data quality report:\n```bash\nml-sniff your_data.csv --data-quality\n```\n\nSpecify target column manually:\n```bash\nml-sniff your_data.csv --target target_column\n```\n\n### Web Interface (GUI)\n\nLaunch the beautiful Streamlit web interface:\n\n```bash\n# Method 1: Using the launcher script\npython run_gui.py\n\n# Method 2: Direct streamlit command\nstreamlit run streamlit_app.py\n\n# Method 3: Using the command line entry point\nml-sniff-gui\n```\n\nThe GUI will open in your browser at `http://localhost:8501` and provides:\n\n- \ud83d\udcc1 **File Upload**: Drag and drop CSV files\n- \ud83c\udfaf **Interactive Analysis**: Real-time analysis with visual feedback\n- \ud83d\udcca **Interactive Charts**: Plotly visualizations with zoom, pan, and hover\n- \ud83c\udfc6 **Feature Analysis**: Multiple importance methods with interactive charts\n- \ud83d\udd0d **Data Quality**: Comprehensive quality assessment with detailed reports\n- \ud83d\udcc8 **Visualizations**: Correlation matrices, distributions, and outlier analysis\n- \ud83d\udce4 **Export**: Download reports in multiple formats\n- \u2699\ufe0f **Customization**: Toggle features and analysis options\n\n### Python API\n\n```python\nfrom ml_sniff import Sniffer\n\n# Basic analysis\nsniffer = Sniffer(\"your_data.csv\")\nsniffer.report()\n\n# Advanced analysis with manual target\nsniffer = Sniffer(\"your_data.csv\", target_column=\"target\")\nsniffer.report()\n\n# Get feature importance\ntop_features = sniffer.get_top_features(5, method='random_forest')\nprint(f\"Top features: {top_features}\")\n\n# Get preprocessing suggestions\nsuggestions = sniffer.suggest_preprocessing()\nprint(suggestions)\n\n# Create visualizations\nsniffer.visualize_data()\nsniffer.create_interactive_dashboard()\n\n# Export report\nsniffer.export_report(\"analysis.json\", format=\"json\")\n```\n\n## \ud83d\udd27 Advanced Features\n\n### Feature Importance Analysis\n\nML Sniff provides multiple methods for feature importance:\n\n```python\n# Random Forest importance\nrf_importance = sniffer.get_feature_importance('random_forest')\n\n# Mutual Information\nmi_importance = sniffer.get_feature_importance('mutual_info')\n\n# Correlation-based\ncorr_importance = sniffer.get_feature_importance('correlation')\n\n# Get top features\ntop_features = sniffer.get_top_features(5, method='random_forest')\n```\n\n### Data Quality Assessment\n\nComprehensive data quality analysis:\n\n```python\n# Get data quality summary\nquality_issues = sniffer.get_data_quality_summary()\n\n# Access detailed quality metrics\nquality_report = sniffer.data_quality_report\n\n# Check for specific issues\nmissing_columns = quality_issues['high_missing']\noutlier_columns = quality_issues['many_outliers']\n```\n\n### Preprocessing Suggestions\n\nAutomated recommendations for data preparation:\n\n```python\nsuggestions = sniffer.suggest_preprocessing()\n\n# Missing data handling\nmissing_suggestions = suggestions['missing_data']\n\n# Outlier handling\noutlier_suggestions = suggestions['outliers']\n\n# Feature scaling\nscaling_suggestions = suggestions['scaling']\n\n# Categorical encoding\nencoding_suggestions = suggestions['encoding']\n\n# Feature selection\nselection_suggestions = suggestions['feature_selection']\n```\n\n### Interactive Dashboard\n\nCreate interactive Plotly dashboards:\n\n```python\n# Create interactive dashboard\nsniffer.create_interactive_dashboard()\n```\n\n## \ud83d\udcca Example Output\n\n```\n================================================================================\nML SNIFF - ADVANCED ML PROBLEM DETECTION\n================================================================================\n\n\ud83d\udcca BASIC STATISTICS:\n \u2022 Rows: 1,000\n \u2022 Columns: 10\n \u2022 Missing Data: 2.50%\n \u2022 Memory Usage: 0.78 MB\n \u2022 Numeric Columns: 6\n \u2022 Categorical Columns: 1\n\n\ud83d\udccb DATA TYPES:\n \u2022 float64: 6 columns\n \u2022 int64: 3 columns\n \u2022 object: 1 columns\n\n\ud83d\udd0d DATA QUALITY ASSESSMENT:\n \u2022 High Missing: feature3\n \u2022 Many Outliers: feature1, feature2\n\n\ud83c\udfaf TARGET COLUMN ANALYSIS:\n \u2022 Identified Target: 'target'\n \u2022 Problem Type: Classification\n \u2022 Suggested Model: RandomForestClassifier\n\n \u2022 Target Statistics:\n - Data Type: int64\n - Unique Values: 3\n - Missing Values: 0\n - Mean: 1.2000\n - Std: 0.8165\n - Min: 0.0000\n - Max: 2.0000\n - Skewness: 0.0000\n - Kurtosis: -1.5000\n - Label Distribution:\n * 0: 400 (40.0%)\n * 1: 350 (35.0%)\n * 2: 250 (25.0%)\n\n\ud83c\udfc6 FEATURE IMPORTANCE:\n 1. feature1: 0.3800\n 2. feature3: 0.2628\n 3. feature4: 0.2000\n 4. feature2: 0.1572\n\n\ud83d\udca1 MODEL RECOMMENDATIONS:\n \u2022 Primary Model: RandomForestClassifier\n \u2022 Hyperparameters: {'n_estimators': 100, 'max_depth': 10, 'random_state': 42}\n \u2022 Alternative Models: LogisticRegression, SVM, XGBClassifier\n \u2022 Consider class imbalance if present\n \u2022 Use metrics like accuracy, precision, recall, F1-score\n\n================================================================================\n```\n\n## \ud83d\udee0\ufe0f CLI Options\n\n```bash\nml-sniff [OPTIONS] FILE\n\nOptions:\n --target, -t TEXT Manually specify target column name\n --visualize, -v Show data visualizations\n --interactive, -i Create interactive Plotly dashboard\n --output, -o TEXT Save report to file instead of printing to console\n --export, -e TEXT Export detailed analysis report to file\n --format, -f [json|csv|txt] Export format (default: json)\n --summary, -s Show only summary information\n --preprocessing, -p Show preprocessing suggestions\n --no-auto-analyze Skip automatic analysis on initialization\n --feature-importance Show feature importance analysis\n --data-quality Show detailed data quality report\n```\n\n## \ud83d\udcc8 Sample Data\n\nCreate sample datasets to test the package:\n\n```python\nimport pandas as pd\nimport numpy as np\n\n# Classification dataset\nnp.random.seed(42)\nn_samples = 1000\n\nclassification_data = {\n 'feature1': np.random.normal(0, 1, n_samples),\n 'feature2': np.random.normal(0, 1, n_samples),\n 'feature3': np.random.normal(0, 1, n_samples),\n 'feature4': np.random.normal(0, 1, n_samples),\n 'categorical_feature': np.random.choice(['A', 'B', 'C'], n_samples),\n 'target': np.random.choice([0, 1, 2], n_samples, p=[0.4, 0.35, 0.25])\n}\n\ndf = pd.DataFrame(classification_data)\ndf.to_csv('classification_sample.csv', index=False)\n\n# Regression dataset\nregression_data = {\n 'feature1': np.random.normal(0, 1, n_samples),\n 'feature2': np.random.normal(0, 1, n_samples),\n 'feature3': np.random.normal(0, 1, n_samples),\n 'target': np.random.normal(0, 1, n_samples)\n}\n\ndf_reg = pd.DataFrame(regression_data)\ndf_reg.to_csv('regression_sample.csv', index=False)\n```\n\n## \ud83d\udd2c API Reference\n\n### Sniffer Class\n\n#### `__init__(data, target_column=None, auto_analyze=True)`\nInitialize the Sniffer with data.\n\n**Parameters:**\n- `data`: CSV file path (str/Path) or pandas DataFrame\n- `target_column`: Optional manual target column specification\n- `auto_analyze`: Whether to automatically analyze data on initialization\n\n#### `report()`\nPrint a comprehensive analysis report to console.\n\n#### `get_summary()`\nGet analysis results as a dictionary.\n\n**Returns:**\n- Dictionary with keys: `target_column`, `problem_type`, `suggested_model`, `basic_stats`, `label_distribution`, `feature_importance`, `data_quality_report`, `outlier_info`, `clustering_analysis`, `quality_issues`\n\n#### `get_feature_importance(method='random_forest')`\nGet feature importance scores.\n\n**Parameters:**\n- `method`: 'random_forest', 'mutual_info', or 'correlation'\n\n**Returns:**\n- Dictionary of feature importance scores\n\n#### `get_top_features(n=5, method='random_forest')`\nGet top n most important features.\n\n**Parameters:**\n- `n`: Number of top features to return\n- `method`: Feature importance method to use\n\n**Returns:**\n- List of top feature names\n\n#### `get_data_quality_summary()`\nGet a summary of data quality issues.\n\n**Returns:**\n- Dictionary with data quality summary\n\n#### `suggest_preprocessing()`\nSuggest preprocessing steps based on data analysis.\n\n**Returns:**\n- Dictionary with preprocessing suggestions\n\n#### `visualize_data(figsize=(15, 10))`\nGenerate comprehensive data visualizations.\n\n#### `create_interactive_dashboard()`\nCreate an interactive Plotly dashboard.\n\n#### `export_report(filename, format='json')`\nExport analysis report to file.\n\n**Parameters:**\n- `filename`: Output filename\n- `format`: 'json', 'csv', or 'txt'\n\n## \ud83e\uddea Development\n\n### Setup Development Environment\n\n```bash\ngit clone https://github.com/ml-sniff/ml-sniff.git\ncd ml-sniff\npip install -e \".[dev]\"\n```\n\n### Run Tests\n\n```bash\npytest tests/\n```\n\n### Code Formatting\n\n```bash\nblack ml_sniff/\nflake8 ml_sniff/\n```\n\n## \ud83d\udccb Dependencies\n\n- pandas >= 1.3.0\n- numpy >= 1.20.0\n- matplotlib >= 3.3.0\n- seaborn >= 0.11.0\n- scikit-learn >= 1.0.0\n- scipy >= 1.7.0\n- plotly >= 5.0.0\n\n## \ud83d\ude80 Roadmap\n\n- [ ] Support for more file formats (Excel, JSON, etc.)\n- [ ] Advanced feature engineering suggestions\n- [ ] Model performance estimation\n- [ ] Integration with popular ML libraries\n- [ ] Web interface\n- [ ] Batch processing capabilities\n- [ ] Time series analysis\n- [ ] Anomaly detection\n- [ ] AutoML integration\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udd98 Support\n\nIf you encounter any issues or have questions, please:\n\n1. Check the [documentation](https://github.com/ml-sniff/ml-sniff#readme)\n2. Search [existing issues](https://github.com/ml-sniff/ml-sniff/issues)\n3. Create a [new issue](https://github.com/ml-sniff/ml-sniff/issues/new)\n\n---\n\n**Made with \u2764\ufe0f for the ML community** \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Advanced Machine Learning Problem Detection with CLI and GUI interfaces",
"version": "1.0.0",
"project_urls": {
"Author Website": "https://sherin-sef-ai.github.io/",
"Bug Tracker": "https://github.com/Sherin-SEF-AI/ml-sniffer/issues",
"Documentation": "https://github.com/Sherin-SEF-AI/ml-sniffer#readme",
"Homepage": "https://github.com/Sherin-SEF-AI/ml-sniffer",
"Repository": "https://github.com/Sherin-SEF-AI/ml-sniffer"
},
"split_keywords": [
"machine-learning",
" data-analysis",
" classification",
" regression",
" clustering",
" automation",
" streamlit",
" gui"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f6889aa371a6bfb22e016bcb845bba0eb64c04d25a2aa1f55fbeeef320e78df9",
"md5": "8c5f56cdf46de76b63be53663b9f4a2c",
"sha256": "40dfb8fd4d7b0abe27eab1f58aabb22489f87338321471da9bd4669a589c3026"
},
"downloads": -1,
"filename": "ml_sniff-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8c5f56cdf46de76b63be53663b9f4a2c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 26663,
"upload_time": "2025-07-27T10:59:21",
"upload_time_iso_8601": "2025-07-27T10:59:21.072938Z",
"url": "https://files.pythonhosted.org/packages/f6/88/9aa371a6bfb22e016bcb845bba0eb64c04d25a2aa1f55fbeeef320e78df9/ml_sniff-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d381bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd",
"md5": "02bdc5e6dade85334b9f0722b786efb2",
"sha256": "0e6b2ce7c8d02074cf4b83cbc77819a614c90d8ae0c66e72c7f47178fa6a3ad9"
},
"downloads": -1,
"filename": "ml_sniff-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "02bdc5e6dade85334b9f0722b786efb2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 32544,
"upload_time": "2025-07-27T10:59:47",
"upload_time_iso_8601": "2025-07-27T10:59:47.356243Z",
"url": "https://files.pythonhosted.org/packages/d3/81/bd06d2bc3480c5c6c585cc36bd5101113ee0ea1058b73809677c8bd64bfd/ml_sniff-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-27 10:59:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Sherin-SEF-AI",
"github_project": "ml-sniffer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "streamlit",
"specs": [
[
">=",
"1.28.0"
]
]
},
{
"name": "streamlit-option-menu",
"specs": [
[
">=",
"0.3.0"
]
]
},
{
"name": "streamlit-aggrid",
"specs": [
[
">=",
"0.3.0"
]
]
}
],
"lcname": "ml-sniff"
}