# mlbench-lite
A comprehensive machine learning benchmarking library that provides an easy way to compare multiple ML models on your dataset. Built with scikit-learn, XGBoost, LightGBM, CatBoost, and pandas for seamless integration into your ML workflow.
## ๐ Features
- **Comprehensive Model Support**: 20+ ML models from multiple libraries
- **Flexible Model Selection**: Choose specific models, categories, or exclude models
- **Multiple ML Libraries**: scikit-learn, XGBoost, LightGBM, CatBoost
- **Simple API**: One function call to benchmark multiple models
- **Comprehensive Metrics**: Returns Accuracy, Precision, Recall, and F1 scores
- **Custom Dataset**: Includes the `load_clover` dataset for testing
- **Easy Integration**: Works seamlessly with scikit-learn datasets
- **Pandas Output**: Results returned as a clean pandas DataFrame
- **Reproducible**: Consistent results with random state control
- **Model Information**: Get detailed info about available models
## ๐ฆ Installation
```bash
pip install mlbench-lite
```
## ๐ฏ Quick Start
```python
from mlbench_lite import benchmark, load_clover
# Load the clover dataset
X, y = load_clover(return_X_y=True)
# Benchmark all available models
results = benchmark(X, y)
print(results)
```
**Output:**
```
Model Category Accuracy Precision Recall F1
0 Random Forest Tree-based Models 0.9500 0.9565 0.9512 0.9505
1 SVM SVM Models 0.9250 0.9337 0.9255 0.9254
2 Logistic Regression Linear Models 0.9125 0.9131 0.9117 0.9115
3 XGBoost XGBoost 0.9000 0.9024 0.9000 0.8997
4 LightGBM LightGBM 0.8875 0.8891 0.8875 0.8873
```
## ๐ API Reference
### `benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`
Benchmark multiple machine learning models on a dataset.
**Parameters:**
- `X` (array-like): Training vectors of shape (n_samples, n_features)
- `y` (array-like): Target values of shape (n_samples,)
- `test_size` (float, optional): Proportion of dataset for testing (default: 0.2)
- `random_state` (int, optional): Random seed for reproducibility (default: 42)
- `models` (list of str, optional): Specific models to use. If None, uses all available models.
- `model_categories` (list of str, optional): Categories of models to use. If None, uses all categories.
- `exclude_models` (list of str, optional): Models to exclude from benchmarking.
**Returns:**
- `pandas.DataFrame`: Results with columns:
- `Model`: Name of the model
- `Category`: Category of the model
- `Accuracy`: Accuracy score
- `Precision`: Precision score (macro-averaged)
- `Recall`: Recall score (macro-averaged)
- `F1`: F1 score (macro-averaged)
### `list_available_models()`
List all available models and their categories.
**Returns:**
- `dict`: Dictionary with model categories as keys and lists of model names as values
### `get_model_info()`
Get detailed information about available models.
**Returns:**
- `pandas.DataFrame`: DataFrame with model information including category, name, and description
### `load_clover(return_X_y=False)`
Load the custom clover dataset.
**Parameters:**
- `return_X_y` (bool, default=False): If True, returns (data, target) instead of a Bunch object
**Returns:**
- `Bunch` or `tuple`: Dataset object with data, target, feature_names, target_names, and DESCR
## ๐ก Code Examples
### 1. Basic Usage with All Models
```python
from mlbench_lite import benchmark, load_clover
# Load the clover dataset
X, y = load_clover(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")
# Benchmark all available models
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)
# Get the best model
best_model = results.iloc[0]
print(f"\n๐ Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")
```
### 2. Model Selection - Specific Models
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Benchmark only specific models
results = benchmark(X, y, models=['Random Forest', 'XGBoost', 'LightGBM', 'Logistic Regression'])
print("Selected Models Results:")
print(results)
```
### 3. Model Selection - By Categories
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Benchmark only tree-based models
results = benchmark(X, y, model_categories=['Tree-based Models'])
print("Tree-based Models Results:")
print(results)
# Benchmark multiple categories
results = benchmark(X, y, model_categories=['Linear Models', 'SVM Models'])
print("\nLinear and SVM Models Results:")
print(results)
```
### 4. Exclude Specific Models
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Exclude slow models
results = benchmark(X, y, exclude_models=['Gaussian Process', 'Multi-layer Perceptron'])
print("Results without slow models:")
print(results)
```
### 5. List Available Models
```python
from mlbench_lite import list_available_models, get_model_info
# List all available models by category
models = list_available_models()
print("Available Models by Category:")
for category, model_list in models.items():
print(f"\n{category}:")
for model in model_list:
print(f" - {model}")
# Get detailed model information
model_info = get_model_info()
print("\nDetailed Model Information:")
print(model_info)
```
### 6. Advanced Model Selection
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Complex selection: specific models from specific categories, excluding some
results = benchmark(
X, y,
models=['Random Forest', 'XGBoost', 'SVM (RBF)', 'Logistic Regression'],
exclude_models=['SVM (Linear)']
)
print("Custom Selection Results:")
print(results)
```
### 7. Using with Scikit-learn Datasets
```python
from mlbench_lite import benchmark
from sklearn.datasets import load_wine, load_breast_cancer
# Test with Wine dataset
print("=== Wine Dataset ===")
X, y = load_wine(return_X_y=True)
results = benchmark(X, y)
print(results)
# Test with Breast Cancer dataset
print("\n=== Breast Cancer Dataset ===")
X, y = load_breast_cancer(return_X_y=True)
results = benchmark(X, y)
print(results)
```
### 8. Custom Test Size
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Use 30% of data for testing
results = benchmark(X, y, test_size=0.3)
print("Results with 30% test size:")
print(results)
# Use 10% of data for testing
results = benchmark(X, y, test_size=0.1)
print("\nResults with 10% test size:")
print(results)
```
### 9. Reproducible Results
```python
from mlbench_lite import benchmark, load_clover
X, y = load_clover(return_X_y=True)
# Set random seed for reproducible results
results1 = benchmark(X, y, random_state=123)
results2 = benchmark(X, y, random_state=123)
print("Results with random_state=123:")
print(results1)
print(f"\nResults are identical: {results1.equals(results2)}")
# Different random state produces different results
results3 = benchmark(X, y, random_state=456)
print(f"\nDifferent random state produces different results: {not results1.equals(results3)}")
```
### 10. Working with Synthetic Data
```python
from mlbench_lite import benchmark
from sklearn.datasets import make_classification
# Create synthetic dataset
X, y = make_classification(
n_samples=1000,
n_features=20,
n_informative=15,
n_classes=4,
random_state=42
)
print(f"Synthetic dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)
```
### 11. Analyzing Results
```python
from mlbench_lite import benchmark, load_clover
import pandas as pd
X, y = load_clover(return_X_y=True)
results = benchmark(X, y)
# Display results with better formatting
print("Detailed Results:")
print("=" * 60)
for idx, row in results.iterrows():
print(f"{row['Model']:20} | Acc: {row['Accuracy']:.4f} | "
f"Prec: {row['Precision']:.4f} | Rec: {row['Recall']:.4f} | "
f"F1: {row['F1']:.4f}")
# Find models with accuracy > 0.9
high_accuracy = results[results['Accuracy'] > 0.9]
print(f"\nModels with accuracy > 0.9: {len(high_accuracy)}")
# Calculate average metrics
avg_metrics = results[['Accuracy', 'Precision', 'Recall', 'F1']].mean()
print(f"\nAverage metrics across all models:")
for metric, value in avg_metrics.items():
print(f" {metric}: {value:.4f}")
```
### 12. Comparing Different Datasets
```python
from mlbench_lite import benchmark, load_clover
from sklearn.datasets import load_wine, load_breast_cancer
datasets = [
("Clover", load_clover(return_X_y=True)),
("Wine", load_wine(return_X_y=True)),
("Breast Cancer", load_breast_cancer(return_X_y=True))
]
print("Dataset Comparison:")
print("=" * 80)
for name, (X, y) in datasets:
print(f"\n{name} Dataset:")
print(f" Shape: {X.shape}, Classes: {len(set(y))}")
results = benchmark(X, y)
best_acc = results.iloc[0]['Accuracy']
best_model = results.iloc[0]['Model']
print(f" Best Model: {best_model} (Accuracy: {best_acc:.4f})")
# Show top 2 models
print(" Top 2 Models:")
for idx, row in results.head(2).iterrows():
print(f" {row['Model']}: {row['Accuracy']:.4f}")
```
## ๐ฌ Models Included
The library includes **20+ machine learning models** from multiple categories:
### **Linear Models**
- **Logistic Regression**: Linear model for classification using logistic function
- **Ridge Classifier**: Linear classifier with L2 regularization
- **SGD Classifier**: Linear classifier using Stochastic Gradient Descent
- **Perceptron**: Simple linear classifier
- **Passive Aggressive**: Online learning algorithm for classification
### **Tree-based Models**
- **Decision Tree**: Non-parametric supervised learning method
- **Random Forest**: Ensemble of decision trees with bagging
- **Extra Trees**: Extremely randomized trees ensemble
- **Gradient Boosting**: Boosting ensemble method using gradient descent
- **AdaBoost**: Adaptive boosting ensemble method
- **Bagging Classifier**: Bootstrap aggregating ensemble method
### **SVM Models**
- **SVM (RBF)**: Support Vector Machine with RBF kernel
- **SVM (Linear)**: Support Vector Machine with linear kernel
### **Neighbors**
- **K-Nearest Neighbors**: Instance-based learning algorithm
### **Naive Bayes**
- **Gaussian Naive Bayes**: Naive Bayes classifier for Gaussian features
- **Multinomial Naive Bayes**: Naive Bayes classifier for multinomial features
- **Bernoulli Naive Bayes**: Naive Bayes classifier for binary features
### **Discriminant Analysis**
- **Linear Discriminant Analysis**: Linear dimensionality reduction and classification
- **Quadratic Discriminant Analysis**: Quadratic classifier with Gaussian assumptions
### **Neural Networks**
- **Multi-layer Perceptron**: Feedforward artificial neural network
### **Gaussian Process**
- **Gaussian Process**: Probabilistic classifier using Gaussian processes
### **Advanced Gradient Boosting**
- **XGBoost**: Extreme gradient boosting framework (if installed)
- **LightGBM**: Light gradient boosting machine (if installed)
- **CatBoost**: Categorical boosting framework (if installed)
All models use their default parameters with appropriate random seeds for reproducibility.
## ๐ Clover Dataset Details
The `load_clover` function provides a custom synthetic dataset:
- **Samples**: 400
- **Features**: 4
- **Classes**: 4
**Features:**
- `leaf_length`: Length of the leaf in cm
- `leaf_width`: Width of the leaf in cm
- `petiole_length`: Length of the petiole in cm
- `leaflet_count`: Number of leaflets per leaf
**Classes:**
- `white_clover`: Trifolium repens
- `red_clover`: Trifolium pratense
- `crimson_clover`: Trifolium incarnatum
- `alsike_clover`: Trifolium hybridum
## ๐ ๏ธ Requirements
### **Core Dependencies**
- Python >= 3.8
- scikit-learn >= 1.0.0
- pandas >= 1.3.0
- numpy >= 1.20.0
### **Optional Dependencies (for additional models)**
- xgboost >= 1.5.0 (for XGBoost models)
- lightgbm >= 3.2.0 (for LightGBM models)
- catboost >= 1.0.0 (for CatBoost models)
- scikit-optimize >= 0.9.0 (for advanced optimization)
**Note**: The library works with just the core dependencies. Optional dependencies are automatically installed when you install the package, but models from unavailable libraries will be skipped gracefully.
## ๐งช Testing
Run the test suite to verify everything works:
```bash
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=mlbench_lite
# Quick functionality test
python -c "from mlbench_lite import benchmark, load_clover; X, y = load_clover(return_X_y=True); results = benchmark(X, y); print(results)"
```
## ๐ Development
### Setup Development Environment
```bash
git clone https://github.com/Arefin994/mlbench-lite.git
cd mlbench-lite
pip install -e ".[dev]"
```
### Code Quality
```bash
# Format code
black mlbench_lite tests
# Lint code
flake8 mlbench_lite tests
# Type checking
mypy mlbench_lite
```
### Building for Distribution
```bash
# Build package
python -m build
# Upload to PyPI
twine upload dist/*
```
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## ๐ Changelog
### 2.0.0 (2024-01-XX)
- **MAJOR UPDATE**: Added 20+ machine learning models
- **NEW**: Flexible model selection (specific models, categories, exclusions)
- **NEW**: Support for XGBoost, LightGBM, and CatBoost
- **NEW**: Model information and listing functions
- **NEW**: Comprehensive model categories (Linear, Tree-based, SVM, etc.)
- **IMPROVED**: Enhanced API with more parameters
- **IMPROVED**: Better error handling and graceful degradation
- **IMPROVED**: Updated documentation with extensive examples
### 0.1.0 (2024-01-XX)
- Initial release
- Basic benchmarking functionality
- Support for Logistic Regression, Random Forest, and SVM
- Comprehensive metrics (Accuracy, Precision, Recall, F1)
- Custom clover dataset
- Full test coverage
- PyPI ready
## ๐ Support
If you encounter any issues or have questions:
1. Check the [Issues](https://github.com/Arefin994/mlbench-lite/issues) page
2. Create a new issue with detailed information
3. Include code examples and error messages
## ๐ Acknowledgments
- Built with [scikit-learn](https://scikit-learn.org/)
- Uses [pandas](https://pandas.pydata.org/) for data handling
- Inspired by the need for simple ML benchmarking tools
Raw data
{
"_id": null,
"home_page": null,
"name": "mlbench-lite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Your Name <your.email@example.com>",
"keywords": "machine learning, benchmarking, scikit-learn, ml",
"author": null,
"author_email": "Your Name <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/d1/94/101fb4bf7d78ad5706dafff6112b97fddf2639d384bd686aaab12d0fbddb/mlbench_lite-2.0.3.tar.gz",
"platform": null,
"description": "# mlbench-lite\r\n\r\nA comprehensive machine learning benchmarking library that provides an easy way to compare multiple ML models on your dataset. Built with scikit-learn, XGBoost, LightGBM, CatBoost, and pandas for seamless integration into your ML workflow.\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Comprehensive Model Support**: 20+ ML models from multiple libraries\r\n- **Flexible Model Selection**: Choose specific models, categories, or exclude models\r\n- **Multiple ML Libraries**: scikit-learn, XGBoost, LightGBM, CatBoost\r\n- **Simple API**: One function call to benchmark multiple models\r\n- **Comprehensive Metrics**: Returns Accuracy, Precision, Recall, and F1 scores\r\n- **Custom Dataset**: Includes the `load_clover` dataset for testing\r\n- **Easy Integration**: Works seamlessly with scikit-learn datasets\r\n- **Pandas Output**: Results returned as a clean pandas DataFrame\r\n- **Reproducible**: Consistent results with random state control\r\n- **Model Information**: Get detailed info about available models\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install mlbench-lite\r\n```\r\n\r\n## \ud83c\udfaf Quick Start\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\n# Load the clover dataset\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Benchmark all available models\r\nresults = benchmark(X, y)\r\nprint(results)\r\n```\r\n\r\n**Output:**\r\n```\r\n Model Category Accuracy Precision Recall F1\r\n0 Random Forest Tree-based Models 0.9500 0.9565 0.9512 0.9505\r\n1 SVM SVM Models 0.9250 0.9337 0.9255 0.9254\r\n2 Logistic Regression Linear Models 0.9125 0.9131 0.9117 0.9115\r\n3 XGBoost XGBoost 0.9000 0.9024 0.9000 0.8997\r\n4 LightGBM LightGBM 0.8875 0.8891 0.8875 0.8873\r\n```\r\n\r\n## \ud83d\udcda API Reference\r\n\r\n### `benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`\r\n\r\nBenchmark multiple machine learning models on a dataset.\r\n\r\n**Parameters:**\r\n- `X` (array-like): Training vectors of shape (n_samples, n_features)\r\n- `y` (array-like): Target values of shape (n_samples,)\r\n- `test_size` (float, optional): Proportion of dataset for testing (default: 0.2)\r\n- `random_state` (int, optional): Random seed for reproducibility (default: 42)\r\n- `models` (list of str, optional): Specific models to use. If None, uses all available models.\r\n- `model_categories` (list of str, optional): Categories of models to use. If None, uses all categories.\r\n- `exclude_models` (list of str, optional): Models to exclude from benchmarking.\r\n\r\n**Returns:**\r\n- `pandas.DataFrame`: Results with columns:\r\n - `Model`: Name of the model\r\n - `Category`: Category of the model\r\n - `Accuracy`: Accuracy score\r\n - `Precision`: Precision score (macro-averaged)\r\n - `Recall`: Recall score (macro-averaged)\r\n - `F1`: F1 score (macro-averaged)\r\n\r\n### `list_available_models()`\r\n\r\nList all available models and their categories.\r\n\r\n**Returns:**\r\n- `dict`: Dictionary with model categories as keys and lists of model names as values\r\n\r\n### `get_model_info()`\r\n\r\nGet detailed information about available models.\r\n\r\n**Returns:**\r\n- `pandas.DataFrame`: DataFrame with model information including category, name, and description\r\n\r\n### `load_clover(return_X_y=False)`\r\n\r\nLoad the custom clover dataset.\r\n\r\n**Parameters:**\r\n- `return_X_y` (bool, default=False): If True, returns (data, target) instead of a Bunch object\r\n\r\n**Returns:**\r\n- `Bunch` or `tuple`: Dataset object with data, target, feature_names, target_names, and DESCR\r\n\r\n## \ud83d\udca1 Code Examples\r\n\r\n### 1. Basic Usage with All Models\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\n# Load the clover dataset\r\nX, y = load_clover(return_X_y=True)\r\nprint(f\"Dataset shape: {X.shape}\")\r\nprint(f\"Number of classes: {len(set(y))}\")\r\n\r\n# Benchmark all available models\r\nresults = benchmark(X, y)\r\nprint(\"\\nBenchmark Results:\")\r\nprint(results)\r\n\r\n# Get the best model\r\nbest_model = results.iloc[0]\r\nprint(f\"\\n\ud83c\udfc6 Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})\")\r\n```\r\n\r\n### 2. Model Selection - Specific Models\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Benchmark only specific models\r\nresults = benchmark(X, y, models=['Random Forest', 'XGBoost', 'LightGBM', 'Logistic Regression'])\r\nprint(\"Selected Models Results:\")\r\nprint(results)\r\n```\r\n\r\n### 3. Model Selection - By Categories\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Benchmark only tree-based models\r\nresults = benchmark(X, y, model_categories=['Tree-based Models'])\r\nprint(\"Tree-based Models Results:\")\r\nprint(results)\r\n\r\n# Benchmark multiple categories\r\nresults = benchmark(X, y, model_categories=['Linear Models', 'SVM Models'])\r\nprint(\"\\nLinear and SVM Models Results:\")\r\nprint(results)\r\n```\r\n\r\n### 4. Exclude Specific Models\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Exclude slow models\r\nresults = benchmark(X, y, exclude_models=['Gaussian Process', 'Multi-layer Perceptron'])\r\nprint(\"Results without slow models:\")\r\nprint(results)\r\n```\r\n\r\n### 5. List Available Models\r\n\r\n```python\r\nfrom mlbench_lite import list_available_models, get_model_info\r\n\r\n# List all available models by category\r\nmodels = list_available_models()\r\nprint(\"Available Models by Category:\")\r\nfor category, model_list in models.items():\r\n print(f\"\\n{category}:\")\r\n for model in model_list:\r\n print(f\" - {model}\")\r\n\r\n# Get detailed model information\r\nmodel_info = get_model_info()\r\nprint(\"\\nDetailed Model Information:\")\r\nprint(model_info)\r\n```\r\n\r\n### 6. Advanced Model Selection\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Complex selection: specific models from specific categories, excluding some\r\nresults = benchmark(\r\n X, y,\r\n models=['Random Forest', 'XGBoost', 'SVM (RBF)', 'Logistic Regression'],\r\n exclude_models=['SVM (Linear)']\r\n)\r\nprint(\"Custom Selection Results:\")\r\nprint(results)\r\n```\r\n\r\n### 7. Using with Scikit-learn Datasets\r\n\r\n```python\r\nfrom mlbench_lite import benchmark\r\nfrom sklearn.datasets import load_wine, load_breast_cancer\r\n\r\n# Test with Wine dataset\r\nprint(\"=== Wine Dataset ===\")\r\nX, y = load_wine(return_X_y=True)\r\nresults = benchmark(X, y)\r\nprint(results)\r\n\r\n# Test with Breast Cancer dataset\r\nprint(\"\\n=== Breast Cancer Dataset ===\")\r\nX, y = load_breast_cancer(return_X_y=True)\r\nresults = benchmark(X, y)\r\nprint(results)\r\n```\r\n\r\n### 8. Custom Test Size\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Use 30% of data for testing\r\nresults = benchmark(X, y, test_size=0.3)\r\nprint(\"Results with 30% test size:\")\r\nprint(results)\r\n\r\n# Use 10% of data for testing\r\nresults = benchmark(X, y, test_size=0.1)\r\nprint(\"\\nResults with 10% test size:\")\r\nprint(results)\r\n```\r\n\r\n### 9. Reproducible Results\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\n\r\nX, y = load_clover(return_X_y=True)\r\n\r\n# Set random seed for reproducible results\r\nresults1 = benchmark(X, y, random_state=123)\r\nresults2 = benchmark(X, y, random_state=123)\r\n\r\nprint(\"Results with random_state=123:\")\r\nprint(results1)\r\nprint(f\"\\nResults are identical: {results1.equals(results2)}\")\r\n\r\n# Different random state produces different results\r\nresults3 = benchmark(X, y, random_state=456)\r\nprint(f\"\\nDifferent random state produces different results: {not results1.equals(results3)}\")\r\n```\r\n\r\n### 10. Working with Synthetic Data\r\n\r\n```python\r\nfrom mlbench_lite import benchmark\r\nfrom sklearn.datasets import make_classification\r\n\r\n# Create synthetic dataset\r\nX, y = make_classification(\r\n n_samples=1000,\r\n n_features=20,\r\n n_informative=15,\r\n n_classes=4,\r\n random_state=42\r\n)\r\n\r\nprint(f\"Synthetic dataset shape: {X.shape}\")\r\nprint(f\"Number of classes: {len(set(y))}\")\r\n\r\nresults = benchmark(X, y)\r\nprint(\"\\nBenchmark Results:\")\r\nprint(results)\r\n```\r\n\r\n### 11. Analyzing Results\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\nimport pandas as pd\r\n\r\nX, y = load_clover(return_X_y=True)\r\nresults = benchmark(X, y)\r\n\r\n# Display results with better formatting\r\nprint(\"Detailed Results:\")\r\nprint(\"=\" * 60)\r\nfor idx, row in results.iterrows():\r\n print(f\"{row['Model']:20} | Acc: {row['Accuracy']:.4f} | \"\r\n f\"Prec: {row['Precision']:.4f} | Rec: {row['Recall']:.4f} | \"\r\n f\"F1: {row['F1']:.4f}\")\r\n\r\n# Find models with accuracy > 0.9\r\nhigh_accuracy = results[results['Accuracy'] > 0.9]\r\nprint(f\"\\nModels with accuracy > 0.9: {len(high_accuracy)}\")\r\n\r\n# Calculate average metrics\r\navg_metrics = results[['Accuracy', 'Precision', 'Recall', 'F1']].mean()\r\nprint(f\"\\nAverage metrics across all models:\")\r\nfor metric, value in avg_metrics.items():\r\n print(f\" {metric}: {value:.4f}\")\r\n```\r\n\r\n### 12. Comparing Different Datasets\r\n\r\n```python\r\nfrom mlbench_lite import benchmark, load_clover\r\nfrom sklearn.datasets import load_wine, load_breast_cancer\r\n\r\ndatasets = [\r\n (\"Clover\", load_clover(return_X_y=True)),\r\n (\"Wine\", load_wine(return_X_y=True)),\r\n (\"Breast Cancer\", load_breast_cancer(return_X_y=True))\r\n]\r\n\r\nprint(\"Dataset Comparison:\")\r\nprint(\"=\" * 80)\r\n\r\nfor name, (X, y) in datasets:\r\n print(f\"\\n{name} Dataset:\")\r\n print(f\" Shape: {X.shape}, Classes: {len(set(y))}\")\r\n \r\n results = benchmark(X, y)\r\n best_acc = results.iloc[0]['Accuracy']\r\n best_model = results.iloc[0]['Model']\r\n \r\n print(f\" Best Model: {best_model} (Accuracy: {best_acc:.4f})\")\r\n \r\n # Show top 2 models\r\n print(\" Top 2 Models:\")\r\n for idx, row in results.head(2).iterrows():\r\n print(f\" {row['Model']}: {row['Accuracy']:.4f}\")\r\n```\r\n\r\n## \ud83d\udd2c Models Included\r\n\r\nThe library includes **20+ machine learning models** from multiple categories:\r\n\r\n### **Linear Models**\r\n- **Logistic Regression**: Linear model for classification using logistic function\r\n- **Ridge Classifier**: Linear classifier with L2 regularization\r\n- **SGD Classifier**: Linear classifier using Stochastic Gradient Descent\r\n- **Perceptron**: Simple linear classifier\r\n- **Passive Aggressive**: Online learning algorithm for classification\r\n\r\n### **Tree-based Models**\r\n- **Decision Tree**: Non-parametric supervised learning method\r\n- **Random Forest**: Ensemble of decision trees with bagging\r\n- **Extra Trees**: Extremely randomized trees ensemble\r\n- **Gradient Boosting**: Boosting ensemble method using gradient descent\r\n- **AdaBoost**: Adaptive boosting ensemble method\r\n- **Bagging Classifier**: Bootstrap aggregating ensemble method\r\n\r\n### **SVM Models**\r\n- **SVM (RBF)**: Support Vector Machine with RBF kernel\r\n- **SVM (Linear)**: Support Vector Machine with linear kernel\r\n\r\n### **Neighbors**\r\n- **K-Nearest Neighbors**: Instance-based learning algorithm\r\n\r\n### **Naive Bayes**\r\n- **Gaussian Naive Bayes**: Naive Bayes classifier for Gaussian features\r\n- **Multinomial Naive Bayes**: Naive Bayes classifier for multinomial features\r\n- **Bernoulli Naive Bayes**: Naive Bayes classifier for binary features\r\n\r\n### **Discriminant Analysis**\r\n- **Linear Discriminant Analysis**: Linear dimensionality reduction and classification\r\n- **Quadratic Discriminant Analysis**: Quadratic classifier with Gaussian assumptions\r\n\r\n### **Neural Networks**\r\n- **Multi-layer Perceptron**: Feedforward artificial neural network\r\n\r\n### **Gaussian Process**\r\n- **Gaussian Process**: Probabilistic classifier using Gaussian processes\r\n\r\n### **Advanced Gradient Boosting**\r\n- **XGBoost**: Extreme gradient boosting framework (if installed)\r\n- **LightGBM**: Light gradient boosting machine (if installed)\r\n- **CatBoost**: Categorical boosting framework (if installed)\r\n\r\nAll models use their default parameters with appropriate random seeds for reproducibility.\r\n\r\n## \ud83d\udcca Clover Dataset Details\r\n\r\nThe `load_clover` function provides a custom synthetic dataset:\r\n\r\n- **Samples**: 400\r\n- **Features**: 4\r\n- **Classes**: 4\r\n\r\n**Features:**\r\n- `leaf_length`: Length of the leaf in cm\r\n- `leaf_width`: Width of the leaf in cm\r\n- `petiole_length`: Length of the petiole in cm\r\n- `leaflet_count`: Number of leaflets per leaf\r\n\r\n**Classes:**\r\n- `white_clover`: Trifolium repens\r\n- `red_clover`: Trifolium pratense\r\n- `crimson_clover`: Trifolium incarnatum\r\n- `alsike_clover`: Trifolium hybridum\r\n\r\n## \ud83d\udee0\ufe0f Requirements\r\n\r\n### **Core Dependencies**\r\n- Python >= 3.8\r\n- scikit-learn >= 1.0.0\r\n- pandas >= 1.3.0\r\n- numpy >= 1.20.0\r\n\r\n### **Optional Dependencies (for additional models)**\r\n- xgboost >= 1.5.0 (for XGBoost models)\r\n- lightgbm >= 3.2.0 (for LightGBM models)\r\n- catboost >= 1.0.0 (for CatBoost models)\r\n- scikit-optimize >= 0.9.0 (for advanced optimization)\r\n\r\n**Note**: The library works with just the core dependencies. Optional dependencies are automatically installed when you install the package, but models from unavailable libraries will be skipped gracefully.\r\n\r\n## \ud83e\uddea Testing\r\n\r\nRun the test suite to verify everything works:\r\n\r\n```bash\r\n# Run all tests\r\npython -m pytest tests/ -v\r\n\r\n# Run with coverage\r\npython -m pytest tests/ --cov=mlbench_lite\r\n\r\n# Quick functionality test\r\npython -c \"from mlbench_lite import benchmark, load_clover; X, y = load_clover(return_X_y=True); results = benchmark(X, y); print(results)\"\r\n```\r\n\r\n## \ud83d\ude80 Development\r\n\r\n### Setup Development Environment\r\n\r\n```bash\r\ngit clone https://github.com/Arefin994/mlbench-lite.git\r\ncd mlbench-lite\r\npip install -e \".[dev]\"\r\n```\r\n\r\n### Code Quality\r\n\r\n```bash\r\n# Format code\r\nblack mlbench_lite tests\r\n\r\n# Lint code\r\nflake8 mlbench_lite tests\r\n\r\n# Type checking\r\nmypy mlbench_lite\r\n```\r\n\r\n### Building for Distribution\r\n\r\n```bash\r\n# Build package\r\npython -m build\r\n\r\n# Upload to PyPI\r\ntwine upload dist/*\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n1. Fork the repository\r\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\r\n4. Push to the branch (`git push origin feature/amazing-feature`)\r\n5. Open a Pull Request\r\n\r\n## \ud83d\udcc8 Changelog\r\n\r\n### 2.0.0 (2024-01-XX)\r\n- **MAJOR UPDATE**: Added 20+ machine learning models\r\n- **NEW**: Flexible model selection (specific models, categories, exclusions)\r\n- **NEW**: Support for XGBoost, LightGBM, and CatBoost\r\n- **NEW**: Model information and listing functions\r\n- **NEW**: Comprehensive model categories (Linear, Tree-based, SVM, etc.)\r\n- **IMPROVED**: Enhanced API with more parameters\r\n- **IMPROVED**: Better error handling and graceful degradation\r\n- **IMPROVED**: Updated documentation with extensive examples\r\n\r\n### 0.1.0 (2024-01-XX)\r\n- Initial release\r\n- Basic benchmarking functionality\r\n- Support for Logistic Regression, Random Forest, and SVM\r\n- Comprehensive metrics (Accuracy, Precision, Recall, F1)\r\n- Custom clover dataset\r\n- Full test coverage\r\n- PyPI ready\r\n\r\n## \ud83c\udd98 Support\r\n\r\nIf you encounter any issues or have questions:\r\n\r\n1. Check the [Issues](https://github.com/Arefin994/mlbench-lite/issues) page\r\n2. Create a new issue with detailed information\r\n3. Include code examples and error messages\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- Built with [scikit-learn](https://scikit-learn.org/)\r\n- Uses [pandas](https://pandas.pydata.org/) for data handling\r\n- Inspired by the need for simple ML benchmarking tools\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A simple machine learning benchmarking library",
"version": "2.0.3",
"project_urls": {
"Bug Tracker": "https://github.com/Arefin994/mlbench-lite/issues",
"Documentation": "https://github.com/Arefin994/mlbench-lite#readme",
"Homepage": "https://github.com/Arefin994/mlbench-lite",
"Repository": "https://github.com/Arefin994/mlbench-lite.git"
},
"split_keywords": [
"machine learning",
" benchmarking",
" scikit-learn",
" ml"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "67c576a5813bfbb04093f2312133dc9590fddf67609731f7f7296e592644684c",
"md5": "4112bb454e7a1fa2ce02d647c946694d",
"sha256": "6634d7ef60765d8f0e08f3e8049ef030f49831755f0076422160d4abb079d839"
},
"downloads": -1,
"filename": "mlbench_lite-2.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4112bb454e7a1fa2ce02d647c946694d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12480,
"upload_time": "2025-09-18T20:34:52",
"upload_time_iso_8601": "2025-09-18T20:34:52.082737Z",
"url": "https://files.pythonhosted.org/packages/67/c5/76a5813bfbb04093f2312133dc9590fddf67609731f7f7296e592644684c/mlbench_lite-2.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d194101fb4bf7d78ad5706dafff6112b97fddf2639d384bd686aaab12d0fbddb",
"md5": "4157768e0d188ae93d9065fc08a17570",
"sha256": "7a9f253751cdcf9d80c2fad5ecbfce2fff2c442ad5310240ef11c23255f51d31"
},
"downloads": -1,
"filename": "mlbench_lite-2.0.3.tar.gz",
"has_sig": false,
"md5_digest": "4157768e0d188ae93d9065fc08a17570",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17118,
"upload_time": "2025-09-18T20:34:55",
"upload_time_iso_8601": "2025-09-18T20:34:55.949228Z",
"url": "https://files.pythonhosted.org/packages/d1/94/101fb4bf7d78ad5706dafff6112b97fddf2639d384bd686aaab12d0fbddb/mlbench_lite-2.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-18 20:34:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Arefin994",
"github_project": "mlbench-lite",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mlbench-lite"
}