# Machine Learning Classification Python Package
![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
![Scikit-Learn](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
## Overview
This Python package provides a comprehensive solution for performing classification tasks using various popular machine learning algorithms. It allows you to read a dataset, preprocess it, train multiple classifiers, perform hyperparameter tuning, and visualize model performance. Additionally, it provides options for scaling data, saving trained models, and customizing the output display.
## Features
1. **Classification Algorithms**:
- Logistic Regression
- K-Nearest Neighbors
- Decision Tree
- Random Forest
- Gradient Boosting
- Support Vector Classifier
- Gaussian Naive Bayes
- Bernoulli Naive Bayes
2. **Advanced Functionality**:
- **Data Scaling**: Options for Min-Max Scaling or Standard Normalization.
- **Hyperparameter Tuning**: Option to perform Grid Search for finding the best model parameters.
- **Model Persistence**: Save and load trained models using `joblib`.
- **Visualization**: Option to plot confusion matrices and detailed classification reports.
- **Cross-Validation**: Evaluate models using cross-validation scores.
- **Configurable Outputs**: Options to control the display of confusion matrices and classification reports.
3. **Results**:
- Returns a DataFrame with model names, accuracy, F1-score, and optionally the best hyperparameters.
## Parameters
The package takes the following parameters as input:
- `dataset`: Path to the CSV or Excel dataset file or a pandas DataFrame.
- `output_column`: Name of the output column containing the target variable.
- `train_test_ratio`: Ratio in which the dataset is divided into train and test splits (must be between 0 and 1).
- `scaling_method` (optional): Method to scale the data ('minmax' or 'normalize').
- `perform_grid_search` (optional): Whether to perform grid search for hyperparameter tuning (default is `False`).
- `save_models` (optional): Whether to save trained models to disk (default is `False`).
- `show_confusion_matrix` (optional): Whether to display confusion matrix plots (default is `False`).
- `show_classification_report` (optional): Whether to print classification reports (default is `False`).
## Installation
Make sure you have Python installed on your system. You can install the package using pip:
```sh
pip install classifier_agent
```
## Usage
Here's an example of how to use the package:
```python
from classifier_agent import classifier_agent
dataset_path = "diabetes.csv"
output_column = "Outcome"
train_test_ratio = 0.25
scaling_method = 'minmax' # Choose 'minmax' or 'normalize'
perform_grid_search = True # Whether to perform grid search
save_models = True # Whether to save models
show_confusion_matrix = True # Whether to plot the confusion matrix
show_classification_report = True # Whether to print the classification report
results = classifier_agent(dataset_path, output_column, train_test_ratio, scaling_method, perform_grid_search, save_models, show_confusion_matrix, show_classification_report)
print(results)
```
## Example Output
The output is a DataFrame that looks like this:
| Classifier | Accuracy | F1-Score | Best Parameters |
|-------------------------|----------|----------|-----------------|
| KNeighborsClassifier | 0.78 | 0.76 | {'n_neighbors': 5, 'weights': 'uniform'} |
| LogisticRegression | 0.80 | 0.79 | {'C': 0.1, 'solver': 'liblinear'} |
| DecisionTreeClassifier | 0.72 | 0.70 | {'criterion': 'entropy', 'max_depth': 20} |
| RandomForestClassifier | 0.85 | 0.84 | {'n_estimators': 200, 'max_depth': 20} |
| GradientBoostingClassifier | 0.83 | 0.82 | {'n_estimators': 200, 'learning_rate': 0.1} |
| SVC | 0.81 | 0.80 | {'C': 1, 'kernel': 'rbf'} |
| GaussianNB | 0.75 | 0.73 | {} |
| BernoulliNB | 0.73 | 0.72 | {} |
## Notes
- The package is actively developed and may receive updates.
- The project is developed with Python version `3.10`.
- If you encounter any issues or have questions, feel free to contact me on [LinkedIn](https://www.linkedin.com/in/adnan-karol-aa1666179/).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Publishing to PyPI
To publish this package to PyPI, follow these steps:
1. **Ensure Your Package is Ready:**
- Make sure your `setup.py` and `README.md` are correctly configured.
- Verify that your package is properly structured and tested.
2. **Create Distribution Archives:**
Run the following command to create distribution archives of your package:
```sh
python setup.py sdist bdist_wheel
```
3. **Install Twine:**
If you haven't already, install Twine, a utility for publishing packages to PyPI:
```sh
pip install twine
```
4. **Upload to PyPI:**
Use Twine to upload your package to PyPI:
```sh
twine upload dist/*
```
You will be prompted to enter your PyPI username and password.
5. **Verify Upload:**
After uploading, check your package on [PyPI](https://pypi.org/) to ensure it appears correctly.
For more detailed instructions, refer to the [PyPI documentation](https://packaging.python.org/tutorials/packaging-projects/).
Raw data
{
"_id": null,
"home_page": "https://github.com/adnanmushtaq1996/ML-Classifier-Python-Package",
"name": "classifier-agent",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "machine learning, classification, random forest, xgboost, svm, logistic regression, naive bayes, knn, decision tree",
"author": "Adnan Karol",
"author_email": "adnanmushtaq5@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/bb/d5/1805c26cee6832ccd233528a002f54017bd26525b7e622b9af77471b9963/classifier_agent-1.0.1.tar.gz",
"platform": null,
"description": "# Machine Learning Classification Python Package\n\n![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)\n![Scikit-Learn](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)\n\n## Overview\n\nThis Python package provides a comprehensive solution for performing classification tasks using various popular machine learning algorithms. It allows you to read a dataset, preprocess it, train multiple classifiers, perform hyperparameter tuning, and visualize model performance. Additionally, it provides options for scaling data, saving trained models, and customizing the output display.\n\n## Features\n\n1. **Classification Algorithms**:\n - Logistic Regression\n - K-Nearest Neighbors\n - Decision Tree\n - Random Forest\n - Gradient Boosting\n - Support Vector Classifier\n - Gaussian Naive Bayes\n - Bernoulli Naive Bayes\n\n2. **Advanced Functionality**:\n - **Data Scaling**: Options for Min-Max Scaling or Standard Normalization.\n - **Hyperparameter Tuning**: Option to perform Grid Search for finding the best model parameters.\n - **Model Persistence**: Save and load trained models using `joblib`.\n - **Visualization**: Option to plot confusion matrices and detailed classification reports.\n - **Cross-Validation**: Evaluate models using cross-validation scores.\n - **Configurable Outputs**: Options to control the display of confusion matrices and classification reports.\n\n3. **Results**:\n - Returns a DataFrame with model names, accuracy, F1-score, and optionally the best hyperparameters.\n\n## Parameters\n\nThe package takes the following parameters as input:\n- `dataset`: Path to the CSV or Excel dataset file or a pandas DataFrame.\n- `output_column`: Name of the output column containing the target variable.\n- `train_test_ratio`: Ratio in which the dataset is divided into train and test splits (must be between 0 and 1).\n- `scaling_method` (optional): Method to scale the data ('minmax' or 'normalize').\n- `perform_grid_search` (optional): Whether to perform grid search for hyperparameter tuning (default is `False`).\n- `save_models` (optional): Whether to save trained models to disk (default is `False`).\n- `show_confusion_matrix` (optional): Whether to display confusion matrix plots (default is `False`).\n- `show_classification_report` (optional): Whether to print classification reports (default is `False`).\n\n## Installation\n\nMake sure you have Python installed on your system. You can install the package using pip:\n\n```sh\npip install classifier_agent\n```\n\n## Usage\n\nHere's an example of how to use the package:\n\n```python\nfrom classifier_agent import classifier_agent\n\ndataset_path = \"diabetes.csv\"\noutput_column = \"Outcome\"\ntrain_test_ratio = 0.25\nscaling_method = 'minmax' # Choose 'minmax' or 'normalize'\nperform_grid_search = True # Whether to perform grid search\nsave_models = True # Whether to save models\nshow_confusion_matrix = True # Whether to plot the confusion matrix\nshow_classification_report = True # Whether to print the classification report\n\nresults = classifier_agent(dataset_path, output_column, train_test_ratio, scaling_method, perform_grid_search, save_models, show_confusion_matrix, show_classification_report)\nprint(results)\n```\n\n## Example Output\n\nThe output is a DataFrame that looks like this:\n\n| Classifier | Accuracy | F1-Score | Best Parameters |\n|-------------------------|----------|----------|-----------------|\n| KNeighborsClassifier | 0.78 | 0.76 | {'n_neighbors': 5, 'weights': 'uniform'} |\n| LogisticRegression | 0.80 | 0.79 | {'C': 0.1, 'solver': 'liblinear'} |\n| DecisionTreeClassifier | 0.72 | 0.70 | {'criterion': 'entropy', 'max_depth': 20} |\n| RandomForestClassifier | 0.85 | 0.84 | {'n_estimators': 200, 'max_depth': 20} |\n| GradientBoostingClassifier | 0.83 | 0.82 | {'n_estimators': 200, 'learning_rate': 0.1} |\n| SVC | 0.81 | 0.80 | {'C': 1, 'kernel': 'rbf'} |\n| GaussianNB | 0.75 | 0.73 | {} |\n| BernoulliNB | 0.73 | 0.72 | {} |\n\n## Notes\n\n- The package is actively developed and may receive updates.\n- The project is developed with Python version `3.10`.\n- If you encounter any issues or have questions, feel free to contact me on [LinkedIn](https://www.linkedin.com/in/adnan-karol-aa1666179/).\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Publishing to PyPI\n\nTo publish this package to PyPI, follow these steps:\n\n1. **Ensure Your Package is Ready:**\n - Make sure your `setup.py` and `README.md` are correctly configured.\n - Verify that your package is properly structured and tested.\n\n2. **Create Distribution Archives:**\n Run the following command to create distribution archives of your package:\n ```sh\n python setup.py sdist bdist_wheel\n ```\n\n3. **Install Twine:**\n If you haven't already, install Twine, a utility for publishing packages to PyPI:\n ```sh\n pip install twine\n ```\n\n4. **Upload to PyPI:**\n Use Twine to upload your package to PyPI:\n ```sh\n twine upload dist/*\n ```\n You will be prompted to enter your PyPI username and password.\n\n5. **Verify Upload:**\n After uploading, check your package on [PyPI](https://pypi.org/) to ensure it appears correctly.\n\nFor more detailed instructions, refer to the [PyPI documentation](https://packaging.python.org/tutorials/packaging-projects/).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for performing classification on datasets in CSV or Excel format.",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/adnanmushtaq1996/ML-Classifier-Python-Package",
"Homepage": "https://github.com/adnanmushtaq1996/ML-Classifier-Python-Package",
"Source": "https://github.com/adnanmushtaq1996/ML-Classifier-Python-Package",
"Tracker": "https://github.com/adnanmushtaq1996/ML-Classifier-Python-Package/issues"
},
"split_keywords": [
"machine learning",
" classification",
" random forest",
" xgboost",
" svm",
" logistic regression",
" naive bayes",
" knn",
" decision tree"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "417d7b2f3d388db2ba191d21568cc38b8cf5da1f9d71989da366f2f08dea126c",
"md5": "a21cd7020b995913c52cc6bd178ef52b",
"sha256": "0c883d2ca58255209355169c5871b06c5ec2e4ac86abdbdb557ab84b5de3bf9d"
},
"downloads": -1,
"filename": "classifier_agent-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a21cd7020b995913c52cc6bd178ef52b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 3939,
"upload_time": "2024-08-01T15:21:59",
"upload_time_iso_8601": "2024-08-01T15:21:59.219342Z",
"url": "https://files.pythonhosted.org/packages/41/7d/7b2f3d388db2ba191d21568cc38b8cf5da1f9d71989da366f2f08dea126c/classifier_agent-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bbd51805c26cee6832ccd233528a002f54017bd26525b7e622b9af77471b9963",
"md5": "4c1baa2c5dea14f78e35d1f4074c42c6",
"sha256": "30406964eab92fad02987bbe7525ecf08ec14e69fbebcc601b6e7e3fa9c147f0"
},
"downloads": -1,
"filename": "classifier_agent-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "4c1baa2c5dea14f78e35d1f4074c42c6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 4184,
"upload_time": "2024-08-01T15:22:00",
"upload_time_iso_8601": "2024-08-01T15:22:00.726005Z",
"url": "https://files.pythonhosted.org/packages/bb/d5/1805c26cee6832ccd233528a002f54017bd26525b7e622b9af77471b9963/classifier_agent-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-01 15:22:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adnanmushtaq1996",
"github_project": "ML-Classifier-Python-Package",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.18.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "termcolor",
"specs": [
[
">=",
"1.1.0"
]
]
}
],
"lcname": "classifier-agent"
}