# classifierAgent: Machine Learning Classification Python Package
![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
![Scikit-Learn](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
# Overview
This Python package provides a comprehensive solution for performing classification tasks using various popular machine learning algorithms. It allows you to read a dataset, preprocess it, train multiple classifiers, perform hyperparameter tuning, and visualize model performance. Additionally, it provides options for scaling data, saving trained models, and customizing the output display.
# Features
1. **Classification Algorithms**:
- Logistic Regression
- K-Nearest Neighbors
- Decision Tree
- Random Forest
- Gradient Boosting
- Support Vector Classifier
- Gaussian Naive Bayes
- Bernoulli Naive Bayes
2. **Advanced Functionality**:
- **Data Scaling**: Options for Min-Max Scaling or Standard Normalization.
- **Hyperparameter Tuning**: Option to perform Grid Search for finding the best model parameters.
- **Model Persistence**: Save and load trained models using `joblib`.
- **Visualization**: Option to plot confusion matrices and detailed classification reports.
- **Cross-Validation**: Evaluate models using cross-validation scores.
- **Configurable Outputs**: Options to control the display of confusion matrices and classification reports.
3. **Results**:
- Returns a DataFrame with model names, accuracy, F1-score, and optionally the best hyperparameters.
## Parameters
The package takes the following parameters as input:
- `dataset`: Path to the CSV or Excel dataset file or a pandas DataFrame.
- `output_column`: Name of the output column containing the target variable.
- `train_test_ratio`: Ratio in which the dataset is divided into train and test splits (must be between 0 and 1).
- `scaling_method` (optional): Method to scale the data ('minmax' or 'normalize').
- `perform_grid_search` (optional): Whether to perform grid search for hyperparameter tuning (default is `False`).
- `save_models` (optional): Whether to save trained models to disk (default is `False`).
- `show_confusion_matrix` (optional): Whether to display confusion matrix plots (default is `False`).
- `show_classification_report` (optional): Whether to print classification reports (default is `False`).
# Installation
Make sure you have Python installed on your system. You can install the package using pip:
```sh
pip install classifierAgent
```
# Usage
Here's an example of how to use the package:
```python
from classifierAgent import classifierAgent
dataset_path = "sampleFile.csv"
output_column = "Outcome"
train_test_ratio = 0.25
scaling_method = 'minmax' # Choose 'minmax' or 'normalize'
perform_grid_search = True # Whether to perform grid search
save_models = True # Whether to save models
show_confusion_matrix = True # Whether to plot the confusion matrix
show_classification_report = True # Whether to print the classification report
results = classifierAgent(dataset_path, output_column, train_test_ratio, scaling_method, perform_grid_search, save_models, show_confusion_matrix, show_classification_report)
print(results)
```
# Example Output
The output is a DataFrame that looks like this:
| Classifier | Accuracy | F1-Score | Best Parameters |
|-------------------------|----------|----------|-----------------|
| KNeighborsClassifier | 0.78 | 0.76 | {'n_neighbors': 5, 'weights': 'uniform'} |
| LogisticRegression | 0.80 | 0.79 | {'C': 0.1, 'solver': 'liblinear'} |
| DecisionTreeClassifier | 0.72 | 0.70 | {'criterion': 'entropy', 'max_depth': 20} |
| RandomForestClassifier | 0.85 | 0.84 | {'n_estimators': 200, 'max_depth': 20} |
| GradientBoostingClassifier | 0.83 | 0.82 | {'n_estimators': 200, 'learning_rate': 0.1} |
| SVC | 0.81 | 0.80 | {'C': 1, 'kernel': 'rbf'} |
| GaussianNB | 0.75 | 0.73 | {} |
| BernoulliNB | 0.73 | 0.72 | {} |
# Publishing to PyPI
To publish this package to PyPI, follow these steps:
1. **Ensure Your Package is Ready:**
- Make sure your `setup.py` and `README.md` are correctly configured.
- Verify that your package is properly structured and tested.
2. **Create Distribution Archives:**
Run the following command to create distribution archives of your package:
```sh
python setup.py sdist bdist_wheel
```
3. **Install Twine:**
If you haven't already, install Twine, a utility for publishing packages to PyPI:
```sh
pip install twine
```
4. **Upload to PyPI:**
Use Twine to upload your package to PyPI:
```sh
twine upload dist/*
```
You will be prompted to enter your PyPI username and password.
5. **Verify Upload:**
After uploading, check your package on [PyPI](https://pypi.org/) to ensure it appears correctly.
For more detailed instructions, refer to the [PyPI documentation](https://packaging.python.org/tutorials/packaging-projects/).
# Automated Publishing with GitHub Actions
To automate the publishing of your package to PyPI, GitHub Actions is setup. This allows to push updates to PyPI whenever changes are made to the `main` branch.
### Setting Up GitHub Actions
1. **Add Your PyPI Token to GitHub Secrets:**
- Go to your repository settings.
- Navigate to "Secrets and variables" > "Actions".
- Add a new secret with the name `PYPI_TOKEN` and paste your PyPI token as the value.
2. **Create a GitHub Actions Workflow File:**
- Add a `.github/workflows/publish.yml` file to your repository with the following content:
This setup will automatically build and publish your package to PyPI whenever you push changes to the `main` branch. Make sure to test your workflow to ensure that everything works as expected.
# Notes
- The package is actively developed and may receive updates.
- The project is developed with Python version `3.10`.
- If you encounter any issues or have questions, feel free to contact me on [LinkedIn](https://www.linkedin.com/in/adnan-karol-aa1666179/).
# License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/adnankarol/classifierAgent",
"name": "classifierAgent",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "machine learning, classification, random forest, xgboost, svm, logistic regression, naive bayes, knn, decision tree",
"author": "Adnan Karol",
"author_email": "adnanmushtaq5@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/8b/97/ccdc2d0404106288267d2df1153b91206dfdc459f4c66d5c831c50e6aef3/classifierAgent-1.1.1.tar.gz",
"platform": null,
"description": "# classifierAgent: Machine Learning Classification Python Package\n\n![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)\n![Scikit-Learn](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)\n\n# Overview\n\nThis Python package provides a comprehensive solution for performing classification tasks using various popular machine learning algorithms. It allows you to read a dataset, preprocess it, train multiple classifiers, perform hyperparameter tuning, and visualize model performance. Additionally, it provides options for scaling data, saving trained models, and customizing the output display.\n\n# Features\n\n1. **Classification Algorithms**:\n - Logistic Regression\n - K-Nearest Neighbors\n - Decision Tree\n - Random Forest\n - Gradient Boosting\n - Support Vector Classifier\n - Gaussian Naive Bayes\n - Bernoulli Naive Bayes\n\n2. **Advanced Functionality**:\n - **Data Scaling**: Options for Min-Max Scaling or Standard Normalization.\n - **Hyperparameter Tuning**: Option to perform Grid Search for finding the best model parameters.\n - **Model Persistence**: Save and load trained models using `joblib`.\n - **Visualization**: Option to plot confusion matrices and detailed classification reports.\n - **Cross-Validation**: Evaluate models using cross-validation scores.\n - **Configurable Outputs**: Options to control the display of confusion matrices and classification reports.\n\n3. **Results**:\n - Returns a DataFrame with model names, accuracy, F1-score, and optionally the best hyperparameters.\n\n## Parameters\n\nThe package takes the following parameters as input:\n- `dataset`: Path to the CSV or Excel dataset file or a pandas DataFrame.\n- `output_column`: Name of the output column containing the target variable.\n- `train_test_ratio`: Ratio in which the dataset is divided into train and test splits (must be between 0 and 1).\n- `scaling_method` (optional): Method to scale the data ('minmax' or 'normalize').\n- `perform_grid_search` (optional): Whether to perform grid search for hyperparameter tuning (default is `False`).\n- `save_models` (optional): Whether to save trained models to disk (default is `False`).\n- `show_confusion_matrix` (optional): Whether to display confusion matrix plots (default is `False`).\n- `show_classification_report` (optional): Whether to print classification reports (default is `False`).\n\n# Installation\n\nMake sure you have Python installed on your system. You can install the package using pip:\n\n```sh\npip install classifierAgent\n```\n\n# Usage\n\nHere's an example of how to use the package:\n\n```python\nfrom classifierAgent import classifierAgent\n\ndataset_path = \"sampleFile.csv\"\noutput_column = \"Outcome\"\ntrain_test_ratio = 0.25\nscaling_method = 'minmax' # Choose 'minmax' or 'normalize'\nperform_grid_search = True # Whether to perform grid search\nsave_models = True # Whether to save models\nshow_confusion_matrix = True # Whether to plot the confusion matrix\nshow_classification_report = True # Whether to print the classification report\n\nresults = classifierAgent(dataset_path, output_column, train_test_ratio, scaling_method, perform_grid_search, save_models, show_confusion_matrix, show_classification_report)\nprint(results)\n```\n\n# Example Output\n\nThe output is a DataFrame that looks like this:\n\n| Classifier | Accuracy | F1-Score | Best Parameters |\n|-------------------------|----------|----------|-----------------|\n| KNeighborsClassifier | 0.78 | 0.76 | {'n_neighbors': 5, 'weights': 'uniform'} |\n| LogisticRegression | 0.80 | 0.79 | {'C': 0.1, 'solver': 'liblinear'} |\n| DecisionTreeClassifier | 0.72 | 0.70 | {'criterion': 'entropy', 'max_depth': 20} |\n| RandomForestClassifier | 0.85 | 0.84 | {'n_estimators': 200, 'max_depth': 20} |\n| GradientBoostingClassifier | 0.83 | 0.82 | {'n_estimators': 200, 'learning_rate': 0.1} |\n| SVC | 0.81 | 0.80 | {'C': 1, 'kernel': 'rbf'} |\n| GaussianNB | 0.75 | 0.73 | {} |\n| BernoulliNB | 0.73 | 0.72 | {} |\n\n# Publishing to PyPI\n\nTo publish this package to PyPI, follow these steps:\n\n1. **Ensure Your Package is Ready:**\n - Make sure your `setup.py` and `README.md` are correctly configured.\n - Verify that your package is properly structured and tested.\n\n2. **Create Distribution Archives:**\n Run the following command to create distribution archives of your package:\n ```sh\n python setup.py sdist bdist_wheel\n ```\n\n3. **Install Twine:**\n If you haven't already, install Twine, a utility for publishing packages to PyPI:\n ```sh\n pip install twine\n ```\n\n4. **Upload to PyPI:**\n Use Twine to upload your package to PyPI:\n ```sh\n twine upload dist/*\n ```\n You will be prompted to enter your PyPI username and password.\n\n5. **Verify Upload:**\n After uploading, check your package on [PyPI](https://pypi.org/) to ensure it appears correctly.\n\nFor more detailed instructions, refer to the [PyPI documentation](https://packaging.python.org/tutorials/packaging-projects/).\n\n# Automated Publishing with GitHub Actions\n\nTo automate the publishing of your package to PyPI, GitHub Actions is setup. This allows to push updates to PyPI whenever changes are made to the `main` branch.\n\n### Setting Up GitHub Actions\n\n1. **Add Your PyPI Token to GitHub Secrets:**\n - Go to your repository settings.\n - Navigate to \"Secrets and variables\" > \"Actions\".\n - Add a new secret with the name `PYPI_TOKEN` and paste your PyPI token as the value.\n\n2. **Create a GitHub Actions Workflow File:**\n - Add a `.github/workflows/publish.yml` file to your repository with the following content:\n\nThis setup will automatically build and publish your package to PyPI whenever you push changes to the `main` branch. Make sure to test your workflow to ensure that everything works as expected.\n\n# Notes\n\n- The package is actively developed and may receive updates.\n- The project is developed with Python version `3.10`.\n- If you encounter any issues or have questions, feel free to contact me on [LinkedIn](https://www.linkedin.com/in/adnan-karol-aa1666179/).\n\n# License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for performing classification on datasets.",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/adnankarol/classifierAgent"
},
"split_keywords": [
"machine learning",
" classification",
" random forest",
" xgboost",
" svm",
" logistic regression",
" naive bayes",
" knn",
" decision tree"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ab9a7e61ba8e2f889cd7f78d972f4d824afbcd8c980ee3d6071aad3a87295863",
"md5": "1f32f70fdb8f6b6d8d6c28f31ae23ae6",
"sha256": "ed346a09674eed4548f7553d00d4bda2b05350fc5ededea3eed79311d5b1b134"
},
"downloads": -1,
"filename": "classifierAgent-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1f32f70fdb8f6b6d8d6c28f31ae23ae6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 7502,
"upload_time": "2024-08-01T18:33:10",
"upload_time_iso_8601": "2024-08-01T18:33:10.846285Z",
"url": "https://files.pythonhosted.org/packages/ab/9a/7e61ba8e2f889cd7f78d972f4d824afbcd8c980ee3d6071aad3a87295863/classifierAgent-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8b97ccdc2d0404106288267d2df1153b91206dfdc459f4c66d5c831c50e6aef3",
"md5": "962043706c722b3d9b46ff7d04d655a5",
"sha256": "6ac8a2fa9211b8fb3b1fc3e492616199483da0c96b8ed72cdd5f7f726a904940"
},
"downloads": -1,
"filename": "classifierAgent-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "962043706c722b3d9b46ff7d04d655a5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 7288,
"upload_time": "2024-08-01T18:33:11",
"upload_time_iso_8601": "2024-08-01T18:33:11.734612Z",
"url": "https://files.pythonhosted.org/packages/8b/97/ccdc2d0404106288267d2df1153b91206dfdc459f4c66d5c831c50e6aef3/classifierAgent-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-01 18:33:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adnankarol",
"github_project": "classifierAgent",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.18.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "joblib",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "termcolor",
"specs": [
[
">=",
"1.1.0"
]
]
}
],
"lcname": "classifieragent"
}