# MLWizard
MLWizard is a Python machine learning library designed to simplify the process of data preparation, feature engineering, model building, and evaluation. It provides a collection of tools for both classification and regression tasks, as well as functionalities for data exploration and manipulation. MLWizard is a distribution of the [TechLeo](https://www.linkedin.com/in/techleo/) community with the aim of making the complex, easy.
## Author
**TechLeo**
- **Email:** techleo.ng@outlook.com
- **GitHub:** [TechLeo GitHub](https://github.com/TechLeo)
- **LinkedIn:** [TechLeo LinkedIn](https://www.linkedin.com/in/techleo/)
## Contact
For inquiries, suggestions, or feedback, please feel free to reach out to the author:
- **Email:** techleo.ng@outlook.com
- **GitHub Issues:** [MLWizard Issues](https://github.com/TechLeo-Libraries/mlwizard/issues)
- **LinkedIn Messages:** [TechLeo LinkedIn](https://www.linkedin.com/in/techleo/)
Your feedback is valuable and contributes to the continuous improvement of MLWizard. The author welcomes collaboration and looks forward to hearing from the users of MLWizard.
## Features
Features from current release
### Data Loading and Handling
- `get_dataset`: Load a dataset.
- `get_training_test_data`: Split the dataset into training and test sets.
- `load_large_dataset`: Load a large dataset efficiently.
- `reduce_data_memory_useage`: Reduce memory usage of the dataset.
### Data Cleaning and Manipulation
- `drop_columns`: Drop specified columns from the dataset.
- `fix_missing_values`: Handle missing values in the dataset.
- `fix_unbalanced_dataset`: Address class imbalance in a classification dataset.
- `filter_data`: Filter data based on specified conditions.
- `remove_duplicates`: Remove duplicate rows from the dataset.
- `rename_columns`: Rename columns in the dataset.
- `replace_values`: Replace specified values in the dataset.
- `reset_index`: Reset the index of the dataset.
- `set_index`: Set a specific column as the index.
- `sort_index`: Sort the index of the dataset.
- `sort_values`: Sort the values of the dataset.
### Data Formatting and Transformation
- `categorical_to_datetime`: Convert categorical columns to datetime format.
- `categorical_to_numerical`: Convert categorical columns to numerical format.
- `numerical_to_categorical`: Convert numerical columns to categorical format.
- `column_binning`: Bin values in a column into specified bins.
### Exploratory Data Analysis
- `eda`: Perform exploratory data analysis on the dataset.
- `eda_visual`: Visualize exploratory data analysis results.
- `pandas_profiling`: Generate a Pandas Profiling report for the dataset.
- `sweetviz_profile_report`: Generate a Sweetviz Profile Report for the dataset.
- `count_column_categories`: Count the categories in a categorical column.
- `unique_elements_in_columns`: Get the unique elements that exist in each column in the dataset.
### Feature Engineering
- `extract_date_features`: Extract date-related features from a datetime column.
- `polyreg_x`: Get the polynomial regression x for independent variables after specifying the degree.
- `select_features`: Select relevant features for modeling.
- `select_dependent_and_independent`: Select dependent and independent variables.
### Data Preprocessing
- `scale_independent_variables`: Scale independent variables in the dataset.
- `remove_outlier`: Remove outliers from the dataset.
- `split_data`: Split the dataset into training and test sets.
### Model Building and Evaluation
- `get_bestK_KNNregressor`: Find the best K value for KNN regression.
- `train_model_regressor`: Train a regression model.
- `regressor_predict`: Make predictions using a regression model.
- `regressor_evaluation`: Evaluate the performance of a regression model.
- `regressor_model_testing`: Test a regression model.
- `polyreg_graph`: Visualize a polynomial regression graph.
- `simple_linregres_graph`: Visualize a regression graph.
- `build_multiple_regressors`: Build multiple regression models.
- `build_multiple_regressors_from_features`: Build regression models using selected features.
- `build_single_regressor_from_features`: Build a single regression model using selected features.
- `get_bestK_KNNclassifier`: Find the best K value for KNN classification.
- `train_model_classifier`: Train a classification model.
- `classifier_predict`: Make predictions using a classification model.
- `classifier_evaluation`: Evaluate the performance of a classification model.
- `classifier_model_testing`: Test a classification model.
- `classifier_graph`: Visualize a classification graph.
- `build_multiple_classifiers`: Build multiple classification models.
- `build_multiple_classifiers_from_features`: Build classification models using selected features.
- `build_single_classifier_from_features`: Build a single classification model using selected features.
### Data Aggregation and Summarization
- `group_data`: Group and summarize data based on specified conditions.
### Data Type Handling
- `select_datatype`: Select columns of a specific datatype in the dataset.
## Installation
You can install MLWizard using pip:
```bash
pip install mlwizard
```
## Useage
from mlwizard import SupervisedLearning
# Example usage
```bash
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, DecisionTreeClassifier
from sklearn.snm import SVC
dataset = pd.read_csv("Your_file_path") # Load your dataset(e.g Pandas DataFrame)
data = SupervisedLearning(dataset)
# Exploratory Data Analysis
eda = data.eda()
eda_visual = data.eda_visual()
# Build and Evaluate Classifier
classifiers = ["LogisticRegression(random_state = 0)", "RandomForestClassifier(random_state = 0)", "DecisionTreeClassifier(random_state = 0)", "SVC()"]
build_model = data.build_multiple_classifiers()
```
## Acknowledgments
MLWizard relies on several open-source libraries to provide its functionality. We would like to express our gratitude to the developers and contributors of the following libraries:
- [NumPy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)
- [Seaborn](https://seaborn.pydata.org/)
- [yData Profiling](https://github.com/ydataai/ydata-profiling)
- [Sweetviz](https://github.com/fbdesignpro/sweetviz)
- [Imbalanced-Learn (imblearn)](https://imbalanced-learn.org/)
- [Scikit-learn](https://scikit-learn.org/)
- [Warnings](https://docs.python.org/3/library/warnings.html)
- [Datatable](https://datatable.readthedocs.io/en/latest/)
The MLWizard library builds upon the functionality provided by these excellent tools, We sincerely thank the maintainers and contributors of these libraries for their valuable contributions to the open-source community.
## License
MLWizard is distributed under the MIT License. Feel free to use, modify, and distribute it according to the terms of the license.
## Changelog
### v1.0.1 (January 2024):
- First release
## Contributors
We'd like to express our gratitude to the following contributors that have influenced and supported MLWizard:
- [Onyiriuba Leonard](https://www.linkedin.com/in/chukwubuikem-leonard-onyiriuba/): for overseeing the entire project development lifecycle.
- Role: Project Lead and Maintainer.
- Email: workwithtechleo@gmail.com.
<br>
- [The TechLeo Community](https://www.linkedin.com/in/techleo/): for allowing the use of this project as a way to explain, learn, test, understand, and make easy, the machine learning process.
- Role: Testers.
- Email: techleo.ng@gmail.com.
Raw data
{
"_id": null,
"home_page": "",
"name": "mlwizard",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": "",
"keywords": "machine learning,data science,data preprocessing,supervised learning,data exploration,ML framework,data cleaning,regression,classification,machine learning toolkit",
"author": "TechLeo (Onyiriuba Leonard Chukwubuikem)",
"author_email": "<techleo.ng@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/6a/4c/800157480f3f36a3bc28b99c3bff0e4b373568b8a388f2d6ed89e7a2027d/mlwizard-1.0.1.tar.gz",
"platform": null,
"description": "# MLWizard\r\n\r\nMLWizard is a Python machine learning library designed to simplify the process of data preparation, feature engineering, model building, and evaluation. It provides a collection of tools for both classification and regression tasks, as well as functionalities for data exploration and manipulation. MLWizard is a distribution of the [TechLeo](https://www.linkedin.com/in/techleo/) community with the aim of making the complex, easy.\r\n\r\n## Author\r\n\r\n**TechLeo**\r\n\r\n- **Email:** techleo.ng@outlook.com\r\n- **GitHub:** [TechLeo GitHub](https://github.com/TechLeo)\r\n- **LinkedIn:** [TechLeo LinkedIn](https://www.linkedin.com/in/techleo/)\r\n\r\n## Contact\r\n\r\nFor inquiries, suggestions, or feedback, please feel free to reach out to the author:\r\n\r\n- **Email:** techleo.ng@outlook.com\r\n- **GitHub Issues:** [MLWizard Issues](https://github.com/TechLeo-Libraries/mlwizard/issues)\r\n- **LinkedIn Messages:** [TechLeo LinkedIn](https://www.linkedin.com/in/techleo/)\r\n\r\nYour feedback is valuable and contributes to the continuous improvement of MLWizard. The author welcomes collaboration and looks forward to hearing from the users of MLWizard.\r\n\r\n\r\n## Features\r\nFeatures from current release\r\n\r\n### Data Loading and Handling\r\n- `get_dataset`: Load a dataset.\r\n- `get_training_test_data`: Split the dataset into training and test sets.\r\n- `load_large_dataset`: Load a large dataset efficiently.\r\n- `reduce_data_memory_useage`: Reduce memory usage of the dataset.\r\n\r\n### Data Cleaning and Manipulation\r\n- `drop_columns`: Drop specified columns from the dataset.\r\n- `fix_missing_values`: Handle missing values in the dataset.\r\n- `fix_unbalanced_dataset`: Address class imbalance in a classification dataset.\r\n- `filter_data`: Filter data based on specified conditions.\r\n- `remove_duplicates`: Remove duplicate rows from the dataset.\r\n- `rename_columns`: Rename columns in the dataset.\r\n- `replace_values`: Replace specified values in the dataset.\r\n- `reset_index`: Reset the index of the dataset.\r\n- `set_index`: Set a specific column as the index.\r\n- `sort_index`: Sort the index of the dataset.\r\n- `sort_values`: Sort the values of the dataset.\r\n\r\n### Data Formatting and Transformation\r\n- `categorical_to_datetime`: Convert categorical columns to datetime format.\r\n- `categorical_to_numerical`: Convert categorical columns to numerical format.\r\n- `numerical_to_categorical`: Convert numerical columns to categorical format.\r\n- `column_binning`: Bin values in a column into specified bins.\r\n\r\n### Exploratory Data Analysis\r\n- `eda`: Perform exploratory data analysis on the dataset.\r\n- `eda_visual`: Visualize exploratory data analysis results.\r\n- `pandas_profiling`: Generate a Pandas Profiling report for the dataset.\r\n- `sweetviz_profile_report`: Generate a Sweetviz Profile Report for the dataset.\r\n- `count_column_categories`: Count the categories in a categorical column.\r\n- `unique_elements_in_columns`: Get the unique elements that exist in each column in the dataset.\r\n\r\n### Feature Engineering\r\n- `extract_date_features`: Extract date-related features from a datetime column.\r\n- `polyreg_x`: Get the polynomial regression x for independent variables after specifying the degree.\r\n- `select_features`: Select relevant features for modeling.\r\n- `select_dependent_and_independent`: Select dependent and independent variables.\r\n\r\n### Data Preprocessing\r\n- `scale_independent_variables`: Scale independent variables in the dataset.\r\n- `remove_outlier`: Remove outliers from the dataset.\r\n- `split_data`: Split the dataset into training and test sets.\r\n\r\n### Model Building and Evaluation\r\n- `get_bestK_KNNregressor`: Find the best K value for KNN regression.\r\n- `train_model_regressor`: Train a regression model.\r\n- `regressor_predict`: Make predictions using a regression model.\r\n- `regressor_evaluation`: Evaluate the performance of a regression model.\r\n- `regressor_model_testing`: Test a regression model.\r\n- `polyreg_graph`: Visualize a polynomial regression graph.\r\n- `simple_linregres_graph`: Visualize a regression graph.\r\n- `build_multiple_regressors`: Build multiple regression models.\r\n- `build_multiple_regressors_from_features`: Build regression models using selected features.\r\n- `build_single_regressor_from_features`: Build a single regression model using selected features.\r\n- `get_bestK_KNNclassifier`: Find the best K value for KNN classification.\r\n- `train_model_classifier`: Train a classification model.\r\n- `classifier_predict`: Make predictions using a classification model.\r\n- `classifier_evaluation`: Evaluate the performance of a classification model.\r\n- `classifier_model_testing`: Test a classification model.\r\n- `classifier_graph`: Visualize a classification graph.\r\n- `build_multiple_classifiers`: Build multiple classification models.\r\n- `build_multiple_classifiers_from_features`: Build classification models using selected features.\r\n- `build_single_classifier_from_features`: Build a single classification model using selected features.\r\n\r\n### Data Aggregation and Summarization\r\n- `group_data`: Group and summarize data based on specified conditions.\r\n\r\n### Data Type Handling\r\n- `select_datatype`: Select columns of a specific datatype in the dataset.\r\n\r\n## Installation\r\n\r\nYou can install MLWizard using pip:\r\n\r\n```bash\r\npip install mlwizard\r\n```\r\n\r\n\r\n## Useage\r\nfrom mlwizard import SupervisedLearning\r\n\r\n# Example usage\r\n```bash\r\nimport numpy as np\r\nimport pandas as pd\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.ensemble import RandomForestClassifier, DecisionTreeClassifier\r\nfrom sklearn.snm import SVC\r\n\r\n\r\ndataset = pd.read_csv(\"Your_file_path\") # Load your dataset(e.g Pandas DataFrame)\r\ndata = SupervisedLearning(dataset)\r\n\r\n# Exploratory Data Analysis\r\neda = data.eda()\r\neda_visual = data.eda_visual()\r\n\r\n# Build and Evaluate Classifier\r\nclassifiers = [\"LogisticRegression(random_state = 0)\", \"RandomForestClassifier(random_state = 0)\", \"DecisionTreeClassifier(random_state = 0)\", \"SVC()\"]\r\nbuild_model = data.build_multiple_classifiers()\r\n```\r\n\r\n## Acknowledgments\r\nMLWizard relies on several open-source libraries to provide its functionality. We would like to express our gratitude to the developers and contributors of the following libraries:\r\n\r\n- [NumPy](https://numpy.org/)\r\n- [Pandas](https://pandas.pydata.org/)\r\n- [Matplotlib](https://matplotlib.org/)\r\n- [Seaborn](https://seaborn.pydata.org/)\r\n- [yData Profiling](https://github.com/ydataai/ydata-profiling)\r\n- [Sweetviz](https://github.com/fbdesignpro/sweetviz)\r\n- [Imbalanced-Learn (imblearn)](https://imbalanced-learn.org/)\r\n- [Scikit-learn](https://scikit-learn.org/)\r\n- [Warnings](https://docs.python.org/3/library/warnings.html)\r\n- [Datatable](https://datatable.readthedocs.io/en/latest/)\r\n\r\nThe MLWizard library builds upon the functionality provided by these excellent tools, We sincerely thank the maintainers and contributors of these libraries for their valuable contributions to the open-source community.\r\n\r\n\r\n## License\r\nMLWizard is distributed under the MIT License. Feel free to use, modify, and distribute it according to the terms of the license.\r\n\r\n\r\n## Changelog\r\n\r\n### v1.0.1 (January 2024):\r\n\r\n- First release\r\n\r\n\r\n## Contributors\r\nWe'd like to express our gratitude to the following contributors that have influenced and supported MLWizard:\r\n\r\n- [Onyiriuba Leonard](https://www.linkedin.com/in/chukwubuikem-leonard-onyiriuba/): for overseeing the entire project development lifecycle.\r\n- Role: Project Lead and Maintainer.\r\n- Email: workwithtechleo@gmail.com.\r\n<br>\r\n\r\n\r\n- [The TechLeo Community](https://www.linkedin.com/in/techleo/): for allowing the use of this project as a way to explain, learn, test, understand, and make easy, the machine learning process. \r\n- Role: Testers.\r\n- Email: techleo.ng@gmail.com.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Let's make building machine learning models the complex way, easy.",
"version": "1.0.1",
"project_urls": null,
"split_keywords": [
"machine learning",
"data science",
"data preprocessing",
"supervised learning",
"data exploration",
"ml framework",
"data cleaning",
"regression",
"classification",
"machine learning toolkit"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ed80f2d2ea1212bd4183aea150bf68e37e099e28c2cc3b9ec6671d8533b359bc",
"md5": "2572f9bbe9c3a2d7b052f8da9c7d46e6",
"sha256": "b688e8e33f3d20f03cda11744f7e6de3a727fb11e51c932c636afa26590f3b20"
},
"downloads": -1,
"filename": "mlwizard-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2572f9bbe9c3a2d7b052f8da9c7d46e6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.0",
"size": 49155,
"upload_time": "2024-01-08T03:22:35",
"upload_time_iso_8601": "2024-01-08T03:22:35.475752Z",
"url": "https://files.pythonhosted.org/packages/ed/80/f2d2ea1212bd4183aea150bf68e37e099e28c2cc3b9ec6671d8533b359bc/mlwizard-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6a4c800157480f3f36a3bc28b99c3bff0e4b373568b8a388f2d6ed89e7a2027d",
"md5": "0fe8a61be123b1aefa16e88d7cfb31e7",
"sha256": "af15b185f33097bbbf62fbe3bb3791f4ad17f00c8f4c4e6b24a022365309924d"
},
"downloads": -1,
"filename": "mlwizard-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "0fe8a61be123b1aefa16e88d7cfb31e7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 48481,
"upload_time": "2024-01-08T03:22:36",
"upload_time_iso_8601": "2024-01-08T03:22:36.611783Z",
"url": "https://files.pythonhosted.org/packages/6a/4c/800157480f3f36a3bc28b99c3bff0e4b373568b8a388f2d6ed89e7a2027d/mlwizard-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-08 03:22:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "mlwizard"
}