# vaganboostktf: VAE-GAN Boost by TF
=============================================<p align="Center"></p>=============================================
=====
[](https://www.python.org/)
[](https://opensource.org/licenses/MIT)
# VaganBoostKFT
VaganBoostKFT is a hybrid machine learning package that integrates generative modeling (using CVAE and CGAN) with an advanced LightGBM classifier pipeline. It provides robust data preprocessing, custom sampling strategies for imbalanced data, automated hyperparameter tuning (including dimensionality reduction via PCA, LDA, or TruncatedSVD), and built-in visualization of model evaluation metrics (confusion matrices, ROC curves, and precision-recall curves).
## Features
- 🧬 Hybrid architecture combining generative and discriminative models
- ⚖️ Effective handling of class imbalance through synthetic data generation
- 🔄 Iterative training process with automatic model refinement
- 📊 Comprehensive evaluation metrics and visualizations
- 💾 Model persistence and reproducibility features
- 🖥️ Command-line interface for easy operation
## Installation
Install the required dependencies:
```bash
pip install dill
pip install dask[dataframe]
pip install umap-learn
```
Additional dependencies (if not already installed) include:
- scikit-learn
- imbalanced-learn
- lightgbm
- tensorflow
- seaborn
- matplotlib
- joblib
```bash
pip install vaganboostktf
```
For development installation:
```bash
git clone https://github.com/yourusername/vaganboostktf.git
cd vaganboostktf
pip install -e .
```
## Modules
- **data_preprocessor.py:** Provides consistent data preprocessing (scaling, handling missing values, and encoding).
- **trainer.py:** Orchestrates the hybrid training workflow combining generative models (CVAE, CGAN) and the LightGBM classifier.
- **lgbm_tuner.py:** Implements hyperparameter tuning for the advanced LightGBM pipeline.
- **lgbm_classifier.py:** Contains the full LightGBM classifier pipeline that integrates preprocessing, feature selection, dimensionality reduction, SMOTE balancing (with custom sampling strategies), and hyperparameter tuning.
- **utils.py:** Provides utility functions for visualization (confusion matrix, ROC curves, precision-recall curves) and helper classes like `DecompositionSwitcher`.
## Usage Example
Below is a sample script demonstrating how to use VaganBoostKFT:
```python
import pandas as pd
import numpy as np
from vaganboostktf.data_preprocessor import DataPreprocessor
from vaganboostktf.trainer import HybridModelTrainer
from vaganboostktf.lgbm_tuner import LightGBMTuner
from vaganboostktf.utils import plot_confusion_matrix, plot_roc_curves, plot_pr_curves
# ===========================
# 1. Load and Prepare Data
# ===========================
df = pd.read_csv("input.csv")
# Identify features and target
feature_columns = [col for col in df.columns if col != "label"]
target_column = "label"
# Initialize data preprocessor
preprocessor = DataPreprocessor()
# Preprocess data (handling missing values, scaling, encoding)
X_train_scaled, X_test_scaled, y_train, y_test = preprocessor.prepare_data(
df, feature_columns, target_column
)
# ===========================
# 2. Train Hybrid Model (CVAE, CGAN + LGBM)
# ===========================
trainer = HybridModelTrainer(config={
'num_classes': 4,
'cvae_params': {
'input_dim': 25,
'latent_dim': 10,
'num_classes': 4,
'learning_rate': 0.01
},
'cgan_params': {
'input_dim': 25,
'latent_dim': 10,
'num_classes': 4,
'generator_lr': 0.0002,
'discriminator_lr': 0.0002
},
'input_path': 'input.csv',
'model_dir': 'trained_models',
'cvae_epochs': 100,
'cgan_epochs': 100,
'lgbm_iterations': 100,
'samples_per_class': 50
})
# Run hybrid training (Generative + LGBM)
trainer.training_loop(X_train_scaled, y_train, X_test_scaled, y_test, iterations=5)
print("\nHybrid training completed! Models saved in 'trained_models/'")
# ===========================
# 3. Load and Evaluate LightGBM Model
# ===========================
lgbm_tuner = LightGBMTuner(input_path="input.csv", output_path="trained_models")
# Train the LightGBM model (already tuned within `lgbm_classifier`)
lgbm_tuner.tune()
# Predict on test data
y_pred = lgbm_tuner.predict(X_test_scaled)
y_proba = lgbm_tuner.predict_proba(X_test_scaled)
# ===========================
# 4. Visualize Results
# ===========================
class_names = [str(i) for i in np.unique(y_test)]
# Plot Confusion Matrix
conf_matrix_fig = plot_confusion_matrix(y_test, y_pred, class_names, normalize=True)
conf_matrix_fig.savefig("trained_models/confusion_matrix.png")
# Plot ROC Curves
roc_curve_fig = plot_roc_curves(y_test, y_proba, class_names)
roc_curve_fig.savefig("trained_models/roc_curve.png")
# Plot Precision-Recall Curves
pr_curve_fig = plot_pr_curves(y_test, y_proba, class_names)
pr_curve_fig.savefig("trained_models/pr_curve.png")
print("\nEvaluation completed! Check 'trained_models/' for plots.")
```
## Architecture
```mermaid
graph TD
A["Raw Data (CSV)"] --> B["DataPreprocessor"]
B --> C["Preprocessed Data"]
C --> D["CVAE"]
C --> E["CGAN"]
D --> F["Synthetic Data (CVAE)"]
E --> G["Synthetic Data (CGAN)"]
F --> H["Combined Real & Synthetic Data"]
G --> H
H --> I["LightGBM Classifier Pipeline"]
I --> J["Evaluation (Confusion Matrix, ROC, PR Curves)"]
J --> K["Best Models Saved"]
```
## Key Components
- **Conditional VAE**: Generates class-conditioned synthetic samples
- **Conditional GAN**: Produces additional class-specific synthetic data
- **LightGBM Tuner**: Optimized gradient boosting with automated hyperparameter search
- **Hybrid Trainer**: Orchestrates iterative training process
## Additional Information
- **Hybrid Workflow:** The training loop in `trainer.py` first trains generative models (CVAE and CGAN) to create synthetic data, which is then combined with real data to train a robust LightGBM classifier.
- **Custom Sampling Strategies:** `lgbm_classifier.py` integrates a function to generate sampling strategies for SMOTE to address severe class imbalance.
- **Visualization:** Evaluation plots are generated and saved in the output directory to help assess model performance.
## Configuration
Default parameters can be modified through:
- Command-line arguments
- JSON configuration files
- Python API parameters
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
=============================================<p align="Center"></p>=============================================
=====
| https://github.com/AliBavarchee/ |
----
| https://www.linkedin.com/in/ali-bavarchee-qip/ |
Raw data
{
"_id": null,
"home_page": "https://github.com/AliBavarchee/vaganboost",
"name": "vaganboostktf",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "machine-learning, deep-learning, data-augmentation, class-imbalance, vae, gan, lightgbm",
"author": "Ali Bavarchee",
"author_email": "ali.bavarchee@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c4/80/c1fd4a9a319538263d4ba12ef95329980ef955b4aee7789c8cf6426d8e5d/vaganboostktf-0.9.9.tar.gz",
"platform": null,
"description": "# vaganboostktf: VAE-GAN Boost by TF\r\n\r\n=============================================<p align=\"Center\"></p>=============================================\r\n=====\r\n\r\n[](https://www.python.org/)\r\n[](https://opensource.org/licenses/MIT)\r\n\r\n# VaganBoostKFT\r\n\r\nVaganBoostKFT is a hybrid machine learning package that integrates generative modeling (using CVAE and CGAN) with an advanced LightGBM classifier pipeline. It provides robust data preprocessing, custom sampling strategies for imbalanced data, automated hyperparameter tuning (including dimensionality reduction via PCA, LDA, or TruncatedSVD), and built-in visualization of model evaluation metrics (confusion matrices, ROC curves, and precision-recall curves).\r\n\r\n\r\n## Features\r\n\r\n- \ud83e\uddec Hybrid architecture combining generative and discriminative models\r\n- \u2696\ufe0f Effective handling of class imbalance through synthetic data generation\r\n- \ud83d\udd04 Iterative training process with automatic model refinement\r\n- \ud83d\udcca Comprehensive evaluation metrics and visualizations\r\n- \ud83d\udcbe Model persistence and reproducibility features\r\n- \ud83d\udda5\ufe0f Command-line interface for easy operation\r\n\r\n## Installation\r\n\r\nInstall the required dependencies:\r\n\r\n```bash\r\npip install dill\r\npip install dask[dataframe]\r\npip install umap-learn\r\n```\r\n\r\nAdditional dependencies (if not already installed) include:\r\n- scikit-learn\r\n- imbalanced-learn\r\n- lightgbm\r\n- tensorflow\r\n- seaborn\r\n- matplotlib\r\n- joblib\r\n\r\n```bash\r\npip install vaganboostktf\r\n```\r\n\r\nFor development installation:\r\n```bash\r\ngit clone https://github.com/yourusername/vaganboostktf.git\r\ncd vaganboostktf\r\npip install -e .\r\n```\r\n\r\n## Modules\r\n\r\n- **data_preprocessor.py:** Provides consistent data preprocessing (scaling, handling missing values, and encoding).\r\n- **trainer.py:** Orchestrates the hybrid training workflow combining generative models (CVAE, CGAN) and the LightGBM classifier.\r\n- **lgbm_tuner.py:** Implements hyperparameter tuning for the advanced LightGBM pipeline.\r\n- **lgbm_classifier.py:** Contains the full LightGBM classifier pipeline that integrates preprocessing, feature selection, dimensionality reduction, SMOTE balancing (with custom sampling strategies), and hyperparameter tuning.\r\n- **utils.py:** Provides utility functions for visualization (confusion matrix, ROC curves, precision-recall curves) and helper classes like `DecompositionSwitcher`.\r\n\r\n## Usage Example\r\n\r\nBelow is a sample script demonstrating how to use VaganBoostKFT:\r\n\r\n```python\r\nimport pandas as pd\r\nimport numpy as np\r\nfrom vaganboostktf.data_preprocessor import DataPreprocessor\r\nfrom vaganboostktf.trainer import HybridModelTrainer\r\nfrom vaganboostktf.lgbm_tuner import LightGBMTuner\r\nfrom vaganboostktf.utils import plot_confusion_matrix, plot_roc_curves, plot_pr_curves\r\n\r\n# ===========================\r\n# 1. Load and Prepare Data\r\n# ===========================\r\ndf = pd.read_csv(\"input.csv\")\r\n\r\n# Identify features and target\r\nfeature_columns = [col for col in df.columns if col != \"label\"]\r\ntarget_column = \"label\"\r\n\r\n# Initialize data preprocessor\r\npreprocessor = DataPreprocessor()\r\n\r\n# Preprocess data (handling missing values, scaling, encoding)\r\nX_train_scaled, X_test_scaled, y_train, y_test = preprocessor.prepare_data(\r\n df, feature_columns, target_column\r\n)\r\n\r\n# ===========================\r\n# 2. Train Hybrid Model (CVAE, CGAN + LGBM)\r\n# ===========================\r\ntrainer = HybridModelTrainer(config={\r\n 'num_classes': 4,\r\n 'cvae_params': {\r\n 'input_dim': 25,\r\n 'latent_dim': 10,\r\n 'num_classes': 4,\r\n 'learning_rate': 0.01\r\n },\r\n 'cgan_params': {\r\n 'input_dim': 25,\r\n 'latent_dim': 10,\r\n 'num_classes': 4,\r\n 'generator_lr': 0.0002,\r\n 'discriminator_lr': 0.0002\r\n },\r\n\t'input_path': 'input.csv',\r\n 'model_dir': 'trained_models',\r\n 'cvae_epochs': 100,\r\n 'cgan_epochs': 100,\r\n 'lgbm_iterations': 100,\r\n 'samples_per_class': 50\r\n})\r\n\r\n# Run hybrid training (Generative + LGBM)\r\ntrainer.training_loop(X_train_scaled, y_train, X_test_scaled, y_test, iterations=5)\r\nprint(\"\\nHybrid training completed! Models saved in 'trained_models/'\")\r\n\r\n# ===========================\r\n# 3. Load and Evaluate LightGBM Model\r\n# ===========================\r\nlgbm_tuner = LightGBMTuner(input_path=\"input.csv\", output_path=\"trained_models\")\r\n\r\n# Train the LightGBM model (already tuned within `lgbm_classifier`)\r\nlgbm_tuner.tune()\r\n\r\n# Predict on test data\r\ny_pred = lgbm_tuner.predict(X_test_scaled)\r\ny_proba = lgbm_tuner.predict_proba(X_test_scaled)\r\n\r\n# ===========================\r\n# 4. Visualize Results\r\n# ===========================\r\nclass_names = [str(i) for i in np.unique(y_test)]\r\n\r\n# Plot Confusion Matrix\r\nconf_matrix_fig = plot_confusion_matrix(y_test, y_pred, class_names, normalize=True)\r\nconf_matrix_fig.savefig(\"trained_models/confusion_matrix.png\")\r\n\r\n# Plot ROC Curves\r\nroc_curve_fig = plot_roc_curves(y_test, y_proba, class_names)\r\nroc_curve_fig.savefig(\"trained_models/roc_curve.png\")\r\n\r\n# Plot Precision-Recall Curves\r\npr_curve_fig = plot_pr_curves(y_test, y_proba, class_names)\r\npr_curve_fig.savefig(\"trained_models/pr_curve.png\")\r\n\r\nprint(\"\\nEvaluation completed! Check 'trained_models/' for plots.\")\r\n```\r\n\r\n## Architecture\r\n\r\n```mermaid\r\ngraph TD\r\n A[\"Raw Data (CSV)\"] --> B[\"DataPreprocessor\"]\r\n B --> C[\"Preprocessed Data\"]\r\n C --> D[\"CVAE\"]\r\n C --> E[\"CGAN\"]\r\n D --> F[\"Synthetic Data (CVAE)\"]\r\n E --> G[\"Synthetic Data (CGAN)\"]\r\n F --> H[\"Combined Real & Synthetic Data\"]\r\n G --> H\r\n H --> I[\"LightGBM Classifier Pipeline\"]\r\n I --> J[\"Evaluation (Confusion Matrix, ROC, PR Curves)\"]\r\n J --> K[\"Best Models Saved\"]\r\n```\r\n\r\n## Key Components\r\n\r\n- **Conditional VAE**: Generates class-conditioned synthetic samples\r\n- **Conditional GAN**: Produces additional class-specific synthetic data\r\n- **LightGBM Tuner**: Optimized gradient boosting with automated hyperparameter search\r\n- **Hybrid Trainer**: Orchestrates iterative training process\r\n\r\n\r\n## Additional Information\r\n\r\n- **Hybrid Workflow:** The training loop in `trainer.py` first trains generative models (CVAE and CGAN) to create synthetic data, which is then combined with real data to train a robust LightGBM classifier.\r\n- **Custom Sampling Strategies:** `lgbm_classifier.py` integrates a function to generate sampling strategies for SMOTE to address severe class imbalance.\r\n- **Visualization:** Evaluation plots are generated and saved in the output directory to help assess model performance.\r\n\r\n\r\n## Configuration\r\n\r\nDefault parameters can be modified through:\r\n- Command-line arguments\r\n- JSON configuration files\r\n- Python API parameters\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n=============================================<p align=\"Center\"></p>=============================================\r\n=====\r\n| https://github.com/AliBavarchee/ |\r\n----\r\n| https://www.linkedin.com/in/ali-bavarchee-qip/ |\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Hybrid VAE-GAN with LightGBM for class-imbalanced classification",
"version": "0.9.9",
"project_urls": {
"Homepage": "https://github.com/AliBavarchee/vaganboost"
},
"split_keywords": [
"machine-learning",
" deep-learning",
" data-augmentation",
" class-imbalance",
" vae",
" gan",
" lightgbm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ce029ab5d737ea24d1eba0874dcd4e2755cc0829d35636ac8d5cc6d57f148145",
"md5": "e497483e1f4c1346b61ff5da6d74bf6e",
"sha256": "5e54de28a4631ac70f672ccd8ef899a0fc5574d9d96be6e34040a943afbfd009"
},
"downloads": -1,
"filename": "vaganboostktf-0.9.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e497483e1f4c1346b61ff5da6d74bf6e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21880,
"upload_time": "2025-02-15T13:56:15",
"upload_time_iso_8601": "2025-02-15T13:56:15.127434Z",
"url": "https://files.pythonhosted.org/packages/ce/02/9ab5d737ea24d1eba0874dcd4e2755cc0829d35636ac8d5cc6d57f148145/vaganboostktf-0.9.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c480c1fd4a9a319538263d4ba12ef95329980ef955b4aee7789c8cf6426d8e5d",
"md5": "0f0ef328d583701302b2de8482759c9a",
"sha256": "8e733f78b80af69fd711badad70a02a7d1e37a075c229c524968338d8f6bc021"
},
"downloads": -1,
"filename": "vaganboostktf-0.9.9.tar.gz",
"has_sig": false,
"md5_digest": "0f0ef328d583701302b2de8482759c9a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20461,
"upload_time": "2025-02-15T13:56:17",
"upload_time_iso_8601": "2025-02-15T13:56:17.643838Z",
"url": "https://files.pythonhosted.org/packages/c4/80/c1fd4a9a319538263d4ba12ef95329980ef955b4aee7789c8cf6426d8e5d/vaganboostktf-0.9.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-15 13:56:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AliBavarchee",
"github_project": "vaganboost",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.18.0"
]
]
}
],
"lcname": "vaganboostktf"
}