[![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](https://github.com/xuxu-wei/SUAVE/blob/main/%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E-%E4%B8%AD%E6%96%87%E7%89%88.md) ![PyPI](https://img.shields.io/pypi/v/suave-ml)
# SUAVE: Supervised and Unified Analysis of Variational Embeddings
**SUAVE** is a Python package built upon a **Hybrid Variational Autoencoder (VAE)** . It unifies unsupervised latent representation learning with supervised prediction tasks:
* **Supervised Learning** : Utilizes VAE to map high-dimensional input features to a low-dimensional, independent latent space. This approach not only retains feature interpretability but also effectively addresses multicollinearity issues, enhancing the model's robustness and generalization capabilities when handling highly correlated features.
* **Representation Learning** : Guides the latent space with label information, enabling dimensionality reduction and producing discriminative and interpretable embeddings beneficial for downstream classification or regression tasks. Additionally, SUAVE integrates multi-task learning, allowing the incorporation of information from various downstream prediction tasks into the latent space learning process by adjusting task weights.
---
## Installation
**Please Note** By default, SUAVE attempts to detect the system environment and automatically installs the appropriate version of PyTorch during installation. However, this feature has not been thoroughly tested.
```bash
pip install suave-ml
```
It is recommended to install the suitable PyTorch version for your system environment before installing this package. Please refer to the [official PyTorch guide](https://pytorch.org/get-started/locally/) for installation instructions. For example, on Windows, you can use the following pip command to install the version of PyTorch corresponding to CUDA 12.1:
```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
---
## Quick Start
### 1. Prepare Your Data (Here, Randomly Generated Data is Used as an Example)
```python
from suave.utils import make_multitask_classification
X_train, X_test, Y_train, Y_test = make_multitask_classification(random_state=123)
```
---
### 2. Define and Train the Model
```python
from suave import SuaveClassifier
# Instantiate the model
model = SuaveClassifier(input_dim=X_train.shape[1], # Input feature dimension
task_classes=[len(Y_train[col].unique()) for col in Y_train.columns], # Number of binary classification tasks
latent_dim=20 # Latent dimension
)
# Fit the model on training data
model.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)
```
![png](readme_files/readme_3_0.png)
```
Training: 70%|███████ | 704/1000 [06:26<02:42, 1.82epoch/s, VAE(t)=189.910, VAE(v)=166.365, AUC(t)=[0.98, 0.961, 0.983], AUC(v)=[0.83, 0.797, 0.922]]
Epoch 705: Task task_3 early stopping triggered.
Early stopping triggered due to no improvement in both VAE and task losses.
```
---
### 3. Make Predictions
```python
# Make predictions on test data
y_probas = model.predict_proba(X_test)
y_hats = model.predict(X_test)
auc_scores = model.score(X_test, Y_test)
print("AUC Scores:", auc_scores)
```
```
AUC Scores: [0.8314483 0.8053462 0.90158279]
```
---
### 4. Transform Features to Latent Space
```python
latent_features = model.transform(np.array(X_test))
X_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`
```
---
### 5. Reconstruct inputs from latent space
```python
reconstructed = model.inverse_transform(latent_features)
X_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)
```
---
## Key Features
### 1. Supervised & Unsupervised Fusion
* **Unsupervised (VAE)** : Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term.
* **Supervised (MTL)** : Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.
### 2. Multi-Task Learning Integration
* **Shared Representations** : A single latent space underpins multiple related classification (or other) tasks, leveraging common data structures for efficient, joint learning.
* **Task-Specific Heads** : Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.
* **Representation Learning to Mitigate Multicollinearity** : By mapping high-dimensional input features to a low-dimensional latent space, SUAVE effectively reduces linear correlations between features, alleviating multicollinearity issues.
### 3. Flexible and Customizable Architecture
* **Configurable Networks** : Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric).
* **Regularization Built-In** : Batch normalization and dropout help stabilize training and mitigate overfitting.
### 4. Scikit-Learn Compatibility
* **Seamless Integration** : The `SuaveClassifier` class is compatible with scikit-learn’s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.
### 5. Comprehensive Training Utilities
* **Joint Objective Optimization** : Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses.
* **Early Stopping & LR Scheduling** : Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.
## Example Use Cases
- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks.
- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors.
- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.
---
## License
This project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/xuxu-wei/SUAVE",
"name": "suave-ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "VAE supervised-dim-reduction multi-task-learning pytorch sklearn deep learning",
"author": "Xuxu Wei",
"author_email": "wxxtcm@163.com",
"download_url": "https://files.pythonhosted.org/packages/b9/cc/44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5/suave_ml-0.1.2a1.tar.gz",
"platform": null,
"description": "[![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](https://github.com/xuxu-wei/SUAVE/blob/main/%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E-%E4%B8%AD%E6%96%87%E7%89%88.md) ![PyPI](https://img.shields.io/pypi/v/suave-ml)\r\n\r\n# SUAVE: Supervised and Unified Analysis of Variational Embeddings\r\n\r\n**SUAVE** is a Python package built upon a **Hybrid Variational Autoencoder (VAE)** . It unifies unsupervised latent representation learning with supervised prediction tasks:\r\n\r\n* **Supervised Learning** : Utilizes VAE to map high-dimensional input features to a low-dimensional, independent latent space. This approach not only retains feature interpretability but also effectively addresses multicollinearity issues, enhancing the model's robustness and generalization capabilities when handling highly correlated features.\r\n* **Representation Learning** : Guides the latent space with label information, enabling dimensionality reduction and producing discriminative and interpretable embeddings beneficial for downstream classification or regression tasks. Additionally, SUAVE integrates multi-task learning, allowing the incorporation of information from various downstream prediction tasks into the latent space learning process by adjusting task weights.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Please Note** By default, SUAVE attempts to detect the system environment and automatically installs the appropriate version of PyTorch during installation. However, this feature has not been thoroughly tested.\r\n\r\n```bash\r\npip install suave-ml\r\n```\r\n\r\nIt is recommended to install the suitable PyTorch version for your system environment before installing this package. Please refer to the [official PyTorch guide](https://pytorch.org/get-started/locally/) for installation instructions. For example, on Windows, you can use the following pip command to install the version of PyTorch corresponding to CUDA 12.1:\r\n\r\n```bash\r\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\r\n```\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n### 1. Prepare Your Data (Here, Randomly Generated Data is Used as an Example)\r\n\r\n```python\r\nfrom suave.utils import make_multitask_classification\r\nX_train, X_test, Y_train, Y_test = make_multitask_classification(random_state=123)\r\n```\r\n\r\n---\r\n\r\n### 2. Define and Train the Model\r\n\r\n```python\r\nfrom suave import SuaveClassifier\r\n\r\n# Instantiate the model\r\nmodel = SuaveClassifier(input_dim=X_train.shape[1], # Input feature dimension\r\n task_classes=[len(Y_train[col].unique()) for col in Y_train.columns], # Number of binary classification tasks\r\n latent_dim=20 # Latent dimension\r\n )\r\n\r\n# Fit the model on training data\r\nmodel.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)\r\n```\r\n\r\n![png](readme_files/readme_3_0.png)\r\n\r\n```\r\nTraining: 70%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588 | 704/1000 [06:26<02:42, 1.82epoch/s, VAE(t)=189.910, VAE(v)=166.365, AUC(t)=[0.98, 0.961, 0.983], AUC(v)=[0.83, 0.797, 0.922]] \r\nEpoch 705: Task task_3 early stopping triggered.\r\nEarly stopping triggered due to no improvement in both VAE and task losses.\r\n```\r\n\r\n---\r\n\r\n### 3. Make Predictions\r\n\r\n```python\r\n# Make predictions on test data\r\ny_probas = model.predict_proba(X_test)\r\ny_hats = model.predict(X_test)\r\n\r\nauc_scores = model.score(X_test, Y_test)\r\nprint(\"AUC Scores:\", auc_scores)\r\n```\r\n\r\n```\r\nAUC Scores: [0.8314483 0.8053462 0.90158279]\r\n```\r\n\r\n---\r\n\r\n### 4. Transform Features to Latent Space\r\n\r\n```python\r\nlatent_features = model.transform(np.array(X_test))\r\nX_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`\r\n```\r\n\r\n---\r\n\r\n### 5. Reconstruct inputs from latent space\r\n\r\n```python\r\nreconstructed = model.inverse_transform(latent_features)\r\nX_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)\r\n```\r\n\r\n---\r\n\r\n## Key Features\r\n\r\n### 1. Supervised & Unsupervised Fusion\r\n\r\n* **Unsupervised (VAE)** : Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term.\r\n* **Supervised (MTL)** : Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.\r\n\r\n### 2. Multi-Task Learning Integration\r\n\r\n* **Shared Representations** : A single latent space underpins multiple related classification (or other) tasks, leveraging common data structures for efficient, joint learning.\r\n* **Task-Specific Heads** : Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.\r\n* **Representation Learning to Mitigate Multicollinearity** : By mapping high-dimensional input features to a low-dimensional latent space, SUAVE effectively reduces linear correlations between features, alleviating multicollinearity issues.\r\n\r\n### 3. Flexible and Customizable Architecture\r\n\r\n* **Configurable Networks** : Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric).\r\n* **Regularization Built-In** : Batch normalization and dropout help stabilize training and mitigate overfitting.\r\n\r\n### 4. Scikit-Learn Compatibility\r\n\r\n* **Seamless Integration** : The `SuaveClassifier` class is compatible with scikit-learn\u2019s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.\r\n\r\n### 5. Comprehensive Training Utilities\r\n\r\n* **Joint Objective Optimization** : Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses.\r\n* **Early Stopping & LR Scheduling** : Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.\r\n\r\n## Example Use Cases\r\n\r\n- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks.\r\n- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors.\r\n- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.\r\n\r\n---\r\n\r\n## License\r\n\r\nThis project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.\r\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "A deep learning model (hybrid VAE) implementation for label information-guided dimensionality reduction and multi-task learning.",
"version": "0.1.2a1",
"project_urls": {
"Homepage": "https://github.com/xuxu-wei/SUAVE"
},
"split_keywords": [
"vae",
"supervised-dim-reduction",
"multi-task-learning",
"pytorch",
"sklearn",
"deep",
"learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "16e1deef395079786e5029c41d96e3b0c97db2a7e51544fa7db4e6482bf72d2c",
"md5": "cf3e4d281c9108fbb27188412316a5cf",
"sha256": "c542b3f6d8eff256c2abbf5fdfc4818bccaa4afef6c83abfc674776ae1bcc392"
},
"downloads": -1,
"filename": "suave_ml-0.1.2a1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cf3e4d281c9108fbb27188412316a5cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 29499,
"upload_time": "2025-01-09T00:08:56",
"upload_time_iso_8601": "2025-01-09T00:08:56.723133Z",
"url": "https://files.pythonhosted.org/packages/16/e1/deef395079786e5029c41d96e3b0c97db2a7e51544fa7db4e6482bf72d2c/suave_ml-0.1.2a1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b9cc44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5",
"md5": "8937788b1bc7e08a3ad539ef14cef7c5",
"sha256": "e733ff3279b39c06c2903f5e476a13f6682faa98ed7ce0b514f858d0e8d0ea6e"
},
"downloads": -1,
"filename": "suave_ml-0.1.2a1.tar.gz",
"has_sig": false,
"md5_digest": "8937788b1bc7e08a3ad539ef14cef7c5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 31775,
"upload_time": "2025-01-09T00:09:00",
"upload_time_iso_8601": "2025-01-09T00:09:00.959418Z",
"url": "https://files.pythonhosted.org/packages/b9/cc/44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5/suave_ml-0.1.2a1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-09 00:09:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xuxu-wei",
"github_project": "SUAVE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.2"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.9.3"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.5"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.2.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.0.0"
]
]
}
],
"lcname": "suave-ml"
}