[![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](使用说明-中文版.md) ![PyPI](https://img.shields.io/pypi/v/suave-ml)
# SUAVE: Supervised and Unified Analysis of Variational Embeddings
**SUAVE** is a Python package built upon a **Hybrid Variational Autoencoder (VAE)** integrated with Multi-Task Learning. It unifies unsupervised latent representation learning with supervised prediction tasks. By guiding the latent space with label information, SUAVE not only achieves dimensionality reduction but also yields discriminative and interpretable embeddings that directly benefit downstream classification or regression tasks.
---
## Key Features
### 1. Supervised & Unsupervised Fusion
- **Unsupervised (VAE)**: Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term.
- **Supervised (MTL)**: Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.
### 2. Multi-Task Learning Integration
- **Shared Representations**: A single latent space underpins multiple related classification (or other) tasks, leveraging common data structure for efficient, joint learning.
- **Task-Specific Heads**: Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.
### 3. Flexible and Customizable Architecture
- **Configurable Networks**: Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric).
- **Regularization Built-In**: Batch normalization and dropout help stabilize training and mitigate overfitting.
### 4. Scikit-Learn Compatibility
- **Seamless Integration**: The `SuaveSklearn` class is compatible with scikit-learn’s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.
### 5. Comprehensive Training Utilities
- **Joint Objective Optimization**: Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses.
- **Early Stopping & LR Scheduling**: Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.
---
## Example Use Cases
- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks.
- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors.
- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.
---
## Installation
**Please Note** This package requires PyTorch. Please install the appropriate version of PyTorch for your system using the [official PyTorch guide](https://pytorch.org/get-started/locally/). By default, SUAVE will detect the system environment during installation and automatically install a suitable PyTorch version, but this feature has not been thoroughly tested.
- Install from Pypi
```bash
pip install suave-ml
```
---
## Quick Start
### 1. Prepare Your Data
```python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
### Generate example data
n_tasks = 3
n_samples = 1000
n_features = 20
X1, y1 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=10, n_classes=3, random_state=123)
X2, y2 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=8, n_classes=4, random_state=456)
X3, y3 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=12, n_classes=2, random_state=789)
X = pd.DataFrame(np.hstack([X1, X2]), columns=[f"feature_{i+1}" for i in range(n_features * 2)]) # AUC of task_3 was expected to be around 0.5
Y = pd.DataFrame({"task_1": y1, "task_2": y2, "task_3": y3})
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
```
---
### 2. Define and Train the Model
```python
from suave import SuaveClassifier
# Instantiate the model
model = SuaveClassifier(input_dim=X_train.shape[1], # Input feature dimension
task_classes=[len(Y[col].unique()) for col in Y.columns], # Number of binary classification tasks
latent_dim=10 # Latent dimension
)
# Fit the model on training data
model.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)
```
![png](readme_files/readme_3_0.png)
Training: 100%|█████████▉| 998/1000 [03:40<00:00, 4.52epoch/s, VAE(t)=94.677, VAE(v)=85.093, AUC(t)=[np.float64(0.606), np.float64(0.639), np.float64(0.5)], AUC(v)=[np.float64(0.572), np.float64(0.642), np.float64(0.543)]]
Early stopping triggered due to no improvement in both VAE and task losses.
---
### 3. Make Predictions
```python
# Make predictions on test data
y_probas = model.predict_proba(X_test)
y_hats = model.predict(X_test)
auc_scores = model.score(X_test, Y_test)
print("AUC Scores:", auc_scores)
AUC Scores: [0.6807871 0.70718777 0.50661058]
```
---
### 4. Transform Features to Latent Space
```python
latent_features = model.transform(np.array(X_test))
X_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`
```
---
### 5. Reconstruct inputs from latent space
```python
reconstructed = model.inverse_transform(latent_features)
X_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)
```
---
## License
This project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/xuxu-wei/SUAVE",
"name": "suave-ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "VAE supervised-dim-reduction multi-task-learning pytorch sklearn deep learning",
"author": "Xuxu Wei",
"author_email": "wxxtcm@163.com",
"download_url": "https://files.pythonhosted.org/packages/cf/0a/72fd1fe79c7999ec0566a41ef469210ccce9550ca703d90611a325022ea4/suave-ml-0.1.1a1.tar.gz",
"platform": null,
"description": "[![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](\u4f7f\u7528\u8bf4\u660e-\u4e2d\u6587\u7248.md) ![PyPI](https://img.shields.io/pypi/v/suave-ml)\r\n\r\n# SUAVE: Supervised and Unified Analysis of Variational Embeddings\r\n\r\n**SUAVE** is a Python package built upon a **Hybrid Variational Autoencoder (VAE)** integrated with Multi-Task Learning. It unifies unsupervised latent representation learning with supervised prediction tasks. By guiding the latent space with label information, SUAVE not only achieves dimensionality reduction but also yields discriminative and interpretable embeddings that directly benefit downstream classification or regression tasks.\r\n\r\n---\r\n\r\n## Key Features\r\n\r\n### 1. Supervised & Unsupervised Fusion\r\n\r\n- **Unsupervised (VAE)**: Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term. \r\n- **Supervised (MTL)**: Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.\r\n\r\n### 2. Multi-Task Learning Integration\r\n\r\n- **Shared Representations**: A single latent space underpins multiple related classification (or other) tasks, leveraging common data structure for efficient, joint learning. \r\n- **Task-Specific Heads**: Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.\r\n\r\n### 3. Flexible and Customizable Architecture\r\n\r\n- **Configurable Networks**: Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric). \r\n- **Regularization Built-In**: Batch normalization and dropout help stabilize training and mitigate overfitting.\r\n\r\n### 4. Scikit-Learn Compatibility\r\n\r\n- **Seamless Integration**: The `SuaveSklearn` class is compatible with scikit-learn\u2019s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.\r\n\r\n### 5. Comprehensive Training Utilities\r\n\r\n- **Joint Objective Optimization**: Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses. \r\n- **Early Stopping & LR Scheduling**: Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.\r\n\r\n---\r\n\r\n## Example Use Cases\r\n\r\n- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks. \r\n- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors. \r\n- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Please Note** This package requires PyTorch. Please install the appropriate version of PyTorch for your system using the [official PyTorch guide](https://pytorch.org/get-started/locally/). By default, SUAVE will detect the system environment during installation and automatically install a suitable PyTorch version, but this feature has not been thoroughly tested.\r\n\r\n- Install from Pypi\r\n\r\n```bash\r\npip install suave-ml\r\n```\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n### 1. Prepare Your Data\r\n\r\n\r\n```python\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nimport pandas as pd\r\nimport numpy as np\r\n\r\n### Generate example data\r\nn_tasks = 3 \r\nn_samples = 1000\r\nn_features = 20\r\n\r\nX1, y1 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=10, n_classes=3, random_state=123)\r\nX2, y2 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=8, n_classes=4, random_state=456)\r\nX3, y3 = make_classification(n_samples=n_samples, n_features=n_features, n_informative=12, n_classes=2, random_state=789)\r\n\r\nX = pd.DataFrame(np.hstack([X1, X2]), columns=[f\"feature_{i+1}\" for i in range(n_features * 2)]) # AUC of task_3 was expected to be around 0.5\r\nY = pd.DataFrame({\"task_1\": y1, \"task_2\": y2, \"task_3\": y3})\r\n\r\nX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)\r\n```\r\n\r\n---\r\n\r\n### 2. Define and Train the Model\r\n\r\n\r\n```python\r\nfrom suave import SuaveClassifier\r\n\r\n# Instantiate the model\r\nmodel = SuaveClassifier(input_dim=X_train.shape[1], # Input feature dimension\r\n task_classes=[len(Y[col].unique()) for col in Y.columns], # Number of binary classification tasks\r\n latent_dim=10 # Latent dimension\r\n )\r\n\r\n# Fit the model on training data\r\nmodel.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)\r\n```\r\n\r\n\r\n![png](readme_files/readme_3_0.png)\r\n\u200b \r\n\r\n\r\n Training: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2589| 998/1000 [03:40<00:00, 4.52epoch/s, VAE(t)=94.677, VAE(v)=85.093, AUC(t)=[np.float64(0.606), np.float64(0.639), np.float64(0.5)], AUC(v)=[np.float64(0.572), np.float64(0.642), np.float64(0.543)]] \r\n \r\n Early stopping triggered due to no improvement in both VAE and task losses.\r\n\r\n\r\n---\r\n### 3. Make Predictions\r\n```python\r\n# Make predictions on test data\r\ny_probas = model.predict_proba(X_test)\r\ny_hats = model.predict(X_test)\r\n\r\nauc_scores = model.score(X_test, Y_test)\r\nprint(\"AUC Scores:\", auc_scores)\r\nAUC Scores: [0.6807871 0.70718777 0.50661058]\r\n```\r\n\r\n---\r\n### 4. Transform Features to Latent Space\r\n\r\n\r\n```python\r\nlatent_features = model.transform(np.array(X_test))\r\nX_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`\r\n```\r\n\r\n---\r\n\r\n### 5. Reconstruct inputs from latent space\r\n\r\n\r\n```python\r\nreconstructed = model.inverse_transform(latent_features)\r\nX_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)\r\n```\r\n\r\n---\r\n\r\n## License\r\n\r\nThis project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.\r\n\r\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Deep learning model for label information-guided dimensionality reduction and multi-task learning.",
"version": "0.1.1a1",
"project_urls": {
"Homepage": "https://github.com/xuxu-wei/SUAVE"
},
"split_keywords": [
"vae",
"supervised-dim-reduction",
"multi-task-learning",
"pytorch",
"sklearn",
"deep",
"learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ec49215febdf86c75d89bf0418d3875966024cc178e32567dfc20887624642a7",
"md5": "130275ab9bb71cae8cebb0cf62f01c7e",
"sha256": "92a5a6e116b9b42be50ab0fc4fad264b3c9c89d7cc0be7547b1448d1f11ac048"
},
"downloads": -1,
"filename": "suave_ml-0.1.1a1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "130275ab9bb71cae8cebb0cf62f01c7e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 24792,
"upload_time": "2024-12-17T02:31:41",
"upload_time_iso_8601": "2024-12-17T02:31:41.748331Z",
"url": "https://files.pythonhosted.org/packages/ec/49/215febdf86c75d89bf0418d3875966024cc178e32567dfc20887624642a7/suave_ml-0.1.1a1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cf0a72fd1fe79c7999ec0566a41ef469210ccce9550ca703d90611a325022ea4",
"md5": "76aad4ed4def0026dca9ed7705179ba6",
"sha256": "201ef07a2d703befbf2475cb1566126d6f4b971dba99214cc4f7e27a0513182d"
},
"downloads": -1,
"filename": "suave-ml-0.1.1a1.tar.gz",
"has_sig": false,
"md5_digest": "76aad4ed4def0026dca9ed7705179ba6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 27112,
"upload_time": "2024-12-17T02:31:44",
"upload_time_iso_8601": "2024-12-17T02:31:44.493450Z",
"url": "https://files.pythonhosted.org/packages/cf/0a/72fd1fe79c7999ec0566a41ef469210ccce9550ca703d90611a325022ea4/suave-ml-0.1.1a1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 02:31:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xuxu-wei",
"github_project": "SUAVE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.2"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.9.3"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.5"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.2.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.0.0"
]
]
}
],
"lcname": "suave-ml"
}