suave-ml


Namesuave-ml JSON
Version 0.1.2a1 PyPI version JSON
download
home_pagehttps://github.com/xuxu-wei/SUAVE
SummaryA deep learning model (hybrid VAE) implementation for label information-guided dimensionality reduction and multi-task learning.
upload_time2025-01-09 00:09:00
maintainerNone
docs_urlNone
authorXuxu Wei
requires_python>=3.8
licenseBSD-3-Clause
keywords vae supervised-dim-reduction multi-task-learning pytorch sklearn deep learning
VCS
bugtrack_url
requirements torch scikit-learn numpy pandas tqdm matplotlib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](https://github.com/xuxu-wei/SUAVE/blob/main/%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E-%E4%B8%AD%E6%96%87%E7%89%88.md)  ![PyPI](https://img.shields.io/pypi/v/suave-ml)

# SUAVE: Supervised and Unified Analysis of Variational Embeddings

**SUAVE** is a Python package built upon a  **Hybrid Variational Autoencoder (VAE)** . It unifies unsupervised latent representation learning with supervised prediction tasks:

* **Supervised Learning** : Utilizes VAE to map high-dimensional input features to a low-dimensional, independent latent space. This approach not only retains feature interpretability but also effectively addresses multicollinearity issues, enhancing the model's robustness and generalization capabilities when handling highly correlated features.
* **Representation Learning** : Guides the latent space with label information, enabling dimensionality reduction and producing discriminative and interpretable embeddings beneficial for downstream classification or regression tasks. Additionally, SUAVE integrates multi-task learning, allowing the incorporation of information from various downstream prediction tasks into the latent space learning process by adjusting task weights.

---

## Installation

**Please Note** By default, SUAVE attempts to detect the system environment and automatically installs the appropriate version of PyTorch during installation. However, this feature has not been thoroughly tested.

```bash
pip install suave-ml
```

It is recommended to install the suitable PyTorch version for your system environment before installing this package. Please refer to the [official PyTorch guide](https://pytorch.org/get-started/locally/) for installation instructions. For example, on Windows, you can use the following pip command to install the version of PyTorch corresponding to CUDA 12.1:

```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

---

## Quick Start

### 1. Prepare Your Data (Here, Randomly Generated Data is Used as an Example)

```python
from suave.utils import make_multitask_classification
X_train, X_test, Y_train, Y_test = make_multitask_classification(random_state=123)
```

---

### 2. Define and Train the Model

```python
from suave import SuaveClassifier

# Instantiate the model
model = SuaveClassifier(input_dim=X_train.shape[1],                                             # Input feature dimension
                        task_classes=[len(Y_train[col].unique()) for col in Y_train.columns],   # Number of binary classification tasks
                        latent_dim=20                                                           # Latent dimension
                        )

# Fit the model on training data
model.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)
```

![png](readme_files/readme_3_0.png)

```
Training:  70%|███████   | 704/1000 [06:26<02:42,  1.82epoch/s, VAE(t)=189.910, VAE(v)=166.365, AUC(t)=[0.98, 0.961, 0.983], AUC(v)=[0.83, 0.797, 0.922]]  
Epoch 705: Task task_3 early stopping triggered.
Early stopping triggered due to no improvement in both VAE and task losses.
```

---

### 3. Make Predictions

```python
# Make predictions on test data
y_probas = model.predict_proba(X_test)
y_hats = model.predict(X_test)

auc_scores = model.score(X_test, Y_test)
print("AUC Scores:", auc_scores)
```

```
AUC Scores: [0.8314483  0.8053462  0.90158279]
```

---

### 4. Transform Features to Latent Space

```python
latent_features = model.transform(np.array(X_test))
X_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`
```

---

### 5. Reconstruct inputs from latent space

```python
reconstructed = model.inverse_transform(latent_features)
X_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)
```

---

## Key Features

### 1. Supervised & Unsupervised Fusion

* **Unsupervised (VAE)** : Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term.
* **Supervised (MTL)** : Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.

### 2. Multi-Task Learning Integration

* **Shared Representations** : A single latent space underpins multiple related classification (or other) tasks, leveraging common data structures for efficient, joint learning.
* **Task-Specific Heads** : Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.
* **Representation Learning to Mitigate Multicollinearity** : By mapping high-dimensional input features to a low-dimensional latent space, SUAVE effectively reduces linear correlations between features, alleviating multicollinearity issues.

### 3. Flexible and Customizable Architecture

* **Configurable Networks** : Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric).
* **Regularization Built-In** : Batch normalization and dropout help stabilize training and mitigate overfitting.

### 4. Scikit-Learn Compatibility

* **Seamless Integration** : The `SuaveClassifier` class is compatible with scikit-learn’s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.

### 5. Comprehensive Training Utilities

* **Joint Objective Optimization** : Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses.
* **Early Stopping & LR Scheduling** : Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.

## Example Use Cases

- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks.
- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors.
- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.

---

## License

This project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xuxu-wei/SUAVE",
    "name": "suave-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "VAE supervised-dim-reduction multi-task-learning pytorch sklearn deep learning",
    "author": "Xuxu Wei",
    "author_email": "wxxtcm@163.com",
    "download_url": "https://files.pythonhosted.org/packages/b9/cc/44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5/suave_ml-0.1.2a1.tar.gz",
    "platform": null,
    "description": "[![Static Badge](https://img.shields.io/badge/%E5%88%87%E6%8D%A2-%E4%B8%AD%E6%96%87%E7%89%88%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3-1082C3?style=flat)](https://github.com/xuxu-wei/SUAVE/blob/main/%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E-%E4%B8%AD%E6%96%87%E7%89%88.md)  ![PyPI](https://img.shields.io/pypi/v/suave-ml)\r\n\r\n# SUAVE: Supervised and Unified Analysis of Variational Embeddings\r\n\r\n**SUAVE** is a Python package built upon a  **Hybrid Variational Autoencoder (VAE)** . It unifies unsupervised latent representation learning with supervised prediction tasks:\r\n\r\n* **Supervised Learning** : Utilizes VAE to map high-dimensional input features to a low-dimensional, independent latent space. This approach not only retains feature interpretability but also effectively addresses multicollinearity issues, enhancing the model's robustness and generalization capabilities when handling highly correlated features.\r\n* **Representation Learning** : Guides the latent space with label information, enabling dimensionality reduction and producing discriminative and interpretable embeddings beneficial for downstream classification or regression tasks. Additionally, SUAVE integrates multi-task learning, allowing the incorporation of information from various downstream prediction tasks into the latent space learning process by adjusting task weights.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Please Note** By default, SUAVE attempts to detect the system environment and automatically installs the appropriate version of PyTorch during installation. However, this feature has not been thoroughly tested.\r\n\r\n```bash\r\npip install suave-ml\r\n```\r\n\r\nIt is recommended to install the suitable PyTorch version for your system environment before installing this package. Please refer to the [official PyTorch guide](https://pytorch.org/get-started/locally/) for installation instructions. For example, on Windows, you can use the following pip command to install the version of PyTorch corresponding to CUDA 12.1:\r\n\r\n```bash\r\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\r\n```\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n### 1. Prepare Your Data (Here, Randomly Generated Data is Used as an Example)\r\n\r\n```python\r\nfrom suave.utils import make_multitask_classification\r\nX_train, X_test, Y_train, Y_test = make_multitask_classification(random_state=123)\r\n```\r\n\r\n---\r\n\r\n### 2. Define and Train the Model\r\n\r\n```python\r\nfrom suave import SuaveClassifier\r\n\r\n# Instantiate the model\r\nmodel = SuaveClassifier(input_dim=X_train.shape[1],                                             # Input feature dimension\r\n                        task_classes=[len(Y_train[col].unique()) for col in Y_train.columns],   # Number of binary classification tasks\r\n                        latent_dim=20                                                           # Latent dimension\r\n                        )\r\n\r\n# Fit the model on training data\r\nmodel.fit(X_train, Y_train, epochs=1000, animate_monitor=True, verbose=1)\r\n```\r\n\r\n![png](readme_files/readme_3_0.png)\r\n\r\n```\r\nTraining:  70%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588   | 704/1000 [06:26<02:42,  1.82epoch/s, VAE(t)=189.910, VAE(v)=166.365, AUC(t)=[0.98, 0.961, 0.983], AUC(v)=[0.83, 0.797, 0.922]]  \r\nEpoch 705: Task task_3 early stopping triggered.\r\nEarly stopping triggered due to no improvement in both VAE and task losses.\r\n```\r\n\r\n---\r\n\r\n### 3. Make Predictions\r\n\r\n```python\r\n# Make predictions on test data\r\ny_probas = model.predict_proba(X_test)\r\ny_hats = model.predict(X_test)\r\n\r\nauc_scores = model.score(X_test, Y_test)\r\nprint(\"AUC Scores:\", auc_scores)\r\n```\r\n\r\n```\r\nAUC Scores: [0.8314483  0.8053462  0.90158279]\r\n```\r\n\r\n---\r\n\r\n### 4. Transform Features to Latent Space\r\n\r\n```python\r\nlatent_features = model.transform(np.array(X_test))\r\nX_latent = pd.DataFrame(latent_features, index=X_test.index, columns=[f'latent_feature {i+1}' for i in range(10)]) # number of columns should be the same as `latent_dim`\r\n```\r\n\r\n---\r\n\r\n### 5. Reconstruct inputs from latent space\r\n\r\n```python\r\nreconstructed = model.inverse_transform(latent_features)\r\nX_reconstructed = pd.DataFrame(reconstructed, index=X_test.index, columns=X_test.columns)\r\n```\r\n\r\n---\r\n\r\n## Key Features\r\n\r\n### 1. Supervised & Unsupervised Fusion\r\n\r\n* **Unsupervised (VAE)** : Learns a latent space representation by reconstructing input features and regularizing the latent variables using a Kullback-Leibler (KL) divergence term.\r\n* **Supervised (MTL)** : Incorporates label information to shape the latent space, ensuring that the learned features are informative for one or multiple prediction tasks.\r\n\r\n### 2. Multi-Task Learning Integration\r\n\r\n* **Shared Representations** : A single latent space underpins multiple related classification (or other) tasks, leveraging common data structures for efficient, joint learning.\r\n* **Task-Specific Heads** : Independent prediction heads are built atop the shared latent space. This encourages knowledge transfer among tasks and can improve predictive performance on each one.\r\n* **Representation Learning to Mitigate Multicollinearity** : By mapping high-dimensional input features to a low-dimensional latent space, SUAVE effectively reduces linear correlations between features, alleviating multicollinearity issues.\r\n\r\n### 3. Flexible and Customizable Architecture\r\n\r\n* **Configurable Networks** : Easily adjust encoder and decoder depths, widths, and layer scaling strategies (e.g., constant, linear, geometric).\r\n* **Regularization Built-In** : Batch normalization and dropout help stabilize training and mitigate overfitting.\r\n\r\n### 4. Scikit-Learn Compatibility\r\n\r\n* **Seamless Integration** : The `SuaveClassifier` class is compatible with scikit-learn\u2019s pipeline and model selection APIs. Perform hyperparameter tuning with `GridSearchCV` and integrate SUAVE models into complex ML workflows with minimal friction.\r\n\r\n### 5. Comprehensive Training Utilities\r\n\r\n* **Joint Objective Optimization** : Simultaneously optimizes the VAE reconstruction/KL losses and supervised cross-entropy losses.\r\n* **Early Stopping & LR Scheduling** : Monitors validation metrics for early stopping and dynamically adjusts learning rates to ensure stable convergence.\r\n\r\n## Example Use Cases\r\n\r\n- **Supervised Dimensionality Reduction**: Obtain a low-dimensional feature representation that preserves predictive signals for classification tasks.\r\n- **Multi-Task Classification**: Tackle multiple related outcomes (e.g., multiple mortality endpoints) within a unified model and benefit from shared latent factors.\r\n- **Generative Modeling & Data Insight**: Interpolate, generate synthetic samples, and visualize latent structures that capture underlying data patterns and decision boundaries.\r\n\r\n---\r\n\r\n## License\r\n\r\nThis project is licensed under the **BSD 3-Clause License** . See the `LICENSE` file for details.\r\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A deep learning model (hybrid VAE) implementation for label information-guided dimensionality reduction and multi-task learning.",
    "version": "0.1.2a1",
    "project_urls": {
        "Homepage": "https://github.com/xuxu-wei/SUAVE"
    },
    "split_keywords": [
        "vae",
        "supervised-dim-reduction",
        "multi-task-learning",
        "pytorch",
        "sklearn",
        "deep",
        "learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "16e1deef395079786e5029c41d96e3b0c97db2a7e51544fa7db4e6482bf72d2c",
                "md5": "cf3e4d281c9108fbb27188412316a5cf",
                "sha256": "c542b3f6d8eff256c2abbf5fdfc4818bccaa4afef6c83abfc674776ae1bcc392"
            },
            "downloads": -1,
            "filename": "suave_ml-0.1.2a1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cf3e4d281c9108fbb27188412316a5cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 29499,
            "upload_time": "2025-01-09T00:08:56",
            "upload_time_iso_8601": "2025-01-09T00:08:56.723133Z",
            "url": "https://files.pythonhosted.org/packages/16/e1/deef395079786e5029c41d96e3b0c97db2a7e51544fa7db4e6482bf72d2c/suave_ml-0.1.2a1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b9cc44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5",
                "md5": "8937788b1bc7e08a3ad539ef14cef7c5",
                "sha256": "e733ff3279b39c06c2903f5e476a13f6682faa98ed7ce0b514f858d0e8d0ea6e"
            },
            "downloads": -1,
            "filename": "suave_ml-0.1.2a1.tar.gz",
            "has_sig": false,
            "md5_digest": "8937788b1bc7e08a3ad539ef14cef7c5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31775,
            "upload_time": "2025-01-09T00:09:00",
            "upload_time_iso_8601": "2025-01-09T00:09:00.959418Z",
            "url": "https://files.pythonhosted.org/packages/b9/cc/44dce00c800a53192cdd0fc7891b773e8c8d761b5bc16406ed7d177e4de5/suave_ml-0.1.2a1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-09 00:09:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xuxu-wei",
    "github_project": "SUAVE",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.9.3"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        }
    ],
    "lcname": "suave-ml"
}
        
Elapsed time: 0.40680s