Name | langvae JSON |
Version |
0.5.2
JSON |
| download |
home_page | None |
Summary | LangVAE: Large Language VAEs made simple |
upload_time | 2024-07-23 10:17:57 |
maintainer | None |
docs_url | None |
author | Danilo S. Carvalho |
requires_python | >=3.9 |
license | None |
keywords |
vae
llm
generative
nlp
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# LangVAE: Large Language VAEs made simple
LangVAE is a Python library for training and running language models using Variational Autoencoders (VAEs). It provides an easy-to-use interface to train VAEs on text data, allowing users to customize the model architecture, loss function, and training parameters.
## Installation
To install LangVAE, simply run:
```bash
pip install langvae
```
This will install all necessary dependencies and set up the package for use in your Python projects.
## Usage
Here's a basic example of how to train a VAE on text data using LangVAE:
```python
from pythae.models.vae import VAEConfig
from langvae import LangVAE
from langvae.encoders import SentenceEncoder
from langvae.decoders import SentenceDecoder
from langvae.data_conversion.tokenization import TokenizedDataSet
from langvae.pipelines import LanguageTrainingPipeline
from langvae.trainers import CyclicalScheduleKLThresholdTrainerConfig
from saf_datasets import EntailmentBankDataSet
DEVICE = "cuda"
LATENT_SIZE = 32
MAX_SENT_LEN = 32
# Load pre-trained sentence encoder and decoder models.
decoder = SentenceDecoder("gpt2", LATENT_SIZE, MAX_SENT_LEN, device=DEVICE)
encoder = SentenceEncoder("bert-base-cased", LATENT_SIZE, decoder.tokenizer, device=DEVICE)
# Select explanatory sentences from the EntailmentBank dataset.
dataset = [
sent for sent in EntailmentBankDataSet()
if (sent.annotations["type"] == "answer" or
sent.annotations["type"].startswith("context"))
]
# Set training and evaluation datasets with auto tokenization.
eval_size = int(0.1 * len(dataset))
train_dataset = TokenizedDataSet(dataset[:-eval_size], decoder.tokenizer, decoder.max_len)
eval_dataset = TokenizedDataSet(dataset[-eval_size:], decoder.tokenizer, decoder.max_len)
# Define VAE model configuration
model_config = VAEConfig(
input_dim=(train_dataset[0]["data"].shape[-2], train_dataset[0]["data"].shape[-1]),
latent_dim=LATENT_SIZE
)
# Initialize LangVAE model
model = LangVAE(model_config, encoder, decoder)
# Train VAE on explanatory sentences
training_config = CyclicalScheduleKLThresholdTrainerConfig(
output_dir='expl_vae',
num_epochs=5,
learning_rate=1e-4,
per_device_train_batch_size=50,
per_device_eval_batch_size=50,
steps_saving=1,
optimizer_cls="AdamW",
scheduler_cls="ReduceLROnPlateau",
scheduler_params={"patience": 5, "factor": 0.5},
max_beta=1.0,
n_cycles=40,
target_kl=2.0
)
pipeline = LanguageTrainingPipeline(
training_config=training_config,
model=model
)
pipeline(
train_data=train_dataset,
eval_data=eval_dataset
)
```
This example loads pre-trained encoder and decoder models, defines a VAE model configuration, initializes the LangVAE model, and trains it on text data using a custom training pipeline.
## License
LangVAE is licensed under the GPLv3 License. See the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "langvae",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "vae, llm, generative, nlp",
"author": "Danilo S. Carvalho",
"author_email": "\"Danilo S. Carvalho\" <danilo.carvalho@manchester.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/7d/f1/850d038f411af26add1d0ae97e09cb3fcba3f586cf80aceba91df406a9ca/langvae-0.5.2.tar.gz",
"platform": null,
"description": "# LangVAE: Large Language VAEs made simple \n\nLangVAE is a Python library for training and running language models using Variational Autoencoders (VAEs). It provides an easy-to-use interface to train VAEs on text data, allowing users to customize the model architecture, loss function, and training parameters.\n\n## Installation\n\nTo install LangVAE, simply run:\n\n```bash\npip install langvae\n```\n\nThis will install all necessary dependencies and set up the package for use in your Python projects.\n\n## Usage\n\nHere's a basic example of how to train a VAE on text data using LangVAE:\n\n```python\nfrom pythae.models.vae import VAEConfig\nfrom langvae import LangVAE\nfrom langvae.encoders import SentenceEncoder\nfrom langvae.decoders import SentenceDecoder\nfrom langvae.data_conversion.tokenization import TokenizedDataSet\nfrom langvae.pipelines import LanguageTrainingPipeline\nfrom langvae.trainers import CyclicalScheduleKLThresholdTrainerConfig\nfrom saf_datasets import EntailmentBankDataSet\n\nDEVICE = \"cuda\"\nLATENT_SIZE = 32\nMAX_SENT_LEN = 32\n\n# Load pre-trained sentence encoder and decoder models.\ndecoder = SentenceDecoder(\"gpt2\", LATENT_SIZE, MAX_SENT_LEN, device=DEVICE)\nencoder = SentenceEncoder(\"bert-base-cased\", LATENT_SIZE, decoder.tokenizer, device=DEVICE)\n\n# Select explanatory sentences from the EntailmentBank dataset.\ndataset = [\n sent for sent in EntailmentBankDataSet()\n if (sent.annotations[\"type\"] == \"answer\" or \n sent.annotations[\"type\"].startswith(\"context\"))\n]\n\n# Set training and evaluation datasets with auto tokenization.\neval_size = int(0.1 * len(dataset))\ntrain_dataset = TokenizedDataSet(dataset[:-eval_size], decoder.tokenizer, decoder.max_len)\neval_dataset = TokenizedDataSet(dataset[-eval_size:], decoder.tokenizer, decoder.max_len)\n\n\n# Define VAE model configuration\nmodel_config = VAEConfig(\n input_dim=(train_dataset[0][\"data\"].shape[-2], train_dataset[0][\"data\"].shape[-1]),\n latent_dim=LATENT_SIZE\n)\n\n# Initialize LangVAE model\nmodel = LangVAE(model_config, encoder, decoder)\n\n# Train VAE on explanatory sentences\ntraining_config = CyclicalScheduleKLThresholdTrainerConfig(\n output_dir='expl_vae',\n num_epochs=5,\n learning_rate=1e-4,\n per_device_train_batch_size=50,\n per_device_eval_batch_size=50,\n steps_saving=1,\n optimizer_cls=\"AdamW\",\n scheduler_cls=\"ReduceLROnPlateau\",\n scheduler_params={\"patience\": 5, \"factor\": 0.5},\n max_beta=1.0,\n n_cycles=40,\n target_kl=2.0\n)\n\npipeline = LanguageTrainingPipeline(\n training_config=training_config,\n model=model\n)\n\npipeline(\n train_data=train_dataset,\n eval_data=eval_dataset\n)\n```\n\nThis example loads pre-trained encoder and decoder models, defines a VAE model configuration, initializes the LangVAE model, and trains it on text data using a custom training pipeline.\n\n\n## License\n\nLangVAE is licensed under the GPLv3 License. See the LICENSE file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "LangVAE: Large Language VAEs made simple",
"version": "0.5.2",
"project_urls": {
"Homepage": "https://github.com/neuro-symbolic-ai/LangVAE",
"Issues": "https://github.com/neuro-symbolic-ai/LangVAE/issues"
},
"split_keywords": [
"vae",
" llm",
" generative",
" nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2628cc37a49cbf99ad8e87a5d26b7025da15e0bc451dd9547e75c1330f4ebb28",
"md5": "c8c6f82bffaac3d4d7e32c8ea7b096b7",
"sha256": "2fb03d447913163cd867248b662527145d31d25bed61313ec285d74bc98d9835"
},
"downloads": -1,
"filename": "langvae-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c8c6f82bffaac3d4d7e32c8ea7b096b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 38828,
"upload_time": "2024-07-23T10:17:56",
"upload_time_iso_8601": "2024-07-23T10:17:56.372222Z",
"url": "https://files.pythonhosted.org/packages/26/28/cc37a49cbf99ad8e87a5d26b7025da15e0bc451dd9547e75c1330f4ebb28/langvae-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7df1850d038f411af26add1d0ae97e09cb3fcba3f586cf80aceba91df406a9ca",
"md5": "f097721b494a033b04e2edfbb769cccd",
"sha256": "5c093d7bd651116029e4a5588f6d87a5f17bbba879e7660b65dbd6c8bca50943"
},
"downloads": -1,
"filename": "langvae-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "f097721b494a033b04e2edfbb769cccd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 30611,
"upload_time": "2024-07-23T10:17:57",
"upload_time_iso_8601": "2024-07-23T10:17:57.898163Z",
"url": "https://files.pythonhosted.org/packages/7d/f1/850d038f411af26add1d0ae97e09cb3fcba3f586cf80aceba91df406a9ca/langvae-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-23 10:17:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neuro-symbolic-ai",
"github_project": "LangVAE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "langvae"
}