transformer-lm-gan

Name	transformer-lm-gan JSON
Version	0.0.3 JSON
	download
home_page	None
Summary	Explorations into Transformer Language Model with Adversarial Loss
upload_time	2025-02-22 17:32:27
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License Copyright (c) 2025 Phil Wang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	adversarial training artificial intelligence autoregressive transformer deep learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
## Language model with adversarial loss

Explorations into adversarial losses on top of autoregressive loss for language modeling

I have tried this in the past, when GANs were still dominant. But at the time I was either too inexperienced or the research not there. Either way could not get it working. Will give it another shot in the next few weeks, mainly to see if an adversarial system could benefit [world modeling](https://github.com/lucidrains/improving-transformers-world-model-for-rl)

## Usage

```python
import torch

from transformer_lm_gan import (
    LanguageModelGenerator,
    Discriminator,
    GAN,
)

gan = GAN(
    strategy = 'gumbel_one_hot', # or 'rotate' for rotation trick, may try combination of two if both fails in experiments
    generator = dict(
        num_tokens = 256,
        dim = 512,
        depth = 6,
        dim_head = 64,
        heads = 8,
        max_seq_len = 1024
    ),
    discriminator = dict(
        num_tokens = 256,
        dim = 512,
        depth = 2,
        dim_head = 64,
        heads = 9,
        max_seq_len = 1024
    )
).cuda()

seq = torch.randint(0, 256, (2, 1024)).cuda()

discr_loss = gan.discriminate_forward(seq)
discr_loss.backward()

gen_loss = gan.generate_forward(seq)
gen_loss.backward()
```

## Citations

```bibtex
@inproceedings{Huang2025TheGI,
    title   = {The GAN is dead; long live the GAN! A Modern GAN Baseline},
    author  = {Yiwen Huang and Aaron Gokaslan and Volodymyr Kuleshov and James Tompkin},
    year    = {2025},
    url     = {https://api.semanticscholar.org/CorpusID:275405495}
}
```

```bibtex
@article{Fifty2024Restructuring,
    title   = {Restructuring Vector Quantization with the Rotation Trick},
    author  = {Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iyengar, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2410.06424},
    url     = {https://api.semanticscholar.org/CorpusID:273229218}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "transformer-lm-gan",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "adversarial training, artificial intelligence, autoregressive transformer, deep learning",
    "author": null,
    "author_email": "Phil Wang <lucidrains@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d6/59/892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e/transformer_lm_gan-0.0.3.tar.gz",
    "platform": null,
    "description": "\n## Language model with adversarial loss\n\nExplorations into adversarial losses on top of autoregressive loss for language modeling\n\nI have tried this in the past, when GANs were still dominant. But at the time I was either too inexperienced or the research not there. Either way could not get it working. Will give it another shot in the next few weeks, mainly to see if an adversarial system could benefit [world modeling](https://github.com/lucidrains/improving-transformers-world-model-for-rl)\n\n## Usage\n\n```python\nimport torch\n\nfrom transformer_lm_gan import (\n    LanguageModelGenerator,\n    Discriminator,\n    GAN,\n)\n\ngan = GAN(\n    strategy = 'gumbel_one_hot', # or 'rotate' for rotation trick, may try combination of two if both fails in experiments\n    generator = dict(\n        num_tokens = 256,\n        dim = 512,\n        depth = 6,\n        dim_head = 64,\n        heads = 8,\n        max_seq_len = 1024\n    ),\n    discriminator = dict(\n        num_tokens = 256,\n        dim = 512,\n        depth = 2,\n        dim_head = 64,\n        heads = 9,\n        max_seq_len = 1024\n    )\n).cuda()\n\nseq = torch.randint(0, 256, (2, 1024)).cuda()\n\ndiscr_loss = gan.discriminate_forward(seq)\ndiscr_loss.backward()\n\ngen_loss = gan.generate_forward(seq)\ngen_loss.backward()\n```\n\n## Citations\n\n```bibtex\n@inproceedings{Huang2025TheGI,\n    title   = {The GAN is dead; long live the GAN! A Modern GAN Baseline},\n    author  = {Yiwen Huang and Aaron Gokaslan and Volodymyr Kuleshov and James Tompkin},\n    year    = {2025},\n    url     = {https://api.semanticscholar.org/CorpusID:275405495}\n}\n```\n\n```bibtex\n@article{Fifty2024Restructuring,\n    title   = {Restructuring Vector Quantization with the Rotation Trick},\n    author  = {Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iyengar, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher R\u00e9},\n    journal = {ArXiv},\n    year    = {2024},\n    volume  = {abs/2410.06424},\n    url     = {https://api.semanticscholar.org/CorpusID:273229218}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Phil Wang\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "Explorations into Transformer Language Model with Adversarial Loss",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://pypi.org/project/transformer-lm-gan/",
        "Repository": "https://github.com/lucidrains/transformer-lm-gan"
    },
    "split_keywords": [
        "adversarial training",
        " artificial intelligence",
        " autoregressive transformer",
        " deep learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4a00d32bd0122c64dc34b134d0205b2eb84a11b3d79da7f69fbccd98e6ca60f1",
                "md5": "e536d2f6dbfd6910f651804383787e11",
                "sha256": "0bba4544f3b6965e9ba17ed000388ace30970abaf86c603d706c161c606fb961"
            },
            "downloads": -1,
            "filename": "transformer_lm_gan-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e536d2f6dbfd6910f651804383787e11",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 7097,
            "upload_time": "2025-02-22T17:32:24",
            "upload_time_iso_8601": "2025-02-22T17:32:24.094907Z",
            "url": "https://files.pythonhosted.org/packages/4a/00/d32bd0122c64dc34b134d0205b2eb84a11b3d79da7f69fbccd98e6ca60f1/transformer_lm_gan-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d659892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e",
                "md5": "00a2034e8227f08b4add93fc00b66dd2",
                "sha256": "66b4f327925c964882c0dc39b6e5fb9bf62c3b2e0a77f7e47efb126e1f090992"
            },
            "downloads": -1,
            "filename": "transformer_lm_gan-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "00a2034e8227f08b4add93fc00b66dd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 36569174,
            "upload_time": "2025-02-22T17:32:27",
            "upload_time_iso_8601": "2025-02-22T17:32:27.328202Z",
            "url": "https://files.pythonhosted.org/packages/d6/59/892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e/transformer_lm_gan-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-22 17:32:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucidrains",
    "github_project": "transformer-lm-gan",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "transformer-lm-gan"
}

None