transformer-lm-gan


Nametransformer-lm-gan JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryExplorations into Transformer Language Model with Adversarial Loss
upload_time2025-02-22 17:32:27
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2025 Phil Wang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords adversarial training artificial intelligence autoregressive transformer deep learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
## Language model with adversarial loss

Explorations into adversarial losses on top of autoregressive loss for language modeling

I have tried this in the past, when GANs were still dominant. But at the time I was either too inexperienced or the research not there. Either way could not get it working. Will give it another shot in the next few weeks, mainly to see if an adversarial system could benefit [world modeling](https://github.com/lucidrains/improving-transformers-world-model-for-rl)

## Usage

```python
import torch

from transformer_lm_gan import (
    LanguageModelGenerator,
    Discriminator,
    GAN,
)

gan = GAN(
    strategy = 'gumbel_one_hot', # or 'rotate' for rotation trick, may try combination of two if both fails in experiments
    generator = dict(
        num_tokens = 256,
        dim = 512,
        depth = 6,
        dim_head = 64,
        heads = 8,
        max_seq_len = 1024
    ),
    discriminator = dict(
        num_tokens = 256,
        dim = 512,
        depth = 2,
        dim_head = 64,
        heads = 9,
        max_seq_len = 1024
    )
).cuda()

seq = torch.randint(0, 256, (2, 1024)).cuda()

discr_loss = gan.discriminate_forward(seq)
discr_loss.backward()

gen_loss = gan.generate_forward(seq)
gen_loss.backward()
```

## Citations

```bibtex
@inproceedings{Huang2025TheGI,
    title   = {The GAN is dead; long live the GAN! A Modern GAN Baseline},
    author  = {Yiwen Huang and Aaron Gokaslan and Volodymyr Kuleshov and James Tompkin},
    year    = {2025},
    url     = {https://api.semanticscholar.org/CorpusID:275405495}
}
```

```bibtex
@article{Fifty2024Restructuring,
    title   = {Restructuring Vector Quantization with the Rotation Trick},
    author  = {Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iyengar, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher RĂ©},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2410.06424},
    url     = {https://api.semanticscholar.org/CorpusID:273229218}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "transformer-lm-gan",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "adversarial training, artificial intelligence, autoregressive transformer, deep learning",
    "author": null,
    "author_email": "Phil Wang <lucidrains@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d6/59/892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e/transformer_lm_gan-0.0.3.tar.gz",
    "platform": null,
    "description": "\n## Language model with adversarial loss\n\nExplorations into adversarial losses on top of autoregressive loss for language modeling\n\nI have tried this in the past, when GANs were still dominant. But at the time I was either too inexperienced or the research not there. Either way could not get it working. Will give it another shot in the next few weeks, mainly to see if an adversarial system could benefit [world modeling](https://github.com/lucidrains/improving-transformers-world-model-for-rl)\n\n## Usage\n\n```python\nimport torch\n\nfrom transformer_lm_gan import (\n    LanguageModelGenerator,\n    Discriminator,\n    GAN,\n)\n\ngan = GAN(\n    strategy = 'gumbel_one_hot', # or 'rotate' for rotation trick, may try combination of two if both fails in experiments\n    generator = dict(\n        num_tokens = 256,\n        dim = 512,\n        depth = 6,\n        dim_head = 64,\n        heads = 8,\n        max_seq_len = 1024\n    ),\n    discriminator = dict(\n        num_tokens = 256,\n        dim = 512,\n        depth = 2,\n        dim_head = 64,\n        heads = 9,\n        max_seq_len = 1024\n    )\n).cuda()\n\nseq = torch.randint(0, 256, (2, 1024)).cuda()\n\ndiscr_loss = gan.discriminate_forward(seq)\ndiscr_loss.backward()\n\ngen_loss = gan.generate_forward(seq)\ngen_loss.backward()\n```\n\n## Citations\n\n```bibtex\n@inproceedings{Huang2025TheGI,\n    title   = {The GAN is dead; long live the GAN! A Modern GAN Baseline},\n    author  = {Yiwen Huang and Aaron Gokaslan and Volodymyr Kuleshov and James Tompkin},\n    year    = {2025},\n    url     = {https://api.semanticscholar.org/CorpusID:275405495}\n}\n```\n\n```bibtex\n@article{Fifty2024Restructuring,\n    title   = {Restructuring Vector Quantization with the Rotation Trick},\n    author  = {Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iyengar, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher R\u00e9},\n    journal = {ArXiv},\n    year    = {2024},\n    volume  = {abs/2410.06424},\n    url     = {https://api.semanticscholar.org/CorpusID:273229218}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Phil Wang\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "Explorations into Transformer Language Model with Adversarial Loss",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://pypi.org/project/transformer-lm-gan/",
        "Repository": "https://github.com/lucidrains/transformer-lm-gan"
    },
    "split_keywords": [
        "adversarial training",
        " artificial intelligence",
        " autoregressive transformer",
        " deep learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4a00d32bd0122c64dc34b134d0205b2eb84a11b3d79da7f69fbccd98e6ca60f1",
                "md5": "e536d2f6dbfd6910f651804383787e11",
                "sha256": "0bba4544f3b6965e9ba17ed000388ace30970abaf86c603d706c161c606fb961"
            },
            "downloads": -1,
            "filename": "transformer_lm_gan-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e536d2f6dbfd6910f651804383787e11",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 7097,
            "upload_time": "2025-02-22T17:32:24",
            "upload_time_iso_8601": "2025-02-22T17:32:24.094907Z",
            "url": "https://files.pythonhosted.org/packages/4a/00/d32bd0122c64dc34b134d0205b2eb84a11b3d79da7f69fbccd98e6ca60f1/transformer_lm_gan-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d659892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e",
                "md5": "00a2034e8227f08b4add93fc00b66dd2",
                "sha256": "66b4f327925c964882c0dc39b6e5fb9bf62c3b2e0a77f7e47efb126e1f090992"
            },
            "downloads": -1,
            "filename": "transformer_lm_gan-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "00a2034e8227f08b4add93fc00b66dd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 36569174,
            "upload_time": "2025-02-22T17:32:27",
            "upload_time_iso_8601": "2025-02-22T17:32:27.328202Z",
            "url": "https://files.pythonhosted.org/packages/d6/59/892a7a4e136895ec33698478bb029d5e26e284bc1f929d0b4a366c83357e/transformer_lm_gan-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-22 17:32:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucidrains",
    "github_project": "transformer-lm-gan",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "transformer-lm-gan"
}
        
Elapsed time: 0.43795s