mambatransformer


Namemambatransformer JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/MambaTransformer
SummaryMambaTransformer - Pytorch
upload_time2024-01-13 18:16:33
maintainer
docs_urlNone
authorKye Gomez
requires_python>=3.6,<4.0
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Mamba Transformer

![Mamba Transformer](/mm_transformer.png)

Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling.

This is 100% novel architecture that I have designed to combine the strengths and weaknesses out of SSMs and Attention for an all-new advanced architecture with the purpose of surpassing our old limits. Faster processing speed, longer context lengths, lower perplexity over long sequences, enhanced and superior reasoning while remaining small and compact.

The architecture is essentially: `x -> norm -> mamba -> norm -> transformer -> norm -> ffn -> norm -> out`.

I added in many normalizations as I believe by default training stability would be severly degraded due to 2 foreign architecture's integrating with one another.


## Install
`pip3 install mambatransformer`


### Usage
```python
import torch
from mamba_transformer import MambaTransformer

# Generate a random tensor of shape (1, 10) with values between 0 and 99
x = torch.randint(0, 100, (1, 10))

# Create an instance of the MambaTransformer model
model = MambaTransformer(
    num_tokens=100,  # Number of tokens in the input sequence
    dim=512,  # Dimension of the model
    heads=8,  # Number of attention heads
    depth=4,  # Number of transformer layers
    dim_head=64,  # Dimension of each attention head
    d_state=512,  # Dimension of the state
    dropout=0.1,  # Dropout rate
    ff_mult=4,  # Multiplier for the feed-forward layer dimension
    return_embeddings=False,  # Whether to return the embeddings,
    transformer_depth=2,  # Number of transformer blocks
    mamba_depth=10,  # Number of Mamba blocks,
    use_linear_attn=True,  # Whether to use linear attention
)

# Pass the input tensor through the model and print the output shape
out = model(x)

print(out.shape)


# to train
model.eval()

# Would you like to train this model? Zeta Corporation offers unmatchable GPU clusters at unbeatable prices, let's partner!

# Tokenizer
model.generate(text)


```

# License
MIT




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/MambaTransformer",
    "name": "mambatransformer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/c8/99/92c39da8c4038b2ebb78628ae821c3f1f63a654bf6854453fa2e924a207e/mambatransformer-0.0.4.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Mamba Transformer\n\n![Mamba Transformer](/mm_transformer.png)\n\nIntegrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling.\n\nThis is 100% novel architecture that I have designed to combine the strengths and weaknesses out of SSMs and Attention for an all-new advanced architecture with the purpose of surpassing our old limits. Faster processing speed, longer context lengths, lower perplexity over long sequences, enhanced and superior reasoning while remaining small and compact.\n\nThe architecture is essentially: `x -> norm -> mamba -> norm -> transformer -> norm -> ffn -> norm -> out`.\n\nI added in many normalizations as I believe by default training stability would be severly degraded due to 2 foreign architecture's integrating with one another.\n\n\n## Install\n`pip3 install mambatransformer`\n\n\n### Usage\n```python\nimport torch\nfrom mamba_transformer import MambaTransformer\n\n# Generate a random tensor of shape (1, 10) with values between 0 and 99\nx = torch.randint(0, 100, (1, 10))\n\n# Create an instance of the MambaTransformer model\nmodel = MambaTransformer(\n    num_tokens=100,  # Number of tokens in the input sequence\n    dim=512,  # Dimension of the model\n    heads=8,  # Number of attention heads\n    depth=4,  # Number of transformer layers\n    dim_head=64,  # Dimension of each attention head\n    d_state=512,  # Dimension of the state\n    dropout=0.1,  # Dropout rate\n    ff_mult=4,  # Multiplier for the feed-forward layer dimension\n    return_embeddings=False,  # Whether to return the embeddings,\n    transformer_depth=2,  # Number of transformer blocks\n    mamba_depth=10,  # Number of Mamba blocks,\n    use_linear_attn=True,  # Whether to use linear attention\n)\n\n# Pass the input tensor through the model and print the output shape\nout = model(x)\n\nprint(out.shape)\n\n\n# to train\nmodel.eval()\n\n# Would you like to train this model? Zeta Corporation offers unmatchable GPU clusters at unbeatable prices, let's partner!\n\n# Tokenizer\nmodel.generate(text)\n\n\n```\n\n# License\nMIT\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MambaTransformer - Pytorch",
    "version": "0.0.4",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/MambaTransformer",
        "Homepage": "https://github.com/kyegomez/MambaTransformer",
        "Repository": "https://github.com/kyegomez/MambaTransformer"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc24fd3afd92f98e8c2f084ef00e70a4bdc2c3bb5467d3578d4f76e76396096a",
                "md5": "8544846ff9a4958529f9ca98de2623c5",
                "sha256": "83067b44708ab0f56fd6a02e03acde2df87f929061a7e022160e2c8466f3c673"
            },
            "downloads": -1,
            "filename": "mambatransformer-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8544846ff9a4958529f9ca98de2623c5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 6088,
            "upload_time": "2024-01-13T18:16:32",
            "upload_time_iso_8601": "2024-01-13T18:16:32.745135Z",
            "url": "https://files.pythonhosted.org/packages/fc/24/fd3afd92f98e8c2f084ef00e70a4bdc2c3bb5467d3578d4f76e76396096a/mambatransformer-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c89992c39da8c4038b2ebb78628ae821c3f1f63a654bf6854453fa2e924a207e",
                "md5": "9b29ce77860fe92ccd09e86dda0187ae",
                "sha256": "0e184403a7b76210f5b1352244a7e907d2a4adf950ce63e72bba4660a5300a30"
            },
            "downloads": -1,
            "filename": "mambatransformer-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "9b29ce77860fe92ccd09e86dda0187ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 5840,
            "upload_time": "2024-01-13T18:16:33",
            "upload_time_iso_8601": "2024-01-13T18:16:33.859770Z",
            "url": "https://files.pythonhosted.org/packages/c8/99/92c39da8c4038b2ebb78628ae821c3f1f63a654bf6854453fa2e924a207e/mambatransformer-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-13 18:16:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "MambaTransformer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "mambatransformer"
}
        
Elapsed time: 0.19405s