cnngpt


Namecnngpt JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/CNNGPT
SummaryCNNGPT - CNNGPT
upload_time2024-09-18 05:59:02
maintainerNone
docs_urlNone
authorKye Gomez
requires_python<4.0,>=3.10
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements torch zetascale swarms
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503)

# CNN-Based Language Model

[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)



## Detailed Explanation of Each Step

### Initialization Parameters

- **`vocab_size`**: The size of the vocabulary (number of unique tokens).
- **`embedding_dim`**: The dimension of the embeddings.
- **`num_layers`**: The number of convolutional layers.
- **`kernel_size`**: The size of the convolutional kernels.
- **`hidden_dim`**: The dimension of the hidden representations (should match `embedding_dim` for residual connections).
- **`max_seq_len`**: The maximum sequence length the model can handle.

### Embedding and Positional Encoding

- **Embeddings**: Convert token IDs to dense vectors.
- **Positional Encoding**: Adds a learnable positional embedding to each token embedding.

### Convolutional Blocks

- **Causal Convolution**: Uses padding on the left to ensure that the convolution at time `t` does not depend on future time steps.
- **Dilation**: Expands the receptive field exponentially, allowing the model to capture long-term dependencies.
- **GLU Activation**: Introduces a gating mechanism that can control the flow of information.
  - The output of the convolution is split into two halves along the channel dimension.
  - One half is passed through a sigmoid function to act as a gate for the other half.
- **Layer Normalization**: Normalizes the outputs to improve training stability.
- **Residual Connections**: Adds the input to the output to facilitate training deeper networks.

### Output Layer

- **Projection**: Maps the final hidden states to the vocabulary space to produce logits for each token.

## Handling Tensor Sizes

Throughout the network, we carefully manage tensor shapes to maintain consistency:

- After embedding and positional encoding: `[batch_size, seq_len, embedding_dim]`
- Before convolution: Transposed to `[batch_size, embedding_dim, seq_len]`
- After convolution and GLU: `[batch_size, hidden_dim, seq_len]`
- After layer normalization and residual connection: Same shape as input to convolution for residual addition.
- Before output layer: Transposed back to `[batch_size, seq_len, embedding_dim]`
- Output logits: `[batch_size, seq_len, vocab_size]`

## Important Notes

- **Causality**: By appropriately padding and slicing the convolution outputs, we ensure that the model does not use future information when predicting the current time step.
- **Residual Connections**: The `embedding_dim` and `hidden_dim` must be equal to correctly add the residual connection.
- **Layer Normalization**: Applied over the feature dimension; we transpose the tensor to `[batch_size, seq_len, hidden_dim]` before applying `LayerNorm`.
- **GLU Activation Function**: The gating mechanism enhances the model's capacity to model complex patterns.
- **Flexibility**: The model can handle sequences shorter than `max_seq_len`; positional encodings are sliced accordingly.

## Conclusion

We have successfully translated the detailed algorithm into a PyTorch implementation, carefully following each step and ensuring that the code aligns with the design principles outlined earlier. This CNN-based language model leverages causal and dilated convolutions, gated activations, residual connections, and layer normalization to effectively model textual data for generation tasks.

By understanding each component and its role in the model, we can appreciate how this architecture captures both local and global dependencies in language, offering a powerful alternative to traditional models in natural language processing.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/CNNGPT",
    "name": "cnngpt",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "artificial intelligence, deep learning, optimizers, Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/2b/f8/75812ad0ae13201e28f9c72a4ca2654f3fde82a4ac51d9f6113688acd7e6/cnngpt-0.0.1.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.com/servers/agora-999382051935506503)\n\n# CNN-Based Language Model\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)\n\n\n\n## Detailed Explanation of Each Step\n\n### Initialization Parameters\n\n- **`vocab_size`**: The size of the vocabulary (number of unique tokens).\n- **`embedding_dim`**: The dimension of the embeddings.\n- **`num_layers`**: The number of convolutional layers.\n- **`kernel_size`**: The size of the convolutional kernels.\n- **`hidden_dim`**: The dimension of the hidden representations (should match `embedding_dim` for residual connections).\n- **`max_seq_len`**: The maximum sequence length the model can handle.\n\n### Embedding and Positional Encoding\n\n- **Embeddings**: Convert token IDs to dense vectors.\n- **Positional Encoding**: Adds a learnable positional embedding to each token embedding.\n\n### Convolutional Blocks\n\n- **Causal Convolution**: Uses padding on the left to ensure that the convolution at time `t` does not depend on future time steps.\n- **Dilation**: Expands the receptive field exponentially, allowing the model to capture long-term dependencies.\n- **GLU Activation**: Introduces a gating mechanism that can control the flow of information.\n  - The output of the convolution is split into two halves along the channel dimension.\n  - One half is passed through a sigmoid function to act as a gate for the other half.\n- **Layer Normalization**: Normalizes the outputs to improve training stability.\n- **Residual Connections**: Adds the input to the output to facilitate training deeper networks.\n\n### Output Layer\n\n- **Projection**: Maps the final hidden states to the vocabulary space to produce logits for each token.\n\n## Handling Tensor Sizes\n\nThroughout the network, we carefully manage tensor shapes to maintain consistency:\n\n- After embedding and positional encoding: `[batch_size, seq_len, embedding_dim]`\n- Before convolution: Transposed to `[batch_size, embedding_dim, seq_len]`\n- After convolution and GLU: `[batch_size, hidden_dim, seq_len]`\n- After layer normalization and residual connection: Same shape as input to convolution for residual addition.\n- Before output layer: Transposed back to `[batch_size, seq_len, embedding_dim]`\n- Output logits: `[batch_size, seq_len, vocab_size]`\n\n## Important Notes\n\n- **Causality**: By appropriately padding and slicing the convolution outputs, we ensure that the model does not use future information when predicting the current time step.\n- **Residual Connections**: The `embedding_dim` and `hidden_dim` must be equal to correctly add the residual connection.\n- **Layer Normalization**: Applied over the feature dimension; we transpose the tensor to `[batch_size, seq_len, hidden_dim]` before applying `LayerNorm`.\n- **GLU Activation Function**: The gating mechanism enhances the model's capacity to model complex patterns.\n- **Flexibility**: The model can handle sequences shorter than `max_seq_len`; positional encodings are sliced accordingly.\n\n## Conclusion\n\nWe have successfully translated the detailed algorithm into a PyTorch implementation, carefully following each step and ensuring that the code aligns with the design principles outlined earlier. This CNN-based language model leverages causal and dilated convolutions, gated activations, residual connections, and layer normalization to effectively model textual data for generation tasks.\n\nBy understanding each component and its role in the model, we can appreciate how this architecture captures both local and global dependencies in language, offering a powerful alternative to traditional models in natural language processing.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CNNGPT - CNNGPT",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/CNNGPT",
        "Homepage": "https://github.com/kyegomez/CNNGPT",
        "Repository": "https://github.com/kyegomez/CNNGPT"
    },
    "split_keywords": [
        "artificial intelligence",
        " deep learning",
        " optimizers",
        " prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbd7b8347fa4e97bb0945ddb2834b351e2d83a17d6b5bbeaade6305b25e3522d",
                "md5": "18b91ec32dca668ae7a1ea64d1042fdc",
                "sha256": "ccdbe248ca45e12a9caa5ca4282ab58303393d29abd04f77463be88974559d05"
            },
            "downloads": -1,
            "filename": "cnngpt-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "18b91ec32dca668ae7a1ea64d1042fdc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 5399,
            "upload_time": "2024-09-18T05:59:00",
            "upload_time_iso_8601": "2024-09-18T05:59:00.531823Z",
            "url": "https://files.pythonhosted.org/packages/bb/d7/b8347fa4e97bb0945ddb2834b351e2d83a17d6b5bbeaade6305b25e3522d/cnngpt-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2bf875812ad0ae13201e28f9c72a4ca2654f3fde82a4ac51d9f6113688acd7e6",
                "md5": "eb55af3ff91c12852982901279096670",
                "sha256": "e47046095da9740ce6a660749c9d746dbc3acc97dda1008e3fca833a89cebae7"
            },
            "downloads": -1,
            "filename": "cnngpt-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "eb55af3ff91c12852982901279096670",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 5113,
            "upload_time": "2024-09-18T05:59:02",
            "upload_time_iso_8601": "2024-09-18T05:59:02.421938Z",
            "url": "https://files.pythonhosted.org/packages/2b/f8/75812ad0ae13201e28f9c72a4ca2654f3fde82a4ac51d9f6113688acd7e6/cnngpt-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-18 05:59:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "CNNGPT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "zetascale",
            "specs": []
        },
        {
            "name": "swarms",
            "specs": []
        }
    ],
    "lcname": "cnngpt"
}
        
Elapsed time: 0.34007s