qwen

Name	qwen JSON
Version	0.1.1 JSON
	download
home_page	https://github.com/kyegomez/Qwen-VL
Summary	Qwen VL - Pytorch
upload_time	2024-01-29 18:49:16
maintainer
docs_url	None
author	Kye Gomez
requires_python	>=3.6,<4.0
license	MIT
keywords	artificial intelligence attention mechanism transformers
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)


# Qwen-VL
My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo... 
For more details, please refer to the [full paper](https://doi.org/10.48550/arXiv.2308.12966).


# Install
`pip3 install qwen`

---

# Usage
```python

# Importing the necessary libraries
import torch
from qwen import Qwen

# Creating an instance of the Qwen model
model = Qwen()

# Generating random text and image tensors
text = torch.randint(0, 20000, (1, 1024))
img = torch.randn(1, 3, 256, 256)

# Passing the image and text tensors through the model
out = model(img, text)  # (1, 1024, 20000)

```

# Todo

- [ ] Position aware vision language adapter, compresses image features. Singer layer cross attention module inited randomly => group of trainable embeddings as query vectors + image features from the visual encoder as keys for cross attention ops => OUTPUT: compresses visual feature sequence to a fixed lnegth of 256, 2d absolute positional encodings are integrated into the cross attentions mechanisms query key pairs => compressed feature sequence of length of 256 => fed into decoder llm

- [ ] Bounding Boxes, for any given accurate bounding box, a norm process is applied in the range [0, 1000] and transformed into a string format (Xtope, Ytople)(Xottomright, Ybottomright) -> the string is tokenized as text and does not require positional vocabulary. Detection strings and regular text strings, two special tokens <box> and </box> are added to the beginning and end of the bounding box string. + another sed of special tokens (<ref> and </ref>) is introduced.

# Citations

Please use the following to cite this work:

```bibtex
@article{bai2023qwen,
  title={Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities},
  author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
  journal={arXiv preprint arXiv:2308.12966},
  year={2023},
  url={https://doi.org/10.48550/arXiv.2308.12966}
}

```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/Qwen-VL",
    "name": "qwen",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,attention mechanism,transformers",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/55/ec/182ead9028328d988eb8f55b1da46d0e90789cfaa733e6cacae0d6c671dc/qwen-0.1.1.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n\n# Qwen-VL\nMy personal implementation of the model from \"Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities\", they haven't released model code yet sooo... \nFor more details, please refer to the\u00a0[full paper](https://doi.org/10.48550/arXiv.2308.12966).\n\n\n# Install\n`pip3 install qwen`\n\n---\n\n# Usage\n```python\n\n# Importing the necessary libraries\nimport torch\nfrom qwen import Qwen\n\n# Creating an instance of the Qwen model\nmodel = Qwen()\n\n# Generating random text and image tensors\ntext = torch.randint(0, 20000, (1, 1024))\nimg = torch.randn(1, 3, 256, 256)\n\n# Passing the image and text tensors through the model\nout = model(img, text)  # (1, 1024, 20000)\n\n```\n\n# Todo\n\n- [ ] Position aware vision language adapter, compresses image features. Singer layer cross attention module inited randomly => group of trainable embeddings as query vectors + image features from the visual encoder as keys for cross attention ops => OUTPUT: compresses visual feature sequence to a fixed lnegth of 256, 2d absolute positional encodings are integrated into the cross attentions mechanisms query key pairs => compressed feature sequence of length of 256 => fed into decoder llm\n\n- [ ] Bounding Boxes, for any given accurate bounding box, a norm process is applied in the range [0, 1000] and transformed into a string format (Xtope, Ytople)(Xottomright, Ybottomright) -> the string is tokenized as text and does not require positional vocabulary. Detection strings and regular text strings, two special tokens <box> and </box> are added to the beginning and end of the bounding box string. + another sed of special tokens (<ref> and </ref>) is introduced.\n\n# Citations\n\nPlease use the following to cite this work:\n\n```bibtex\n@article{bai2023qwen,\n  title={Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities},\n  author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},\n  journal={arXiv preprint arXiv:2308.12966},\n  year={2023},\n  url={https://doi.org/10.48550/arXiv.2308.12966}\n}\n\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Qwen VL - Pytorch",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/kyegomez/Qwen-VL"
    },
    "split_keywords": [
        "artificial intelligence",
        "attention mechanism",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c2ad74d014e77c54a5221a67167184a233b18936cb9fb24ea58e0562ec781aea",
                "md5": "b4ea28885a28d24779956b2323c1b5eb",
                "sha256": "5c18e1e895195079ea7be7ee332c6eb2159a3dfddef2b47ef56daee5bd104d6c"
            },
            "downloads": -1,
            "filename": "qwen-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b4ea28885a28d24779956b2323c1b5eb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 4263,
            "upload_time": "2024-01-29T18:49:15",
            "upload_time_iso_8601": "2024-01-29T18:49:15.435058Z",
            "url": "https://files.pythonhosted.org/packages/c2/ad/74d014e77c54a5221a67167184a233b18936cb9fb24ea58e0562ec781aea/qwen-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55ec182ead9028328d988eb8f55b1da46d0e90789cfaa733e6cacae0d6c671dc",
                "md5": "288143190cff778089d83febc14f526e",
                "sha256": "3aa2d2afd1c2842909f2e59ffce16a53fb6c02ba0993633d128dee17905c6afe"
            },
            "downloads": -1,
            "filename": "qwen-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "288143190cff778089d83febc14f526e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 4410,
            "upload_time": "2024-01-29T18:49:16",
            "upload_time_iso_8601": "2024-01-29T18:49:16.574003Z",
            "url": "https://files.pythonhosted.org/packages/55/ec/182ead9028328d988eb8f55b1da46d0e90789cfaa733e6cacae0d6c671dc/qwen-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-29 18:49:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "Qwen-VL",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "qwen"
}

Kye Gomez