ProtFlash

Name	ProtFlash JSON
Version	0.1.1 JSON
	download
home_page	http://github.com/isyslab-hust/ProtFlash
Summary	protein language model
upload_time	2023-07-03 13:35:55
maintainer
docs_url	None
author	Lei Wang
requires_python
license	MIT
keywords	protein language model
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## ProtFlash: A lightweight protein language model
[![PyPI - Version](https://img.shields.io/pypi/v/ProtFlash.svg?style=flat)](https://pypi.org/project/ProtFlash/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ProtFlash.svg)](https://pypi.org/project/ProtFlash/) [![GitHub - LICENSE](https://img.shields.io/github/license/isyslab-hust/ProtFlash.svg?style=flat)](./LICENSE) ![PyPI - Downloads](https://img.shields.io/pypi/dm/ProtFlash) [![Wheel](https://img.shields.io/pypi/wheel/ProtFlash)](https://pypi.org/project/ProtFlash/) ![build](https://img.shields.io/github/actions/workflow/status/isyslab-hust/ProtFlash/publish_to_pypi.yml)

### Install 
As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version
```
# latest version
pip install git+https://github.com/isyslab-hust/ProtFlash

# stable version
pip install ProtFlash
```

## **Model details**
|   **Model**    | **# of parameters** | **# of hidden size** |            **Pretraining dataset**             | **# of proteins** | **Model download** |
|:--------------:|:-------------------:|:----------------------:|:----------------------------------------------:|:-----------------:|:------------------------:|
|    ProtFlash-base    |        174M         |           768           | [UniRef100](https://www.uniprot.org/downloads) |       51M        |      [ProtFlash-base](https://zenodo.org/record/7655858/files/protflash_large.pt)       |
| ProtFlash-small |        79M         |           512           | [UniRef50](https://www.uniprot.org/downloads)  |        51M        |     [ProtFlash-small](https://zenodo.org/record/7655858/files/flash_protein.pt)     |

### Usage

#### protein sequence embedding
```
from ProtFlash.pretrain import load_prot_flash_base
from ProtFlash.utils import batchConverter
data = [
    ("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
    ("protein2", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
]
ids, batch_token, lengths = batchConverter(data)
model = load_prot_flash_base()
with torch.no_grad():
    token_embedding = model(batch_token, lengths)
# Generate per-sequence representations via averaging
sequence_representations = []
for i, (_, seq) in enumerate(data):
    sequence_representations.append(token_embedding[i, 0: len(seq) + 1].mean(0))
```

#### loading weight files
```
import torch
from ProtFlash.model import FLASHTransformer

model_data = torch.load(your_parameters_file)
hyper_parameter = model_data["hyper_parameters"]
model = FLASHTransformer(hyper_parameter['dim'], hyper_parameter['num_tokens'], hyper_parameter         ['num_layers'], group_size=hyper_parameter['num_tokens'],
                             query_key_dim=hyper_parameter['qk_dim'], max_rel_dist=hyper_parameter['max_rel_dist'], expansion_factor=hyper_parameter['expansion_factor'])

model.load_state_dict(model_data['state_dict'])
```

### License
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

### Citation
If you use this code or one of our pretrained models for your publication, please cite our paper:
```
Lei Wang, Hui Zhang, Wei Xu, Zhidong Xue, and Yan Wang. ProtFlash: Deciphering the protein landscape with a novel and lightweight language model, Under revision (2023)
```

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/isyslab-hust/ProtFlash",
    "name": "ProtFlash",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "protein language model",
    "author": "Lei Wang",
    "author_email": "wanglei94@hust.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/8e/f1/d7cbe20c911dea4e93b6711af96775458e1b18fe94fbf5f7632392927c5d/ProtFlash-0.1.1.tar.gz",
    "platform": null,
    "description": "## ProtFlash: A lightweight protein language model\n[![PyPI - Version](https://img.shields.io/pypi/v/ProtFlash.svg?style=flat)](https://pypi.org/project/ProtFlash/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ProtFlash.svg)](https://pypi.org/project/ProtFlash/) [![GitHub - LICENSE](https://img.shields.io/github/license/isyslab-hust/ProtFlash.svg?style=flat)](./LICENSE) ![PyPI - Downloads](https://img.shields.io/pypi/dm/ProtFlash) [![Wheel](https://img.shields.io/pypi/wheel/ProtFlash)](https://pypi.org/project/ProtFlash/) ![build](https://img.shields.io/github/actions/workflow/status/isyslab-hust/ProtFlash/publish_to_pypi.yml)\n\n### Install \nAs a prerequisite, you must have PyTorch installed to use this repository.\n\nYou can use this one-liner for installation, using the latest release version\n```\n# latest version\npip install git+https://github.com/isyslab-hust/ProtFlash\n\n# stable version\npip install ProtFlash\n```\n\n## **Model details**\n|   **Model**    | **# of parameters** | **# of hidden size** |            **Pretraining dataset**             | **# of proteins** | **Model download** |\n|:--------------:|:-------------------:|:----------------------:|:----------------------------------------------:|:-----------------:|:------------------------:|\n|    ProtFlash-base    |        174M         |           768           | [UniRef100](https://www.uniprot.org/downloads) |       51M        |      [ProtFlash-base](https://zenodo.org/record/7655858/files/protflash_large.pt)       |\n| ProtFlash-small |        79M         |           512           | [UniRef50](https://www.uniprot.org/downloads)  |        51M        |     [ProtFlash-small](https://zenodo.org/record/7655858/files/flash_protein.pt)     |\n\n### Usage\n\n#### protein sequence embedding\n```\nfrom ProtFlash.pretrain import load_prot_flash_base\nfrom ProtFlash.utils import batchConverter\ndata = [\n    (\"protein1\", \"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG\"),\n    (\"protein2\", \"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE\"),\n]\nids, batch_token, lengths = batchConverter(data)\nmodel = load_prot_flash_base()\nwith torch.no_grad():\n    token_embedding = model(batch_token, lengths)\n# Generate per-sequence representations via averaging\nsequence_representations = []\nfor i, (_, seq) in enumerate(data):\n    sequence_representations.append(token_embedding[i, 0: len(seq) + 1].mean(0))\n```\n\n#### loading weight files\n```\nimport torch\nfrom ProtFlash.model import FLASHTransformer\n\nmodel_data = torch.load(your_parameters_file)\nhyper_parameter = model_data[\"hyper_parameters\"]\nmodel = FLASHTransformer(hyper_parameter['dim'], hyper_parameter['num_tokens'], hyper_parameter         ['num_layers'], group_size=hyper_parameter['num_tokens'],\n                             query_key_dim=hyper_parameter['qk_dim'], max_rel_dist=hyper_parameter['max_rel_dist'], expansion_factor=hyper_parameter['expansion_factor'])\n\nmodel.load_state_dict(model_data['state_dict'])\n```\n\n### License\nThis source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.\n\n### Citation\nIf you use this code or one of our pretrained models for your publication, please cite our paper:\n```\nLei Wang, Hui Zhang, Wei Xu, Zhidong Xue, and Yan Wang. ProtFlash: Deciphering the protein landscape with a novel and lightweight language model, Under revision (2023)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "protein language model",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "http://github.com/isyslab-hust/ProtFlash"
    },
    "split_keywords": [
        "protein",
        "language",
        "model"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5aad637a378b05fcfec846fb265fa87ceb9aa4edfed7f62cbe8261fe7d1e71e",
                "md5": "2d43095e9233fbe0111b4c25208bea41",
                "sha256": "6dbcf564e0962c6310ef572f2b5902698e570a73c12d457491fb73d95b6bfe49"
            },
            "downloads": -1,
            "filename": "ProtFlash-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d43095e9233fbe0111b4c25208bea41",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7397,
            "upload_time": "2023-07-03T13:35:54",
            "upload_time_iso_8601": "2023-07-03T13:35:54.282590Z",
            "url": "https://files.pythonhosted.org/packages/c5/aa/d637a378b05fcfec846fb265fa87ceb9aa4edfed7f62cbe8261fe7d1e71e/ProtFlash-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ef1d7cbe20c911dea4e93b6711af96775458e1b18fe94fbf5f7632392927c5d",
                "md5": "2cb9eed190b9a3dfaca9cea9d2fd7356",
                "sha256": "de8eb7ab3ceae2858ad6e6ee6746b8be94b235c0f59fa81674e3fd1d68317d22"
            },
            "downloads": -1,
            "filename": "ProtFlash-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2cb9eed190b9a3dfaca9cea9d2fd7356",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6550,
            "upload_time": "2023-07-03T13:35:55",
            "upload_time_iso_8601": "2023-07-03T13:35:55.300963Z",
            "url": "https://files.pythonhosted.org/packages/8e/f1/d7cbe20c911dea4e93b6711af96775458e1b18fe94fbf5f7632392927c5d/ProtFlash-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-03 13:35:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "isyslab-hust",
    "github_project": "ProtFlash",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "protflash"
}

Lei Wang