ncRNABert


NamencRNABert JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/wangleiofficial/ncRNABert
SummaryncRNA language model
upload_time2023-12-24 02:13:51
maintainer
docs_urlNone
authorLei Wang
requires_python
licenseMIT
keywords ncrna language model
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## ncRNABert: Deciphering the landscape of non-coding RNA using language model

[![PyPI - Version](https://img.shields.io/pypi/v/ncRNABert.svg?style=flat)](https://pypi.org/project/ncRNABert/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ncRNABert.svg)](https://pypi.org/project/ncRNABert/) [![GitHub - LICENSE](https://img.shields.io/github/license/wangleiofficial/ncRNABert.svg?style=flat)](./LICENSE) ![PyPI - Downloads](https://img.shields.io/pypi/dm/ncRNABert) [![Wheel](https://img.shields.io/pypi/wheel/ncRNABert)](https://pypi.org/project/ncRNABert/) ![build](https://img.shields.io/github/actions/workflow/status/wangleiofficial/ncRNABert/publish_to_pypi.yml)

### Model details
|   **Model**    | **# of parameters** | **# of hidden size** |            **Pretraining dataset**             | **# of ncRNAs** | **Model download** |
|:--------------:|:-------------------:|:----------------------:|:----------------------------------------------:|:-----------------:|:------------------------:|
|    ncRNABert   |        303M         |           1024           | [RNAcentral](http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/rnacentral_active.fasta.gz) |       26M        |      [Download](https://zenodo.org/record/8263889/files/ncRNABert.pt)       |
|    ncRNABert   |        303M         |           1024           | [RNAcentral](http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/rnacentral_active.fasta.gz) + nt |       -       |      [Download](https://zenodo.org/records/10421246/files/ncRNABert_nt_rnacentral_3kmer.pt)       |

### Install
As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version

```
# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert

# stable version
pip install ncRNABert

```

### Usage

#### ncRNA sequence embedding

```
from ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex
from ncRNABert.utils import BatchConverter
import torch

data = [
    ("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
    ("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]

ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
model_ex = load_ncRNABert_ex()
with torch.no_grad():
    results = model(batch_token, lengths, repr_layers=[24])
    results_ex = model_ex(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
token_representations_ex = results_ex["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
    sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))
    sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))
```

### License
This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wangleiofficial/ncRNABert",
    "name": "ncRNABert",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ncRNA language model",
    "author": "Lei Wang",
    "author_email": "wanglei@isyslab.org",
    "download_url": "https://files.pythonhosted.org/packages/b8/d8/ab03d223cdc63ca6a70ea931cbf93c5af922f339a980f70b0407e1d509bd/ncRNABert-0.1.3.tar.gz",
    "platform": null,
    "description": "## ncRNABert: Deciphering the landscape of non-coding RNA using language model\n\n[![PyPI - Version](https://img.shields.io/pypi/v/ncRNABert.svg?style=flat)](https://pypi.org/project/ncRNABert/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ncRNABert.svg)](https://pypi.org/project/ncRNABert/) [![GitHub - LICENSE](https://img.shields.io/github/license/wangleiofficial/ncRNABert.svg?style=flat)](./LICENSE) ![PyPI - Downloads](https://img.shields.io/pypi/dm/ncRNABert) [![Wheel](https://img.shields.io/pypi/wheel/ncRNABert)](https://pypi.org/project/ncRNABert/) ![build](https://img.shields.io/github/actions/workflow/status/wangleiofficial/ncRNABert/publish_to_pypi.yml)\n\n### Model details\n|   **Model**    | **# of parameters** | **# of hidden size** |            **Pretraining dataset**             | **# of ncRNAs** | **Model download** |\n|:--------------:|:-------------------:|:----------------------:|:----------------------------------------------:|:-----------------:|:------------------------:|\n|    ncRNABert   |        303M         |           1024           | [RNAcentral](http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/rnacentral_active.fasta.gz) |       26M        |      [Download](https://zenodo.org/record/8263889/files/ncRNABert.pt)       |\n|    ncRNABert   |        303M         |           1024           | [RNAcentral](http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/rnacentral_active.fasta.gz) + nt |       -       |      [Download](https://zenodo.org/records/10421246/files/ncRNABert_nt_rnacentral_3kmer.pt)       |\n\n### Install\nAs a prerequisite, you must have PyTorch installed to use this repository.\n\nYou can use this one-liner for installation, using the latest release version\n\n```\n# latest version\npip install git+https://github.com/wangleiofficial/ncRNABert\n\n# stable version\npip install ncRNABert\n\n```\n\n### Usage\n\n#### ncRNA sequence embedding\n\n```\nfrom ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex\nfrom ncRNABert.utils import BatchConverter\nimport torch\n\ndata = [\n    (\"ncRNA1\", \"ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG\"),\n    (\"ncRNA2\", \"AGGTTTTTAATCTAATTAAGATAGTTGA\"),\n]\n\nids, batch_token, lengths = BatchConverter(data)\nmodel = load_ncRNABert()\nmodel_ex = load_ncRNABert_ex()\nwith torch.no_grad():\n    results = model(batch_token, lengths, repr_layers=[24])\n    results_ex = model_ex(batch_token, lengths, repr_layers=[24])\n# Generate per-sequence representations via averaging\ntoken_representations = results[\"representations\"][24]\ntoken_representations_ex = results_ex[\"representations\"][24]\nsequence_representations = []\nsequence_representations_ex = []\nbatch_lens = [len(item[1]) for item in data]\nfor i, tokens_len in enumerate(batch_lens):\n    sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))\n    sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))\n```\n\n### License\nThis source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ncRNA language model",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/wangleiofficial/ncRNABert"
    },
    "split_keywords": [
        "ncrna",
        "language",
        "model"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ddf3894d272a507d58226990fdceb729014dd46cfaae4ca5584424e2ddaba99",
                "md5": "3a74bce477540fbef570049738ce404c",
                "sha256": "53a80128bbfe4d648f54dd675d99c8a3f80c530eed3679095f821fafc679d37c"
            },
            "downloads": -1,
            "filename": "ncRNABert-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a74bce477540fbef570049738ce404c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11171,
            "upload_time": "2023-12-24T02:13:50",
            "upload_time_iso_8601": "2023-12-24T02:13:50.222271Z",
            "url": "https://files.pythonhosted.org/packages/9d/df/3894d272a507d58226990fdceb729014dd46cfaae4ca5584424e2ddaba99/ncRNABert-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b8d8ab03d223cdc63ca6a70ea931cbf93c5af922f339a980f70b0407e1d509bd",
                "md5": "930fd0f23d7b8deb18ae98e1990eefaa",
                "sha256": "4e8ddd4fbecd10729714be106ad298c0f1ce5bffa4dc150a7ead5ee4cf815d9f"
            },
            "downloads": -1,
            "filename": "ncRNABert-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "930fd0f23d7b8deb18ae98e1990eefaa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10275,
            "upload_time": "2023-12-24T02:13:51",
            "upload_time_iso_8601": "2023-12-24T02:13:51.937735Z",
            "url": "https://files.pythonhosted.org/packages/b8/d8/ab03d223cdc63ca6a70ea931cbf93c5af922f339a980f70b0407e1d509bd/ncRNABert-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-24 02:13:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wangleiofficial",
    "github_project": "ncRNABert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ncrnabert"
}
        
Elapsed time: 0.41596s