rouskinhf

Name	rouskinhf JSON
Version	0.4.8 JSON
	download
home_page	None
Summary	A library to manipulate data for our DMS prediction models.
upload_time	2024-05-14 17:07:07
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT License Copyright (c) Microsoft Corporation. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![PyPI](https://img.shields.io/pypi/v/rouskinhf)
![GitHub tag (with filter)](https://img.shields.io/github/v/tag/rouskinlab/rouskinhf)

# Download your RNA data from HuggingFace with rouskinhf!

A wrapper around Huggingface the load data for eFold. You can:
- pull datasets from the Rouskinlab's HuggingFace
- create datasets from local files 

# Installation

### To download data

```bash
pip install rouskinhf
```

### To push data to huggingface (optional) 

- get a token access from the rouskilab huggingface's page
- add this token to your environment

```bash
export HUGGINGFACE_TOKEN="hf_yourtokenhere"
```

### To predict structures from rouskinhf (optional)
You'll need to install D. Mathew's [RNAstructure Fold](https://rna.urmc.rochester.edu/RNAstructure.html) (also available on [Rouskinlab GitHub](https://github.com/rouskinlab/RNAstructure)).

Check your RNAstructure Fold installation in a terminal:

```bash
Fold --version
```

# How to use

### Download a dataset

```python
import rouskinhf

rouskinhf.get_dataset(
    name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab
    force_download = False # use a local copy of the data if it exists
)
```

### Convert whatever format to rouskinhf format

```python
import rouskinhf

rouskinhf.convert(
    format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)
    file_or_folder = 'path/to/my/ct/folder',
    predict_structure = False, # Add structure from RNAstructure
    filter = True, # removes duplicates, non-regular characters and low AUROC
    min_AUROC=0.8,
)
```
> Note: Sequences with bases different than `A`, `C`, `G`, `T`, `U`, `N`, `a`, `c`, `g`, `t`, `u`, `n` are not supported. The data will be filtered out.


### Rouskinhf structure format
```json
# rouskinhf_output_file.json
{
    "reference_name": {
        "sequence": "CACGCUAUG",
        "structure": [(0,8), (1,7)], # base pair representation
        # whatever other info you need
    }
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rouskinhf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Yves Martin <yves@martin.yt>, Alberic de Lajarte <albericlajarte@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ed/44/cadd00eb40814ee959875deda43a36c11cbe91f5b76e4e83bd2fa6733d4d/rouskinhf-0.4.8.tar.gz",
    "platform": null,
    "description": "![PyPI](https://img.shields.io/pypi/v/rouskinhf)\n![GitHub tag (with filter)](https://img.shields.io/github/v/tag/rouskinlab/rouskinhf)\n\n# Download your RNA data from HuggingFace with rouskinhf!\n\nA wrapper around Huggingface the load data for eFold. You can:\n- pull datasets from the Rouskinlab's HuggingFace\n- create datasets from local files \n\n# Installation\n\n### To download data\n\n```bash\npip install rouskinhf\n```\n\n### To push data to huggingface (optional) \n\n- get a token access from the rouskilab huggingface's page\n- add this token to your environment\n\n```bash\nexport HUGGINGFACE_TOKEN=\"hf_yourtokenhere\"\n```\n\n### To predict structures from rouskinhf (optional)\nYou'll need to install D. Mathew's [RNAstructure Fold](https://rna.urmc.rochester.edu/RNAstructure.html) (also available on [Rouskinlab GitHub](https://github.com/rouskinlab/RNAstructure)).\n\nCheck your RNAstructure Fold installation in a terminal:\n\n```bash\nFold --version\n```\n\n# How to use\n\n### Download a dataset\n\n```python\nimport rouskinhf\n\nrouskinhf.get_dataset(\n    name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab\n    force_download = False # use a local copy of the data if it exists\n)\n```\n\n### Convert whatever format to rouskinhf format\n\n```python\nimport rouskinhf\n\nrouskinhf.convert(\n    format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)\n    file_or_folder = 'path/to/my/ct/folder',\n    predict_structure = False, # Add structure from RNAstructure\n    filter = True, # removes duplicates, non-regular characters and low AUROC\n    min_AUROC=0.8,\n)\n```\n> Note: Sequences with bases different than `A`, `C`, `G`, `T`, `U`, `N`, `a`, `c`, `g`, `t`, `u`, `n` are not supported. The data will be filtered out.\n\n\n### Rouskinhf structure format\n```json\n# rouskinhf_output_file.json\n{\n    \"reference_name\": {\n        \"sequence\": \"CACGCUAUG\",\n        \"structure\": [(0,8), (1,7)], # base pair representation\n        # whatever other info you need\n    }\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) Microsoft Corporation.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE ",
    "summary": "A library to manipulate data for our DMS prediction models.",
    "version": "0.4.8",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e7f68789fba2f5fea1ef590f004f48be92ec856758b73241c92735eaaf2ddb9",
                "md5": "92f2a5e63b8872cd511db0fd9d344661",
                "sha256": "04576cb878ca6c049c22258083381d34a46eb7a583aeaefe0454cd3460a4c3ad"
            },
            "downloads": -1,
            "filename": "rouskinhf-0.4.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "92f2a5e63b8872cd511db0fd9d344661",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 19308,
            "upload_time": "2024-05-14T17:07:06",
            "upload_time_iso_8601": "2024-05-14T17:07:06.474700Z",
            "url": "https://files.pythonhosted.org/packages/2e/7f/68789fba2f5fea1ef590f004f48be92ec856758b73241c92735eaaf2ddb9/rouskinhf-0.4.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed44cadd00eb40814ee959875deda43a36c11cbe91f5b76e4e83bd2fa6733d4d",
                "md5": "ff1f4dc2611970a8f1d11ca069ed0677",
                "sha256": "61eaf0316e28fa4e8ba82f94d6bbd14a63e174b604f5980f6a27bdab2fd5e436"
            },
            "downloads": -1,
            "filename": "rouskinhf-0.4.8.tar.gz",
            "has_sig": false,
            "md5_digest": "ff1f4dc2611970a8f1d11ca069ed0677",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17283,
            "upload_time": "2024-05-14T17:07:07",
            "upload_time_iso_8601": "2024-05-14T17:07:07.769856Z",
            "url": "https://files.pythonhosted.org/packages/ed/44/cadd00eb40814ee959875deda43a36c11cbe91f5b76e4e83bd2fa6733d4d/rouskinhf-0.4.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-14 17:07:07",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "rouskinhf"
}

None