Name | efold JSON |
Version |
0.1.2
JSON |
| download |
home_page | None |
Summary | A library to build our DMS signal and RNAstructure prediction models. |
upload_time | 2024-04-08 17:47:55 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# eFold
This repo contains the pytorch code for our paper “*Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”*
[[BioRXiv](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v1.full)] [[Data](https://huggingface.co/rouskinlab)]
## Install
```bash
pip install efold
```
## Inference mode
### Using the command line
From a sequence:
```bash
efold AAACAUGAGGAUUACCCAUGU -o seq.txt
cat seq.txt
AAACAUGAGGAUUACCCAUGU
..(((((.((....)))))))
```
or a fasta file:
```bash
efold --fasta example.fasta
```
Using different formats:
```bash
efold AAACAUGAGGAUUACCCAUGU -bp # base pairs
efold AAACAUGAGGAUUACCCAUGU -db # dotbracket (default)
```
Output can be .json, .csv or .txt
```bash
efold AAACAUGAGGAUUACCCAUGU -o output.csv
```
Run help:
```bash
efold -h
```
### Using python
```python
>>> from efold import inference
>>> inference('AACUGUGCUA', fmt='dotbracket')
..(((((.((....)))))))
```
## File structure
```bash
efold/
api/ # for inference calls
core/ # backend
models/ # where we define eFold and other models
resources/
efold_weights.py # our best model weights
scripts/
efold_training.py # our training script
[...]
LICENSE
requirements.txt
pyproject.toml
```
## Data
### List of the datasets we used
A breakdown of the data we used is summarized [here](https://github.com/rouskinlab/efold_data). All the data is stored on the [HuggingFace](https://huggingface.co/rouskinlab).
### Get the data
You can download our datasets using [rouskinHF](https://github.com/rouskinlab/rouskinhf):
```bash
pip install rouskinhf
```
And in your code, write:
```python
>>> import rouskinhf
>>> data = rouskinhf.get_dataset('ribo500-blast') # look at the dataset names on huggingface
```
## Reproducing our results
Run the training script:
```bash
git clone https://github.com/rouskinlab/eFold
python eFold/scripts/efold_training.py
```
## Citation
**Plain text:**
Albéric A. de Lajarte, Yves J. Martin des Taillades, Colin Kalicki, Federico Fuchs Wightman, Justin Aruda, Dragui Salazar, Matthew F. Allan, Casper L’Esperance-Kerckhoff, Alex Kashi, Fabrice Jossinet, Silvi Rouskin. “Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”. bioRxiv 2024.01.24.577093; doi: https://doi.org/10.1101/2024.01.24.577093. 2024
**BibTex:**
```
@article {Lajarte_Martin_2024,
title = {Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction},
author = {Alb{\'e}ric A. de Lajarte and Yves J. Martin des Taillades and Colin Kalicki and Federico Fuchs Wightman and Justin Aruda and Dragui Salazar and Matthew F. Allan and Casper L{\textquoteright}Esperance-Kerckhoff and Alex Kashi and Fabrice Jossinet and Silvi Rouskin},
year = {2024},
doi = {10.1101/2024.01.24.577093},
URL = {https://www.biorxiv.org/content/early/2024/01/25/2024.01.24.577093},
journal = {bioRxiv}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "efold",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Yves Martin <yves@martin.yt>, Alberic de Lajarte <albericlajarte@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d0/2c/08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c/efold-0.1.2.tar.gz",
"platform": null,
"description": "# eFold\n\nThis repo contains the pytorch code for our paper \u201c*Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction\u201d* \n\n[[BioRXiv](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v1.full)] [[Data](https://huggingface.co/rouskinlab)]\n\n## Install\n\n```bash\npip install efold\n```\n\n\n\n## Inference mode\n\n### Using the command line\n\nFrom a sequence:\n\n```bash\nefold AAACAUGAGGAUUACCCAUGU -o seq.txt\ncat seq.txt\n\nAAACAUGAGGAUUACCCAUGU\n..(((((.((....)))))))\n```\n\nor a fasta file:\n\n```bash\nefold --fasta example.fasta\n```\n\nUsing different formats:\n```bash\nefold AAACAUGAGGAUUACCCAUGU -bp # base pairs\nefold AAACAUGAGGAUUACCCAUGU -db # dotbracket (default)\n```\n\nOutput can be .json, .csv or .txt\n```bash\nefold AAACAUGAGGAUUACCCAUGU -o output.csv\n```\n\nRun help:\n```bash\nefold -h\n```\n\n### Using python\n\n```python\n>>> from efold import inference\n>>> inference('AACUGUGCUA', fmt='dotbracket')\n..(((((.((....)))))))\n```\n\n## File structure\n\n```bash\nefold/\n api/ # for inference calls\n core/ # backend \n models/ # where we define eFold and other models\n resources/\n efold_weights.py # our best model weights\nscripts/\n efold_training.py # our training script\n [...]\nLICENSE\nrequirements.txt\npyproject.toml\n```\n\n## Data\n\n### List of the datasets we used\n\nA breakdown of the data we used is summarized [here](https://github.com/rouskinlab/efold_data). All the data is stored on the [HuggingFace](https://huggingface.co/rouskinlab). \n\n### Get the data\n\nYou can download our datasets using [rouskinHF](https://github.com/rouskinlab/rouskinhf):\n\n```bash\npip install rouskinhf\n```\n\nAnd in your code, write:\n\n```python\n>>> import rouskinhf\n>>> data = rouskinhf.get_dataset('ribo500-blast') # look at the dataset names on huggingface\n```\n\n\n\n## Reproducing our results\n\nRun the training script:\n\n```bash\ngit clone https://github.com/rouskinlab/eFold\npython eFold/scripts/efold_training.py\n```\n\n## Citation\n\n**Plain text:**\n\nAlb\u00e9ric A. de Lajarte, Yves J. Martin des Taillades, Colin Kalicki, Federico Fuchs Wightman, Justin Aruda, Dragui Salazar, Matthew F. Allan, Casper L\u2019Esperance-Kerckhoff, Alex Kashi, Fabrice Jossinet, Silvi Rouskin. \u201cDiverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction\u201d. bioRxiv 2024.01.24.577093; doi: https://doi.org/10.1101/2024.01.24.577093. 2024\n\n**BibTex:**\n\n```\n@article {Lajarte_Martin_2024,\n\ttitle = {Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction},\n\tauthor = {Alb{\\'e}ric A. de Lajarte and Yves J. Martin des Taillades and Colin Kalicki and Federico Fuchs Wightman and Justin Aruda and Dragui Salazar and Matthew F. Allan and Casper L{\\textquoteright}Esperance-Kerckhoff and Alex Kashi and Fabrice Jossinet and Silvi Rouskin},\n\tyear = {2024},\n\tdoi = {10.1101/2024.01.24.577093},\n\tURL = {https://www.biorxiv.org/content/early/2024/01/25/2024.01.24.577093},\n\tjournal = {bioRxiv}\n}\n\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A library to build our DMS signal and RNAstructure prediction models.",
"version": "0.1.2",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f47faca700d72a3b9a61a9978510e918faabfc32f4e4bb4e631313a601899524",
"md5": "cb00d3d11177b3ca9fd98c59b894d3b7",
"sha256": "f632e6fb3b1b5e7e9016372264961b3c26c6c7e8b9472d431b15b349729fd71b"
},
"downloads": -1,
"filename": "efold-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cb00d3d11177b3ca9fd98c59b894d3b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 10266497,
"upload_time": "2024-04-08T17:47:52",
"upload_time_iso_8601": "2024-04-08T17:47:52.653920Z",
"url": "https://files.pythonhosted.org/packages/f4/7f/aca700d72a3b9a61a9978510e918faabfc32f4e4bb4e631313a601899524/efold-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d02c08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c",
"md5": "92bcedb4932b70a2169813dc254e19d9",
"sha256": "5ef8ddd1d95d0ffffa2e83a0245048303eeb6793f35a9794590707468ead47ff"
},
"downloads": -1,
"filename": "efold-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "92bcedb4932b70a2169813dc254e19d9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 10258199,
"upload_time": "2024-04-08T17:47:55",
"upload_time_iso_8601": "2024-04-08T17:47:55.733470Z",
"url": "https://files.pythonhosted.org/packages/d0/2c/08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c/efold-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-08 17:47:55",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "efold"
}