efold


Nameefold JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA library to build our DMS signal and RNAstructure prediction models.
upload_time2024-04-08 17:47:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # eFold

This repo contains the pytorch code for our paper “*Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”* 

[[BioRXiv](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v1.full)] [[Data](https://huggingface.co/rouskinlab)]

## Install

```bash
pip install efold
```



## Inference mode

### Using the command line

From a sequence:

```bash
efold AAACAUGAGGAUUACCCAUGU -o seq.txt
cat seq.txt

AAACAUGAGGAUUACCCAUGU
..(((((.((....)))))))
```

or a fasta file:

```bash
efold --fasta example.fasta
```

Using different formats:
```bash
efold AAACAUGAGGAUUACCCAUGU -bp # base pairs
efold AAACAUGAGGAUUACCCAUGU -db # dotbracket (default)
```

Output can be .json, .csv or .txt
```bash
efold AAACAUGAGGAUUACCCAUGU -o output.csv
```

Run help:
```bash
efold -h
```

### Using python

```python
>>> from efold import inference
>>> inference('AACUGUGCUA', fmt='dotbracket')
..(((((.((....)))))))
```

## File structure

```bash
efold/
    api/    # for inference calls
    core/   # backend 
    models/ # where we define eFold and other models
    resources/
        efold_weights.py # our best model weights
scripts/
    efold_training.py # our training script
    [...]
LICENSE
requirements.txt
pyproject.toml
```

## Data

### List of the datasets we used

A breakdown of the data we used is summarized [here](https://github.com/rouskinlab/efold_data). All the data is stored on the [HuggingFace](https://huggingface.co/rouskinlab). 

### Get the data

You can download our datasets using [rouskinHF](https://github.com/rouskinlab/rouskinhf):

```bash
pip install rouskinhf
```

And in your code, write:

```python
>>> import rouskinhf
>>> data = rouskinhf.get_dataset('ribo500-blast') # look at the dataset names on huggingface
```



## Reproducing our results

Run the training script:

```bash
git clone https://github.com/rouskinlab/eFold
python eFold/scripts/efold_training.py
```

## Citation

**Plain text:**

Albéric A. de Lajarte, Yves J. Martin des Taillades, Colin Kalicki, Federico Fuchs Wightman, Justin Aruda, Dragui Salazar, Matthew F. Allan, Casper L’Esperance-Kerckhoff, Alex Kashi, Fabrice Jossinet, Silvi Rouskin. “Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”. bioRxiv 2024.01.24.577093; doi: https://doi.org/10.1101/2024.01.24.577093. 2024

**BibTex:**

```
@article {Lajarte_Martin_2024,
	title = {Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction},
	author = {Alb{\'e}ric A. de Lajarte and Yves J. Martin des Taillades and Colin Kalicki and Federico Fuchs Wightman and Justin Aruda and Dragui Salazar and Matthew F. Allan and Casper L{\textquoteright}Esperance-Kerckhoff and Alex Kashi and Fabrice Jossinet and Silvi Rouskin},
	year = {2024},
	doi = {10.1101/2024.01.24.577093},
	URL = {https://www.biorxiv.org/content/early/2024/01/25/2024.01.24.577093},
	journal = {bioRxiv}
}

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "efold",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Yves Martin <yves@martin.yt>, Alberic de Lajarte <albericlajarte@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d0/2c/08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c/efold-0.1.2.tar.gz",
    "platform": null,
    "description": "# eFold\n\nThis repo contains the pytorch code for our paper \u201c*Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction\u201d* \n\n[[BioRXiv](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v1.full)] [[Data](https://huggingface.co/rouskinlab)]\n\n## Install\n\n```bash\npip install efold\n```\n\n\n\n## Inference mode\n\n### Using the command line\n\nFrom a sequence:\n\n```bash\nefold AAACAUGAGGAUUACCCAUGU -o seq.txt\ncat seq.txt\n\nAAACAUGAGGAUUACCCAUGU\n..(((((.((....)))))))\n```\n\nor a fasta file:\n\n```bash\nefold --fasta example.fasta\n```\n\nUsing different formats:\n```bash\nefold AAACAUGAGGAUUACCCAUGU -bp # base pairs\nefold AAACAUGAGGAUUACCCAUGU -db # dotbracket (default)\n```\n\nOutput can be .json, .csv or .txt\n```bash\nefold AAACAUGAGGAUUACCCAUGU -o output.csv\n```\n\nRun help:\n```bash\nefold -h\n```\n\n### Using python\n\n```python\n>>> from efold import inference\n>>> inference('AACUGUGCUA', fmt='dotbracket')\n..(((((.((....)))))))\n```\n\n## File structure\n\n```bash\nefold/\n    api/    # for inference calls\n    core/   # backend \n    models/ # where we define eFold and other models\n    resources/\n        efold_weights.py # our best model weights\nscripts/\n    efold_training.py # our training script\n    [...]\nLICENSE\nrequirements.txt\npyproject.toml\n```\n\n## Data\n\n### List of the datasets we used\n\nA breakdown of the data we used is summarized [here](https://github.com/rouskinlab/efold_data). All the data is stored on the [HuggingFace](https://huggingface.co/rouskinlab). \n\n### Get the data\n\nYou can download our datasets using [rouskinHF](https://github.com/rouskinlab/rouskinhf):\n\n```bash\npip install rouskinhf\n```\n\nAnd in your code, write:\n\n```python\n>>> import rouskinhf\n>>> data = rouskinhf.get_dataset('ribo500-blast') # look at the dataset names on huggingface\n```\n\n\n\n## Reproducing our results\n\nRun the training script:\n\n```bash\ngit clone https://github.com/rouskinlab/eFold\npython eFold/scripts/efold_training.py\n```\n\n## Citation\n\n**Plain text:**\n\nAlb\u00e9ric A. de Lajarte, Yves J. Martin des Taillades, Colin Kalicki, Federico Fuchs Wightman, Justin Aruda, Dragui Salazar, Matthew F. Allan, Casper L\u2019Esperance-Kerckhoff, Alex Kashi, Fabrice Jossinet, Silvi Rouskin. \u201cDiverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction\u201d. bioRxiv 2024.01.24.577093; doi: https://doi.org/10.1101/2024.01.24.577093. 2024\n\n**BibTex:**\n\n```\n@article {Lajarte_Martin_2024,\n\ttitle = {Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction},\n\tauthor = {Alb{\\'e}ric A. de Lajarte and Yves J. Martin des Taillades and Colin Kalicki and Federico Fuchs Wightman and Justin Aruda and Dragui Salazar and Matthew F. Allan and Casper L{\\textquoteright}Esperance-Kerckhoff and Alex Kashi and Fabrice Jossinet and Silvi Rouskin},\n\tyear = {2024},\n\tdoi = {10.1101/2024.01.24.577093},\n\tURL = {https://www.biorxiv.org/content/early/2024/01/25/2024.01.24.577093},\n\tjournal = {bioRxiv}\n}\n\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to build our DMS signal and RNAstructure prediction models.",
    "version": "0.1.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f47faca700d72a3b9a61a9978510e918faabfc32f4e4bb4e631313a601899524",
                "md5": "cb00d3d11177b3ca9fd98c59b894d3b7",
                "sha256": "f632e6fb3b1b5e7e9016372264961b3c26c6c7e8b9472d431b15b349729fd71b"
            },
            "downloads": -1,
            "filename": "efold-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb00d3d11177b3ca9fd98c59b894d3b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 10266497,
            "upload_time": "2024-04-08T17:47:52",
            "upload_time_iso_8601": "2024-04-08T17:47:52.653920Z",
            "url": "https://files.pythonhosted.org/packages/f4/7f/aca700d72a3b9a61a9978510e918faabfc32f4e4bb4e631313a601899524/efold-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d02c08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c",
                "md5": "92bcedb4932b70a2169813dc254e19d9",
                "sha256": "5ef8ddd1d95d0ffffa2e83a0245048303eeb6793f35a9794590707468ead47ff"
            },
            "downloads": -1,
            "filename": "efold-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "92bcedb4932b70a2169813dc254e19d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 10258199,
            "upload_time": "2024-04-08T17:47:55",
            "upload_time_iso_8601": "2024-04-08T17:47:55.733470Z",
            "url": "https://files.pythonhosted.org/packages/d0/2c/08f7cfc5dd8d88dce40ca5f97122f15625958903cb81af9f7dc8a1d99c5c/efold-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 17:47:55",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "efold"
}
        
Elapsed time: 1.07401s