seqfold


Nameseqfold JSON
Version 0.7.17 PyPI version JSON
download
home_pagehttps://github.com/Lattice-Automation/seqfold
SummaryPredict the minimum free energy structure of nucleic acids
upload_time2023-05-30 14:26:25
maintainer
docs_urlNone
authorJJTimmons
requires_python>=3.5
licensemit
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # seqfold

[![DOI](https://zenodo.org/badge/224018980.svg)](https://zenodo.org/badge/latestdoi/224018980)

Predict the minimum free energy structure of nucleic acids.

`seqfold` is an implementation of the `Zuker, 1981` dynamic programming algorithm, the basis for [UNAFold](http://unafold.rna.albany.edu/?q=DINAMelt/software)/[mfold](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/), with energy functions from `SantaLucia, 2004` (DNA) and `Turner, 2009` (RNA).

## Installation

### pypy3 (strongly recommended)

```bash
pypy3 -m ensurepip
pypy3 -m pip install seqfold
```

For a 200bp sequence (on my laptop), [pypy3](https://doc.pypy.org/en/latest/index.html) takes 2.5 seconds versus 15 seconds for CPython.

### Default pip

```bash
pip install seqfold
```

## Usage

### Python

```python
from seqfold import fold, dg, dg_cache, dot_bracket

# just returns minimum free energy
dg("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC", temp = 37.0)  # -13.4

# `fold` returns a list of `seqfold.Struct` from the minimum free energy structure
structs = fold("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC")
print(sum(s.e for s in structs))  # -13.4, same as dg()
for struct in structs:
    print(struct)  # prints the i, j, ddg, and description of each structure

# `dg_cache` returns a 2D array where each (i,j) combination returns the MFE from i to j inclusive
cache = dg_cache("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC")

# `dot_bracket` returns a dot_bracket representation of the folding
print(dot_bracket(structs))  # ((((((((.((((......))))..((((.......)))).))))))))
```

### CLI

```txt
usage: seqfold [-h] [-t FLOAT] [-d] [-r] [--version] SEQ

Predict the minimum free energy (kcal/mol) of a nucleic acid sequence

positional arguments:
  SEQ                   nucleic acid sequence to fold

optional arguments:
  -h, --help            show this help message and exit
  -t FLOAT, --celcius FLOAT
                        temperature in Celsius
  -d, --dot-bracket     write a dot-bracket of the MFE folding to stdout
  -r, --sub-structures  write each substructure of the MFE folding to stdout
  --version             show program's version number and exit
```

#### Examples

```bash
$ seqfold GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC --celcius 32
-15.3
```

```bash
$ seqfold GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC --celcius 32 --dot-bracket --sub-structures
GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC
((((((((.((((......))))..((((.......)))).))))))))
   i    j    ddg  description
   0   48   -1.9  STACK:GG/CC
   1   47   -1.9  STACK:GG/CC
   2   46   -1.4  STACK:GA/CT
   3   45   -1.4  STACK:AG/TC
   4   44   -1.9  STACK:GG/CC
   5   43   -1.6  STACK:GT/CA
   6   42   -1.4  STACK:TC/AG
   7   41   -0.5  BIFURCATION:4n/3h
   9   22   -1.1  STACK:TT/AA
  10   21   -0.7  STACK:TA/AT
  11   20   -1.6  STACK:AC/TG
  12   19    3.0  HAIRPIN:CA/GG
  25   39   -1.9  STACK:CC/GG
  26   38   -2.3  STACK:CG/GC
  27   37   -1.9  STACK:GG/CC
  28   36    3.2  HAIRPIN:GT/CT
-15.3
```

### Notes

- The type of nucleic acid, DNA or RNA, is inferred from the input sequence.
- `seqfold` is case-insensitive with the input sequence.
- The default temperature is 37 degrees Celsius for both the Python and CLI interface.

## Motivation

Secondary structure prediction is used for making [PCR primers](https://academic.oup.com/nar/article/40/15/e115/1223759), designing [oligos for MAGE](https://pubs.acs.org/doi/abs/10.1021/acssynbio.5b00219), and tuning [RBS expression rates](https://www.sciencedirect.com/science/article/pii/B9780123851208000024).

While [UNAFold](http://unafold.rna.albany.edu/?q=DINAMelt/software) and [mfold](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/) are the most widely used applications for nucleic acid secondary structure prediction, their format and license are restrictive. `seqfold` is meant to be an open-source, minimalist alternative for predicting minimum free energy secondary structure.

|              | seqfold               | mfold                                                                                  | UNAFold                                                                                          |
| ------------ | --------------------- | -------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| License      | MIT                   | [Academic Non-commercial](http://unafold.rna.albany.edu/download/Academic_License.txt) | [\$200-36,000](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/products/) |
| OS           | Linux, MacOS, Windows | Linux, MacOS                                                                           | Linux, MacOS, Windows                                                                            |
| Format       | python, CLI python    | CLI binary                                                                             | CLI binary                                                                                       |
| Dependencies | none                  | (mfold_util)                                                                           | Perl, (gnuplot, glut/OpenGL)                                                                     |
| Graphical    | no                    | yes (output)                                                                           | yes (output)                                                                                     |
| Heterodimers | no                    | yes                                                                                    | yes                                                                                              |
| Constraints  | no                    | yes                                                                                    | yes                                                                                              |

## Citations

Papers, and how they helped in developing `seqfold`, are listed below.

### Nussinov, 1980

> Nussinov, Ruth, and Ann B. Jacobson. "Fast algorithm for predicting the secondary structure of single-stranded RNA." Proceedings of the National Academy of Sciences 77.11 (1980): 6309-6313.

Framework for the dynamic programming approach. It has a conceptually helpful "Maximal Matching" example that demonstrates the approach on a simple sequence with only matched or unmatched bp.

### Zuker, 1981

> Zuker, Michael, and Patrick Stiegler. "Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information." Nucleic acids research 9.1 (1981): 133-148.

The most cited paper in this space. Extends further than `Nussinov, 1980` with a nearest neighbor approach to energies and a consideration of each of stack, bulge, internal loop, and hairpin. Their data structure and traceback method are both more intuitive than `Nussinov, 1980`.

### Jaeger, 1989

> Jaeger, John A., Douglas H. Turner, and Michael Zuker. "Improved predictions of secondary structures for RNA." Proceedings of the National Academy of Sciences 86.20 (1989): 7706-7710.

Zuker and colleagues expand on the 1981 paper to incorporate penalties for multibranched loops and dangling ends.

### SantaLucia, 2004

> SantaLucia Jr, John, and Donald Hicks. "The thermodynamics of DNA structural motifs." Annu. Rev. Biophys. Biomol. Struct. 33 (2004): 415-440.

The paper from which almost every DNA energy function in `seqfold` comes from (with the exception of multibranch loops). Provides neighbor entropies and enthalpies for stacks, mismatching stacks, terminal stacks, and dangling stacks. Ditto for bulges, internal loops, and hairpins.

### Turner, 2009

> Turner, Douglas H., and David H. Mathews. "NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure." Nucleic acids research 38.suppl_1 (2009): D280-D282.

Source of RNA nearest neighbor change in entropy and enthalpy parameter data. In `/data`.

### Ward, 2017

> Ward, M., Datta, A., Wise, M., & Mathews, D. H. (2017). Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic acids research, 45(14), 8541-8550.

An investigation of energy functions for multibranch loops that validates the simple linear approach employed by `Jaeger, 1989` that keeps runtime within `O(n³)`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Lattice-Automation/seqfold",
    "name": "seqfold",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "",
    "author": "JJTimmons",
    "author_email": "jtimmons@latticeautomation.com",
    "download_url": "https://files.pythonhosted.org/packages/8a/65/c1506b49a63e1ef31cd85f0a88db49928f4ecb2932498a97b573d0117739/seqfold-0.7.17.tar.gz",
    "platform": null,
    "description": "# seqfold\n\n[![DOI](https://zenodo.org/badge/224018980.svg)](https://zenodo.org/badge/latestdoi/224018980)\n\nPredict the minimum free energy structure of nucleic acids.\n\n`seqfold` is an implementation of the `Zuker, 1981` dynamic programming algorithm, the basis for [UNAFold](http://unafold.rna.albany.edu/?q=DINAMelt/software)/[mfold](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/), with energy functions from `SantaLucia, 2004` (DNA) and `Turner, 2009` (RNA).\n\n## Installation\n\n### pypy3 (strongly recommended)\n\n```bash\npypy3 -m ensurepip\npypy3 -m pip install seqfold\n```\n\nFor a 200bp sequence (on my laptop), [pypy3](https://doc.pypy.org/en/latest/index.html) takes 2.5 seconds versus 15 seconds for CPython.\n\n### Default pip\n\n```bash\npip install seqfold\n```\n\n## Usage\n\n### Python\n\n```python\nfrom seqfold import fold, dg, dg_cache, dot_bracket\n\n# just returns minimum free energy\ndg(\"GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC\", temp = 37.0)  # -13.4\n\n# `fold` returns a list of `seqfold.Struct` from the minimum free energy structure\nstructs = fold(\"GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC\")\nprint(sum(s.e for s in structs))  # -13.4, same as dg()\nfor struct in structs:\n    print(struct)  # prints the i, j, ddg, and description of each structure\n\n# `dg_cache` returns a 2D array where each (i,j) combination returns the MFE from i to j inclusive\ncache = dg_cache(\"GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC\")\n\n# `dot_bracket` returns a dot_bracket representation of the folding\nprint(dot_bracket(structs))  # ((((((((.((((......))))..((((.......)))).))))))))\n```\n\n### CLI\n\n```txt\nusage: seqfold [-h] [-t FLOAT] [-d] [-r] [--version] SEQ\n\nPredict the minimum free energy (kcal/mol) of a nucleic acid sequence\n\npositional arguments:\n  SEQ                   nucleic acid sequence to fold\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -t FLOAT, --celcius FLOAT\n                        temperature in Celsius\n  -d, --dot-bracket     write a dot-bracket of the MFE folding to stdout\n  -r, --sub-structures  write each substructure of the MFE folding to stdout\n  --version             show program's version number and exit\n```\n\n#### Examples\n\n```bash\n$ seqfold GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC --celcius 32\n-15.3\n```\n\n```bash\n$ seqfold GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC --celcius 32 --dot-bracket --sub-structures\nGGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC\n((((((((.((((......))))..((((.......)))).))))))))\n   i    j    ddg  description\n   0   48   -1.9  STACK:GG/CC\n   1   47   -1.9  STACK:GG/CC\n   2   46   -1.4  STACK:GA/CT\n   3   45   -1.4  STACK:AG/TC\n   4   44   -1.9  STACK:GG/CC\n   5   43   -1.6  STACK:GT/CA\n   6   42   -1.4  STACK:TC/AG\n   7   41   -0.5  BIFURCATION:4n/3h\n   9   22   -1.1  STACK:TT/AA\n  10   21   -0.7  STACK:TA/AT\n  11   20   -1.6  STACK:AC/TG\n  12   19    3.0  HAIRPIN:CA/GG\n  25   39   -1.9  STACK:CC/GG\n  26   38   -2.3  STACK:CG/GC\n  27   37   -1.9  STACK:GG/CC\n  28   36    3.2  HAIRPIN:GT/CT\n-15.3\n```\n\n### Notes\n\n- The type of nucleic acid, DNA or RNA, is inferred from the input sequence.\n- `seqfold` is case-insensitive with the input sequence.\n- The default temperature is 37 degrees Celsius for both the Python and CLI interface.\n\n## Motivation\n\nSecondary structure prediction is used for making [PCR primers](https://academic.oup.com/nar/article/40/15/e115/1223759), designing [oligos for MAGE](https://pubs.acs.org/doi/abs/10.1021/acssynbio.5b00219), and tuning [RBS expression rates](https://www.sciencedirect.com/science/article/pii/B9780123851208000024).\n\nWhile [UNAFold](http://unafold.rna.albany.edu/?q=DINAMelt/software) and [mfold](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/) are the most widely used applications for nucleic acid secondary structure prediction, their format and license are restrictive. `seqfold` is meant to be an open-source, minimalist alternative for predicting minimum free energy secondary structure.\n\n|              | seqfold               | mfold                                                                                  | UNAFold                                                                                          |\n| ------------ | --------------------- | -------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |\n| License      | MIT                   | [Academic Non-commercial](http://unafold.rna.albany.edu/download/Academic_License.txt) | [\\$200-36,000](https://www.ibridgenetwork.org/#!/profiles/1045554571442/innovations/1/products/) |\n| OS           | Linux, MacOS, Windows | Linux, MacOS                                                                           | Linux, MacOS, Windows                                                                            |\n| Format       | python, CLI python    | CLI binary                                                                             | CLI binary                                                                                       |\n| Dependencies | none                  | (mfold_util)                                                                           | Perl, (gnuplot, glut/OpenGL)                                                                     |\n| Graphical    | no                    | yes (output)                                                                           | yes (output)                                                                                     |\n| Heterodimers | no                    | yes                                                                                    | yes                                                                                              |\n| Constraints  | no                    | yes                                                                                    | yes                                                                                              |\n\n## Citations\n\nPapers, and how they helped in developing `seqfold`, are listed below.\n\n### Nussinov, 1980\n\n> Nussinov, Ruth, and Ann B. Jacobson. \"Fast algorithm for predicting the secondary structure of single-stranded RNA.\" Proceedings of the National Academy of Sciences 77.11 (1980): 6309-6313.\n\nFramework for the dynamic programming approach. It has a conceptually helpful \"Maximal Matching\" example that demonstrates the approach on a simple sequence with only matched or unmatched bp.\n\n### Zuker, 1981\n\n> Zuker, Michael, and Patrick Stiegler. \"Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.\" Nucleic acids research 9.1 (1981): 133-148.\n\nThe most cited paper in this space. Extends further than `Nussinov, 1980` with a nearest neighbor approach to energies and a consideration of each of stack, bulge, internal loop, and hairpin. Their data structure and traceback method are both more intuitive than `Nussinov, 1980`.\n\n### Jaeger, 1989\n\n> Jaeger, John A., Douglas H. Turner, and Michael Zuker. \"Improved predictions of secondary structures for RNA.\" Proceedings of the National Academy of Sciences 86.20 (1989): 7706-7710.\n\nZuker and colleagues expand on the 1981 paper to incorporate penalties for multibranched loops and dangling ends.\n\n### SantaLucia, 2004\n\n> SantaLucia Jr, John, and Donald Hicks. \"The thermodynamics of DNA structural motifs.\" Annu. Rev. Biophys. Biomol. Struct. 33 (2004): 415-440.\n\nThe paper from which almost every DNA energy function in `seqfold` comes from (with the exception of multibranch loops). Provides neighbor entropies and enthalpies for stacks, mismatching stacks, terminal stacks, and dangling stacks. Ditto for bulges, internal loops, and hairpins.\n\n### Turner, 2009\n\n> Turner, Douglas H., and David H. Mathews. \"NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure.\" Nucleic acids research 38.suppl_1 (2009): D280-D282.\n\nSource of RNA nearest neighbor change in entropy and enthalpy parameter data. In `/data`.\n\n### Ward, 2017\n\n> Ward, M., Datta, A., Wise, M., & Mathews, D. H. (2017). Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic acids research, 45(14), 8541-8550.\n\nAn investigation of energy functions for multibranch loops that validates the simple linear approach employed by `Jaeger, 1989` that keeps runtime within `O(n\u00b3)`.\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Predict the minimum free energy structure of nucleic acids",
    "version": "0.7.17",
    "project_urls": {
        "Homepage": "https://github.com/Lattice-Automation/seqfold"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5fe42bfa5bb9ae7cf06cebc028e4fbbc3e4e1cc1fd8eee87a39a68da86ec16c4",
                "md5": "72e4d85dc4d624b5fd05d97ffba3f134",
                "sha256": "beaf8d6064f69d33d4ad7e53eea4d4f425baf2cbd6e54e67537a12f6ff29c617"
            },
            "downloads": -1,
            "filename": "seqfold-0.7.17-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "72e4d85dc4d624b5fd05d97ffba3f134",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 29995,
            "upload_time": "2023-05-30T14:26:22",
            "upload_time_iso_8601": "2023-05-30T14:26:22.706096Z",
            "url": "https://files.pythonhosted.org/packages/5f/e4/2bfa5bb9ae7cf06cebc028e4fbbc3e4e1cc1fd8eee87a39a68da86ec16c4/seqfold-0.7.17-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a65c1506b49a63e1ef31cd85f0a88db49928f4ecb2932498a97b573d0117739",
                "md5": "5be48aa5c38eed700bca022ee8684aa0",
                "sha256": "ee0f7ca408496aa9908f60d02ca95e0c6161ec8e3cb48dbedfe52070a832b522"
            },
            "downloads": -1,
            "filename": "seqfold-0.7.17.tar.gz",
            "has_sig": false,
            "md5_digest": "5be48aa5c38eed700bca022ee8684aa0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 30546,
            "upload_time": "2023-05-30T14:26:25",
            "upload_time_iso_8601": "2023-05-30T14:26:25.500960Z",
            "url": "https://files.pythonhosted.org/packages/8a/65/c1506b49a63e1ef31cd85f0a88db49928f4ecb2932498a97b573d0117739/seqfold-0.7.17.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-30 14:26:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Lattice-Automation",
    "github_project": "seqfold",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "seqfold"
}
        
Elapsed time: 0.07516s