drtransformer


Namedrtransformer JSON
Version 1.0 PyPI version JSON
download
home_pageNone
SummaryHeuristic cotranscriptional folding using the nearest neighbor energy model.
upload_time2023-02-08 14:53:30
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords cotranscriptional folding rna secondary structure
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DrTransformer -- heuristic cotranscriptional folding.

DrTransformer (short for "DNA-to-RNA transformer") is a program for heuristic
and deterministic cotranscriptional folding simulations of RNA molecules. The
code of this project is available under MIT license, however this software
depends on the [ViennaRNA] package which is available through the [ViennaRNA
license].

## Installation
If you already have the Python bindings of the [ViennaRNA] package installed,
then the latest stable release of DrTransformer can be installed from PyPI:
```sh
  ~$ pip install drtransformer
```

DrTransformer can also be installed with bioconda to resolve the [ViennaRNA]
dependency automatically. First, make sure [bioconda] is set up properly with:
```sh
  ~$ conda config --add channels defaults
  ~$ conda config --add channels bioconda
  ~$ conda config --add channels conda-forge
  ~$ conda config --set channel_priority strict
```
Second, install or update your DrTransformer installation.
```sh
  ~$ conda install drtransformer
```

### Testing/Contributing
To install the latest development version of DrTransformer, clone the
repository and run:
```sh
  ~$ pip install .[dev]
```
Use the following command to run all present unittests:
```sh
  ~$ python -m pytest 
```
Please provide unittests if you are submitting a pull request with a new feature.

## Usage
Until further documentation is available, please use the *--help* options of the 
command line executables:
```sh
  ~$ DrTransformer --help
  ~$ DrPlotter --help
```

### An example cotranscriptional folding simulation
We show simulations of three sequences designed by [Xayaphoummine et
al. (2006)].  Briefly, two sequences are composed of the same palindromic
subsequences (A, B, C, D) in forward and reverse order (`ABCD` and `DCBA`); the
third sequence (`DCMA`) has a point mutation which changes B to M. The
experiment demonstrates how the order of helix formation determines which
structures are formed at the end of transcription, an effect that cannot be
observed with a thermodynamic equilibrium prediction, because the free energies
of, for example, the helices A:B and B:A are almost the same due to their
palindromic subsequences.  The three input files [`ABCD.fa`], [`DCBA.fa`] and
[`DCMA.fa`] contain a fasta header and the respective sequence from the
original publication.  Those files can be found in the subfolder [`examples/`]. 

```sh
  ~$ cat ABCD.fa | DrTransformer --name ABCD --o-prune 0.01 --logfile 
```
This command line call of DrTransformer produces two files:
 - `ABCD.log` contains a human-readable summary of the cotranscriptional folding process. 
 - `ABCD.drf` contains the details of the cotranscriptional folding simulation in the
 [DrForna] file format. 

#### Structure-based data analysis
DrPlotter supports different types of visual analysis for the `.drf` file
format. The following command line call reads the previously generated file
`ABCD.drf` and produces a plot called `ABCD.png`.
```sh
  ~$ cat ABCD.drf | DrPlotter --name ABCD --format png
```
![ABCD](examples/ABCD.png)

The legend of `ABCD.png` must be interpreted in combination with the `ABCD.log`
file. **Note that the structure IDs from your newly generated files might not
match the ones shown here.** For example, to see which structures are shown at
the simulation of nucleotide 73, read the log file entries for this transcript
length:
```
73    1 .(..(((((((((((((((....)))))))))))))))..).(((((((((.......)))))))))...... -42.60 +[0.0213 -> 0.9876] ID = 24
73    2 ....(((((((((((((((....))))))))))))))).(..(((((((((....)).)))))))..)..... -39.90 -[0.9787 -> 0.0124] ID = 25
```
The logfile lists two structures (in order of their free energy), it shows
their occupancy at the start of the simulation and at the end of a simulation
in square brackets, and it provides the ID to follow a specific structure
through the transcription process (+/- indicate a change in occpancy). The IDs
are used as labels in the plot `ABCD.png`.

### Motif-based data analysis
Instead of following specific structures, it is often more helpful to visualize
when specific helical motifs are formed in the ensemble. Generally, we refer to
a helix formed from sequences A and B as A:B, etc. All potential helices 
plotted here are provided in dot-bracket notation in the files [`ABCD.motifs`], [`DCBA.motifs`] and [`DCMA.motifs`].
```sh
  ~$ cat ABCD.drf | DrPlotter --name ABCD-motifs --molecule ABCD --format png --motiffile ABCD.motifs --motifs A:B C:D A:D B:C
  ~$ cat DCBA.drf | DrPlotter --name DCBA-motifs --molecule DCBA --format png --motiffile DCBA.motifs --motifs B:A D:C D:A C:B
  ~$ cat DCMA.drf | DrPlotter --name DCMA-motifs --molecule DCMA --format png --motiffile DCMA.motifs --motifs M:A D:C D:A C:M
```
<img src="examples/ABCD-motifs.png" alt="ABCD"/><br>
ABCD forms only structures A:B and C:D but not A:D and B:C. Also, helix C:D is
not formed "immediately", because there is a competing structure which
is cotranscriptionally favored (see ID 25 from the previous anlysis).

<img src="examples/DCBA-motifs.png" alt="DCBA"/><br>
DCBA forms structures with all motifs. The helical structures C:B and
D:A dominate with more than 90%, the helices D:C and B:A are
below 10% of the population. Eventually, D:C and B:A will be
dominant, but not on the time scale simulated here. (Can you repeat the analysis
to see how much time it needs until D:C and B:A dominate the ensemble?)

<img src="examples/DCMA-motifs.png" alt="DCMA"><br>
As shown in the publication, a single point mutation (from DCBA to DCMA) is
sufficient to drastically shift occupancy of helices: M:A and D:C
are more occupied at the end of transcription than D:A and C:M.

### Tips and tricks
 - The header of the logfile contains all relevant DrTransformer parameters that generated the file. 
 - You can use the parameter `--plot-minh` to group similar structures (separated by energy barriers < plot-minh) together. 
    In contrast to the `--t-fast` parameter, this will not affect the accuracy of the model.
 - Use `--pause-sites` to see the effects of pausing at specific nucleotides on cotranscriptional folding.
 - Motifs for DrPlotter can also contain 'x' in the dot-bracket notation for *must be unpaired*.

## Version
v0.12 -- perparing for official release
  * changed --t-lin, --t-log defaults and fixed --t-lin=1, --t-log=1
  * fixed potential issues with --t-end = --t-ext
  * adapted README example to publication 

v0.11 -- using lonely base-pairs
  * removed the --noLP default (added parameter setting)
  * added profiling option for runtime optimization
  * using --cg-auto default paramter
  * using k0=1e5, t-ext=0.04 default parameter
  * added new visulization types and fixed motif file input
  * added epsilon to t-fast sanity check

v0.10 -- moved to beta status (first official release)
  * changes in parameter defaults 
  * bugfix in linalg
  * new DrPlotter simulation layout and motif plotting
  * repaired code to enable plotting including pause sites

v0.9 -- standalone package (no official release)
  * extraction from the [ribolands] package to a standalone Python package.
  * using scipy and numpy for matrix exponentials (instead of [treekin])
  * implemented lookahead to skip pruning of potentially relevant future structures

## Cite
Stefan Badelt, Ronny Lorenz, Ivo L Hofacker: **DrTransformer: heuristic
cotranscriptional RNA folding using the nearest neighbor energy model**, 
Bioinformatics, Volume 39, Issue 1, January 2023, 
[https://doi.org/10.1093/bioinformatics/btad034]

[//]: References
[ViennaRNA]: <http://www.tbi.univie.ac.at/RNA>
[ViennaRNA license]: <https://github.com/ViennaRNA/ViennaRNA/blob/master/license.txt>
[bioconda]: <https://bioconda.github.io>
[DrForna]: <https://github.com/ViennaRNA/drforna>
[Xayaphoummine et al. (2006)]: <https://doi.org/10.1093/nar/gkl1036>
[https://doi.org/10.1093/bioinformatics/btad034]: <https://doi.org/10.1093/bioinformatics/btad034>
[`examples/`]: <examples>
[`ABCD.fa`]: <examples/ABCD.fa>
[`DCBA.fa`]: <examples/DCBA.fa>
[`DCMA.fa`]: <examples/DCMA.fa>
[`ABCD.motifs`]: <examples/ABCD.motifs>
[`DCBA.motifs`]: <examples/DCBA.motifs>
[`DCMA.motifs`]: <examples/DCMA.motifs>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "drtransformer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "cotranscriptional folding,RNA,secondary structure",
    "author": null,
    "author_email": "Stefan Badelt <bad-ants-fleet@posteo.eu>",
    "download_url": "https://files.pythonhosted.org/packages/3a/05/0564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7/drtransformer-1.0.tar.gz",
    "platform": null,
    "description": "# DrTransformer -- heuristic cotranscriptional folding.\n\nDrTransformer (short for \"DNA-to-RNA transformer\") is a program for heuristic\nand deterministic cotranscriptional folding simulations of RNA molecules. The\ncode of this project is available under MIT license, however this software\ndepends on the [ViennaRNA] package which is available through the [ViennaRNA\nlicense].\n\n## Installation\nIf you already have the Python bindings of the [ViennaRNA] package installed,\nthen the latest stable release of DrTransformer can be installed from PyPI:\n```sh\n  ~$ pip install drtransformer\n```\n\nDrTransformer can also be installed with bioconda to resolve the [ViennaRNA]\ndependency automatically. First, make sure [bioconda] is set up properly with:\n```sh\n  ~$ conda config --add channels defaults\n  ~$ conda config --add channels bioconda\n  ~$ conda config --add channels conda-forge\n  ~$ conda config --set channel_priority strict\n```\nSecond, install or update your DrTransformer installation.\n```sh\n  ~$ conda install drtransformer\n```\n\n### Testing/Contributing\nTo install the latest development version of DrTransformer, clone the\nrepository and run:\n```sh\n  ~$ pip install .[dev]\n```\nUse the following command to run all present unittests:\n```sh\n  ~$ python -m pytest \n```\nPlease provide unittests if you are submitting a pull request with a new feature.\n\n## Usage\nUntil further documentation is available, please use the *--help* options of the \ncommand line executables:\n```sh\n  ~$ DrTransformer --help\n  ~$ DrPlotter --help\n```\n\n### An example cotranscriptional folding simulation\nWe show simulations of three sequences designed by [Xayaphoummine et\nal. (2006)].  Briefly, two sequences are composed of the same palindromic\nsubsequences (A, B, C, D) in forward and reverse order (`ABCD` and `DCBA`); the\nthird sequence (`DCMA`) has a point mutation which changes B to M. The\nexperiment demonstrates how the order of helix formation determines which\nstructures are formed at the end of transcription, an effect that cannot be\nobserved with a thermodynamic equilibrium prediction, because the free energies\nof, for example, the helices A:B and B:A are almost the same due to their\npalindromic subsequences.  The three input files [`ABCD.fa`], [`DCBA.fa`] and\n[`DCMA.fa`] contain a fasta header and the respective sequence from the\noriginal publication.  Those files can be found in the subfolder [`examples/`]. \n\n```sh\n  ~$ cat ABCD.fa | DrTransformer --name ABCD --o-prune 0.01 --logfile \n```\nThis command line call of DrTransformer produces two files:\n - `ABCD.log` contains a human-readable summary of the cotranscriptional folding process. \n - `ABCD.drf` contains the details of the cotranscriptional folding simulation in the\n [DrForna] file format. \n\n#### Structure-based data analysis\nDrPlotter supports different types of visual analysis for the `.drf` file\nformat. The following command line call reads the previously generated file\n`ABCD.drf` and produces a plot called `ABCD.png`.\n```sh\n  ~$ cat ABCD.drf | DrPlotter --name ABCD --format png\n```\n![ABCD](examples/ABCD.png)\n\nThe legend of `ABCD.png` must be interpreted in combination with the `ABCD.log`\nfile. **Note that the structure IDs from your newly generated files might not\nmatch the ones shown here.** For example, to see which structures are shown at\nthe simulation of nucleotide 73, read the log file entries for this transcript\nlength:\n```\n73    1 .(..(((((((((((((((....)))))))))))))))..).(((((((((.......)))))))))...... -42.60 +[0.0213 -> 0.9876] ID = 24\n73    2 ....(((((((((((((((....))))))))))))))).(..(((((((((....)).)))))))..)..... -39.90 -[0.9787 -> 0.0124] ID = 25\n```\nThe logfile lists two structures (in order of their free energy), it shows\ntheir occupancy at the start of the simulation and at the end of a simulation\nin square brackets, and it provides the ID to follow a specific structure\nthrough the transcription process (+/- indicate a change in occpancy). The IDs\nare used as labels in the plot `ABCD.png`.\n\n### Motif-based data analysis\nInstead of following specific structures, it is often more helpful to visualize\nwhen specific helical motifs are formed in the ensemble. Generally, we refer to\na helix formed from sequences A and B as A:B, etc. All potential helices \nplotted here are provided in dot-bracket notation in the files [`ABCD.motifs`], [`DCBA.motifs`] and [`DCMA.motifs`].\n```sh\n  ~$ cat ABCD.drf | DrPlotter --name ABCD-motifs --molecule ABCD --format png --motiffile ABCD.motifs --motifs A:B C:D A:D B:C\n  ~$ cat DCBA.drf | DrPlotter --name DCBA-motifs --molecule DCBA --format png --motiffile DCBA.motifs --motifs B:A D:C D:A C:B\n  ~$ cat DCMA.drf | DrPlotter --name DCMA-motifs --molecule DCMA --format png --motiffile DCMA.motifs --motifs M:A D:C D:A C:M\n```\n<img src=\"examples/ABCD-motifs.png\" alt=\"ABCD\"/><br>\nABCD forms only structures A:B and C:D but not A:D and B:C. Also, helix C:D is\nnot formed \"immediately\", because there is a competing structure which\nis cotranscriptionally favored (see ID 25 from the previous anlysis).\n\n<img src=\"examples/DCBA-motifs.png\" alt=\"DCBA\"/><br>\nDCBA forms structures with all motifs. The helical structures C:B and\nD:A dominate with more than 90%, the helices D:C and B:A are\nbelow 10% of the population. Eventually, D:C and B:A will be\ndominant, but not on the time scale simulated here. (Can you repeat the analysis\nto see how much time it needs until D:C and B:A dominate the ensemble?)\n\n<img src=\"examples/DCMA-motifs.png\" alt=\"DCMA\"><br>\nAs shown in the publication, a single point mutation (from DCBA to DCMA) is\nsufficient to drastically shift occupancy of helices: M:A and D:C\nare more occupied at the end of transcription than D:A and C:M.\n\n### Tips and tricks\n - The header of the logfile contains all relevant DrTransformer parameters that generated the file. \n - You can use the parameter `--plot-minh` to group similar structures (separated by energy barriers < plot-minh) together. \n    In contrast to the `--t-fast` parameter, this will not affect the accuracy of the model.\n - Use `--pause-sites` to see the effects of pausing at specific nucleotides on cotranscriptional folding.\n - Motifs for DrPlotter can also contain 'x' in the dot-bracket notation for *must be unpaired*.\n\n## Version\nv0.12 -- perparing for official release\n  * changed --t-lin, --t-log defaults and fixed --t-lin=1, --t-log=1\n  * fixed potential issues with --t-end = --t-ext\n  * adapted README example to publication \n\nv0.11 -- using lonely base-pairs\n  * removed the --noLP default (added parameter setting)\n  * added profiling option for runtime optimization\n  * using --cg-auto default paramter\n  * using k0=1e5, t-ext=0.04 default parameter\n  * added new visulization types and fixed motif file input\n  * added epsilon to t-fast sanity check\n\nv0.10 -- moved to beta status (first official release)\n  * changes in parameter defaults \n  * bugfix in linalg\n  * new DrPlotter simulation layout and motif plotting\n  * repaired code to enable plotting including pause sites\n\nv0.9 -- standalone package (no official release)\n  * extraction from the [ribolands] package to a standalone Python package.\n  * using scipy and numpy for matrix exponentials (instead of [treekin])\n  * implemented lookahead to skip pruning of potentially relevant future structures\n\n## Cite\nStefan Badelt, Ronny Lorenz, Ivo L Hofacker: **DrTransformer: heuristic\ncotranscriptional RNA folding using the nearest neighbor energy model**, \nBioinformatics, Volume 39, Issue 1, January 2023, \n[https://doi.org/10.1093/bioinformatics/btad034]\n\n[//]: References\n[ViennaRNA]: <http://www.tbi.univie.ac.at/RNA>\n[ViennaRNA license]: <https://github.com/ViennaRNA/ViennaRNA/blob/master/license.txt>\n[bioconda]: <https://bioconda.github.io>\n[DrForna]: <https://github.com/ViennaRNA/drforna>\n[Xayaphoummine et al. (2006)]: <https://doi.org/10.1093/nar/gkl1036>\n[https://doi.org/10.1093/bioinformatics/btad034]: <https://doi.org/10.1093/bioinformatics/btad034>\n[`examples/`]: <examples>\n[`ABCD.fa`]: <examples/ABCD.fa>\n[`DCBA.fa`]: <examples/DCBA.fa>\n[`DCMA.fa`]: <examples/DCMA.fa>\n[`ABCD.motifs`]: <examples/ABCD.motifs>\n[`DCBA.motifs`]: <examples/DCBA.motifs>\n[`DCMA.motifs`]: <examples/DCMA.motifs>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Heuristic cotranscriptional folding using the nearest neighbor energy model.",
    "version": "1.0",
    "split_keywords": [
        "cotranscriptional folding",
        "rna",
        "secondary structure"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "710ca14c887edf8d78d1b5685730ed5bf05a5affde48f2e03ce50973a05d0cd4",
                "md5": "7d3b09b0f130777c55ea177812bf0832",
                "sha256": "899b9a897c66eaf309ec74945744b742c3e8afab0d421a2b5e090d7255931bc1"
            },
            "downloads": -1,
            "filename": "drtransformer-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d3b09b0f130777c55ea177812bf0832",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 41444,
            "upload_time": "2023-02-08T14:52:59",
            "upload_time_iso_8601": "2023-02-08T14:52:59.727942Z",
            "url": "https://files.pythonhosted.org/packages/71/0c/a14c887edf8d78d1b5685730ed5bf05a5affde48f2e03ce50973a05d0cd4/drtransformer-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a050564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7",
                "md5": "9c684fe192c2e17b697119914424f297",
                "sha256": "75b0363255866ece1aa80577d8c086ee94d85ee2bbec29c3ef53cc332d7a4878"
            },
            "downloads": -1,
            "filename": "drtransformer-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9c684fe192c2e17b697119914424f297",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 189590,
            "upload_time": "2023-02-08T14:53:30",
            "upload_time_iso_8601": "2023-02-08T14:53:30.855378Z",
            "url": "https://files.pythonhosted.org/packages/3a/05/0564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7/drtransformer-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-08 14:53:30",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "drtransformer"
}
        
Elapsed time: 0.03677s