# DrTransformer -- heuristic cotranscriptional folding.
DrTransformer (short for "DNA-to-RNA transformer") is a program for heuristic
and deterministic cotranscriptional folding simulations of RNA molecules. The
code of this project is available under MIT license, however this software
depends on the [ViennaRNA] package which is available through the [ViennaRNA
license].
## Installation
If you already have the Python bindings of the [ViennaRNA] package installed,
then the latest stable release of DrTransformer can be installed from PyPI:
```sh
~$ pip install drtransformer
```
DrTransformer can also be installed with bioconda to resolve the [ViennaRNA]
dependency automatically. First, make sure [bioconda] is set up properly with:
```sh
~$ conda config --add channels defaults
~$ conda config --add channels bioconda
~$ conda config --add channels conda-forge
~$ conda config --set channel_priority strict
```
Second, install or update your DrTransformer installation.
```sh
~$ conda install drtransformer
```
### Testing/Contributing
To install the latest development version of DrTransformer, clone the
repository and run:
```sh
~$ pip install .[dev]
```
Use the following command to run all present unittests:
```sh
~$ python -m pytest
```
Please provide unittests if you are submitting a pull request with a new feature.
## Usage
Until further documentation is available, please use the *--help* options of the
command line executables:
```sh
~$ DrTransformer --help
~$ DrPlotter --help
```
### An example cotranscriptional folding simulation
We show simulations of three sequences designed by [Xayaphoummine et
al. (2006)]. Briefly, two sequences are composed of the same palindromic
subsequences (A, B, C, D) in forward and reverse order (`ABCD` and `DCBA`); the
third sequence (`DCMA`) has a point mutation which changes B to M. The
experiment demonstrates how the order of helix formation determines which
structures are formed at the end of transcription, an effect that cannot be
observed with a thermodynamic equilibrium prediction, because the free energies
of, for example, the helices A:B and B:A are almost the same due to their
palindromic subsequences. The three input files [`ABCD.fa`], [`DCBA.fa`] and
[`DCMA.fa`] contain a fasta header and the respective sequence from the
original publication. Those files can be found in the subfolder [`examples/`].
```sh
~$ cat ABCD.fa | DrTransformer --name ABCD --o-prune 0.01 --logfile
```
This command line call of DrTransformer produces two files:
- `ABCD.log` contains a human-readable summary of the cotranscriptional folding process.
- `ABCD.drf` contains the details of the cotranscriptional folding simulation in the
[DrForna] file format.
#### Structure-based data analysis
DrPlotter supports different types of visual analysis for the `.drf` file
format. The following command line call reads the previously generated file
`ABCD.drf` and produces a plot called `ABCD.png`.
```sh
~$ cat ABCD.drf | DrPlotter --name ABCD --format png
```
![ABCD](examples/ABCD.png)
The legend of `ABCD.png` must be interpreted in combination with the `ABCD.log`
file. **Note that the structure IDs from your newly generated files might not
match the ones shown here.** For example, to see which structures are shown at
the simulation of nucleotide 73, read the log file entries for this transcript
length:
```
73 1 .(..(((((((((((((((....)))))))))))))))..).(((((((((.......)))))))))...... -42.60 +[0.0213 -> 0.9876] ID = 24
73 2 ....(((((((((((((((....))))))))))))))).(..(((((((((....)).)))))))..)..... -39.90 -[0.9787 -> 0.0124] ID = 25
```
The logfile lists two structures (in order of their free energy), it shows
their occupancy at the start of the simulation and at the end of a simulation
in square brackets, and it provides the ID to follow a specific structure
through the transcription process (+/- indicate a change in occpancy). The IDs
are used as labels in the plot `ABCD.png`.
### Motif-based data analysis
Instead of following specific structures, it is often more helpful to visualize
when specific helical motifs are formed in the ensemble. Generally, we refer to
a helix formed from sequences A and B as A:B, etc. All potential helices
plotted here are provided in dot-bracket notation in the files [`ABCD.motifs`], [`DCBA.motifs`] and [`DCMA.motifs`].
```sh
~$ cat ABCD.drf | DrPlotter --name ABCD-motifs --molecule ABCD --format png --motiffile ABCD.motifs --motifs A:B C:D A:D B:C
~$ cat DCBA.drf | DrPlotter --name DCBA-motifs --molecule DCBA --format png --motiffile DCBA.motifs --motifs B:A D:C D:A C:B
~$ cat DCMA.drf | DrPlotter --name DCMA-motifs --molecule DCMA --format png --motiffile DCMA.motifs --motifs M:A D:C D:A C:M
```
<img src="examples/ABCD-motifs.png" alt="ABCD"/><br>
ABCD forms only structures A:B and C:D but not A:D and B:C. Also, helix C:D is
not formed "immediately", because there is a competing structure which
is cotranscriptionally favored (see ID 25 from the previous anlysis).
<img src="examples/DCBA-motifs.png" alt="DCBA"/><br>
DCBA forms structures with all motifs. The helical structures C:B and
D:A dominate with more than 90%, the helices D:C and B:A are
below 10% of the population. Eventually, D:C and B:A will be
dominant, but not on the time scale simulated here. (Can you repeat the analysis
to see how much time it needs until D:C and B:A dominate the ensemble?)
<img src="examples/DCMA-motifs.png" alt="DCMA"><br>
As shown in the publication, a single point mutation (from DCBA to DCMA) is
sufficient to drastically shift occupancy of helices: M:A and D:C
are more occupied at the end of transcription than D:A and C:M.
### Tips and tricks
- The header of the logfile contains all relevant DrTransformer parameters that generated the file.
- You can use the parameter `--plot-minh` to group similar structures (separated by energy barriers < plot-minh) together.
In contrast to the `--t-fast` parameter, this will not affect the accuracy of the model.
- Use `--pause-sites` to see the effects of pausing at specific nucleotides on cotranscriptional folding.
- Motifs for DrPlotter can also contain 'x' in the dot-bracket notation for *must be unpaired*.
## Version
v0.12 -- perparing for official release
* changed --t-lin, --t-log defaults and fixed --t-lin=1, --t-log=1
* fixed potential issues with --t-end = --t-ext
* adapted README example to publication
v0.11 -- using lonely base-pairs
* removed the --noLP default (added parameter setting)
* added profiling option for runtime optimization
* using --cg-auto default paramter
* using k0=1e5, t-ext=0.04 default parameter
* added new visulization types and fixed motif file input
* added epsilon to t-fast sanity check
v0.10 -- moved to beta status (first official release)
* changes in parameter defaults
* bugfix in linalg
* new DrPlotter simulation layout and motif plotting
* repaired code to enable plotting including pause sites
v0.9 -- standalone package (no official release)
* extraction from the [ribolands] package to a standalone Python package.
* using scipy and numpy for matrix exponentials (instead of [treekin])
* implemented lookahead to skip pruning of potentially relevant future structures
## Cite
Stefan Badelt, Ronny Lorenz, Ivo L Hofacker: **DrTransformer: heuristic
cotranscriptional RNA folding using the nearest neighbor energy model**,
Bioinformatics, Volume 39, Issue 1, January 2023,
[https://doi.org/10.1093/bioinformatics/btad034]
[//]: References
[ViennaRNA]: <http://www.tbi.univie.ac.at/RNA>
[ViennaRNA license]: <https://github.com/ViennaRNA/ViennaRNA/blob/master/license.txt>
[bioconda]: <https://bioconda.github.io>
[DrForna]: <https://github.com/ViennaRNA/drforna>
[Xayaphoummine et al. (2006)]: <https://doi.org/10.1093/nar/gkl1036>
[https://doi.org/10.1093/bioinformatics/btad034]: <https://doi.org/10.1093/bioinformatics/btad034>
[`examples/`]: <examples>
[`ABCD.fa`]: <examples/ABCD.fa>
[`DCBA.fa`]: <examples/DCBA.fa>
[`DCMA.fa`]: <examples/DCMA.fa>
[`ABCD.motifs`]: <examples/ABCD.motifs>
[`DCBA.motifs`]: <examples/DCBA.motifs>
[`DCMA.motifs`]: <examples/DCMA.motifs>
Raw data
{
"_id": null,
"home_page": null,
"name": "drtransformer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "cotranscriptional folding,RNA,secondary structure",
"author": null,
"author_email": "Stefan Badelt <bad-ants-fleet@posteo.eu>",
"download_url": "https://files.pythonhosted.org/packages/3a/05/0564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7/drtransformer-1.0.tar.gz",
"platform": null,
"description": "# DrTransformer -- heuristic cotranscriptional folding.\n\nDrTransformer (short for \"DNA-to-RNA transformer\") is a program for heuristic\nand deterministic cotranscriptional folding simulations of RNA molecules. The\ncode of this project is available under MIT license, however this software\ndepends on the [ViennaRNA] package which is available through the [ViennaRNA\nlicense].\n\n## Installation\nIf you already have the Python bindings of the [ViennaRNA] package installed,\nthen the latest stable release of DrTransformer can be installed from PyPI:\n```sh\n ~$ pip install drtransformer\n```\n\nDrTransformer can also be installed with bioconda to resolve the [ViennaRNA]\ndependency automatically. First, make sure [bioconda] is set up properly with:\n```sh\n ~$ conda config --add channels defaults\n ~$ conda config --add channels bioconda\n ~$ conda config --add channels conda-forge\n ~$ conda config --set channel_priority strict\n```\nSecond, install or update your DrTransformer installation.\n```sh\n ~$ conda install drtransformer\n```\n\n### Testing/Contributing\nTo install the latest development version of DrTransformer, clone the\nrepository and run:\n```sh\n ~$ pip install .[dev]\n```\nUse the following command to run all present unittests:\n```sh\n ~$ python -m pytest \n```\nPlease provide unittests if you are submitting a pull request with a new feature.\n\n## Usage\nUntil further documentation is available, please use the *--help* options of the \ncommand line executables:\n```sh\n ~$ DrTransformer --help\n ~$ DrPlotter --help\n```\n\n### An example cotranscriptional folding simulation\nWe show simulations of three sequences designed by [Xayaphoummine et\nal. (2006)]. Briefly, two sequences are composed of the same palindromic\nsubsequences (A, B, C, D) in forward and reverse order (`ABCD` and `DCBA`); the\nthird sequence (`DCMA`) has a point mutation which changes B to M. The\nexperiment demonstrates how the order of helix formation determines which\nstructures are formed at the end of transcription, an effect that cannot be\nobserved with a thermodynamic equilibrium prediction, because the free energies\nof, for example, the helices A:B and B:A are almost the same due to their\npalindromic subsequences. The three input files [`ABCD.fa`], [`DCBA.fa`] and\n[`DCMA.fa`] contain a fasta header and the respective sequence from the\noriginal publication. Those files can be found in the subfolder [`examples/`]. \n\n```sh\n ~$ cat ABCD.fa | DrTransformer --name ABCD --o-prune 0.01 --logfile \n```\nThis command line call of DrTransformer produces two files:\n - `ABCD.log` contains a human-readable summary of the cotranscriptional folding process. \n - `ABCD.drf` contains the details of the cotranscriptional folding simulation in the\n [DrForna] file format. \n\n#### Structure-based data analysis\nDrPlotter supports different types of visual analysis for the `.drf` file\nformat. The following command line call reads the previously generated file\n`ABCD.drf` and produces a plot called `ABCD.png`.\n```sh\n ~$ cat ABCD.drf | DrPlotter --name ABCD --format png\n```\n![ABCD](examples/ABCD.png)\n\nThe legend of `ABCD.png` must be interpreted in combination with the `ABCD.log`\nfile. **Note that the structure IDs from your newly generated files might not\nmatch the ones shown here.** For example, to see which structures are shown at\nthe simulation of nucleotide 73, read the log file entries for this transcript\nlength:\n```\n73 1 .(..(((((((((((((((....)))))))))))))))..).(((((((((.......)))))))))...... -42.60 +[0.0213 -> 0.9876] ID = 24\n73 2 ....(((((((((((((((....))))))))))))))).(..(((((((((....)).)))))))..)..... -39.90 -[0.9787 -> 0.0124] ID = 25\n```\nThe logfile lists two structures (in order of their free energy), it shows\ntheir occupancy at the start of the simulation and at the end of a simulation\nin square brackets, and it provides the ID to follow a specific structure\nthrough the transcription process (+/- indicate a change in occpancy). The IDs\nare used as labels in the plot `ABCD.png`.\n\n### Motif-based data analysis\nInstead of following specific structures, it is often more helpful to visualize\nwhen specific helical motifs are formed in the ensemble. Generally, we refer to\na helix formed from sequences A and B as A:B, etc. All potential helices \nplotted here are provided in dot-bracket notation in the files [`ABCD.motifs`], [`DCBA.motifs`] and [`DCMA.motifs`].\n```sh\n ~$ cat ABCD.drf | DrPlotter --name ABCD-motifs --molecule ABCD --format png --motiffile ABCD.motifs --motifs A:B C:D A:D B:C\n ~$ cat DCBA.drf | DrPlotter --name DCBA-motifs --molecule DCBA --format png --motiffile DCBA.motifs --motifs B:A D:C D:A C:B\n ~$ cat DCMA.drf | DrPlotter --name DCMA-motifs --molecule DCMA --format png --motiffile DCMA.motifs --motifs M:A D:C D:A C:M\n```\n<img src=\"examples/ABCD-motifs.png\" alt=\"ABCD\"/><br>\nABCD forms only structures A:B and C:D but not A:D and B:C. Also, helix C:D is\nnot formed \"immediately\", because there is a competing structure which\nis cotranscriptionally favored (see ID 25 from the previous anlysis).\n\n<img src=\"examples/DCBA-motifs.png\" alt=\"DCBA\"/><br>\nDCBA forms structures with all motifs. The helical structures C:B and\nD:A dominate with more than 90%, the helices D:C and B:A are\nbelow 10% of the population. Eventually, D:C and B:A will be\ndominant, but not on the time scale simulated here. (Can you repeat the analysis\nto see how much time it needs until D:C and B:A dominate the ensemble?)\n\n<img src=\"examples/DCMA-motifs.png\" alt=\"DCMA\"><br>\nAs shown in the publication, a single point mutation (from DCBA to DCMA) is\nsufficient to drastically shift occupancy of helices: M:A and D:C\nare more occupied at the end of transcription than D:A and C:M.\n\n### Tips and tricks\n - The header of the logfile contains all relevant DrTransformer parameters that generated the file. \n - You can use the parameter `--plot-minh` to group similar structures (separated by energy barriers < plot-minh) together. \n In contrast to the `--t-fast` parameter, this will not affect the accuracy of the model.\n - Use `--pause-sites` to see the effects of pausing at specific nucleotides on cotranscriptional folding.\n - Motifs for DrPlotter can also contain 'x' in the dot-bracket notation for *must be unpaired*.\n\n## Version\nv0.12 -- perparing for official release\n * changed --t-lin, --t-log defaults and fixed --t-lin=1, --t-log=1\n * fixed potential issues with --t-end = --t-ext\n * adapted README example to publication \n\nv0.11 -- using lonely base-pairs\n * removed the --noLP default (added parameter setting)\n * added profiling option for runtime optimization\n * using --cg-auto default paramter\n * using k0=1e5, t-ext=0.04 default parameter\n * added new visulization types and fixed motif file input\n * added epsilon to t-fast sanity check\n\nv0.10 -- moved to beta status (first official release)\n * changes in parameter defaults \n * bugfix in linalg\n * new DrPlotter simulation layout and motif plotting\n * repaired code to enable plotting including pause sites\n\nv0.9 -- standalone package (no official release)\n * extraction from the [ribolands] package to a standalone Python package.\n * using scipy and numpy for matrix exponentials (instead of [treekin])\n * implemented lookahead to skip pruning of potentially relevant future structures\n\n## Cite\nStefan Badelt, Ronny Lorenz, Ivo L Hofacker: **DrTransformer: heuristic\ncotranscriptional RNA folding using the nearest neighbor energy model**, \nBioinformatics, Volume 39, Issue 1, January 2023, \n[https://doi.org/10.1093/bioinformatics/btad034]\n\n[//]: References\n[ViennaRNA]: <http://www.tbi.univie.ac.at/RNA>\n[ViennaRNA license]: <https://github.com/ViennaRNA/ViennaRNA/blob/master/license.txt>\n[bioconda]: <https://bioconda.github.io>\n[DrForna]: <https://github.com/ViennaRNA/drforna>\n[Xayaphoummine et al. (2006)]: <https://doi.org/10.1093/nar/gkl1036>\n[https://doi.org/10.1093/bioinformatics/btad034]: <https://doi.org/10.1093/bioinformatics/btad034>\n[`examples/`]: <examples>\n[`ABCD.fa`]: <examples/ABCD.fa>\n[`DCBA.fa`]: <examples/DCBA.fa>\n[`DCMA.fa`]: <examples/DCMA.fa>\n[`ABCD.motifs`]: <examples/ABCD.motifs>\n[`DCBA.motifs`]: <examples/DCBA.motifs>\n[`DCMA.motifs`]: <examples/DCMA.motifs>\n",
"bugtrack_url": null,
"license": null,
"summary": "Heuristic cotranscriptional folding using the nearest neighbor energy model.",
"version": "1.0",
"split_keywords": [
"cotranscriptional folding",
"rna",
"secondary structure"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "710ca14c887edf8d78d1b5685730ed5bf05a5affde48f2e03ce50973a05d0cd4",
"md5": "7d3b09b0f130777c55ea177812bf0832",
"sha256": "899b9a897c66eaf309ec74945744b742c3e8afab0d421a2b5e090d7255931bc1"
},
"downloads": -1,
"filename": "drtransformer-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7d3b09b0f130777c55ea177812bf0832",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 41444,
"upload_time": "2023-02-08T14:52:59",
"upload_time_iso_8601": "2023-02-08T14:52:59.727942Z",
"url": "https://files.pythonhosted.org/packages/71/0c/a14c887edf8d78d1b5685730ed5bf05a5affde48f2e03ce50973a05d0cd4/drtransformer-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3a050564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7",
"md5": "9c684fe192c2e17b697119914424f297",
"sha256": "75b0363255866ece1aa80577d8c086ee94d85ee2bbec29c3ef53cc332d7a4878"
},
"downloads": -1,
"filename": "drtransformer-1.0.tar.gz",
"has_sig": false,
"md5_digest": "9c684fe192c2e17b697119914424f297",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 189590,
"upload_time": "2023-02-08T14:53:30",
"upload_time_iso_8601": "2023-02-08T14:53:30.855378Z",
"url": "https://files.pythonhosted.org/packages/3a/05/0564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7/drtransformer-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-08 14:53:30",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "drtransformer"
}