<!-- PROJECT LOGO -->
<br />
<div align="center">
<a href="bio2byte.be/b2btools" target="_blank" ref="noreferrer noopener">
<img src="https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n_400x400.jpg" width="80px"/>
</a>
# Constava
</div>
## Table of content
* [Table of content](#table-of-content)
* [Description](#description)
* [Installation](#installation)
* [Prerequisites](#prerequisites)
* [Installation through PyPI](#installation-through-pypi)
* [Installation from the source](#installation-from-the-source)
* [Usage](#usage)
* [Execution from the command line](#execution-from-the-command-line)
* [Extracting backbone dihedrals from a trajectory](#extracting-backbone-dihedrals-from-a-trajectory)
* [Analyzing a conformational ensemble](#analyzing-a-conformational-ensemble)
* [Generating custom conformational state models](#generating-custom-conformational-state-models)
* [Execution as a python library](#execution-as-a-python-library)
* [Extracting backbone dihedrals as a DataFrame](#extracting-backbone-dihedrals-as-a-dataframe)
* [Setting parameters and analyzing a conformational ensemble](#setting-parameters-and-analyzing-a-conformational-ensemble)
* [Generating and loading conformational state models](#generating-and-loading-conformational-state-models)
* [Constava-class parameters vs. command line arguments](#constava-class-parameters-vs-command-line-arguments)
* [License](#license)
* [Authors](#authors)
* [Acknowledgements](#acknowledgments)
* [Contact](#contact)
[<Go to top>](#constava)
# Description
Constava analyzes conformational ensembles calculating conformational state
propensities and conformational state variability. The conformational state
propensities indicate the likelihood of a residue residing in a given
conformational state, while the conformational state variability is a measure
of the residues ability to transiton between conformational states.
Each conformational state is a statistical model of based on the backbone
dihedrals (phi, psi). The default models were derived from an analysis of NMR
ensembles and chemical shifts. To analyze a conformational ensemble, the phi-
and psi-angles for each conformational state in the ensemble need to be
provided.
As input data Constava needs the backbone dihedral angles extracted from the
conformational ensemble. These dihedrals can be obtained using GROMACS'
`gmx chi` module (set `--input-format=xvg`) or using the `constava dihedrals`
submodule, which supports a wide range of MD and structure formats.
[<Go to top>](#constava)
## Installation
### Prerequisites
- Python 3.8 or higher
- pip
[<Go to top>](#constava)
### Installation through PyPI
We recommend this installation for most users.
1. Create a virtual environment (optional but recommended):
```
python3 -m venv constava
source constava/bin/activate
```
2. Install the python module:
```
pip install constava
```
3. Run tests to ensure the successful installation (optional but recommended):
```
constava test
```
If the package requires to be uninstalled, run `pip uninstall constava`.
[<Go to top>](#constava)
### Installation from the source
Alternatively, you may download and install the software from the source code
directly.
1. Clone the repository:
```sh
git clone https://bitbucket.org/bio2byte/constava/
cd constava
```
2. Create a virtual environment (optional but recommended):
```sh
python3 -m venv constava
source constava/bin/activate
```
3. Build and install the package:
```sh
# In the packages root directory do:
# Build package from source
make build
# Install locally
make install
# Test installation
make test
```
If the package requires to be uninstalled, run `make uninstall` in the terminal
from the package's root directory.
[<Go to top>](#constava)
## Usage
The software provides two modes of interaction. Shell user may use the software
from the command line, while users skilled in Python can import it as a module.
We provide a couple of usage examples in a [Colab notebook](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/constava_examples.ipynb).
[<Go to top>](#constava)
### Execution from the command line
The software is subdivided in **three submodules**:
The `constava dihedrals` submodule provides a simple way to extract backbone
dihedral angles from MD simulations or PDB ensembles. For more information
run: `constava dihedrals -h`. Alternatively, the backbone dihedrals may be
extracted with GROMACS' `gmx chi` module.
The `constava analyze` submodule analyzes the provided backbone dihedral angles
and infers the propensities for each residue to reside in a given
conformational state. For more information run: `constava analyze -h`.
The `constava fit-model` can be used to train a custom probabilistic model of
confromational states. The default models were derived from an analysis of NMR
ensembles and chemical shifts; they cover six conformational states:
* Core Helix - Exclusively alpha-helical, low backbone dynamics
* Surrounding Helix - Mostly alpha-helical, high backbone dynamics
* Core Sheet - Exclusively beta-sheet, low backbone dynamics
* Surrounding Sheet - Mostly extended conformation, high backbone dynamics
* Turn - Mostly turn, high backbone dynamics
* Other - Mostly coil, high backbone dynamics
[<Go to top>](#constava)
#### Extracting backbone dihedrals from a trajectory
To extract dihedral angles from a trajectory the `constava dihedrals` submodule
is used.
```
usage: constava dihedrals [-h] [-s <file.pdb>] [-f <file.xtc> [<file.xtc> ...]] [-o OUTPUT] [--selection SELECTION] [--precision PRECISION] [--degrees] [-O]
The `constava dihedrals` submodule is used to extract the backbone dihedrals
needed for the analysis from confromational ensembles. By default the results
are written out in radians as this is the preferred format for
`constava analyze`.
Note: For the first and last residue in a protein only one backbone dihedral
can be extracted. Thus, those residues are omitted by default.
optional arguments:
-h, --help Show this help message and exit
Input & output options:
-s <file.pdb>, --structure <file.pdb>
Structure file with atomic information: [pdb, gro, tpr]
-f <file.xtc> [<file.xtc> ...], --trajectory <file.xtc> [<file.xtc> ...]
Trajectory file with coordinates: [pdb, gro, trr, xtc, crd, nc]
-o OUTPUT, --output OUTPUT
CSV file to write dihedral information to. (default: dihedrals.csv)
Input & output options:
--selection SELECTION
Selection for the dihedral calculation. (default: 'protein')
--precision PRECISION
Defines the number of decimals written for the dihedrals. (default: 5)
--degrees If set results are written in degrees instead of radians.
-O, --overwrite If set any previously generated output will be overwritten.
```
An example:
```sh
# Obtain backbone dihedrals (overwriting any existing files)
constava dihedrals -O -s "2mkx.gro" -f "2mkx.xtc" -o "2mkx_dihedrals.csv"
```
[<Go to top>](#constava)
#### Analyzing a conformational ensemble
To analyze the backbone dihedral angles extracted from a confromational ensemble,
the `constava analyze` submodule is used.
```
usage: constava analyze [-h] [-i <file.csv> [<file.csv> ...]] [--input-format {auto,xvg,csv}] [-o <file.csv>] [--output-format {auto,csv,json,tsv}] [-m <file.pkl>] [--window <int> [<int> ...]]
[--window-series <int> [<int> ...]] [--bootstrap <int> [<int> ...]] [--bootstrap-series <int> [<int> ...]] [--bootstrap-samples <int>] [--degrees] [--precision <int>] [--seed <int>] [-v]
The `constava analyze` submodule analyzes the provided backbone dihedral angles
and infers the propensities for each residue to reside in a given
conformational state.
Each conformational state is a statistical model of based on the backbone
dihedrals (phi, psi). The default models were derived from an analysis of NMR
ensembles and chemical shifts. To analyze a conformational ensemble, the phi-
and psi-angles for each conformational state in the ensemble need to be
provided.
As input data the backbone dihedral angles extracted from the conformational
ensemble need to be provided. Those can be generated using the
`constava dihedrals` submodule (`--input-format csv`) or GROMACS'
`gmx chi` module (`--input-format xvg`).
optional arguments:
-h, --help Show this help message and exit
Input & output options:
-i <file.csv> [<file.csv> ...], --input <file.csv> [<file.csv> ...]
Input file(s) that contain the dihedral angles.
--input-format {auto,xvg,csv}
Format of the input file: {'auto', 'csv', 'xvg'}
-o <file.csv>, --output <file.csv>
The file to write the results to.
--output-format {auto,csv,json,tsv}
Format of output file: {'csv', 'json', 'tsv'}. (default: 'auto')
Conformational state model options:
-m <file.pkl>, --load-model <file.pkl>
Load a conformational state model from the given pickled
file. If not provided, the default model will be used.
Subsampling options:
--window <int> [<int> ...]
Do inference using a moving reading-frame. Each reading
frame consists of <int> consecutive samples. Multiple
values can be provided.
--window-series <int> [<int> ...]
Do inference using a moving reading-frame. Each reading
frame consists of <int> consecutive samples. Return the
results for every window rather than the average. This can
result in very large output files. Multiple values can be
provided.
--bootstrap <int> [<int> ...]
Do inference using <Int> samples obtained through
bootstrapping. Multiple values can be provided.
--bootstrap-series <int> [<int> ...]
Do inference using <Int> samples obtained through
bootstrapping. Return the results for every subsample
rather than the average. This can result in very
large output files. Multiple values can be provided.
--bootstrap-samples <int>
When bootstrapping, sample <Int> times from the input data.
(default: 500)
Miscellaneous options:
--degrees Set this flag, if dihedrals in the input files are in
degrees.
--precision <int> Sets the number of decimals in the output files.
--seed <int> Set random seed for bootstrap sampling
-v, --verbose Set verbosity level of screen output. Flag can be given
multiple times (up to 2) to gradually increase output to
debugging mode.
```
An example:
```sh
# Run constava with debug-level output
constava analyze \
-i "2mkx_dihedrals.csv" \
-o "2mkx_constava.json" --output-format json \
--window 3 5 25 \
-vv
```
[<Go to top>](#constava)
#### Generating custom conformational state models
To train a custom probabilistic model of confromational states, the `constava fit-model`
submodule is used.
```
usage: constava fit-model [-h] [-i <file.json>] -o <file.pkl> [--model-type {kde,grid}] [--kde-bandwidth <float>] [--grid-points <int>] [--degrees] [-v]
The `constava fit-model` submodule is used to generate the probabilistic
conformational state models used in the analysis. By default, when running
`constava analyze` these models are generated on-the-fly. In selected cases
generating a model beforehand and loading it can be useful, though.
We provide two model types. kde-Models are the default. They are fast to fit
but may be slow in the inference in large conformational ensembles (e.g.,
long-timescale MD simulations). The idea of grid-Models is, to replace
the continuous probability density function of the kde-Model by a fixed set
of grid-points. The PDF for any sample is then estimated by linear
interpolation between the nearest grid points. This is slightly less
accurate than the kde-Model but speeds up inference significantly.
optional arguments:
-h, --help Show this help message and exit
Input and output options:
-i <file.json>, --input <file.json>
The data to which the new conformational state models will
be fitted. It should be provided as a JSON file. The
top-most key should indicate the names of the
conformational states. On the level below, lists of phi-/
psi pairs for each stat should be provided. If not provided
the default data from the publication will be used.
-o <file.pkl>, --output <file.pkl>
Write the generated model to a pickled file, that can be
loaded gain using `constava analyze --load-model`
Conformational state model options:
--model-type {kde,grid}
The probabilistic conformational state model used. The
default is `kde`. The alternative `grid` runs significantly
faster while slightly sacrificing accuracy: {'kde', 'grid'}
(default: 'kde')
--kde-bandwidth <float>
This flag controls the bandwidth of the Gaussian kernel
density estimator. (default: 0.13)
--grid-points <int> This flag controls how many grid points are used to
describe the probability density function. Only applies if
`--model-type` is set to `grid`. (default: 10000)
Miscellaneous options:
--degrees Set this flag, if dihedrals in `model-data` are in degrees
instead of radians.
-v, --verbose Set verbosity level of screen output. Flag can be given
multiple times (up to 2) to gradually increase output to
debugging mode.
```
An example:
```sh
# Generates a faster 'grid-interpolation model' using the default dataset
constava fit-model -v \
-o default_grid.pkl \
--model-type grid \
--kde-bandwidth 0.13 \
--grid-points 6400
```
[<Go to top>](#constava)
### Execution as a python library
The module provides the `Constava` class a general interface to software's
features. The only notable exception is the extraction of dihedrals,
which is done through a separate function.
[<Go to top>](#constava)
#### Extracting backbone dihedrals as a DataFrame
```python
import pandas as pd
from constava.utils.dihedrals import calculate_dihedrals
# Calculate dihedrals as a DataFrame
dihedrals = calculate_dihedrals(structure="./2mkx.pdb", trajectory="2mkx.xtc")
# Write dihedrals out as a csv
dihedrals.to_csv("2mkx_dihedrals.csv", index=False, float_format="%.4f")
```
[<Go to top>](#constava)
#### Setting parameters and analyzing a conformational ensemble
This example code will generate an output for a protein:
```python
# Initialize Constava Python interface with parameters
import glob
from constava import Constava
# Define input and output files
PDBID = "2mkx"
input_files = glob.glob(f"./{PDBID}/ramaPhiPsi*.xvg")
output_file = f"./{PDBID}_constava.csv"
# Initialize Constava Python interface with parameters
c = Constava(
input_files = input_files,
output_file = output_file,
bootstrap = [3,5,10,25],
input_degrees = True,
verbose = 2)
# Alter parameters after initialization
c.set_param("window", [1,3,5])
# Run the calculation and write results
c.run()
```
This protein, with 48 residues and 100 frames per residue runs in about 1 minute.
The original MD ensembles from the manuscript can be found in
[https://doi.org/10.5281/zenodo.8160755](https://doi.org/10.5281/zenodo.8160755).
[<Go to top>](#constava)
#### Generating and loading conformational state models
Conformational state models are usually fitted at runtime. This is usually the
safest option to retain compatibility. For `kde` models, refitting usually
takes less than a second and is almost neglectable. However, `grid` interpolation
models take longer to generate. Thus, it makes sense to store them when
running multiple predictions on the same model.
**Note:** Conformational state model-pickles are intended for quickly rerunning
simulations. They are **not for storing or sharing your conformational state models**.
When you need to store or share a custom conformational state model, provide
the training data and and model-fitting parameters.
```python
from constava import Constava
# Fit the grid-interpolation model
c = Constava(verbose = 1)
csmodel = c.fit_csmodel(model_type = "grid",
kde_bandwidth = .13,
grid_points = 10_201)
# Write the fitted model out as a pickle
csmodel.dump_pickle("grid_model.pkl")
# Use the new model to analyze a confromational ensemble
PDBID = "2mkx"
input_files = glob.glob(f"./{PDBID}_dihedrals.csv")
output_file = f"./{PDBID}_constava.csv"
c = Constava(
input_files = input_files,
output_file = output_file,
model_load = "grid_model.pkl",
input_degrees=True,
window = [1, 5, 10, 25],
verbose = 1)
c.run()
```
[<Go to top>](#constava)
#### Constava-class parameters vs. command line arguments
In the following table, all available parameters of the Python interface (`Constava`
class) and their corresponding command line arguments are listed. The defaults for
parameters in Python and command line are the same.
| Python parameter | Command line argument | Description |
|---------------------------------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `input_files : List[str] or str` | `constava analyze --input <file> [<file> ...]` | Input file(s) that contain the dihedral angles. |
| `input_format : str` | `constava analyze --input-format <enum>` | Format of the input file: `{'auto', 'csv', 'xvg'}` |
| `output_file : str` | `constava analyze --output <file>` | The file to write the output to. |
| `output_format : str` | `constava analyze --output-format <enum>` | Format of output file: `{'auto', 'csv', 'json', 'tsv'}` |
| | | |
| `model_type : str` | `constava fit-model --model-type <enum>` | The probabilistic conformational state model used. Default is `kde`. The alternative `grid` runs significantly faster while slightly sacrificing accuracy: `{'kde', 'grid'}` |
| `model_load : str` | `constava analyze --load-model <file>` | Load a conformational state model from the given pickled file. |
| `model_data : str` | `constava fit-model --input <file>` | Fit conformational state models to data provided in the given file. |
| `model_dump : str` | `constava fit-model --output <file>` | Write the generated model to a pickled file, that can be loaded again using `model_load`. |
| | | |
| `window : List[int] or int` | `constava analyze --window <Int> [<Int> ...]` | Do inference using a moving reading-frame of <int> consecutive samples. Multiple values can be given as a list. |
| `window_series : List[int] or int` | `constava analyze --window-series <Int> [<Int> ...]` | Do inference using a moving reading-frame of <int> consecutive samples. Return the results for every window rather than the average. Multiple values can be given as a list. |
| `bootstrap : List[int] or int` | `constava analyze --bootstrap <Int> [<Int> ...]` | Do inference using <Int> samples obtained through bootstrapping. Multiple values can be given as a list. |
| `bootstrap_series : List[int] or int` | `constava analyze --bootstrap-series <Int> [<Int> ...]` | Do inference using <Int> samples obtained through bootstrapping. Return the results for every bootstrap rather than the average. Multiple values can be given as a list. |
| `bootstrap_samples : int` | `constava analyze --bootstrap-samples <Int> ` | When bootstrapping, sample <Int> times from the input data. |
| | | |
| `input_degrees : bool` | `constava analyze --degrees` | Set `True` if input files are in degrees. |
| `model_data_degrees : bool` | `constava fit-model --degrees` | Set `True` if the data given under `model_data` to is given in degrees. |
| `precision : int` | `constava analyze --precision <int> ` | Sets the number of decimals in the output files. By default, 4 decimals. |
| `kde_bandwidth : float` | `constava fit-model --kde-bandwidth <float>` | This controls the bandwidth of the Gaussian kernel density estimator. |
| `grid_points : int` | `constava analyze --grid-points <int>` | When `model_type` equals 'grid', this controls how many grid points are used to describe the probability density function. |
| `seed : int` | `constava analyze --seed <int>` | Set the random seed especially for bootstrapping. |
| `verbose : int` | `constava <...> -v [-v] ` | Set verbosity level of screen output. |
[<Go to top>](#constava)
## License
Distributed under the GNU General Public License v3 (GPLv3) License.
[<Go to top>](#constava)
## Authors
- Jose Gavalda-Garcia<sup>♠</sup>
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0001-6431-3442) -
[jose.gavalda.garcia@vub.be](mailto:jose.gavalda.garcia@vub.be)
- David Bickel<sup>♠</sup>
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-0332-8338) -
[david.bickel@vub.be](mailto:david.bickel@vub.be)
- Joel Roca-Martinez
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](
https://orcid.org/0000-0002-4313-3845) -
[joel.roca.martinez@vub.be](mailto:joel.roca.martinez@vub.be)
- Daniele Raimondi -
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-1157-1899) -
[daniele.raimondi@kuleuven.be](mailto:daniele.raimondi@kuleuven.be)
- Gabriele Orlando -
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0002-5935-5258) -
[gabriele.orlando@kuleuven.be](mailto:gabriele.orlando@kuleuven.be)
- Wim Vranken -
[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0001-7470-4324) -
[Personal page](https://researchportal.vub.be/en/persons/wim-vranken) -
[wim.vranken@vub.be](mailto:wim.vranken@vub.be)
<sup>♠</sup> Authors contributed equally to this work.
[<Go to top>](#constava)
## Acknowledgments
We thank Adrian Diaz [![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-0165-1318) for the invaluable help in the distribution of this software.
[<Go to top>](#constava)
## Contact
Wim Vranken - [wim.vranken@vub.be](mailto:wim.vranken@vub.be)
Bio2Byte website: [https://bio2byte.be/](https://bio2byte.be/)
[<Go to top>](#constava)
Raw data
{
"_id": null,
"home_page": "https://bitbucket.org/bio2byte/constava/",
"name": "constava",
"maintainer": "Jose Gavalda-Garcia, David Bickel, Adrian Diaz, Wim Vranken",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "jose.gavalda.garcia@vub.be, david.bickel@vub.be, adrian.diaz@vub.be, wim.vranken@vub.be",
"keywords": null,
"author": "Wim Vranken",
"author_email": "wim.vranken@vub.be",
"download_url": "https://files.pythonhosted.org/packages/e8/e3/794f7e6eb6171023571c8238613d416fad71f629dfa6fd753b6c6077258d/constava-1.1.0.tar.gz",
"platform": null,
"description": "<!-- PROJECT LOGO -->\n<br /> \n<div align=\"center\">\n <a href=\"bio2byte.be/b2btools\" target=\"_blank\" ref=\"noreferrer noopener\">\n <img src=\"https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n_400x400.jpg\" width=\"80px\"/>\n </a>\n\n # Constava\n</div>\n\n## Table of content\n\n* [Table of content](#table-of-content)\n* [Description](#description)\n* [Installation](#installation)\n * [Prerequisites](#prerequisites)\n * [Installation through PyPI](#installation-through-pypi)\n * [Installation from the source](#installation-from-the-source)\n* [Usage](#usage)\n * [Execution from the command line](#execution-from-the-command-line)\n * [Extracting backbone dihedrals from a trajectory](#extracting-backbone-dihedrals-from-a-trajectory)\n * [Analyzing a conformational ensemble](#analyzing-a-conformational-ensemble)\n * [Generating custom conformational state models](#generating-custom-conformational-state-models)\n * [Execution as a python library](#execution-as-a-python-library)\n * [Extracting backbone dihedrals as a DataFrame](#extracting-backbone-dihedrals-as-a-dataframe)\n * [Setting parameters and analyzing a conformational ensemble](#setting-parameters-and-analyzing-a-conformational-ensemble)\n * [Generating and loading conformational state models](#generating-and-loading-conformational-state-models)\n * [Constava-class parameters vs. command line arguments](#constava-class-parameters-vs-command-line-arguments)\n* [License](#license)\n* [Authors](#authors)\n* [Acknowledgements](#acknowledgments)\n* [Contact](#contact)\n\n[<Go to top>](#constava)\n\n# Description\n\nConstava analyzes conformational ensembles calculating conformational state \npropensities and conformational state variability. The conformational state \npropensities indicate the likelihood of a residue residing in a given \nconformational state, while the conformational state variability is a measure \nof the residues ability to transiton between conformational states.\n\nEach conformational state is a statistical model of based on the backbone \ndihedrals (phi, psi). The default models were derived from an analysis of NMR\nensembles and chemical shifts. To analyze a conformational ensemble, the phi- \nand psi-angles for each conformational state in the ensemble need to be \nprovided. \n\nAs input data Constava needs the backbone dihedral angles extracted from the \nconformational ensemble. These dihedrals can be obtained using GROMACS' \n`gmx chi` module (set `--input-format=xvg`) or using the `constava dihedrals` \nsubmodule, which supports a wide range of MD and structure formats.\n\n[<Go to top>](#constava)\n\n## Installation\n\n### Prerequisites\n- Python 3.8 or higher\n- pip\n\n[<Go to top>](#constava)\n\n### Installation through PyPI\n\nWe recommend this installation for most users.\n\n1. Create a virtual environment (optional but recommended):\n ```\n python3 -m venv constava\n source constava/bin/activate\n ```\n\n2. Install the python module:\n ```\n pip install constava\n ```\n\n3. Run tests to ensure the successful installation (optional but recommended):\n ```\n constava test\n ```\n\nIf the package requires to be uninstalled, run `pip uninstall constava`. \n\n[<Go to top>](#constava)\n\n### Installation from the source\n\nAlternatively, you may download and install the software from the source code\ndirectly.\n\n1. Clone the repository:\n ```sh\n git clone https://bitbucket.org/bio2byte/constava/\n cd constava\n ```\n\n2. Create a virtual environment (optional but recommended):\n ```sh\n python3 -m venv constava\n source constava/bin/activate\n ```\n\n3. Build and install the package:\n\n ```sh\n # In the packages root directory do:\n # Build package from source\n make build\n # Install locally\n make install\n # Test installation\n make test\n ```\n\nIf the package requires to be uninstalled, run `make uninstall` in the terminal \nfrom the package's root directory. \n\n[<Go to top>](#constava)\n\n## Usage\n\nThe software provides two modes of interaction. Shell user may use the software\nfrom the command line, while users skilled in Python can import it as a module.\nWe provide a couple of usage examples in a [Colab notebook](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/constava_examples.ipynb).\n\n[<Go to top>](#constava)\n\n### Execution from the command line\n\nThe software is subdivided in **three submodules**:\n\nThe `constava dihedrals` submodule provides a simple way to extract backbone \ndihedral angles from MD simulations or PDB ensembles. For more information\nrun: `constava dihedrals -h`. Alternatively, the backbone dihedrals may be\nextracted with GROMACS' `gmx chi` module.\n\nThe `constava analyze` submodule analyzes the provided backbone dihedral angles\nand infers the propensities for each residue to reside in a given \nconformational state. For more information run: `constava analyze -h`.\n\nThe `constava fit-model` can be used to train a custom probabilistic model of\nconfromational states. The default models were derived from an analysis of NMR\nensembles and chemical shifts; they cover six conformational states:\n\n* Core Helix - Exclusively alpha-helical, low backbone dynamics\n* Surrounding Helix - Mostly alpha-helical, high backbone dynamics\n* Core Sheet - Exclusively beta-sheet, low backbone dynamics\n* Surrounding Sheet - Mostly extended conformation, high backbone dynamics\n* Turn - Mostly turn, high backbone dynamics\n* Other - Mostly coil, high backbone dynamics\n\n[<Go to top>](#constava)\n\n#### Extracting backbone dihedrals from a trajectory\n\nTo extract dihedral angles from a trajectory the `constava dihedrals` submodule \nis used.\n\n```\nusage: constava dihedrals [-h] [-s <file.pdb>] [-f <file.xtc> [<file.xtc> ...]] [-o OUTPUT] [--selection SELECTION] [--precision PRECISION] [--degrees] [-O]\n\nThe `constava dihedrals` submodule is used to extract the backbone dihedrals\nneeded for the analysis from confromational ensembles. By default the results\nare written out in radians as this is the preferred format for\n`constava analyze`.\n\nNote: For the first and last residue in a protein only one backbone dihedral\ncan be extracted. Thus, those residues are omitted by default.\n\noptional arguments:\n -h, --help Show this help message and exit\n\nInput & output options:\n -s <file.pdb>, --structure <file.pdb>\n Structure file with atomic information: [pdb, gro, tpr]\n -f <file.xtc> [<file.xtc> ...], --trajectory <file.xtc> [<file.xtc> ...]\n Trajectory file with coordinates: [pdb, gro, trr, xtc, crd, nc]\n -o OUTPUT, --output OUTPUT\n CSV file to write dihedral information to. (default: dihedrals.csv)\n\nInput & output options:\n --selection SELECTION\n Selection for the dihedral calculation. (default: 'protein')\n --precision PRECISION\n Defines the number of decimals written for the dihedrals. (default: 5)\n --degrees If set results are written in degrees instead of radians.\n -O, --overwrite If set any previously generated output will be overwritten.\n```\n\nAn example:\n\n```sh\n# Obtain backbone dihedrals (overwriting any existing files)\nconstava dihedrals -O -s \"2mkx.gro\" -f \"2mkx.xtc\" -o \"2mkx_dihedrals.csv\"\n```\n\n[<Go to top>](#constava)\n\n#### Analyzing a conformational ensemble\n\nTo analyze the backbone dihedral angles extracted from a confromational ensemble,\nthe `constava analyze` submodule is used.\n\n```\nusage: constava analyze [-h] [-i <file.csv> [<file.csv> ...]] [--input-format {auto,xvg,csv}] [-o <file.csv>] [--output-format {auto,csv,json,tsv}] [-m <file.pkl>] [--window <int> [<int> ...]]\n [--window-series <int> [<int> ...]] [--bootstrap <int> [<int> ...]] [--bootstrap-series <int> [<int> ...]] [--bootstrap-samples <int>] [--degrees] [--precision <int>] [--seed <int>] [-v]\n\nThe `constava analyze` submodule analyzes the provided backbone dihedral angles\nand infers the propensities for each residue to reside in a given \nconformational state. \n\nEach conformational state is a statistical model of based on the backbone \ndihedrals (phi, psi). The default models were derived from an analysis of NMR\nensembles and chemical shifts. To analyze a conformational ensemble, the phi- \nand psi-angles for each conformational state in the ensemble need to be \nprovided. \n\nAs input data the backbone dihedral angles extracted from the conformational \nensemble need to be provided. Those can be generated using the \n`constava dihedrals` submodule (`--input-format csv`) or GROMACS'\n`gmx chi` module (`--input-format xvg`).\n\noptional arguments:\n -h, --help Show this help message and exit\n\nInput & output options:\n -i <file.csv> [<file.csv> ...], --input <file.csv> [<file.csv> ...]\n Input file(s) that contain the dihedral angles.\n --input-format {auto,xvg,csv}\n Format of the input file: {'auto', 'csv', 'xvg'}\n -o <file.csv>, --output <file.csv>\n The file to write the results to.\n --output-format {auto,csv,json,tsv}\n Format of output file: {'csv', 'json', 'tsv'}. (default: 'auto')\n\nConformational state model options:\n -m <file.pkl>, --load-model <file.pkl>\n Load a conformational state model from the given pickled \n file. If not provided, the default model will be used.\n\nSubsampling options:\n --window <int> [<int> ...]\n Do inference using a moving reading-frame. Each reading \n frame consists of <int> consecutive samples. Multiple \n values can be provided.\n --window-series <int> [<int> ...]\n Do inference using a moving reading-frame. Each reading \n frame consists of <int> consecutive samples. Return the \n results for every window rather than the average. This can\n result in very large output files. Multiple values can be \n provided.\n --bootstrap <int> [<int> ...]\n Do inference using <Int> samples obtained through \n bootstrapping. Multiple values can be provided.\n --bootstrap-series <int> [<int> ...]\n Do inference using <Int> samples obtained through \n bootstrapping. Return the results for every subsample\n rather than the average. This can result in very \n large output files. Multiple values can be provided.\n --bootstrap-samples <int>\n When bootstrapping, sample <Int> times from the input data.\n (default: 500)\n\nMiscellaneous options:\n --degrees Set this flag, if dihedrals in the input files are in \n degrees.\n --precision <int> Sets the number of decimals in the output files.\n --seed <int> Set random seed for bootstrap sampling\n -v, --verbose Set verbosity level of screen output. Flag can be given \n multiple times (up to 2) to gradually increase output to \n debugging mode.\n```\n\nAn example:\n\n```sh\n# Run constava with debug-level output\nconstava analyze \\\n -i \"2mkx_dihedrals.csv\" \\\n -o \"2mkx_constava.json\" --output-format json \\\n --window 3 5 25 \\\n -vv\n```\n\n[<Go to top>](#constava)\n\n#### Generating custom conformational state models\n\nTo train a custom probabilistic model of confromational states, the `constava fit-model` \nsubmodule is used. \n\n```\nusage: constava fit-model [-h] [-i <file.json>] -o <file.pkl> [--model-type {kde,grid}] [--kde-bandwidth <float>] [--grid-points <int>] [--degrees] [-v]\n\nThe `constava fit-model` submodule is used to generate the probabilistic\nconformational state models used in the analysis. By default, when running\n`constava analyze` these models are generated on-the-fly. In selected cases \ngenerating a model beforehand and loading it can be useful, though.\n\nWe provide two model types. kde-Models are the default. They are fast to fit\nbut may be slow in the inference in large conformational ensembles (e.g., \nlong-timescale MD simulations). The idea of grid-Models is, to replace\nthe continuous probability density function of the kde-Model by a fixed set\nof grid-points. The PDF for any sample is then estimated by linear \ninterpolation between the nearest grid points. This is slightly less\naccurate than the kde-Model but speeds up inference significantly.\n\noptional arguments:\n -h, --help Show this help message and exit\n\nInput and output options:\n -i <file.json>, --input <file.json>\n The data to which the new conformational state models will\n be fitted. It should be provided as a JSON file. The \n top-most key should indicate the names of the \n conformational states. On the level below, lists of phi-/\n psi pairs for each stat should be provided. If not provided \n the default data from the publication will be used.\n -o <file.pkl>, --output <file.pkl>\n Write the generated model to a pickled file, that can be\n loaded gain using `constava analyze --load-model`\n\nConformational state model options:\n --model-type {kde,grid}\n The probabilistic conformational state model used. The \n default is `kde`. The alternative `grid` runs significantly\n faster while slightly sacrificing accuracy: {'kde', 'grid'}\n (default: 'kde')\n --kde-bandwidth <float>\n This flag controls the bandwidth of the Gaussian kernel \n density estimator. (default: 0.13)\n --grid-points <int> This flag controls how many grid points are used to \n describe the probability density function. Only applies if\n `--model-type` is set to `grid`. (default: 10000)\n\nMiscellaneous options:\n --degrees Set this flag, if dihedrals in `model-data` are in degrees \n instead of radians.\n -v, --verbose Set verbosity level of screen output. Flag can be given \n multiple times (up to 2) to gradually increase output to \n debugging mode.\n```\n\nAn example:\n\n```sh\n# Generates a faster 'grid-interpolation model' using the default dataset\nconstava fit-model -v \\\n -o default_grid.pkl \\\n --model-type grid \\\n --kde-bandwidth 0.13 \\\n --grid-points 6400\n```\n\n[<Go to top>](#constava)\n\n### Execution as a python library\n\nThe module provides the `Constava` class a general interface to software's \nfeatures. The only notable exception is the extraction of dihedrals,\nwhich is done through a separate function.\n\n[<Go to top>](#constava)\n\n#### Extracting backbone dihedrals as a DataFrame\n\n```python\nimport pandas as pd\nfrom constava.utils.dihedrals import calculate_dihedrals\n\n# Calculate dihedrals as a DataFrame\ndihedrals = calculate_dihedrals(structure=\"./2mkx.pdb\", trajectory=\"2mkx.xtc\")\n\n# Write dihedrals out as a csv\ndihedrals.to_csv(\"2mkx_dihedrals.csv\", index=False, float_format=\"%.4f\")\n```\n\n[<Go to top>](#constava)\n\n#### Setting parameters and analyzing a conformational ensemble\n\nThis example code will generate an output for a protein:\n\n```python\n# Initialize Constava Python interface with parameters\nimport glob\nfrom constava import Constava\n\n# Define input and output files\nPDBID = \"2mkx\"\ninput_files = glob.glob(f\"./{PDBID}/ramaPhiPsi*.xvg\")\noutput_file = f\"./{PDBID}_constava.csv\"\n\n# Initialize Constava Python interface with parameters\nc = Constava(\n input_files = input_files,\n output_file = output_file,\n bootstrap = [3,5,10,25],\n input_degrees = True,\n verbose = 2)\n\n# Alter parameters after initialization\nc.set_param(\"window\", [1,3,5])\n\n# Run the calculation and write results\nc.run()\n```\nThis protein, with 48 residues and 100 frames per residue runs in about 1 minute.\n\nThe original MD ensembles from the manuscript can be found in \n[https://doi.org/10.5281/zenodo.8160755](https://doi.org/10.5281/zenodo.8160755).\n\n[<Go to top>](#constava)\n\n#### Generating and loading conformational state models\n\nConformational state models are usually fitted at runtime. This is usually the\nsafest option to retain compatibility. For `kde` models, refitting usually\ntakes less than a second and is almost neglectable. However, `grid` interpolation\nmodels take longer to generate. Thus, it makes sense to store them when \nrunning multiple predictions on the same model.\n\n**Note:** Conformational state model-pickles are intended for quickly rerunning \nsimulations. They are **not for storing or sharing your conformational state models**. \nWhen you need to store or share a custom conformational state model, provide \nthe training data and and model-fitting parameters.\n\n```python\nfrom constava import Constava\n\n# Fit the grid-interpolation model\nc = Constava(verbose = 1)\ncsmodel = c.fit_csmodel(model_type = \"grid\",\n kde_bandwidth = .13,\n grid_points = 10_201)\n\n# Write the fitted model out as a pickle\ncsmodel.dump_pickle(\"grid_model.pkl\")\n\n# Use the new model to analyze a confromational ensemble\nPDBID = \"2mkx\"\ninput_files = glob.glob(f\"./{PDBID}_dihedrals.csv\")\noutput_file = f\"./{PDBID}_constava.csv\"\nc = Constava(\n input_files = input_files,\n output_file = output_file,\n model_load = \"grid_model.pkl\",\n input_degrees=True,\n window = [1, 5, 10, 25],\n verbose = 1)\nc.run()\n```\n\n[<Go to top>](#constava)\n\n#### Constava-class parameters vs. command line arguments\n\nIn the following table, all available parameters of the Python interface (`Constava` \nclass) and their corresponding command line arguments are listed. The defaults for \nparameters in Python and command line are the same.\n\n| Python parameter | Command line argument | Description |\n|---------------------------------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `input_files : List[str] or str` | `constava analyze --input <file> [<file> ...]` | Input file(s) that contain the dihedral angles. |\n| `input_format : str` | `constava analyze --input-format <enum>` | Format of the input file: `{'auto', 'csv', 'xvg'}` |\n| `output_file : str` | `constava analyze --output <file>` | The file to write the output to. |\n| `output_format : str` | `constava analyze --output-format <enum>` | Format of output file: `{'auto', 'csv', 'json', 'tsv'}` |\n| | | |\n| `model_type : str` | `constava fit-model --model-type <enum>` | The probabilistic conformational state model used. Default is `kde`. The alternative `grid` runs significantly faster while slightly sacrificing accuracy: `{'kde', 'grid'}` |\n| `model_load : str` | `constava analyze --load-model <file>` | Load a conformational state model from the given pickled file. |\n| `model_data : str` | `constava fit-model --input <file>` | Fit conformational state models to data provided in the given file. |\n| `model_dump : str` | `constava fit-model --output <file>` | Write the generated model to a pickled file, that can be loaded again using `model_load`. |\n| | | |\n| `window : List[int] or int` | `constava analyze --window <Int> [<Int> ...]` | Do inference using a moving reading-frame of <int> consecutive samples. Multiple values can be given as a list. |\n| `window_series : List[int] or int` | `constava analyze --window-series <Int> [<Int> ...]` | Do inference using a moving reading-frame of <int> consecutive samples. Return the results for every window rather than the average. Multiple values can be given as a list. |\n| `bootstrap : List[int] or int` | `constava analyze --bootstrap <Int> [<Int> ...]` | Do inference using <Int> samples obtained through bootstrapping. Multiple values can be given as a list. |\n| `bootstrap_series : List[int] or int` | `constava analyze --bootstrap-series <Int> [<Int> ...]` | Do inference using <Int> samples obtained through bootstrapping. Return the results for every bootstrap rather than the average. Multiple values can be given as a list. |\n| `bootstrap_samples : int` | `constava analyze --bootstrap-samples <Int> ` | When bootstrapping, sample <Int> times from the input data. |\n| | | |\n| `input_degrees : bool` | `constava analyze --degrees` | Set `True` if input files are in degrees. |\n| `model_data_degrees : bool` | `constava fit-model --degrees` | Set `True` if the data given under `model_data` to is given in degrees. |\n| `precision : int` | `constava analyze --precision <int> ` | Sets the number of decimals in the output files. By default, 4 decimals. |\n| `kde_bandwidth : float` | `constava fit-model --kde-bandwidth <float>` | This controls the bandwidth of the Gaussian kernel density estimator. |\n| `grid_points : int` | `constava analyze --grid-points <int>` | When `model_type` equals 'grid', this controls how many grid points are used to describe the probability density function. |\n| `seed : int` | `constava analyze --seed <int>` | Set the random seed especially for bootstrapping. |\n| `verbose : int` | `constava <...> -v [-v] ` | Set verbosity level of screen output. |\n\n[<Go to top>](#constava)\n\n## License\n\nDistributed under the GNU General Public License v3 (GPLv3) License.\n\n[<Go to top>](#constava)\n\n## Authors\n\n- Jose Gavalda-Garcia<sup>♠</sup> \n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0001-6431-3442) - \n[jose.gavalda.garcia@vub.be](mailto:jose.gavalda.garcia@vub.be)\n\n- David Bickel<sup>♠</sup> \n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-0332-8338) - \n[david.bickel@vub.be](mailto:david.bickel@vub.be)\n\n- Joel Roca-Martinez \n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](\nhttps://orcid.org/0000-0002-4313-3845) - \n[joel.roca.martinez@vub.be](mailto:joel.roca.martinez@vub.be)\n\n- Daniele Raimondi -\n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-1157-1899) - \n[daniele.raimondi@kuleuven.be](mailto:daniele.raimondi@kuleuven.be)\n\n- Gabriele Orlando -\n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0002-5935-5258) - \n[gabriele.orlando@kuleuven.be](mailto:gabriele.orlando@kuleuven.be)\n\n- Wim Vranken - \n[![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0001-7470-4324) - \n[Personal page](https://researchportal.vub.be/en/persons/wim-vranken) - \n[wim.vranken@vub.be](mailto:wim.vranken@vub.be)\n\n<sup>♠</sup> Authors contributed equally to this work.\n\n[<Go to top>](#constava)\n\n## Acknowledgments\n\nWe thank Adrian Diaz [![ORCID](https://orcid.org/sites/default/files/images/orcid_16x16.png)](https://orcid.org/0000-0003-0165-1318) for the invaluable help in the distribution of this software. \n\n[<Go to top>](#constava)\n\n## Contact\n\nWim Vranken - [wim.vranken@vub.be](mailto:wim.vranken@vub.be)\n\nBio2Byte website: [https://bio2byte.be/](https://bio2byte.be/)\n\n[<Go to top>](#constava)\n",
"bugtrack_url": null,
"license": "OSI Approved :: GNU General Public License v3 (GPLv3)",
"summary": "This software is used to calculate conformational states probability & conformational state variability from a protein structure ensemble.",
"version": "1.1.0",
"project_urls": {
"Homepage": "https://bitbucket.org/bio2byte/constava/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b744a45a08b1d38b1d526983219de21c06fff69c6c06f11de983f607785512c1",
"md5": "d95c78bba3a52a3a1d2a950c3e2044da",
"sha256": "4c8fd0fb68ef59dea83756ee81df86a1986cb13a4f3a801f7f8b43edd46e5104"
},
"downloads": -1,
"filename": "constava-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d95c78bba3a52a3a1d2a950c3e2044da",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 6064686,
"upload_time": "2024-07-09T12:15:09",
"upload_time_iso_8601": "2024-07-09T12:15:09.143582Z",
"url": "https://files.pythonhosted.org/packages/b7/44/a45a08b1d38b1d526983219de21c06fff69c6c06f11de983f607785512c1/constava-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e8e3794f7e6eb6171023571c8238613d416fad71f629dfa6fd753b6c6077258d",
"md5": "65118b78f6ed274d0fe3efab68e4e5ad",
"sha256": "6ac0a19544aa7bb800cb2a5b7b85499d5473fe132ded49cefddca032065cb2a9"
},
"downloads": -1,
"filename": "constava-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "65118b78f6ed274d0fe3efab68e4e5ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6007365,
"upload_time": "2024-07-09T12:15:15",
"upload_time_iso_8601": "2024-07-09T12:15:15.207834Z",
"url": "https://files.pythonhosted.org/packages/e8/e3/794f7e6eb6171023571c8238613d416fad71f629dfa6fd753b6c6077258d/constava-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-09 12:15:15",
"github": false,
"gitlab": false,
"bitbucket": true,
"codeberg": false,
"bitbucket_user": "bio2byte",
"bitbucket_project": "constava",
"lcname": "constava"
}