<h1 align="center">
<a href="bio2byte.be/b2btools" target="_blank" ref="noreferrer noopener">
<img src="https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n_400x400.jpg" width="224px"/>
</a>
<br/>
Bio2Byte Tools
</h1>
<p align="center">This package provides you structural predictions for protein sequences made by Bio2Byte group.</p>
<p align="center">
<a href="https://anaconda.org/Bio2Byte/b2bTools"> <img src="https://anaconda.org/Bio2Byte/b2bTools/badges/version.svg" /></a>
<a href="https://anaconda.org/Bio2Byte/b2bTools"> <img src="https://anaconda.org/Bio2Byte/b2bTools/badges/latest_release_relative_date.svg" /></a>
<a href="https://anaconda.org/Bio2Byte/b2bTools"> <img src="https://anaconda.org/Bio2Byte/b2bTools/badges/platforms.svg" /></a>
<a href="https://anaconda.org/Bio2Byte/b2bTools"> <img src="https://anaconda.org/Bio2Byte/b2bTools/badges/downloads.svg" /></a>
</p>
## π§ͺ About this Python package
This package, called `b2bTools`, offers biophysical feature predictors for protein sequences as well as different file parses and other utilities to help you with your protein data analysis.
If your input data consists on one or more sequences not aligned, we provide you with the Single Sequence mode. On the other hand, if your input is a Multiple Sequence Alignment (MSA), we provide the MSA mode. For NMR data, we have the predictor ShiftCrypt.
About the available predictors:
| Predictor | Usage | Bio.Tools | Online predictor |
| --------- | --------- | ---- | ---- |
| DynaMine | Fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle. | [Link](https://bio.tools/Dynamine) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/dynamine/)|
| DisoMine | Predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. | [Link](https://bio.tools/Disomine) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/disomine/)|
| EfoldMine | Predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. | [Link](https://bio.tools/b2btools) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/efoldmine) |
| AgMata | Single-sequence based predictor of protein regions that are likely to cause beta-aggregation. | [Link](https://bio.tools/agmata) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/agmata/) |
| PSPer | PSP (Phase Separating Protein) predicts whether a protein is likely to phase-separate with a particular mechanism involving RNA interacts (FUS-like proteins). It will highlight the regions in your protein that are involved mechanistically, and provide an overall score. | [Link](https://bio.tools/PSPer) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/psp/) |
| ShiftCrypt | Auto-encoding NMR chemical shifts from their native vector space to a residue-level biophysical index | [Link](https://bio.tools/ShiftCrypt) | [Start predicting online βΆοΈ](https://bio2byte.be/b2btools/shiftcrypt/) |
**π Related link:**
- These tools are described on the Bio2Byte website inside the [Tools section](https://bio2byte.be/tool/).
- [Galaxy](https://usegalaxy.org) is an open source, web-based platform for data intensive biomedical research. There is an available version of the Single Sequence predictors on Galaxy Europe. Start predicting online using Galaxy from [this link](https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/b2btools_single_sequence/b2btools_single_sequence/3.0.5+galaxy0).
## Usage and examples
To install the latest version of this package:
```console
$ pip install b2bTools
```
**β οΈ Important notes:** [Hmmer](http://hmmer.org) and [T-Coffee](https://tcoffee.crg.eu) are required to run several features. Please install them following their official guidelines.
### Single Sequence predictions
Use this example as an entry point when you have a FASTA file containing one or more sequences. There is a live demo available on [Google Colab](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2ByteTools_v3_singleseq_pypi.ipynb).
#### Predicting biophysical features in JSON format
```python
import json
from b2bTools import SingleSeq, constants
input_fasta = "/path/to/example.fasta"
single_seq = SingleSeq(input_fasta)
# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'
single_seq.predict(tools=constants.PREDICTOR_NAMES)
predictions = single_seq.get_all_predictions()
for sequence_id, prediction_values in predictions.items():
with open(f"{input_fasta}_{sequence_id}.json", "w") as fp:
json.dump(prediction_values, fp, indent=4)
```
#### Predicting biophysical features in tabular formats (CSV, TSV)
```python
import json
from b2bTools import SingleSeq, constants
input_fasta = "/path/to/example.fasta"
single_seq = SingleSeq(input_fasta)
# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'
single_seq.predict(tools=constants.PREDICTOR_NAMES)
single_seq.get_all_predictions_tabular(f"{input_fasta}_residues.csv", sep=",")
single_seq.get_all_predictions_tabular(f"{input_fasta}_residues.tsv", sep="\t")
```
The output contains these columns:
| Predictor | Value | Data type |
| --------- | ----- | --------- |
| None | `sequence_id` | `String` |
| None | `residue` | `Char` |
| None | `residue_index` | `Integer` |
| PSPer | `RRM` | `Float` |
| AgMata | `agmata` | `Float` |
| PSPer | `arg` | `Float` |
| DynaMine | `backbone` | `Float` |
| DynaMine | `coil` | `Float` |
| PSPer | `complexity` | `Float` |
| DisoMine | `disoMine` | `Float` |
| PSPer | `disorder` | `Float` |
| EFoldMine | `earlyFolding` | `Float` |
| DynaMine | `helix` | `Float` |
| DynaMine | `ppII` | `Float` |
| DynaMine | `sheet` | `Float` |
| DynaMine | `sidechain` | `Float` |
| PSPer | `tyr` | `Float` |
#### Plotting biophysical features
In case you need to plot the prediction values:
```python
import json
from b2bTools import SingleSeq, constants
from matplotlib import pyplot as plt
input_fasta = "/path/to/example.fasta"
single_seq = SingleSeq(input_fasta)
sequence_id = "SEQ001"
# Predict features using 'dynamine'
single_seq.predict(tools=constants.TOOL_DYNAMINE)
predictions = single_seq.get_all_predictions()
backbone_pred = predictions[sequence_id]['backbone']
sidechain_pred = predictions[sequence_id]['sidechain']
plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(backbone_pred)), sidechain_pred, label = "Sidechain")
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### Extracting metadata
For extracting metadata from the prediction values in tabular format when analyzing Single Sequence input:
```python
from b2bTools import SingleSeq, constants
input_fasta = "/path/to/example.fasta"
single_seq = SingleSeq(input_fasta)
# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'
single_seq.predict(tools=constants.PREDICTOR_NAMES)
predictions = single_seq.get_all_predictions()
single_seq.get_metadata(f"{input_fasta}.csv")
```
The output will include these columns:
- sequence_id
- length
- Execution times:
- dynamine_execution_time
- disomine_execution_time
- efoldmine_execution_time
- agmata_execution_time
- psper_execution_time
- DynaMine:
- Backbone:
- backbone_mean
- backbone_median
- backbone_variance
- backbone_stdev
- backbone_pvariance
- backbone_pstdev
- backbone_min
- backbone_max
- Side chain:
- sidechain_mean
- sidechain_median
- sidechain_variance
- sidechain_stdev
- sidechain_pvariance
- sidechain_pstdev
- sidechain_min
- sidechain_max
- ppII:
- ppII_mean
- ppII_median
- ppII_variance
- ppII_stdev
- ppII_pvariance
- ppII_pstdev
- ppII_min
- ppII_max
- Coil:
- coil_mean
- coil_median
- coil_variance
- coil_stdev
- coil_pvariance
- coil_pstdev
- coil_min
- coil_max
- Sheet:
- sheet_mean
- sheet_median
- sheet_variance
- sheet_stdev
- sheet_pvariance
- sheet_pstdev
- sheet_min
- sheet_max
- Helix:
- helix_mean
- helix_median
- helix_variance
- helix_stdev
- helix_pvariance
- helix_pstdev
- helix_min
- helix_max
- EFoldMine:
- earlyFolding_mean
- earlyFolding_median
- earlyFolding_variance
- earlyFolding_stdev
- earlyFolding_pvariance
- earlyFolding_pstdev
- earlyFolding_min
- earlyFolding_max
- AgMata:
- agmata_mean
- agmata_median
- agmata_variance
- agmata_stdev
- agmata_pvariance
- agmata_pstdev
- agmata_min
- agmata_max
- DisoMine:
- disoMine_mean
- disoMine_median
- disoMine_variance
- disoMine_stdev
- disoMine_pvariance
- disoMine_pstdev
- disoMine_min
- disoMine_max
- PSPer:
- Protein Score:
- protein_score
- Complexity:
- complexity_mean
- complexity_median
- complexity_variance
- complexity_stdev
- complexity_pvariance
- complexity_pstdev
- complexity_min
- complexity_max
- ARG:
- arg_mean
- arg_median
- arg_variance
- arg_stdev
- arg_pvariance
- arg_pstdev
- arg_min
- arg_max
- TYR:
- tyr_mean
- tyr_median
- tyr_variance
- tyr_stdev
- tyr_pvariance
- tyr_pstdev
- tyr_min
- tyr_max
- RRM:
- RRM_mean
- RRM_median
- RRM_variance
- RRM_stdev
- RRM_pvariance
- RRM_pstdev
- RRM_min
- RRM_max
- Disorder:
- disorder_mean
- disorder_median
- disorder_variance
- disorder_stdev
- disorder_pvariance
- disorder_pstdev
- disorder_min
- disorder_max
### Multiple Sequences Alignment predictions
Use the following example as an entry point when you have a MSA file input. There is a live demo available on Google Colab: [link](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2ByteTools_v3_multipleseq_pypi.ipynb)
#### Predicting biophysical features in JSON format
```python
import json
from b2bTools import MultipleSeq, constants
input_msa = "/path/to/example.afa"
multiple_seq = MultipleSeq()
multiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)
predictions = multiple_seq.get_all_predictions_msa()
for sequence_id, prediction_values in predictions.items():
with open(f"{input_msa}_{sequence_id}.json", "w") as fp:
json.dump(prediction_values, fp, indent=4)
distributions_dict = multiple_seq.get_all_predictions_msa_distrib()
distributions = distributions_dict['results']
with open(f"{input_msa}_distributions.json", "w") as fp:
json.dump(distributions, fp, indent=4)
```
#### Predicting biophysical features in tabular formats (CSV, TSV)
```python
from b2bTools import MultipleSeq, constants
input_msa = "/path/to/example.afa"
multiple_seq = MultipleSeq()
multiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)
multiple_seq.get_all_predictions_tabular(f"{input_msa}_residues.csv", sep=",")
multiple_seq.get_all_predictions_tabular(f"{input_msa}_residues.tsv", sep="\t")
```
The output contains these columns:
| Predictor | Value | Data type |
| ----------| ----- | --------- |
| None | sequence_id | String |
| None | residue | Char |
| None | residue_index | Integer |
| PSPer | RRM | Float |
| AgMata | agmata | Float |
| PSPer | arg | Float |
| DynaMine | backbone | Float |
| DynaMine | coil | Float |
| PSPer | complexity | Float |
| DisoMine | disoMine | Float |
| PSPer | disorder | Float |
| EFoldMine | earlyFolding | Float |
| DynaMine | helix | Float |
| DynaMine | ppII | Float |
| DynaMine | sheet | Float |
| DynaMine | sidechain | Float |
| PSPer | tyr | Float |
#### Extracting metadata
For extracting metadata from the prediction values in tabular format when analyzing MSA input:
```python
from b2bTools import MultipleSeq, constants
input_msa = "/path/to/example.afa"
multiple_seq = MultipleSeq()
multiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)
multiple_seq.get_metadata(f"{input_msa}.csv")
```
## π» Installation
From the official documentation:
> [`pip`](https://pypi.org/) is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.
To install the b2bTools package in your local environment:
```console
$ pip install b2bTools
```
**π‘ Notes:** If you are using [Jupyter Notebook](https://jupyter.org) or [Google Colab](https://colab.research.google.com), install the package directly from `pip` inside a _code block_ cell:
```python
!pip install b2bTools
```
## π¦ Package content
### π General Tools
Besides the prediction tools, this package includes general bioinformatics tools useful to manipulate files.
#### π Single Sequences files (FASTA format)
The class `FastaIO` provides the following static methods:
- `read_fasta_from_file`
- `read_fasta_from_string`
- `write_fasta`
Usage:
```python
from b2bTools.general.parsers.fasta import FastaIO
```
#### π Multiple Sequences Alignments files
The class `AlignmentsIO` provides the following static methods:
- `read_alignments`
- `read_alignments_fasta`
- `read_alignments_A3M`
- `read_alignments_blast`
- `read_alignments_balibase`
- `read_alignments_clustal`
- `read_alignments_psi`
- `read_alignments_phylip`
- `read_alignments_stockholm`
- `write_fasta_from_alignment`
- `write_fasta_from_seq_alignment_dict`
- `json_preds_to_csv_singleseq`
- `json_preds_to_csv_msa`
Usage:
```python
from b2bTools.general.parsers.alignments import AlignmentsIO
```
#### π NEF files
The class `NefIO` provides the following static methods:
- `read_nef_file`
- `read_nef_file_sequence_shifts`
Usage:
```python
from b2bTools.general.parsers.alignments import AlignmentsIO
```
#### π NMR-STAR files
The class `NMRStarIO` provides the following static methods:
- `read_nmr_star_project`
- `read_nmr_star_sequence_shifts`
Usage:
```python
from b2bTools.general.parsers.nmr_star import NMRStarIO
```
### π Biophysical features predictors
Given a predictor might be built on top of other, it is usual to get more output predictions than the expected:
| Predictor | Depends on |
| --------- | --------------------- |
| Dynamine | - |
| EfoldMine | Dynamine |
| Disomine | EfoldMine, Dynamine |
| AgMata | EfoldMine, Dynamine |
These are all the available options to use inside the tools array parameter:
| Predictor | constant value | literal value |
| --------- | ----------------------------| ------------- |
| Dynamine | `constants.TOOL_DYNAMINE` | `"dynamine"` |
| EfoldMine | `constants.TOOL_EFOLDMINE` | `"efoldmine"` |
| Disomine | `constants.TOOL_DISOMINE` | `"disomine"` |
| AgMata | `constants.TOOL_AGMATA` | `"agmata"` |
| PSPer | `constants.TOOL_PSP` | `"psper"` |
The next table shows all the available predictor values by predictor:
| Predictor | Output key | Output values (type) |
| --------- | ---------------- | -------------------- |
| Dynamine | `"backbone"` | `[Float]` |
| Dynamine | `"sidechain"` | `[Float]` |
| Dynamine | `"helix"` | `[Float]` |
| Dynamine | `"ppII"` | `[Float]` |
| Dynamine | `"coil"` | `[Float]` |
| Dynamine | `"sheet"` | `[Float]` |
| EfoldMine | `"earlyFolding"` | `[Float]` |
| Disomine | `"disoMine"` | `[Float]` |
| AgMata | `"agmata"` | `[Float]` |
| PSPer | `"viterbi"` | `[Float]` |
| PSPer | `"complexity"` | `[Float]` |
| PSPer | `"tyr"` | `[Float]` |
| PSPer | `"arg"` | `[Float]` |
| PSPer | `"RRM"` | `[Float]` |
| PSPer | `"disorder"` | `[Float]` |
For MSA input files, the distribution dictionary and/or JSON will include:
```python
multiple_seq.get_all_predictions_msa_distrib()['results']
```
| Predictor | Output key | Output values (type) |
| --------- | ---------------- | -------------------- |
| Dynamine | `"backbone"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Dynamine | `"sidechain"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Dynamine | `"helix"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Dynamine | `"ppII"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Dynamine | `"coil"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Dynamine | `"sheet"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| EfoldMine | `"earlyFolding"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| Disomine | `"disoMine"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| AgMata | `"agmata"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"viterbi"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"complexity"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"tyr"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"arg"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"RRM"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
| PSPer | `"disorder"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |
The method `get_all_predictions` will return a dictionary with the following structure:
```python
{
"SEQUENCE_ID_000": {
"seq": "the input sequence 0",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
"SEQUENCE_ID_001": {
"seq": "the input sequence 1",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
"...": { ... },
"SEQUENCE_ID_N": {
"seq": "the input sequence N",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
}
```
You are ready to use the sequence and predictions to work with them. Here is an example of plotting the data.
```python
backbone_pred = predictions['SEQ001']['backbone']
sidechain_pred = predictions['SEQ001']['sidechain']
plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(sidechain_pred)), sidechain_pred, label = "Sidechain")
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### Running as Python module (no Python code involved)
You are able to use this package directly from your console session with no Python code involved. Further details available on [the official Python documentation site](https://docs.python.org/3/tutorial/modules.html#executing-modules-as-scripts)
```console
usage: b2bTools [-h] [-v] -i INPUT_FILE -o OUTPUT_JSON_FILE
[-t OUTPUT_TABULAR_FILE] [-m METADATA_FILE]
[-dj DISTRIBUTION_JSON_FILE] [-dt DISTRIBUTION_TABULAR_FILE]
[-s {comma,tab}] [--short_ids] [--mode {single_seq,msa}]
[--dynamine] [--disomine] [--efoldmine] [--agmata] [--psper]
Bio2Byte Tool - Command Line Interface
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-i INPUT_FILE, --input_file INPUT_FILE
File to process
-o OUTPUT_JSON_FILE, --output_json_file OUTPUT_JSON_FILE
Path to JSON output file
-t OUTPUT_TABULAR_FILE, --output_tabular_file OUTPUT_TABULAR_FILE
Path to tabular output file
-m METADATA_FILE, --metadata_file METADATA_FILE
Path to tabular metadata file
-dj DISTRIBUTION_JSON_FILE, --distribution_json_file DISTRIBUTION_JSON_FILE
Path to distribution output JSON file
-dt DISTRIBUTION_TABULAR_FILE, --distribution_tabular_file DISTRIBUTION_TABULAR_FILE
Path to distribution output JSON file
-s {comma,tab}, --sep {comma,tab}
Tabular separator
--short_ids Trim sequence ids (up to 20 chars per seq)
--mode {single_seq,msa}
Execution mode: Single Sequence or MSA Analysis
--dynamine Run DynaMine predictor
--disomine Run DisoMine predictor
--efoldmine Run EFoldMine predictor
--agmata Run AgMata predictor
--psper Run PSPer predictor
```
##### To display the help section
```console
b2bTools --help
```
##### To visualize the version
```console
b2bTools --version
```
##### Example for Single Sequence
Please run this command to get all the predictions from your input FASTA file:
```console
b2bTools \
--input_file /path/to/input/example_toy.fasta \
--output_json_file /path/to/output/example_toy.json \
--output_tabular_file /path/to/output/example_toy.csv \
--metadata_file /path/to/output/example_toy.meta.csv
```
Expected output:
```console
2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Arguments parsed with success
2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/example_toy.fasta
2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']
2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Predicting sequence(s)
...
2023-07-04 16:04:23,986 [b2bTools v3.0.6 INFO] Saving results in JSON format in: /path/to/output/example_toy.json
2023-07-04 16:04:24,006 [b2bTools v3.0.6 INFO] Saving results in tabular format in: /path/to/output/example_toy.csv
2023-07-04 16:04:24,040 [b2bTools v3.0.6 INFO] Saving metadata in tabular format in: /path/to/output/example_toy.meta.csv
2023-07-04 16:04:24,279 [b2bTools v3.0.6 INFO] Execution finished with success
```
Otherwise, if you need to extract only one sequence from the input file:
```console
b2bTools \
--sequence_id Q647G9 \
--input_file /path/to/input/example_toy.fasta \
--output_json_file /path/to/output/example_toy.json \
--output_tabular_file /path/to/output/example_toy.csv \
--metadata_file /path/to/output/example_toy.meta.csv
```
```console
2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Arguments parsed with success
2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/example_toy.fasta
2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Sequence to filter: Q647G9
2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']
2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Predicting sequence(s)
...
2023-07-04 16:25:35,842 [b2bTools v3.0.6 INFO] Saving results for Q647G9 in JSON format in: /path/to/output/example_toy.json
2023-07-04 16:25:35,845 [b2bTools v3.0.6 INFO] Saving results for Q647G9 in tabular format in: /path/to/output/example_toy.csv
2023-07-04 16:25:35,860 [b2bTools v3.0.6 INFO] Saving metadata for Q647G9 in tabular format in: /path/to/output/example_toy.meta.csv
2023-07-04 16:25:35,893 [b2bTools v3.0.6 INFO] Execution finished with success
```
##### Example for MSA
Please run this command to get all the predictions from your input MSA file:
```console
b2bTools \
--mode msa \
--input_file /path/to/input/small_alignment.clustal \
--output_json_file /path/to/output/small_alignment.clustal.json \
--output_tabular_file /path/to/output/small_alignment.clustal.csv \
--metadata_file /path/to/output/small_alignment.clustal.meta.csv \
--distribution_json_file /path/to/output/small_alignment.clustal.distrib.json \
--distribution_tabular_file /path/to/output/small_alignment.clustal.distrib.csv
```
Expected output:
```console
2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Arguments parsed with success
2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/small_alignment.clustal
2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']
2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Predicting sequence(s)
...
2023-07-04 16:06:40,749 [b2bTools v3.0.6 INFO] Saving results in JSON format in: /path/to/output/small_alignment.clustal.json
2023-07-04 16:06:40,751 [b2bTools v3.0.6 INFO] Saving distributions in JSON format in: /path/to/output/small_alignment.clustal.distrib.json
2023-07-04 16:06:40,760 [b2bTools v3.0.6 INFO] Saving distributions in tabular format in: /path/to/output/small_alignment.clustal.distrib.csv
2023-07-04 16:06:40,784 [b2bTools v3.0.6 INFO] Saving results in tabular format in: /path/to/output/small_alignment.clustal.csv
2023-07-04 16:06:40,788 [b2bTools v3.0.6 INFO] Saving metadata in tabular format in: /path/to/output/small_alignment.clustal.meta.csv
2023-07-04 16:06:40,807 [b2bTools v3.0.6 INFO] Execution finished with success
```
Otherwise, if you need to extract only one sequence from the input file:
```console
b2bTools \
--mode msa \
--sequence_id SEQ_1
--input_file /path/to/input/small_alignment.clustal \
--output_json_file /path/to/output/small_alignment.clustal.json \
--output_tabular_file /path/to/output/small_alignment.clustal.csv \
--metadata_file /path/to/output/small_alignment.clustal.meta.csv \
--distribution_json_file /path/to/output/small_alignment.clustal.distrib.json \
--distribution_tabular_file /path/to/output/small_alignment.clustal.distrib.csv
```
```console
2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Arguments parsed with success
2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/small_alignment.clustal
2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Sequence to filter: SEQ_1
2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']
2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Predicting sequence(s)
...
2023-07-04 16:28:34,602 [b2bTools v3.0.6 INFO] Saving results for SEQ_1 in JSON format in: /path/to/output/small_alignment.clustal.json
2023-07-04 16:28:34,603 [b2bTools v3.0.6 INFO] Saving distributions in JSON format in: /path/to/output/small_alignment.clustal.distrib.json
2023-07-04 16:28:34,612 [b2bTools v3.0.6 INFO] Saving distributions in tabular format in: /path/to/output/small_alignment.clustal.distrib.csv
2023-07-04 16:28:34,632 [b2bTools v3.0.6 INFO] Saving results for SEQ_1 in tabular format in: /path/to/output/small_alignment.clustal.csv
2023-07-04 16:28:34,635 [b2bTools v3.0.6 INFO] Saving metadata for SEQ_1 in tabular format in: /path/to/output/small_alignment.clustal.meta.csv
2023-07-04 16:28:34,651 [b2bTools v3.0.6 INFO] Execution finished with success
```
#### From an aligned file
```python
import matplotlib.pyplot as plt
from b2bTools import MultipleSeq
multiple_seq = MultipleSeq()
multiple_seq.from_aligned_file("/path/to/example.fasta")
predictions = multiple_seq.get_all_predictions_msa("SEQ001")
backbone_pred = predictions['backbone']
sidechain_pred = predictions['sidechain']
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### From two MSA files
```python
import matplotlib.pyplot as plt
from b2bTools import MultipleSeq
multiple_seq = MultipleSeq()
multiple_seq.from_two_msa("/path/to/example_a.fasta", "/path/to/example_b.fasta")
predictions = multiple_seq.get_all_predictions_msa("SEQ001")
backbone_pred = predictions['backbone']
sidechain_pred = predictions['sidechain']
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### From a JSON with variations file
In this case, we support a JSON format to introduce variants in a sequence. For instance:
```json
{
"metadata": { "name": "target_fasta_file" },
"WT": "MAKSTILALLALVLVAHASAMRRERGRQGDSSSCERQVDRVNLKPCEQHIMQRIMGEQEQYDSYDIRSTRSSDQQQRCCDELNEMENTQRCMCEALQQIMENQCDRLQDRQMVQQFKRELMNLPQQCNFRAPQRCDLDVSGGRC",
"Variants": {
"Var1": ["A3S", "A11G"],
"Var2": ["A2G", "K3_S4insPH", "T5del"]
}
}
```
Where WT is the wild-type sequence, and the Variants key includes a dictionary of different variations. Each of them are handled by an array of replacements:
- <Target Residue><New Residue> (For example: Replace the A at the position 3 with a S would be `"A3S"`)
Regarding the input fasta file, the `metadata` key contains the name of the input, remember it should stored in the same directory than the json file.
The code snippet is:
```python
import matplotlib.pyplot as plt
from b2bTools import MultipleSeq
multiple_seq = MultipleSeq()
multiple_seq.from_json("/path/to/example.json")
predictions = multiple_seq.get_all_predictions_msa("SEQ001")
backbone_pred = predictions['backbone']
sidechain_pred = predictions['sidechain']
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### From a sequence performing a BLAST before running the predictions
In case you want to perform a mutation of a residue at one specific position, you have the parameters `mut_position`, `mut_residue` and the value of `mut_option` must be `"y"`.
```python
import matplotlib.pyplot as plt
from b2bTools import MultipleSeq
multiple_seq = MultipleSeq()
multiple_seq.from_blast("path/to/example.fasta", mut_option="y", mut_position=1, mut_residue="A")
predictions = multiple_seq.get_all_predictions_msa("SEQ001")
backbone_pred = predictions['backbone']
sidechain_pred = predictions['sidechain']
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
#### From an UniRef ID performing a BLAST before running the predictions
```python
import matplotlib.pyplot as plt
from b2bTools import MultipleSeq
multiple_seq = MultipleSeq()
multiple_seq.from_uniref("A2R2V4")
predictions = multiple_seq.get_all_predictions_msa("SEQ001")
backbone_pred = predictions['backbone']
sidechain_pred = predictions['sidechain']
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
```
**β οΈ Note**: the query using the UniRef ID was limited to 25 results to increase the time performance.
### π ShiftCrypt predictor (NMR data)
```python
import json
from b2bTools.nmr.shiftCrypt.Predictor import ShiftCrypt
from b2bTools.nmr.shiftCrypt.shiftcrypt_pkg import shiftcrypt_parser as parser
shiftcrypt_instance = ShiftCrypt()
path_to_input = '/path/to/example.nef'
allProteinShifts = parser.parse_official(path_to_input)
result_list = shiftcrypt_instance.predictShifts(
allProteinShifts,
modelClass='1'
)
with open(f"{path_to_input}.json", "w") as fp:
json.dump(result_list, fp, indent=4)
```
Regarding the `modelClass` parameter of method `predictShifts`:
- `modelClass='1'`: the method with the full set of Cs. this may retur a lot of -10 (missing values) because of the scarcity of cs data for some residues
- `modelClass='2'`: the method with just the most common Cs values
- `modelClass='3'`: the method with only N and H CS. Used for dimers
The next table shows all the available predictor values from `shiftcrypt_instance.predictShifts`. Please remind that the returning value is a list of dictionaries with these values:
| Predictor | Output key | Output values (type) |
| --------- | ---------------- | -------------------- |
| ShiftCrypt | `"ID_file"` | `String` |
| ShiftCrypt | `"sequence"` | `[Char]` |
| ShiftCrypt | `"seqCodes"` | `[Integer]` |
| ShiftCrypt | `"shiftCrypt"` | `[Float]` |
| ShiftCrypt | `"chainCode"` | `String` |
## π Documentation: classes & methods
If you are interested in further details, please read the full documentation on [the Bio2Byte website](https://bio2byte.be/b2btools/package-documentation).
To generate locally the documentation you can follow the next steps described in this section.
1. Download the source code of the Bio2Byte Tools in your local environment:
```console
$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools
```
2. Run the following command:
```console
$ make generate-docs
```
3. And then open folder `./wrapped_documentation`
**π‘ Notes:** At any moment, you can read the docs of a method invoking the `__doc__` method (e.g. `print(SingleSeq.predict.__doc__)`).
## π How to cite
If you use this package or data in this package, please cite:
| Predictor | Authors | Cite | Digital Object Identifier (DOI) |
| --------- | --------- | --------- | --------- |
| Dynamine | Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken | _Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken._ From protein sequence to dynamics and disorder with DynaMine **Nature Communications 4:2741 (2013)** | https://www.nature.com/articles/ncomms3741 |
| Disomine | Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken | _Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken._ Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. **bioRxiv 2020.05.25.115253 (2020)** | https://www.biorxiv.org/content/10.1101/2020.05.25.115253v1 |
| EfoldMine | Raimondi, D., Orlando, G., Pancsa, R. et al | _Raimondi, D., Orlando, G., Pancsa, R. et al._ Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. **Sci Rep 7, 8826 (2017)** | https://doi.org/10.1038/s41598-017-08366-3 |
| AgMata | Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken | _Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken._ Accurate prediction of protein beta-aggregation with generalized statistical potentials **Bioinformatics , Volume 36, Issue 7, 1 April 2020, Pages 2076β2081 (2020)** | https://academic.oup.com/bioinformatics/article/36/7/2076/5670527 |
| PSPer | Gabriele Orlando, Daniele Raimondi, Francesco Tabaro, Francesco CodicΓ¨, Yves Moreau, Wim F Vranken | _Gabriele Orlando and others_, Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates, **Bioinformatics, Volume 35, Issue 22, November 2019, Pages 4617β4623** | https://doi.org/10.1093/bioinformatics/btz274 |
| ShiftCrypt | Gabriele Orlando, Daniele Raimondi, Luciano Porto Kagami, Wim F Vranken | _Gabriele Orlando and others_. ShiftCrypt: a web server to understand and biophysically align proteins through their NMR chemical shift values, **Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W36βW40** | https://doi.org/10.1093/nar/gkaa391 |
<!--
## π License
Bio2Byte Tools is free and open-source software licensed under the Apache 2.0 License.
-->
## π Terms of use
1. The Bio2Byte group aims to promote open science by providing freely available online services, database and software relating to the life sciences, with focus on proteins. Where we present scientific data generated by others we impose no additional restriction on the use of the contributed data than those provided by the data owner.
1. The Bio2Byte group expects attribution (e.g. in publications, services or products) for any of its online services, databases or software in accordance with good scientific practice. The expected attribution will be indicated in 'How to cite' sections (or equivalent).
1. The Bio2Byte group is not liable to you or third parties claiming through you, for any loss or damage.
1. Any questions or comments concerning these Terms of Use can be addressed to [Wim Vranken](mailto:wim.vranken@vub.be).
<hr/>
<p align="center">Β© Wim Vranken, Bio2Byte group, VUB, Belgium</p>
<p align="center"><a href="https://bio2byte.be/" target="_blank" ref="noreferrer noopener">https://bio2byte.be/</a></p>
Raw data
{
"_id": null,
"home_page": "https://bio2byte.be",
"name": "b2bTools",
"maintainer": "Jose Gavalda-Garcia, Adrian Diaz, Wim Vranken",
"docs_url": null,
"requires_python": ">=3.7, <3.10",
"maintainer_email": "jose.gavalda.garcia@vub.be, adrian.diaz@vub.be, wim.vranken@vub.be",
"keywords": "b2bTools,biology,bioinformatics,bio-informatics,fasta,proteins,protein-folding",
"author": "Wim Vranken",
"author_email": "Wim.Vranken@vub.be",
"download_url": "https://files.pythonhosted.org/packages/2b/5b/5e6cf3d36e31ce9093e32827b2f1ee31838d8507fd8a239577056ea99797/b2bTools-3.0.6.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <a href=\"bio2byte.be/b2btools\" target=\"_blank\" ref=\"noreferrer noopener\">\n <img src=\"https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n_400x400.jpg\" width=\"224px\"/>\n </a>\n <br/>\n Bio2Byte Tools\n</h1>\n<p align=\"center\">This package provides you structural predictions for protein sequences made by Bio2Byte group.</p>\n<p align=\"center\">\n <a href=\"https://anaconda.org/Bio2Byte/b2bTools\"> <img src=\"https://anaconda.org/Bio2Byte/b2bTools/badges/version.svg\" /></a> \n <a href=\"https://anaconda.org/Bio2Byte/b2bTools\"> <img src=\"https://anaconda.org/Bio2Byte/b2bTools/badges/latest_release_relative_date.svg\" /></a> \n <a href=\"https://anaconda.org/Bio2Byte/b2bTools\"> <img src=\"https://anaconda.org/Bio2Byte/b2bTools/badges/platforms.svg\" /></a> \n <a href=\"https://anaconda.org/Bio2Byte/b2bTools\"> <img src=\"https://anaconda.org/Bio2Byte/b2bTools/badges/downloads.svg\" /></a> \n</p>\n\n## \ud83e\uddea About this Python package\n\nThis package, called `b2bTools`, offers biophysical feature predictors for protein sequences as well as different file parses and other utilities to help you with your protein data analysis.\nIf your input data consists on one or more sequences not aligned, we provide you with the Single Sequence mode. On the other hand, if your input is a Multiple Sequence Alignment (MSA), we provide the MSA mode. For NMR data, we have the predictor ShiftCrypt.\n\nAbout the available predictors:\n\n| Predictor | Usage | Bio.Tools | Online predictor |\n| --------- | --------- | ---- | ---- |\n| DynaMine | Fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle. | [Link](https://bio.tools/Dynamine) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/dynamine/)|\n| DisoMine | Predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. | [Link](https://bio.tools/Disomine) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/disomine/)|\n| EfoldMine | Predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. | [Link](https://bio.tools/b2btools) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/efoldmine) |\n| AgMata | Single-sequence based predictor of protein regions that are likely to cause beta-aggregation. | [Link](https://bio.tools/agmata) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/agmata/) |\n| PSPer | PSP (Phase Separating Protein) predicts whether a protein is likely to phase-separate with a particular mechanism involving RNA interacts (FUS-like proteins). It will highlight the regions in your protein that are involved mechanistically, and provide an overall score. | [Link](https://bio.tools/PSPer) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/psp/) |\n| ShiftCrypt | Auto-encoding NMR chemical shifts from their native vector space to a residue-level biophysical index | [Link](https://bio.tools/ShiftCrypt) | [Start predicting online \u25b6\ufe0f](https://bio2byte.be/b2btools/shiftcrypt/) |\n\n**\ud83d\udd17 Related link:**\n\n- These tools are described on the Bio2Byte website inside the [Tools section](https://bio2byte.be/tool/).\n- [Galaxy](https://usegalaxy.org) is an open source, web-based platform for data intensive biomedical research. There is an available version of the Single Sequence predictors on Galaxy Europe. Start predicting online using Galaxy from [this link](https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/b2btools_single_sequence/b2btools_single_sequence/3.0.5+galaxy0).\n\n## Usage and examples\n\nTo install the latest version of this package:\n\n```console\n$ pip install b2bTools\n```\n\n**\u26a0\ufe0f Important notes:** [Hmmer](http://hmmer.org) and [T-Coffee](https://tcoffee.crg.eu) are required to run several features. Please install them following their official guidelines.\n\n### Single Sequence predictions\n\nUse this example as an entry point when you have a FASTA file containing one or more sequences. There is a live demo available on [Google Colab](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2ByteTools_v3_singleseq_pypi.ipynb).\n\n#### Predicting biophysical features in JSON format\n\n```python\nimport json\nfrom b2bTools import SingleSeq, constants\n\ninput_fasta = \"/path/to/example.fasta\"\n\nsingle_seq = SingleSeq(input_fasta)\n\n# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'\nsingle_seq.predict(tools=constants.PREDICTOR_NAMES)\n\npredictions = single_seq.get_all_predictions()\n\nfor sequence_id, prediction_values in predictions.items():\n with open(f\"{input_fasta}_{sequence_id}.json\", \"w\") as fp:\n json.dump(prediction_values, fp, indent=4)\n```\n\n#### Predicting biophysical features in tabular formats (CSV, TSV)\n\n```python\nimport json\nfrom b2bTools import SingleSeq, constants\n\ninput_fasta = \"/path/to/example.fasta\"\n\nsingle_seq = SingleSeq(input_fasta)\n\n# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'\nsingle_seq.predict(tools=constants.PREDICTOR_NAMES)\n\nsingle_seq.get_all_predictions_tabular(f\"{input_fasta}_residues.csv\", sep=\",\")\nsingle_seq.get_all_predictions_tabular(f\"{input_fasta}_residues.tsv\", sep=\"\\t\")\n```\n\nThe output contains these columns:\n\n| Predictor | Value | Data type |\n| --------- | ----- | --------- |\n| None | `sequence_id` | `String` |\n| None | `residue` | `Char` |\n| None | `residue_index` | `Integer` |\n| PSPer | `RRM` | `Float` |\n| AgMata | `agmata` | `Float` |\n| PSPer | `arg` | `Float` |\n| DynaMine | `backbone` | `Float` |\n| DynaMine | `coil` | `Float` |\n| PSPer | `complexity` | `Float` |\n| DisoMine | `disoMine` | `Float` |\n| PSPer | `disorder` | `Float` |\n| EFoldMine | `earlyFolding` | `Float` |\n| DynaMine | `helix` | `Float` |\n| DynaMine | `ppII` | `Float` |\n| DynaMine | `sheet` | `Float` |\n| DynaMine | `sidechain` | `Float` |\n| PSPer | `tyr` | `Float` |\n\n#### Plotting biophysical features\nIn case you need to plot the prediction values:\n\n```python\nimport json\n\nfrom b2bTools import SingleSeq, constants\nfrom matplotlib import pyplot as plt\n\ninput_fasta = \"/path/to/example.fasta\"\n\nsingle_seq = SingleSeq(input_fasta)\nsequence_id = \"SEQ001\"\n\n# Predict features using 'dynamine'\nsingle_seq.predict(tools=constants.TOOL_DYNAMINE)\n\npredictions = single_seq.get_all_predictions()\n\nbackbone_pred = predictions[sequence_id]['backbone']\nsidechain_pred = predictions[sequence_id]['sidechain']\n\nplt.plot(range(len(backbone_pred)), backbone_pred, label = \"Backbone\")\nplt.plot(range(len(backbone_pred)), sidechain_pred, label = \"Sidechain\")\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### Extracting metadata\n\nFor extracting metadata from the prediction values in tabular format when analyzing Single Sequence input:\n\n```python\nfrom b2bTools import SingleSeq, constants\n\ninput_fasta = \"/path/to/example.fasta\"\n\nsingle_seq = SingleSeq(input_fasta)\n\n# Predict features using 'dynamine', 'disomine', 'efoldmine', 'agmata', 'psper'\nsingle_seq.predict(tools=constants.PREDICTOR_NAMES)\n\npredictions = single_seq.get_all_predictions()\nsingle_seq.get_metadata(f\"{input_fasta}.csv\")\n```\n\nThe output will include these columns:\n\n- sequence_id\n- length\n- Execution times:\n - dynamine_execution_time\n - disomine_execution_time\n - efoldmine_execution_time\n - agmata_execution_time\n - psper_execution_time\n- DynaMine:\n - Backbone:\n - backbone_mean\n - backbone_median\n - backbone_variance\n - backbone_stdev\n - backbone_pvariance\n - backbone_pstdev\n - backbone_min\n - backbone_max\n - Side chain:\n - sidechain_mean\n - sidechain_median\n - sidechain_variance\n - sidechain_stdev\n - sidechain_pvariance\n - sidechain_pstdev\n - sidechain_min\n - sidechain_max\n - ppII:\n - ppII_mean\n - ppII_median\n - ppII_variance\n - ppII_stdev\n - ppII_pvariance\n - ppII_pstdev\n - ppII_min\n - ppII_max\n - Coil:\n - coil_mean\n - coil_median\n - coil_variance\n - coil_stdev\n - coil_pvariance\n - coil_pstdev\n - coil_min\n - coil_max\n - Sheet:\n - sheet_mean\n - sheet_median\n - sheet_variance\n - sheet_stdev\n - sheet_pvariance\n - sheet_pstdev\n - sheet_min\n - sheet_max\n - Helix:\n - helix_mean\n - helix_median\n - helix_variance\n - helix_stdev\n - helix_pvariance\n - helix_pstdev\n - helix_min\n - helix_max\n- EFoldMine:\n - earlyFolding_mean\n - earlyFolding_median\n - earlyFolding_variance\n - earlyFolding_stdev\n - earlyFolding_pvariance\n - earlyFolding_pstdev\n - earlyFolding_min\n - earlyFolding_max\n- AgMata:\n - agmata_mean\n - agmata_median\n - agmata_variance\n - agmata_stdev\n - agmata_pvariance\n - agmata_pstdev\n - agmata_min\n - agmata_max\n- DisoMine:\n - disoMine_mean\n - disoMine_median\n - disoMine_variance\n - disoMine_stdev\n - disoMine_pvariance\n - disoMine_pstdev\n - disoMine_min\n - disoMine_max\n- PSPer:\n - Protein Score:\n - protein_score\n - Complexity:\n - complexity_mean\n - complexity_median\n - complexity_variance\n - complexity_stdev\n - complexity_pvariance\n - complexity_pstdev\n - complexity_min\n - complexity_max\n - ARG:\n - arg_mean\n - arg_median\n - arg_variance\n - arg_stdev\n - arg_pvariance\n - arg_pstdev\n - arg_min\n - arg_max\n - TYR:\n - tyr_mean\n - tyr_median\n - tyr_variance\n - tyr_stdev\n - tyr_pvariance\n - tyr_pstdev\n - tyr_min\n - tyr_max\n - RRM:\n - RRM_mean\n - RRM_median\n - RRM_variance\n - RRM_stdev\n - RRM_pvariance\n - RRM_pstdev\n - RRM_min\n - RRM_max\n - Disorder:\n - disorder_mean\n - disorder_median\n - disorder_variance\n - disorder_stdev\n - disorder_pvariance\n - disorder_pstdev\n - disorder_min\n - disorder_max\n\n### Multiple Sequences Alignment predictions\n\nUse the following example as an entry point when you have a MSA file input. There is a live demo available on Google Colab: [link](https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2ByteTools_v3_multipleseq_pypi.ipynb)\n\n#### Predicting biophysical features in JSON format\n\n```python\nimport json\nfrom b2bTools import MultipleSeq, constants\n\ninput_msa = \"/path/to/example.afa\"\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)\n\npredictions = multiple_seq.get_all_predictions_msa()\n\nfor sequence_id, prediction_values in predictions.items():\n with open(f\"{input_msa}_{sequence_id}.json\", \"w\") as fp:\n json.dump(prediction_values, fp, indent=4)\n\ndistributions_dict = multiple_seq.get_all_predictions_msa_distrib()\ndistributions = distributions_dict['results']\n\nwith open(f\"{input_msa}_distributions.json\", \"w\") as fp:\n json.dump(distributions, fp, indent=4)\n\n```\n\n#### Predicting biophysical features in tabular formats (CSV, TSV)\n\n```python\nfrom b2bTools import MultipleSeq, constants\n\ninput_msa = \"/path/to/example.afa\"\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)\n\nmultiple_seq.get_all_predictions_tabular(f\"{input_msa}_residues.csv\", sep=\",\")\nmultiple_seq.get_all_predictions_tabular(f\"{input_msa}_residues.tsv\", sep=\"\\t\")\n```\n\nThe output contains these columns:\n\n| Predictor | Value | Data type |\n| ----------| ----- | --------- |\n| None | sequence_id | String |\n| None | residue | Char |\n| None | residue_index | Integer |\n| PSPer | RRM | Float |\n| AgMata | agmata | Float |\n| PSPer | arg | Float |\n| DynaMine | backbone | Float |\n| DynaMine | coil | Float |\n| PSPer | complexity | Float |\n| DisoMine | disoMine | Float |\n| PSPer | disorder | Float |\n| EFoldMine | earlyFolding | Float |\n| DynaMine | helix | Float |\n| DynaMine | ppII | Float |\n| DynaMine | sheet | Float |\n| DynaMine | sidechain | Float |\n| PSPer | tyr | Float |\n\n#### Extracting metadata\nFor extracting metadata from the prediction values in tabular format when analyzing MSA input:\n\n```python\nfrom b2bTools import MultipleSeq, constants\n\ninput_msa = \"/path/to/example.afa\"\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_aligned_file(input_msa, tools=constants.PREDICTOR_NAMES)\n\nmultiple_seq.get_metadata(f\"{input_msa}.csv\")\n```\n\n\n\n## \ud83d\udcbb Installation\n\nFrom the official documentation:\n\n> [`pip`](https://pypi.org/) is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.\n\n\nTo install the b2bTools package in your local environment:\n\n```console\n$ pip install b2bTools\n```\n\n**\ud83d\udca1 Notes:** If you are using [Jupyter Notebook](https://jupyter.org) or [Google Colab](https://colab.research.google.com), install the package directly from `pip` inside a _code block_ cell:\n\n```python\n!pip install b2bTools\n```\n\n## \ud83d\udce6 Package content\n\n### \ud83d\udd0d General Tools\n\nBesides the prediction tools, this package includes general bioinformatics tools useful to manipulate files.\n\n#### \ud83d\udcc4 Single Sequences files (FASTA format)\n\nThe class `FastaIO` provides the following static methods:\n\n- `read_fasta_from_file`\n- `read_fasta_from_string`\n- `write_fasta`\n\nUsage:\n\n```python\nfrom b2bTools.general.parsers.fasta import FastaIO\n```\n\n#### \ud83d\udcc4 Multiple Sequences Alignments files\n\nThe class `AlignmentsIO` provides the following static methods:\n\n- `read_alignments`\n- `read_alignments_fasta`\n- `read_alignments_A3M`\n- `read_alignments_blast`\n- `read_alignments_balibase`\n- `read_alignments_clustal`\n- `read_alignments_psi`\n- `read_alignments_phylip`\n- `read_alignments_stockholm`\n- `write_fasta_from_alignment`\n- `write_fasta_from_seq_alignment_dict`\n- `json_preds_to_csv_singleseq`\n- `json_preds_to_csv_msa`\n\nUsage:\n\n```python\nfrom b2bTools.general.parsers.alignments import AlignmentsIO\n```\n\n#### \ud83d\udcc4 NEF files\n\nThe class `NefIO` provides the following static methods:\n\n- `read_nef_file`\n- `read_nef_file_sequence_shifts`\n\nUsage:\n\n```python\nfrom b2bTools.general.parsers.alignments import AlignmentsIO\n```\n\n#### \ud83d\udcc4 NMR-STAR files\n\nThe class `NMRStarIO` provides the following static methods:\n\n- `read_nmr_star_project`\n- `read_nmr_star_sequence_shifts`\n\nUsage:\n\n```python\nfrom b2bTools.general.parsers.nmr_star import NMRStarIO\n```\n\n### \ud83d\udd0d Biophysical features predictors\n\nGiven a predictor might be built on top of other, it is usual to get more output predictions than the expected:\n\n| Predictor | Depends on |\n| --------- | --------------------- |\n| Dynamine | - |\n| EfoldMine | Dynamine |\n| Disomine | EfoldMine, Dynamine |\n| AgMata | EfoldMine, Dynamine |\n\nThese are all the available options to use inside the tools array parameter:\n\n| Predictor | constant value | literal value |\n| --------- | ----------------------------| ------------- |\n| Dynamine | `constants.TOOL_DYNAMINE` | `\"dynamine\"` |\n| EfoldMine | `constants.TOOL_EFOLDMINE` | `\"efoldmine\"` |\n| Disomine | `constants.TOOL_DISOMINE` | `\"disomine\"` |\n| AgMata | `constants.TOOL_AGMATA` | `\"agmata\"` |\n| PSPer | `constants.TOOL_PSP` | `\"psper\"` |\n\nThe next table shows all the available predictor values by predictor:\n\n| Predictor | Output key | Output values (type) |\n| --------- | ---------------- | -------------------- |\n| Dynamine | `\"backbone\"` | `[Float]` |\n| Dynamine | `\"sidechain\"` | `[Float]` |\n| Dynamine | `\"helix\"` | `[Float]` |\n| Dynamine | `\"ppII\"` | `[Float]` |\n| Dynamine | `\"coil\"` | `[Float]` |\n| Dynamine | `\"sheet\"` | `[Float]` |\n| EfoldMine | `\"earlyFolding\"` | `[Float]` |\n| Disomine | `\"disoMine\"` | `[Float]` |\n| AgMata | `\"agmata\"` | `[Float]` |\n| PSPer | `\"viterbi\"` | `[Float]` |\n| PSPer | `\"complexity\"` | `[Float]` |\n| PSPer | `\"tyr\"` | `[Float]` |\n| PSPer | `\"arg\"` | `[Float]` |\n| PSPer | `\"RRM\"` | `[Float]` |\n| PSPer | `\"disorder\"` | `[Float]` |\n\nFor MSA input files, the distribution dictionary and/or JSON will include:\n\n```python\nmultiple_seq.get_all_predictions_msa_distrib()['results']\n```\n\n| Predictor | Output key | Output values (type) |\n| --------- | ---------------- | -------------------- |\n| Dynamine | `\"backbone\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Dynamine | `\"sidechain\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Dynamine | `\"helix\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Dynamine | `\"ppII\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Dynamine | `\"coil\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Dynamine | `\"sheet\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| EfoldMine | `\"earlyFolding\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| Disomine | `\"disoMine\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| AgMata | `\"agmata\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"viterbi\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"complexity\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"tyr\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"arg\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"RRM\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n| PSPer | `\"disorder\"` | `['median', 'thirdQuartile', 'firstQuartile', 'topOutlier', 'bottomOutlier']` |\n\nThe method `get_all_predictions` will return a dictionary with the following structure:\n\n```python\n{\n \"SEQUENCE_ID_000\": {\n \"seq\": \"the input sequence 0\",\n \"result001\": [0.001, 0.002, ..., 0.00],\n \"result002\": [0.001, 0.002, ..., 0.00],\n \"...\": [...],\n \"resultN\": [0.001, 0.002, ..., 0.00]\n },\n \"SEQUENCE_ID_001\": {\n \"seq\": \"the input sequence 1\",\n \"result001\": [0.001, 0.002, ..., 0.00],\n \"result002\": [0.001, 0.002, ..., 0.00],\n \"...\": [...],\n \"resultN\": [0.001, 0.002, ..., 0.00]\n },\n \"...\": { ... },\n \"SEQUENCE_ID_N\": {\n \"seq\": \"the input sequence N\",\n \"result001\": [0.001, 0.002, ..., 0.00],\n \"result002\": [0.001, 0.002, ..., 0.00],\n \"...\": [...],\n \"resultN\": [0.001, 0.002, ..., 0.00]\n },\n}\n```\n\nYou are ready to use the sequence and predictions to work with them. Here is an example of plotting the data.\n\n```python\nbackbone_pred = predictions['SEQ001']['backbone']\nsidechain_pred = predictions['SEQ001']['sidechain']\n\nplt.plot(range(len(backbone_pred)), backbone_pred, label = \"Backbone\")\nplt.plot(range(len(sidechain_pred)), sidechain_pred, label = \"Sidechain\")\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### Running as Python module (no Python code involved)\n\nYou are able to use this package directly from your console session with no Python code involved. Further details available on [the official Python documentation site](https://docs.python.org/3/tutorial/modules.html#executing-modules-as-scripts)\n\n```console\nusage: b2bTools [-h] [-v] -i INPUT_FILE -o OUTPUT_JSON_FILE\n [-t OUTPUT_TABULAR_FILE] [-m METADATA_FILE]\n [-dj DISTRIBUTION_JSON_FILE] [-dt DISTRIBUTION_TABULAR_FILE]\n [-s {comma,tab}] [--short_ids] [--mode {single_seq,msa}]\n [--dynamine] [--disomine] [--efoldmine] [--agmata] [--psper]\n\nBio2Byte Tool - Command Line Interface\n\noptional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n -i INPUT_FILE, --input_file INPUT_FILE\n File to process\n -o OUTPUT_JSON_FILE, --output_json_file OUTPUT_JSON_FILE\n Path to JSON output file\n -t OUTPUT_TABULAR_FILE, --output_tabular_file OUTPUT_TABULAR_FILE\n Path to tabular output file\n -m METADATA_FILE, --metadata_file METADATA_FILE\n Path to tabular metadata file\n -dj DISTRIBUTION_JSON_FILE, --distribution_json_file DISTRIBUTION_JSON_FILE\n Path to distribution output JSON file\n -dt DISTRIBUTION_TABULAR_FILE, --distribution_tabular_file DISTRIBUTION_TABULAR_FILE\n Path to distribution output JSON file\n -s {comma,tab}, --sep {comma,tab}\n Tabular separator\n --short_ids Trim sequence ids (up to 20 chars per seq)\n --mode {single_seq,msa}\n Execution mode: Single Sequence or MSA Analysis\n --dynamine Run DynaMine predictor\n --disomine Run DisoMine predictor\n --efoldmine Run EFoldMine predictor\n --agmata Run AgMata predictor\n --psper Run PSPer predictor\n```\n\n##### To display the help section\n\n```console\nb2bTools --help\n```\n\n##### To visualize the version\n\n```console\nb2bTools --version\n```\n\n##### Example for Single Sequence\n\nPlease run this command to get all the predictions from your input FASTA file:\n\n```console\nb2bTools \\\n --input_file /path/to/input/example_toy.fasta \\\n --output_json_file /path/to/output/example_toy.json \\\n --output_tabular_file /path/to/output/example_toy.csv \\\n --metadata_file /path/to/output/example_toy.meta.csv\n```\n\nExpected output:\n\n```console\n2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Arguments parsed with success\n2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/example_toy.fasta\n2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']\n2023-07-04 16:04:23,630 [b2bTools v3.0.6 INFO] Predicting sequence(s)\n...\n2023-07-04 16:04:23,986 [b2bTools v3.0.6 INFO] Saving results in JSON format in: /path/to/output/example_toy.json\n2023-07-04 16:04:24,006 [b2bTools v3.0.6 INFO] Saving results in tabular format in: /path/to/output/example_toy.csv\n2023-07-04 16:04:24,040 [b2bTools v3.0.6 INFO] Saving metadata in tabular format in: /path/to/output/example_toy.meta.csv\n2023-07-04 16:04:24,279 [b2bTools v3.0.6 INFO] Execution finished with success\n```\n\nOtherwise, if you need to extract only one sequence from the input file:\n\n```console\nb2bTools \\\n --sequence_id Q647G9 \\\n --input_file /path/to/input/example_toy.fasta \\\n --output_json_file /path/to/output/example_toy.json \\\n --output_tabular_file /path/to/output/example_toy.csv \\\n --metadata_file /path/to/output/example_toy.meta.csv\n```\n\n```console\n2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Arguments parsed with success\n2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/example_toy.fasta\n2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Sequence to filter: Q647G9\n2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']\n2023-07-04 16:25:35,486 [b2bTools v3.0.6 INFO] Predicting sequence(s)\n...\n2023-07-04 16:25:35,842 [b2bTools v3.0.6 INFO] Saving results for Q647G9 in JSON format in: /path/to/output/example_toy.json\n2023-07-04 16:25:35,845 [b2bTools v3.0.6 INFO] Saving results for Q647G9 in tabular format in: /path/to/output/example_toy.csv\n2023-07-04 16:25:35,860 [b2bTools v3.0.6 INFO] Saving metadata for Q647G9 in tabular format in: /path/to/output/example_toy.meta.csv\n2023-07-04 16:25:35,893 [b2bTools v3.0.6 INFO] Execution finished with success\n```\n\n##### Example for MSA\n\nPlease run this command to get all the predictions from your input MSA file:\n\n```console\nb2bTools \\\n --mode msa \\\n --input_file /path/to/input/small_alignment.clustal \\\n --output_json_file /path/to/output/small_alignment.clustal.json \\\n --output_tabular_file /path/to/output/small_alignment.clustal.csv \\\n --metadata_file /path/to/output/small_alignment.clustal.meta.csv \\\n --distribution_json_file /path/to/output/small_alignment.clustal.distrib.json \\\n --distribution_tabular_file /path/to/output/small_alignment.clustal.distrib.csv\n```\n\nExpected output:\n\n```console\n2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Arguments parsed with success\n2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/small_alignment.clustal\n2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']\n2023-07-04 16:06:40,524 [b2bTools v3.0.6 INFO] Predicting sequence(s)\n...\n2023-07-04 16:06:40,749 [b2bTools v3.0.6 INFO] Saving results in JSON format in: /path/to/output/small_alignment.clustal.json\n2023-07-04 16:06:40,751 [b2bTools v3.0.6 INFO] Saving distributions in JSON format in: /path/to/output/small_alignment.clustal.distrib.json\n2023-07-04 16:06:40,760 [b2bTools v3.0.6 INFO] Saving distributions in tabular format in: /path/to/output/small_alignment.clustal.distrib.csv\n2023-07-04 16:06:40,784 [b2bTools v3.0.6 INFO] Saving results in tabular format in: /path/to/output/small_alignment.clustal.csv\n2023-07-04 16:06:40,788 [b2bTools v3.0.6 INFO] Saving metadata in tabular format in: /path/to/output/small_alignment.clustal.meta.csv\n2023-07-04 16:06:40,807 [b2bTools v3.0.6 INFO] Execution finished with success\n```\n\nOtherwise, if you need to extract only one sequence from the input file:\n\n```console\nb2bTools \\\n --mode msa \\\n --sequence_id SEQ_1\n --input_file /path/to/input/small_alignment.clustal \\\n --output_json_file /path/to/output/small_alignment.clustal.json \\\n --output_tabular_file /path/to/output/small_alignment.clustal.csv \\\n --metadata_file /path/to/output/small_alignment.clustal.meta.csv \\\n --distribution_json_file /path/to/output/small_alignment.clustal.distrib.json \\\n --distribution_tabular_file /path/to/output/small_alignment.clustal.distrib.csv\n```\n\n```console\n2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Arguments parsed with success\n2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Reading sequences from: /path/to/input/small_alignment.clustal\n2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Sequence to filter: SEQ_1\n2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Tools to execute: ['dynamine']\n2023-07-04 16:28:34,388 [b2bTools v3.0.6 INFO] Predicting sequence(s)\n...\n2023-07-04 16:28:34,602 [b2bTools v3.0.6 INFO] Saving results for SEQ_1 in JSON format in: /path/to/output/small_alignment.clustal.json\n2023-07-04 16:28:34,603 [b2bTools v3.0.6 INFO] Saving distributions in JSON format in: /path/to/output/small_alignment.clustal.distrib.json\n2023-07-04 16:28:34,612 [b2bTools v3.0.6 INFO] Saving distributions in tabular format in: /path/to/output/small_alignment.clustal.distrib.csv\n2023-07-04 16:28:34,632 [b2bTools v3.0.6 INFO] Saving results for SEQ_1 in tabular format in: /path/to/output/small_alignment.clustal.csv\n2023-07-04 16:28:34,635 [b2bTools v3.0.6 INFO] Saving metadata for SEQ_1 in tabular format in: /path/to/output/small_alignment.clustal.meta.csv\n2023-07-04 16:28:34,651 [b2bTools v3.0.6 INFO] Execution finished with success\n```\n\n#### From an aligned file\n\n```python\nimport matplotlib.pyplot as plt\nfrom b2bTools import MultipleSeq\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_aligned_file(\"/path/to/example.fasta\")\n\npredictions = multiple_seq.get_all_predictions_msa(\"SEQ001\")\nbackbone_pred = predictions['backbone']\nsidechain_pred = predictions['sidechain']\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### From two MSA files\n\n```python\nimport matplotlib.pyplot as plt\nfrom b2bTools import MultipleSeq\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_two_msa(\"/path/to/example_a.fasta\", \"/path/to/example_b.fasta\")\n\npredictions = multiple_seq.get_all_predictions_msa(\"SEQ001\")\nbackbone_pred = predictions['backbone']\nsidechain_pred = predictions['sidechain']\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### From a JSON with variations file\n\nIn this case, we support a JSON format to introduce variants in a sequence. For instance:\n\n```json\n{\n \"metadata\": { \"name\": \"target_fasta_file\" },\n \"WT\": \"MAKSTILALLALVLVAHASAMRRERGRQGDSSSCERQVDRVNLKPCEQHIMQRIMGEQEQYDSYDIRSTRSSDQQQRCCDELNEMENTQRCMCEALQQIMENQCDRLQDRQMVQQFKRELMNLPQQCNFRAPQRCDLDVSGGRC\",\n \"Variants\": {\n \"Var1\": [\"A3S\", \"A11G\"],\n \"Var2\": [\"A2G\", \"K3_S4insPH\", \"T5del\"]\n }\n}\n```\n\nWhere WT is the wild-type sequence, and the Variants key includes a dictionary of different variations. Each of them are handled by an array of replacements:\n\n- <Target Residue><New Residue> (For example: Replace the A at the position 3 with a S would be `\"A3S\"`)\n\nRegarding the input fasta file, the `metadata` key contains the name of the input, remember it should stored in the same directory than the json file.\n\nThe code snippet is:\n\n```python\nimport matplotlib.pyplot as plt\nfrom b2bTools import MultipleSeq\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_json(\"/path/to/example.json\")\n\npredictions = multiple_seq.get_all_predictions_msa(\"SEQ001\")\nbackbone_pred = predictions['backbone']\nsidechain_pred = predictions['sidechain']\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### From a sequence performing a BLAST before running the predictions\n\nIn case you want to perform a mutation of a residue at one specific position, you have the parameters `mut_position`, `mut_residue` and the value of `mut_option` must be `\"y\"`.\n\n```python\nimport matplotlib.pyplot as plt\nfrom b2bTools import MultipleSeq\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_blast(\"path/to/example.fasta\", mut_option=\"y\", mut_position=1, mut_residue=\"A\")\n\npredictions = multiple_seq.get_all_predictions_msa(\"SEQ001\")\nbackbone_pred = predictions['backbone']\nsidechain_pred = predictions['sidechain']\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n#### From an UniRef ID performing a BLAST before running the predictions\n\n```python\nimport matplotlib.pyplot as plt\nfrom b2bTools import MultipleSeq\n\nmultiple_seq = MultipleSeq()\nmultiple_seq.from_uniref(\"A2R2V4\")\n\npredictions = multiple_seq.get_all_predictions_msa(\"SEQ001\")\nbackbone_pred = predictions['backbone']\nsidechain_pred = predictions['sidechain']\n\nplt.legend()\nplt.xlabel('aa_position')\nplt.ylabel('pred_values')\nplt.show()\n```\n\n**\u26a0\ufe0f Note**: the query using the UniRef ID was limited to 25 results to increase the time performance.\n\n### \ud83d\udd0d ShiftCrypt predictor (NMR data)\n\n```python\nimport json\nfrom b2bTools.nmr.shiftCrypt.Predictor import ShiftCrypt\nfrom b2bTools.nmr.shiftCrypt.shiftcrypt_pkg import shiftcrypt_parser as parser\n\nshiftcrypt_instance = ShiftCrypt()\npath_to_input = '/path/to/example.nef'\n\nallProteinShifts = parser.parse_official(path_to_input)\nresult_list = shiftcrypt_instance.predictShifts(\n allProteinShifts,\n modelClass='1'\n)\n\nwith open(f\"{path_to_input}.json\", \"w\") as fp:\n json.dump(result_list, fp, indent=4)\n```\n\nRegarding the `modelClass` parameter of method `predictShifts`:\n\n- `modelClass='1'`: the method with the full set of Cs. this may retur a lot of -10 (missing values) because of the scarcity of cs data for some residues\n- `modelClass='2'`: the method with just the most common Cs values\n- `modelClass='3'`: the method with only N and H CS. Used for dimers\n\n\nThe next table shows all the available predictor values from `shiftcrypt_instance.predictShifts`. Please remind that the returning value is a list of dictionaries with these values:\n\n| Predictor | Output key | Output values (type) |\n| --------- | ---------------- | -------------------- |\n| ShiftCrypt | `\"ID_file\"` | `String` |\n| ShiftCrypt | `\"sequence\"` | `[Char]` |\n| ShiftCrypt | `\"seqCodes\"` | `[Integer]` |\n| ShiftCrypt | `\"shiftCrypt\"` | `[Float]` |\n| ShiftCrypt | `\"chainCode\"` | `String` |\n\n## \ud83d\udcda Documentation: classes & methods\n\nIf you are interested in further details, please read the full documentation on [the Bio2Byte website](https://bio2byte.be/b2btools/package-documentation).\n\nTo generate locally the documentation you can follow the next steps described in this section.\n\n1. Download the source code of the Bio2Byte Tools in your local environment:\n\n```console\n$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools\n```\n\n2. Run the following command:\n\n```console\n$ make generate-docs\n```\n\n3. And then open folder `./wrapped_documentation`\n\n**\ud83d\udca1 Notes:** At any moment, you can read the docs of a method invoking the `__doc__` method (e.g. `print(SingleSeq.predict.__doc__)`).\n\n## \ud83d\udcd6 How to cite\n\nIf you use this package or data in this package, please cite:\n\n| Predictor | Authors | Cite | Digital Object Identifier (DOI) |\n| --------- | --------- | --------- | --------- |\n| Dynamine | Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken | _Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken._ From protein sequence to dynamics and disorder with DynaMine **Nature Communications 4:2741 (2013)** | https://www.nature.com/articles/ncomms3741 |\n| Disomine | Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken | _Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken._ Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. **bioRxiv 2020.05.25.115253 (2020)** | https://www.biorxiv.org/content/10.1101/2020.05.25.115253v1 |\n| EfoldMine | Raimondi, D., Orlando, G., Pancsa, R. et al | _Raimondi, D., Orlando, G., Pancsa, R. et al._ Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. **Sci Rep 7, 8826 (2017)** | https://doi.org/10.1038/s41598-017-08366-3 |\n| AgMata | Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken | _Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken._ Accurate prediction of protein beta-aggregation with generalized statistical potentials **Bioinformatics , Volume 36, Issue 7, 1 April 2020, Pages 2076\u20132081 (2020)** | https://academic.oup.com/bioinformatics/article/36/7/2076/5670527 |\n| PSPer | Gabriele Orlando, Daniele Raimondi, Francesco Tabaro, Francesco Codic\u00e8, Yves Moreau, Wim F Vranken | _Gabriele Orlando and others_, Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates, **Bioinformatics, Volume 35, Issue 22, November 2019, Pages 4617\u20134623** | https://doi.org/10.1093/bioinformatics/btz274 |\n| ShiftCrypt | Gabriele Orlando, Daniele Raimondi, Luciano Porto Kagami, Wim F Vranken | _Gabriele Orlando and others_. ShiftCrypt: a web server to understand and biophysically align proteins through their NMR chemical shift values, **Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W36\u2013W40** | https://doi.org/10.1093/nar/gkaa391 |\n\n<!--\n## \ud83d\udcdd License\nBio2Byte Tools is free and open-source software licensed under the Apache 2.0 License.\n-->\n\n## \ud83d\udcdd Terms of use\n\n1. The Bio2Byte group aims to promote open science by providing freely available online services, database and software relating to the life sciences, with focus on proteins. Where we present scientific data generated by others we impose no additional restriction on the use of the contributed data than those provided by the data owner.\n1. The Bio2Byte group expects attribution (e.g. in publications, services or products) for any of its online services, databases or software in accordance with good scientific practice. The expected attribution will be indicated in 'How to cite' sections (or equivalent).\n1. The Bio2Byte group is not liable to you or third parties claiming through you, for any loss or damage.\n1. Any questions or comments concerning these Terms of Use can be addressed to [Wim Vranken](mailto:wim.vranken@vub.be).\n\n<hr/>\n<p align=\"center\">\u00a9 Wim Vranken, Bio2Byte group, VUB, Belgium</p>\n<p align=\"center\"><a href=\"https://bio2byte.be/\" target=\"_blank\" ref=\"noreferrer noopener\">https://bio2byte.be/</a></p>\n",
"bugtrack_url": null,
"license": "OSI Approved :: GNU General Public License v3 (GPLv3)",
"summary": "bio2Byte software suite to predict protein biophysical properties from their amino-acid sequences",
"version": "3.0.6",
"project_urls": {
"Documentation": "https://bio2byte.be/b2btools/package-documentation",
"HTML interface": "https://bio2byte.be/b2btools",
"Homepage": "https://bio2byte.be"
},
"split_keywords": [
"b2btools",
"biology",
"bioinformatics",
"bio-informatics",
"fasta",
"proteins",
"protein-folding"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fb3955d55c1a5149dd92fcfd6ff524081266d7b2b08e12e69233e488cb083fcf",
"md5": "24599c39f35074e8a89d1bf5684b3858",
"sha256": "e2c9a0f72add5d7b8800f5f92ac78139fb8cc26c8f71ae8b81806603a992cd56"
},
"downloads": -1,
"filename": "b2bTools-3.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "24599c39f35074e8a89d1bf5684b3858",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7, <3.10",
"size": 20117894,
"upload_time": "2023-07-04T15:43:29",
"upload_time_iso_8601": "2023-07-04T15:43:29.681333Z",
"url": "https://files.pythonhosted.org/packages/fb/39/55d55c1a5149dd92fcfd6ff524081266d7b2b08e12e69233e488cb083fcf/b2bTools-3.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b5b5e6cf3d36e31ce9093e32827b2f1ee31838d8507fd8a239577056ea99797",
"md5": "8b7436a3d57b87192c521c0202b4b00e",
"sha256": "3617afcfe76cbff3a6bfad4060f962568565fb691b978f87d8db78cde97da58b"
},
"downloads": -1,
"filename": "b2bTools-3.0.6.tar.gz",
"has_sig": false,
"md5_digest": "8b7436a3d57b87192c521c0202b4b00e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7, <3.10",
"size": 19834567,
"upload_time": "2023-07-04T15:43:36",
"upload_time_iso_8601": "2023-07-04T15:43:36.933726Z",
"url": "https://files.pythonhosted.org/packages/2b/5b/5e6cf3d36e31ce9093e32827b2f1ee31838d8507fd8a239577056ea99797/b2bTools-3.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-04 15:43:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "b2btools"
}