Name | vairo JSON |
Version |
1.0.0
JSON |
| download |
home_page | http://chango.ibmb.csic.es |
Summary | VAIRO guides predictions towards particular dynamic states selecting the prior information input or analysing the results of the search. |
upload_time | 2025-08-07 08:32:06 |
maintainer | None |
docs_url | None |
author | Isabel Uson |
requires_python | >=3.6 |
license | None |
keywords |
crystallography
macromolecular
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# VAIRO
Guiding structural model predictions with experimental information
-------------------
## Prerequisites
* AlphaFold2
* HH-suite
* CCP4 suite
* ALEPH
* MAXIT
-------------------
## Installation
In order to install VAIRO and its interface VAIROGUI, you need to run the installer script located in tools/install_vairo.sh. This script handles conda setup and installs all VAIRO dependencies within a dedicated environment.
Execute the installer script:
```
bash tools/install_vairo.sh
```
The script will:
1. Check for an existing conda installation and install it if missing.
2. Create and activate a conda environment for VAIRO.
3. Install all Python and system dependencies required by VAIRO.
4. Verify that system libraries like CUDA drivers or MAXIT are already present.
-------------------
## Usage
To run the command-line program:
```
vairo [-h] [-check] <config.yaml>
```
| Flag | Description |
|-----------|------------------------------------------------|
| `-h` | Show help and exit |
| `-check` | Validate configuration (*.yml*) file parsing |
To launch the graphical interface:
```
vairogui
```
-------------------
## Configuration File (YAML)
-------------------
The configuration file must be in valid YAML. Below are all supported sections and parameters.
### 1. Mandatory keys
mode (string) Choose one of: naive, guided.
output_dir (string) Directory where results will be saved.
af2_dbs_path (string) Path to the AlphaFold2 databases (must be pre-downloaded).
### 2. Common optional keys
run_dir (string, default: "run") Directory where AlphaFold2 jobs will run.
glycines (integer, default: 50) Number of glycine residues to insert between concatenated sequences.
small_bfd (boolean, default: false) Use reduced BFD library.
run_af2 (boolean, default: true) Run AlphaFold2 (otherwise stop after generating features.pkl file).
stop_after_msa (boolean, default: false) Run AlphaFold2 up to MSA generation, then exit.
reference (string, default: "") PDB ID or path to PDB file to be used as global reference.
experimental_pdbs (list of strings, default: []) List of PDB IDs or paths to PDB files for result comparison.
mosaic (integer, default: null) Split the sequence into X partitions.
mosaic_partition (range, default: null) Residue based partitioning.
mosaic_seq_partition (range, default: null) Sequence numbering partitioning.
cluster_templates (boolean, default: false - becomes true if mode: naive) Cluster templates from preprocessed features.pkl.
cluster_templates_msa (integer, default: -1) Number of sequences to add to the MSA (-1 = all).
cluster_templates_msa_mask (sequence range, default: null) Remove specific residues from MSA sequences.
cluster_templates_sequence (string path, default: null) Replace templates sequences using FASTA at given path.
show_pymol (string, default: null) Pymol selection string (comma-separated regions) to zoom into.
### 3. Query sequence
Define one or more sequences to generate the query sequence. All sequences will be concatenated using glycine linkers.
```
sequences:
- fasta_path (string, mandatory) Path to the FASTA file.
num_of_copies (integer, default: 1) Number of copies of the sequence.
positions (list of integers, default: [], any position) Insertion position in the query.
name (string, default: file name from fasta_path) Sequence name.
predict_region (range, default: null) Predict only this subsequence instead of the full length.
mutations (map) Map three-letter amino acid codes to residue indices. Example:
- 'ALA': 10, 20
```
### 4. Add templates
Customize PDB templates for insertion into features.pkl.
```
templates:
- pdb (string, mandatory) Path to a PDB file or existing PDB ID.
add_to_msa (boolean, default: false) Add the template’s sequence to the MSA.
add_to_templates (boolean, default: true) Include the template in features.pkl.
generate_multimer (boolean, default: true) Generate a multimeric assembly from the PDB.
strict (boolean, default: true) Discard templates with E-values below threshold.
aligned (boolean, default: false) Skip alignment if already aligned.
legacy (boolean, default: false) Use pre-aligned, single-chain template for the full query.
reference (string, default: null) Reference to be used in order to insert it into the query sequence.
modifications (List) Chain-level edits before/after alignment. Each modification can include:
- chain (string, mandatory) chain ID or All.
position (integer, default: null) Insertion position in query (if single chain).
maintain_residues (list of integers, default: null) Selected residues will be kept, and the rest will be deleted.
delete_residues (list of integers, default: null) Selected residues will be deleted, the rest will be kept.
when (string, default: after_alignment) before_alignment or after_alignment.
mutations (List) Modifications in the residues:
- numbering_residues (list of integers, mandatory) Residue positions where the mutations will be applied.
mutate_with (string, mandatory) The amino acid to mutate to, specified as a three‑letter code or as a FASTA file path.
```
### 5. Add features
Merge or slice existing features.pkl files from other AlphaFold2 runs into your run.
```
features:
- path (string, mandatory) Path to an existing features.pkl file.
keep_msa (integer, default: -1) -1 = all sequences; otherwise top X by coverage.
keep_templates (integer, default: -1) -1 = all templates; otherwise top X by coverage.
msa_mask (range, default: null) Remove this residue range from the MSA.
sequence (string, default: null) FASTA file to replace all template sequences.
numbering_query (list of integers, default: null) Insertion positions in the query sequence.
numbering_features (list of ranges, default: null) Map feature blocks into the positions given by numbering_query.
positions (range, default: null) Inserts the features.pkl into the query sequence. The position refers to the sequence index, whereas in numbering_query and numbering_features, it refers to the residue positions in the entire query sequence.
mutations (map) Map three-letter amino acid codes to residue indices. Example:
- 'ALA': 10, 20
```
### 6. Append library
Append existing FASTA/PDB files from a library into your run.
```
append_library:
- path: (string, mandatory) Path to a directory, PDB, or FASTA file.
add_to_msa (boolean, default: true) Append sequences to the MSA.
add_to_templates (boolean, default: false) Append PDBs to the templates.
numbering_query (list of integers, default: null) Insertion positions in the query.
numbering_library (list of ranges, default: null) Residue range from the library entry to insert.
```
### 7. Configuration file example
```
mode: guided
output_dir: /path/to/output
af2_dbs_path: /path/to/af2_dbs
run_af2: True
experimental_pdbs: /path/to/references/experimental.pdb
sequences:
- fasta_path: /path/to/data/seq1.fasta
num_of_copies: 1
- fasta_path: /path/to/data/seq2.fasta
num_of_copies: 1
- fasta_path: /path/to/data/seq3.fasta
num_of_copies: 1
- fasta_path: /path/to/data/seq4.fasta
num_of_copies: 1
features:
- path: /path/to/features1.pkl
keep_msa: 30
keep_templates: 0
numbering_query: 1
- path: /path/to/features2.pkl
keep_msa: 30
keep_templates: 0
msa_mask: 276-477, 652-857
numbering_query: 1
- path: /path/to/features3.pkl
keep_msa: 30
keep_templates: 0
msa_mask: 8-250
numbering_query: 4
templates:
- pdb: /path/to/templates/template.pdb
add_to_msa: true
add_to_templates: True
generate_multimer: False
aligned: true
modifications:
- chain: A
position: 1
mutations:
- numbering_residues: 276-477
mutate_with: /path/to/data/seq1.fasta
```
-------------------
## Output information
All information is located in the output_dir directory, which is specified as an input parameter in the configuration file. Inside output_dir, you will find the following folders and files:
- output.html: Contains the results in HTML format, including all plots, run statistics, and prediction analyses.
- output.log: The log file with detailed information from the execution.
- plots/: All plots generated by the output analysis.
- frobenius/: Plots generated by ALEPH.
- interfaces/: Results of the interface analysis performed by PISA.
- clustering/: (If clustering is enabled) Contains the results related to clustering jobs.
- input/: All input files used in the run.
- run/: Stores runtime information and outputs (see below for details).
- templates/: Templates extracted from the features.pkl, split by chains.
- rankeds: Ranked models generated by AlphaFold2, split by chains.
Inside the run/ directory, you will find:
- results: Results of the AlphaFold2 run (see below for details).
- Templates folder: Subfolders named after each template, containing the databases generated to align each template.
- Sequences folder: Subfolders named after each sequence, containing alignments of the templates with the corresponding sequence.
Inside the run/results/ directory, you will find:
- tmp/: Contains intermediate files generated by external programs (e.g., Aleph).
- ccanalysis/ and ccanalysis_ranked/: PDB files used for the cc_analysis run.
- msas/: Information generated by AlphaFold2. It contains the extracted sequences and the template alignments.
- templates_nonsplit/: Templates extracted from features.pkl, not split by chains.
- rankeds_split/: Ranked models generated by AlphaFold2, split by chains.
- rankeds/: Ranked models generated by AlphaFold2, not split by chains.
Raw data
{
"_id": null,
"home_page": "http://chango.ibmb.csic.es",
"name": "vairo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "crystallography macromolecular",
"author": "Isabel Uson",
"author_email": "bugs-borges@ibmb.csic.es",
"download_url": "https://files.pythonhosted.org/packages/7d/66/428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a/vairo-1.0.0.tar.gz",
"platform": null,
"description": "# VAIRO\nGuiding structural model predictions with experimental information\n\n-------------------\n\n## Prerequisites\n* AlphaFold2\n* HH-suite\n* CCP4 suite\n* ALEPH\n* MAXIT\n\n-------------------\n\n## Installation\n\nIn order to install VAIRO and its interface VAIROGUI, you need to run the installer script located in tools/install_vairo.sh. This script handles conda setup and installs all VAIRO dependencies within a dedicated environment.\n\nExecute the installer script:\n```\nbash tools/install_vairo.sh\n```\n\nThe script will:\n1. Check for an existing conda installation and install it if missing.\n2. Create and activate a conda environment for VAIRO.\n3. Install all Python and system dependencies required by VAIRO.\n4. Verify that system libraries like CUDA drivers or MAXIT are already present.\n\n-------------------\n\n## Usage\n\nTo run the command-line program:\n\n```\nvairo [-h] [-check] <config.yaml>\n```\n\n| Flag | Description |\n|-----------|------------------------------------------------|\n| `-h` | Show help and exit |\n| `-check` | Validate configuration (*.yml*) file parsing | \n\n\nTo launch the graphical interface:\n```\nvairogui\n```\n\n-------------------\n\n## Configuration File (YAML)\n-------------------\nThe configuration file must be in valid YAML. Below are all supported sections and parameters.\n\n### 1. Mandatory keys\n\n mode (string) Choose one of: naive, guided.\n output_dir (string) Directory where results will be saved.\n af2_dbs_path (string) Path to the AlphaFold2 databases (must be pre-downloaded).\n\n### 2. Common optional keys\n\n run_dir (string, default: \"run\") Directory where AlphaFold2 jobs will run.\n glycines (integer, default: 50) Number of glycine residues to insert between concatenated sequences.\n small_bfd (boolean, default: false) Use reduced BFD library.\n run_af2 (boolean, default: true) Run AlphaFold2 (otherwise stop after generating features.pkl file).\n stop_after_msa (boolean, default: false) Run AlphaFold2 up to MSA generation, then exit.\n reference (string, default: \"\") PDB ID or path to PDB file to be used as global reference.\n experimental_pdbs (list of strings, default: []) List of PDB IDs or paths to PDB files for result comparison.\n mosaic (integer, default: null) Split the sequence into X partitions.\n mosaic_partition (range, default: null) Residue based partitioning.\n mosaic_seq_partition (range, default: null) Sequence numbering partitioning.\n cluster_templates (boolean, default: false - becomes true if mode: naive) Cluster templates from preprocessed features.pkl.\n cluster_templates_msa (integer, default: -1) Number of sequences to add to the MSA (-1 = all).\n cluster_templates_msa_mask (sequence range, default: null) Remove specific residues from MSA sequences.\n cluster_templates_sequence (string path, default: null) Replace templates sequences using FASTA at given path.\n show_pymol (string, default: null) Pymol selection string (comma-separated regions) to zoom into.\n\n\n### 3. Query sequence\nDefine one or more sequences to generate the query sequence. All sequences will be concatenated using glycine linkers.\n```\nsequences:\n - fasta_path (string, mandatory) Path to the FASTA file.\n num_of_copies (integer, default: 1) Number of copies of the sequence.\n positions (list of integers, default: [], any position) Insertion position in the query.\n name (string, default: file name from fasta_path) Sequence name.\n predict_region (range, default: null) Predict only this subsequence instead of the full length.\n mutations (map) Map three-letter amino acid codes to residue indices. Example:\n - 'ALA': 10, 20\n```\n\n### 4. Add templates\nCustomize PDB templates for insertion into features.pkl.\n```\ntemplates:\n - pdb (string, mandatory) Path to a PDB file or existing PDB ID.\n add_to_msa (boolean, default: false) Add the template\u2019s sequence to the MSA.\n add_to_templates (boolean, default: true) Include the template in features.pkl.\n generate_multimer (boolean, default: true) Generate a multimeric assembly from the PDB.\n strict (boolean, default: true) Discard templates with E-values below threshold.\n aligned (boolean, default: false) Skip alignment if already aligned.\n legacy (boolean, default: false) Use pre-aligned, single-chain template for the full query.\n reference (string, default: null) Reference to be used in order to insert it into the query sequence.\n modifications (List) Chain-level edits before/after alignment. Each modification can include:\n - chain (string, mandatory) chain ID or All.\n position (integer, default: null) Insertion position in query (if single chain).\n maintain_residues (list of integers, default: null) Selected residues will be kept, and the rest will be deleted.\n delete_residues (list of integers, default: null) Selected residues will be deleted, the rest will be kept.\n when (string, default: after_alignment) before_alignment or after_alignment.\n mutations (List) Modifications in the residues:\n - numbering_residues (list of integers, mandatory) Residue positions where the mutations will be applied.\n mutate_with (string, mandatory) The amino acid to mutate to, specified as a three\u2011letter code or as a FASTA file path.\n```\n\n### 5. Add features\nMerge or slice existing features.pkl files from other AlphaFold2 runs into your run.\n```\nfeatures:\n - path (string, mandatory) Path to an existing features.pkl file.\n keep_msa (integer, default: -1) -1 = all sequences; otherwise top X by coverage.\n keep_templates (integer, default: -1) -1 = all templates; otherwise top X by coverage.\n msa_mask (range, default: null) Remove this residue range from the MSA.\n sequence (string, default: null) FASTA file to replace all template sequences.\n numbering_query (list of integers, default: null) Insertion positions in the query sequence.\n numbering_features (list of ranges, default: null) Map feature blocks into the positions given by numbering_query.\n positions (range, default: null) Inserts the features.pkl into the query sequence. The position refers to the sequence index, whereas in numbering_query and numbering_features, it refers to the residue positions in the entire query sequence.\n mutations (map) Map three-letter amino acid codes to residue indices. Example:\n - 'ALA': 10, 20\n```\n\n### 6. Append library\nAppend existing FASTA/PDB files from a library into your run.\n```\nappend_library:\n - path: (string, mandatory) Path to a directory, PDB, or FASTA file.\n add_to_msa (boolean, default: true) Append sequences to the MSA.\n add_to_templates (boolean, default: false) Append PDBs to the templates.\n numbering_query (list of integers, default: null) Insertion positions in the query.\n numbering_library (list of ranges, default: null) Residue range from the library entry to insert.\n```\n\n### 7. Configuration file example\n```\nmode: guided\noutput_dir: /path/to/output\naf2_dbs_path: /path/to/af2_dbs\nrun_af2: True\nexperimental_pdbs: /path/to/references/experimental.pdb\n\nsequences:\n- fasta_path: /path/to/data/seq1.fasta\n num_of_copies: 1\n- fasta_path: /path/to/data/seq2.fasta\n num_of_copies: 1\n- fasta_path: /path/to/data/seq3.fasta\n num_of_copies: 1\n- fasta_path: /path/to/data/seq4.fasta\n num_of_copies: 1\n\nfeatures:\n- path: /path/to/features1.pkl\n keep_msa: 30\n keep_templates: 0\n numbering_query: 1\n\n- path: /path/to/features2.pkl\n keep_msa: 30\n keep_templates: 0\n msa_mask: 276-477, 652-857\n numbering_query: 1\n\n- path: /path/to/features3.pkl\n keep_msa: 30\n keep_templates: 0\n msa_mask: 8-250\n numbering_query: 4\n\ntemplates:\n- pdb: /path/to/templates/template.pdb\n add_to_msa: true\n add_to_templates: True\n generate_multimer: False\n aligned: true\n modifications:\n - chain: A\n position: 1\n mutations:\n - numbering_residues: 276-477\n mutate_with: /path/to/data/seq1.fasta\n\n```\n\n-------------------\n\n## Output information\n\nAll information is located in the output_dir directory, which is specified as an input parameter in the configuration file. Inside output_dir, you will find the following folders and files:\n- output.html: Contains the results in HTML format, including all plots, run statistics, and prediction analyses.\n- output.log: The log file with detailed information from the execution.\n- plots/: All plots generated by the output analysis.\n- frobenius/: Plots generated by ALEPH.\n- interfaces/: Results of the interface analysis performed by PISA.\n- clustering/: (If clustering is enabled) Contains the results related to clustering jobs.\n- input/: All input files used in the run.\n- run/: Stores runtime information and outputs (see below for details).\n- templates/: Templates extracted from the features.pkl, split by chains.\n- rankeds: Ranked models generated by AlphaFold2, split by chains.\n\nInside the run/ directory, you will find:\n- results: Results of the AlphaFold2 run (see below for details).\n- Templates folder: Subfolders named after each template, containing the databases generated to align each template.\n- Sequences folder: Subfolders named after each sequence, containing alignments of the templates with the corresponding sequence.\n\nInside the run/results/ directory, you will find:\n- tmp/: Contains intermediate files generated by external programs (e.g., Aleph).\n- ccanalysis/ and ccanalysis_ranked/: PDB files used for the cc_analysis run.\n- msas/: Information generated by AlphaFold2. It contains the extracted sequences and the template alignments.\n- templates_nonsplit/: Templates extracted from features.pkl, not split by chains.\n- rankeds_split/: Ranked models generated by AlphaFold2, split by chains.\n- rankeds/: Ranked models generated by AlphaFold2, not split by chains.\n",
"bugtrack_url": null,
"license": null,
"summary": "VAIRO guides predictions towards particular dynamic states selecting the prior information input or analysing the results of the search.",
"version": "1.0.0",
"project_urls": {
"Homepage": "http://chango.ibmb.csic.es"
},
"split_keywords": [
"crystallography",
"macromolecular"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "cc3862634cedd2bddc0025bb1f69121c0c2ff96dd57cb548564864895b5deed7",
"md5": "8fcdd73da3346df9625b84f4d6d262f7",
"sha256": "5af8e968e12b322d826b72f5c587863ea697f0b101306700f1803ca90ea2683e"
},
"downloads": -1,
"filename": "vairo-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8fcdd73da3346df9625b84f4d6d262f7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 13466813,
"upload_time": "2025-08-07T08:32:01",
"upload_time_iso_8601": "2025-08-07T08:32:01.039924Z",
"url": "https://files.pythonhosted.org/packages/cc/38/62634cedd2bddc0025bb1f69121c0c2ff96dd57cb548564864895b5deed7/vairo-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7d66428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a",
"md5": "61df7a9dde7a7280ba1833741fb5fd7d",
"sha256": "a1ff4595bcfb80a88997f07bc66b076d317b62b76240b6aa37184e30770f0f71"
},
"downloads": -1,
"filename": "vairo-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "61df7a9dde7a7280ba1833741fb5fd7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 13338204,
"upload_time": "2025-08-07T08:32:06",
"upload_time_iso_8601": "2025-08-07T08:32:06.029370Z",
"url": "https://files.pythonhosted.org/packages/7d/66/428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a/vairo-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-07 08:32:06",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "vairo"
}