vairo


Namevairo JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttp://chango.ibmb.csic.es
SummaryVAIRO guides predictions towards particular dynamic states selecting the prior information input or analysing the results of the search.
upload_time2025-08-07 08:32:06
maintainerNone
docs_urlNone
authorIsabel Uson
requires_python>=3.6
licenseNone
keywords crystallography macromolecular
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # VAIRO
Guiding structural model predictions with experimental information

-------------------

## Prerequisites
* AlphaFold2
* HH-suite
* CCP4 suite
* ALEPH
* MAXIT

-------------------

## Installation

In order to install VAIRO and its interface VAIROGUI, you need to run the installer script located in tools/install_vairo.sh. This script handles conda setup and installs all VAIRO dependencies within a dedicated environment.

Execute the installer script:
```
bash tools/install_vairo.sh
```

The script will:
1. Check for an existing conda installation and install it if missing.
2. Create and activate a conda environment for VAIRO.
3. Install all Python and system dependencies required by VAIRO.
4. Verify that system libraries like CUDA drivers or MAXIT are already present.

-------------------

## Usage

To run the command-line program:

```
vairo [-h] [-check] <config.yaml>
```

| Flag      | Description                                    |
|-----------|------------------------------------------------|
| `-h`      | Show help and exit                             |
| `-check`  | Validate configuration (*.yml*) file parsing   |    


To launch the graphical interface:
```
vairogui
```

-------------------

## Configuration File (YAML)
-------------------
The configuration file must be in valid YAML. Below are all supported sections and parameters.

### 1. Mandatory keys

    mode (string) Choose one of: naive, guided.
    output_dir (string) Directory where results will be saved.
    af2_dbs_path (string) Path to the AlphaFold2 databases (must be pre-downloaded).

### 2. Common optional keys

    run_dir (string, default: "run") Directory where AlphaFold2 jobs will run.
    glycines (integer, default: 50) Number of glycine residues to insert between concatenated sequences.
    small_bfd (boolean, default: false) Use reduced BFD library.
    run_af2 (boolean, default: true) Run AlphaFold2 (otherwise stop after generating features.pkl file).
    stop_after_msa (boolean, default: false) Run AlphaFold2 up to MSA generation, then exit.
    reference (string, default: "") PDB ID or path to PDB file to be used as global reference.
    experimental_pdbs (list of strings, default: []) List of PDB IDs or paths to PDB files for result comparison.
    mosaic (integer, default: null) Split the sequence into X partitions.
    mosaic_partition (range, default: null) Residue based partitioning.
    mosaic_seq_partition (range, default: null) Sequence numbering partitioning.
    cluster_templates (boolean, default: false - becomes true if mode: naive) Cluster templates from preprocessed features.pkl.
    cluster_templates_msa (integer, default: -1) Number of sequences to add to the MSA (-1 = all).
    cluster_templates_msa_mask (sequence range, default: null) Remove specific residues from MSA sequences.
    cluster_templates_sequence (string path, default: null) Replace templates sequences using FASTA at given path.
    show_pymol (string, default: null) Pymol selection string (comma-separated regions) to zoom into.


### 3. Query sequence
Define one or more sequences to generate the query sequence. All sequences will be concatenated using glycine linkers.
```
sequences:
    - fasta_path (string, mandatory) Path to the FASTA file.
      num_of_copies (integer, default: 1) Number of copies of the sequence.
      positions (list of integers, default: [], any position) Insertion position in the query.
      name (string, default: file name from fasta_path) Sequence name.
      predict_region (range, default: null) Predict only this subsequence instead of the full length.
      mutations (map) Map three-letter amino acid codes to residue indices. Example:
        - 'ALA': 10, 20
```

### 4. Add templates
Customize PDB templates for insertion into features.pkl.
```
templates:
    - pdb (string, mandatory) Path to a PDB file or existing PDB ID.
      add_to_msa (boolean, default: false) Add the template’s sequence to the MSA.
      add_to_templates (boolean, default: true) Include the template in features.pkl.
      generate_multimer (boolean, default: true) Generate a multimeric assembly from the PDB.
      strict (boolean, default: true) Discard templates with E-values below threshold.
      aligned (boolean, default: false) Skip alignment if already aligned.
      legacy (boolean, default: false) Use pre-aligned, single-chain template for the full query.
      reference (string, default: null) Reference to be used in order to insert it into the query sequence.
      modifications (List) Chain-level edits before/after alignment. Each modification can include:
         - chain (string, mandatory) chain ID or All.
           position (integer, default: null) Insertion position in query (if single chain).
           maintain_residues (list of integers, default: null) Selected residues will be kept, and the rest will be deleted.
           delete_residues (list of integers, default: null) Selected residues will be deleted, the rest will be kept.
           when (string, default: after_alignment) before_alignment or after_alignment.
           mutations (List) Modifications in the residues:
              - numbering_residues (list of integers, mandatory) Residue positions where the mutations will be applied.
                mutate_with (string, mandatory) The amino acid to mutate to, specified as a three‑letter code or as a FASTA file path.
```

### 5. Add features
Merge or slice existing features.pkl files from other AlphaFold2 runs into your run.
```
features:
    - path (string, mandatory) Path to an existing features.pkl file.
      keep_msa (integer, default: -1) -1 = all sequences; otherwise top X by coverage.
      keep_templates (integer, default: -1) -1 = all templates; otherwise top X by coverage.
      msa_mask (range, default: null) Remove this residue range from the MSA.
      sequence (string, default: null) FASTA file to replace all template sequences.
      numbering_query (list of integers, default: null) Insertion positions in the query sequence.
      numbering_features (list of ranges, default: null) Map feature blocks into the positions given by numbering_query.
      positions (range, default: null) Inserts the features.pkl into the query sequence. The position refers to the sequence index, whereas in numbering_query and numbering_features, it refers to the residue positions in the entire query sequence.
      mutations (map) Map three-letter amino acid codes to residue indices. Example:
        - 'ALA': 10, 20
```

### 6. Append library
Append existing FASTA/PDB files from a library into your run.
```
append_library:
    - path: (string, mandatory) Path to a directory, PDB, or FASTA file.
      add_to_msa (boolean, default: true) Append sequences to the MSA.
      add_to_templates (boolean, default: false) Append PDBs to the templates.
      numbering_query (list of integers, default: null) Insertion positions in the query.
      numbering_library (list of ranges, default: null) Residue range from the library entry to insert.
```

### 7. Configuration file example
```
mode: guided
output_dir: /path/to/output
af2_dbs_path: /path/to/af2_dbs
run_af2: True
experimental_pdbs: /path/to/references/experimental.pdb

sequences:
- fasta_path: /path/to/data/seq1.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq2.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq3.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq4.fasta
  num_of_copies: 1

features:
- path: /path/to/features1.pkl
  keep_msa: 30
  keep_templates: 0
  numbering_query: 1

- path: /path/to/features2.pkl
  keep_msa: 30
  keep_templates: 0
  msa_mask: 276-477, 652-857
  numbering_query: 1

- path: /path/to/features3.pkl
  keep_msa: 30
  keep_templates: 0
  msa_mask: 8-250
  numbering_query: 4

templates:
- pdb: /path/to/templates/template.pdb
  add_to_msa: true
  add_to_templates: True
  generate_multimer: False
  aligned: true
  modifications:
  - chain: A
    position: 1
    mutations:
    - numbering_residues: 276-477
      mutate_with: /path/to/data/seq1.fasta

```

-------------------

## Output information

All information is located in the output_dir directory, which is specified as an input parameter in the configuration file. Inside output_dir, you will find the following folders and files:
- output.html: Contains the results in HTML format, including all plots, run statistics, and prediction analyses.
- output.log: The log file with detailed information from the execution.
- plots/: All plots generated by the output analysis.
- frobenius/: Plots generated by ALEPH.
- interfaces/: Results of the interface analysis performed by PISA.
- clustering/: (If clustering is enabled) Contains the results related to clustering jobs.
- input/: All input files used in the run.
- run/: Stores runtime information and outputs (see below for details).
- templates/: Templates extracted from the features.pkl, split by chains.
- rankeds: Ranked models generated by AlphaFold2, split by chains.

Inside the run/ directory, you will find:
- results: Results of the AlphaFold2 run (see below for details).
- Templates folder: Subfolders named after each template, containing the databases generated to align each template.
- Sequences folder: Subfolders named after each sequence, containing alignments of the templates with the corresponding sequence.

Inside the run/results/ directory, you will find:
- tmp/: Contains intermediate files generated by external programs (e.g., Aleph).
- ccanalysis/ and ccanalysis_ranked/: PDB files used for the cc_analysis run.
- msas/: Information generated by AlphaFold2. It contains the extracted sequences and the template alignments.
- templates_nonsplit/: Templates extracted from features.pkl, not split by chains.
- rankeds_split/: Ranked models generated by AlphaFold2, split by chains.
- rankeds/: Ranked models generated by AlphaFold2, not split by chains.

            

Raw data

            {
    "_id": null,
    "home_page": "http://chango.ibmb.csic.es",
    "name": "vairo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "crystallography macromolecular",
    "author": "Isabel Uson",
    "author_email": "bugs-borges@ibmb.csic.es",
    "download_url": "https://files.pythonhosted.org/packages/7d/66/428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a/vairo-1.0.0.tar.gz",
    "platform": null,
    "description": "# VAIRO\nGuiding structural model predictions with experimental information\n\n-------------------\n\n## Prerequisites\n* AlphaFold2\n* HH-suite\n* CCP4 suite\n* ALEPH\n* MAXIT\n\n-------------------\n\n## Installation\n\nIn order to install VAIRO and its interface VAIROGUI, you need to run the installer script located in tools/install_vairo.sh. This script handles conda setup and installs all VAIRO dependencies within a dedicated environment.\n\nExecute the installer script:\n```\nbash tools/install_vairo.sh\n```\n\nThe script will:\n1. Check for an existing conda installation and install it if missing.\n2. Create and activate a conda environment for VAIRO.\n3. Install all Python and system dependencies required by VAIRO.\n4. Verify that system libraries like CUDA drivers or MAXIT are already present.\n\n-------------------\n\n## Usage\n\nTo run the command-line program:\n\n```\nvairo [-h] [-check] <config.yaml>\n```\n\n| Flag      | Description                                    |\n|-----------|------------------------------------------------|\n| `-h`      | Show help and exit                             |\n| `-check`  | Validate configuration (*.yml*) file parsing   |    \n\n\nTo launch the graphical interface:\n```\nvairogui\n```\n\n-------------------\n\n## Configuration File (YAML)\n-------------------\nThe configuration file must be in valid YAML. Below are all supported sections and parameters.\n\n### 1. Mandatory keys\n\n    mode (string) Choose one of: naive, guided.\n    output_dir (string) Directory where results will be saved.\n    af2_dbs_path (string) Path to the AlphaFold2 databases (must be pre-downloaded).\n\n### 2. Common optional keys\n\n    run_dir (string, default: \"run\") Directory where AlphaFold2 jobs will run.\n    glycines (integer, default: 50) Number of glycine residues to insert between concatenated sequences.\n    small_bfd (boolean, default: false) Use reduced BFD library.\n    run_af2 (boolean, default: true) Run AlphaFold2 (otherwise stop after generating features.pkl file).\n    stop_after_msa (boolean, default: false) Run AlphaFold2 up to MSA generation, then exit.\n    reference (string, default: \"\") PDB ID or path to PDB file to be used as global reference.\n    experimental_pdbs (list of strings, default: []) List of PDB IDs or paths to PDB files for result comparison.\n    mosaic (integer, default: null) Split the sequence into X partitions.\n    mosaic_partition (range, default: null) Residue based partitioning.\n    mosaic_seq_partition (range, default: null) Sequence numbering partitioning.\n    cluster_templates (boolean, default: false - becomes true if mode: naive) Cluster templates from preprocessed features.pkl.\n    cluster_templates_msa (integer, default: -1) Number of sequences to add to the MSA (-1 = all).\n    cluster_templates_msa_mask (sequence range, default: null) Remove specific residues from MSA sequences.\n    cluster_templates_sequence (string path, default: null) Replace templates sequences using FASTA at given path.\n    show_pymol (string, default: null) Pymol selection string (comma-separated regions) to zoom into.\n\n\n### 3. Query sequence\nDefine one or more sequences to generate the query sequence. All sequences will be concatenated using glycine linkers.\n```\nsequences:\n    - fasta_path (string, mandatory) Path to the FASTA file.\n      num_of_copies (integer, default: 1) Number of copies of the sequence.\n      positions (list of integers, default: [], any position) Insertion position in the query.\n      name (string, default: file name from fasta_path) Sequence name.\n      predict_region (range, default: null) Predict only this subsequence instead of the full length.\n      mutations (map) Map three-letter amino acid codes to residue indices. Example:\n        - 'ALA': 10, 20\n```\n\n### 4. Add templates\nCustomize PDB templates for insertion into features.pkl.\n```\ntemplates:\n    - pdb (string, mandatory) Path to a PDB file or existing PDB ID.\n      add_to_msa (boolean, default: false) Add the template\u2019s sequence to the MSA.\n      add_to_templates (boolean, default: true) Include the template in features.pkl.\n      generate_multimer (boolean, default: true) Generate a multimeric assembly from the PDB.\n      strict (boolean, default: true) Discard templates with E-values below threshold.\n      aligned (boolean, default: false) Skip alignment if already aligned.\n      legacy (boolean, default: false) Use pre-aligned, single-chain template for the full query.\n      reference (string, default: null) Reference to be used in order to insert it into the query sequence.\n      modifications (List) Chain-level edits before/after alignment. Each modification can include:\n         - chain (string, mandatory) chain ID or All.\n           position (integer, default: null) Insertion position in query (if single chain).\n           maintain_residues (list of integers, default: null) Selected residues will be kept, and the rest will be deleted.\n           delete_residues (list of integers, default: null) Selected residues will be deleted, the rest will be kept.\n           when (string, default: after_alignment) before_alignment or after_alignment.\n           mutations (List) Modifications in the residues:\n              - numbering_residues (list of integers, mandatory) Residue positions where the mutations will be applied.\n                mutate_with (string, mandatory) The amino acid to mutate to, specified as a three\u2011letter code or as a FASTA file path.\n```\n\n### 5. Add features\nMerge or slice existing features.pkl files from other AlphaFold2 runs into your run.\n```\nfeatures:\n    - path (string, mandatory) Path to an existing features.pkl file.\n      keep_msa (integer, default: -1) -1 = all sequences; otherwise top X by coverage.\n      keep_templates (integer, default: -1) -1 = all templates; otherwise top X by coverage.\n      msa_mask (range, default: null) Remove this residue range from the MSA.\n      sequence (string, default: null) FASTA file to replace all template sequences.\n      numbering_query (list of integers, default: null) Insertion positions in the query sequence.\n      numbering_features (list of ranges, default: null) Map feature blocks into the positions given by numbering_query.\n      positions (range, default: null) Inserts the features.pkl into the query sequence. The position refers to the sequence index, whereas in numbering_query and numbering_features, it refers to the residue positions in the entire query sequence.\n      mutations (map) Map three-letter amino acid codes to residue indices. Example:\n        - 'ALA': 10, 20\n```\n\n### 6. Append library\nAppend existing FASTA/PDB files from a library into your run.\n```\nappend_library:\n    - path: (string, mandatory) Path to a directory, PDB, or FASTA file.\n      add_to_msa (boolean, default: true) Append sequences to the MSA.\n      add_to_templates (boolean, default: false) Append PDBs to the templates.\n      numbering_query (list of integers, default: null) Insertion positions in the query.\n      numbering_library (list of ranges, default: null) Residue range from the library entry to insert.\n```\n\n### 7. Configuration file example\n```\nmode: guided\noutput_dir: /path/to/output\naf2_dbs_path: /path/to/af2_dbs\nrun_af2: True\nexperimental_pdbs: /path/to/references/experimental.pdb\n\nsequences:\n- fasta_path: /path/to/data/seq1.fasta\n  num_of_copies: 1\n- fasta_path: /path/to/data/seq2.fasta\n  num_of_copies: 1\n- fasta_path: /path/to/data/seq3.fasta\n  num_of_copies: 1\n- fasta_path: /path/to/data/seq4.fasta\n  num_of_copies: 1\n\nfeatures:\n- path: /path/to/features1.pkl\n  keep_msa: 30\n  keep_templates: 0\n  numbering_query: 1\n\n- path: /path/to/features2.pkl\n  keep_msa: 30\n  keep_templates: 0\n  msa_mask: 276-477, 652-857\n  numbering_query: 1\n\n- path: /path/to/features3.pkl\n  keep_msa: 30\n  keep_templates: 0\n  msa_mask: 8-250\n  numbering_query: 4\n\ntemplates:\n- pdb: /path/to/templates/template.pdb\n  add_to_msa: true\n  add_to_templates: True\n  generate_multimer: False\n  aligned: true\n  modifications:\n  - chain: A\n    position: 1\n    mutations:\n    - numbering_residues: 276-477\n      mutate_with: /path/to/data/seq1.fasta\n\n```\n\n-------------------\n\n## Output information\n\nAll information is located in the output_dir directory, which is specified as an input parameter in the configuration file. Inside output_dir, you will find the following folders and files:\n- output.html: Contains the results in HTML format, including all plots, run statistics, and prediction analyses.\n- output.log: The log file with detailed information from the execution.\n- plots/: All plots generated by the output analysis.\n- frobenius/: Plots generated by ALEPH.\n- interfaces/: Results of the interface analysis performed by PISA.\n- clustering/: (If clustering is enabled) Contains the results related to clustering jobs.\n- input/: All input files used in the run.\n- run/: Stores runtime information and outputs (see below for details).\n- templates/: Templates extracted from the features.pkl, split by chains.\n- rankeds: Ranked models generated by AlphaFold2, split by chains.\n\nInside the run/ directory, you will find:\n- results: Results of the AlphaFold2 run (see below for details).\n- Templates folder: Subfolders named after each template, containing the databases generated to align each template.\n- Sequences folder: Subfolders named after each sequence, containing alignments of the templates with the corresponding sequence.\n\nInside the run/results/ directory, you will find:\n- tmp/: Contains intermediate files generated by external programs (e.g., Aleph).\n- ccanalysis/ and ccanalysis_ranked/: PDB files used for the cc_analysis run.\n- msas/: Information generated by AlphaFold2. It contains the extracted sequences and the template alignments.\n- templates_nonsplit/: Templates extracted from features.pkl, not split by chains.\n- rankeds_split/: Ranked models generated by AlphaFold2, split by chains.\n- rankeds/: Ranked models generated by AlphaFold2, not split by chains.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "VAIRO guides predictions towards particular dynamic states selecting the prior information input or analysing the results of the search.",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "http://chango.ibmb.csic.es"
    },
    "split_keywords": [
        "crystallography",
        "macromolecular"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc3862634cedd2bddc0025bb1f69121c0c2ff96dd57cb548564864895b5deed7",
                "md5": "8fcdd73da3346df9625b84f4d6d262f7",
                "sha256": "5af8e968e12b322d826b72f5c587863ea697f0b101306700f1803ca90ea2683e"
            },
            "downloads": -1,
            "filename": "vairo-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8fcdd73da3346df9625b84f4d6d262f7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 13466813,
            "upload_time": "2025-08-07T08:32:01",
            "upload_time_iso_8601": "2025-08-07T08:32:01.039924Z",
            "url": "https://files.pythonhosted.org/packages/cc/38/62634cedd2bddc0025bb1f69121c0c2ff96dd57cb548564864895b5deed7/vairo-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7d66428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a",
                "md5": "61df7a9dde7a7280ba1833741fb5fd7d",
                "sha256": "a1ff4595bcfb80a88997f07bc66b076d317b62b76240b6aa37184e30770f0f71"
            },
            "downloads": -1,
            "filename": "vairo-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "61df7a9dde7a7280ba1833741fb5fd7d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 13338204,
            "upload_time": "2025-08-07T08:32:06",
            "upload_time_iso_8601": "2025-08-07T08:32:06.029370Z",
            "url": "https://files.pythonhosted.org/packages/7d/66/428161e621dfc45101f75960db497244ba7295fbea52545eb933b29fe92a/vairo-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-07 08:32:06",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "vairo"
}
        
Elapsed time: 2.00599s