Oncodrive3D


NameOncodrive3D JSON
Version 1.0.4 PyPI version JSON
download
home_pageNone
SummaryOncodrive3D is a method designed to analyse patterns of somatic mutations across tumors to identify three-dimensional (3D) clusters of missense mutations and detect genes that are under positive selection.
upload_time2025-01-17 11:23:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseGNU Affero General Public License v3 or later (AGPLv3+)
keywords bbglab bioinformatics driversprediction positiveselection
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Oncodrive3D

**Oncodrive3D** is a fast and accurate computational method designed to analyze patterns of somatic mutation across tumors, with the goal of identifying **three-dimensional (3D) clusters** of missense mutations and detecting genes under **positive selection**. 

The method leverages **AlphaFold 2-predicted protein structures** and Predicted Aligned Error (PAE) to define residue contacts within the protein's 3D space. When available, it integrates **mutational profiles** to build an accurate background model of neutral mutagenesis. By applying a novel **rank-based statistical approach**, Oncodrive3D scores potential 3D clusters and computes empirical p-values."

[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![docker](https://img.shields.io/docker/v/bbglab/oncodrive3d?logo=docker)](https://hub.docker.com/r/bbglab/oncodrive3d)
[![PyPI - Version](https://img.shields.io/pypi/v/oncodrive3d?logo=pypi)](https://pypi.org/project/Oncodrive3D/)

---

## License

Oncodrive3D is available to the general public subject to certain conditions described in its [LICENSE](LICENSE).

## Requirements

Before you begin, ensure **Python 3.10 or later** is installed on your system.  
Additionally, you may need to install additional development tools. Depending on your environment, you can choose one of the following methods:

- If you have sudo privileges:

   ```bash
   sudo apt install built-essential
   ```

- For HPC cluster environment, it is recommended to use [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or [Mamba](https://mamba.readthedocs.io/en/latest/)):

   ```bash
   conda create -n o3d python=3.10.0
   conda activate o3d
   conda install -c conda-forge gxx gcc libxcrypt clang zlib
   ```


## Installation

- Install via PyPI:

   ```bash
   pip install oncodrive3d
   ```

- Alternatively, you can obtain the latest code from the repository and install it for development with pip:

   ```bash
   git clone https://github.com/bbglab/oncodrive3d.git
   cd oncodrive3d
   pip install -e .
   oncodrive3d --help
   ```

- Or you can use a modern build tool like [uv](https://github.com/astral-sh/uv):

   ```bash
   git clone https://github.com/bbglab/oncodrive3d.git
   cd oncodrive3d
   uv run oncodrive3d --help
   ```

## Building Datasets

This step build the datasets necessary for Oncodrive3D to run the 3D clustering analysis. It is required once after installation or whenever you need to generate datasets for a different organism or apply a specific threshold to define amino acid contacts.

> [!WARNING]
> This step is highly time- and resource-intensive, requiring a significant amount of free disk space. It will download a large amount of data, including AlphaFold-predicted structures and reference genomes (if not already cached). Ensure sufficient resources are available before proceeding, as insufficient capacity may result in extended runtimes or processing failures.

> [!NOTE]
> The first time that you run Oncodrive3D building dataset step with a given reference genome, it will download it from our servers. By default the downloaded datasets go to`~/.bgdata`. If you want to move these datasets to another folder you have to define the system environment variable `BGDATA_LOCAL` with an export command.

```
Usage: oncodrive3d build-datasets [OPTIONS]

Examples:
  Basic build:
    oncodrive3d build-datasets -o <build_folder>
  
  Build with MANE Select transcripts:
    oncodrive3d build-datasets -o <build_folder> --mane

Options:
  -o, --output_dir PATH           Path to the directory where the output files will be saved. 
                                  Default: ./datasets/
  -s, --organism PATH             Specifies the organism (`human` or `mouse`). 
                                  Default: human
  -m, --mane                      Use structures predicted from MANE Select transcripts 
                                  (applicable to Homo sapiens only).
  -d, --distance_threshold INT    Distance threshold (Å) for defining residues contacts. 
                                  Default: 10
  -c, --cores INT                 Number of CPU cores for computation. 
                                  Default: All available CPU cores
  -v, --verbose                   Enables verbose output.
  -h, --help                      Show this message and exit.  
```

For more information on the output of this step, please refer to the [Building Datasets Output Documentation](https://github.com/bbglab/oncodrive3d/tree/master/docs/build_output.md).


## Running 3D clustering Analysis

For in depth information on how to obtain the required input data and for comprehensive information about the output, please refer to the [Input and Output Documentation](https://github.com/bbglab/oncodrive3d/tree/master/docs/run_input_output.md) of the 3D clustering analysis.  

### Input

- **Mutations file** (`required`): It can be either:
   - **<input_maf>**: A Mutation Annotation Format (MAF) file annotated with consequences (e.g., by using [Ensembl Variant Effect Predictor (VEP)](https://www.ensembl.org/info/docs/tools/vep/index.html)).
   - **<input_vep>**: The unfiltered output of VEP including annotations for all possible transcripts.

- **<mut_profile>** (`optional`): Dictionary including the normalized frequencies of mutations (*values*) in every possible trinucleotide context (*keys*), such as 'ACA>A', 'ACC>A', and so on.

---

> [!NOTE] 
> Examples of the input files are available in the [Test Input Folder](https://github.com/bbglab/oncodrive3d/tree/master/test/input).  
Please refer to these examples to understand the expected format and structure of the input files.

---

---

> [!NOTE]
> Oncodrive3D uses the mutational profile of the cohort to build an accurate background model. However, it’s not strictly required. If the mutational profile is not provided, the tool will use a simple uniform distribution as the background model for simulating mutations and scoring potential 3D clusters.

---

### Main Output

- **Gene-level output**: CSV file (`\<cohort>.3d_clustering_genes.csv`) containing the results of the analysis at the gene level. Each row represents a gene, sorted from the most significant to the least significant based on the 3D clustering analysis. The table also includes genes that were not analyzed, with the reason for exclusion provided in the `status` column.
  
- **Residue-level output**: CSV file (`<cohort>.3d_clustering_pos.csv`) containing the results of the analysis at the level of mutated residues. Each row corresponds to a mutated position within a gene and includes detailed information for each potential mutational cluster.


### Usage

```
Usage: oncodrive3d run [OPTIONS]

Examples:
  Basic run:
    oncodrive3d run -i <input_maf> -p <mut_profile> -d <build_folder> -C <cohort_name>
  
  Example of run using VEP output as input and MANE Select transcripts:
    oncodrive3d run -i <input_vep> -p <mut_profile> -d <build_folder> -C <cohort_name> \
                    --o3d_transcripts --use_input_symbols --mane

Options:
  -i, --input_path PATH            Path to the input file (MAF or VEP output) containing the 
                                   annotated mutations for the cohort. [required]
  -p, --mut_profile_path PATH      Path to the JSON file specifying the cohort's mutational 
                                   profile (192 key-value pairs).
  -o, --output_dir PATH            Path to the output directory for results. 
                                   Default: ./output/
  -d, --data_dir PATH              Path to the directory containing the datasets built in the 
                                   building datasets step. 
                                   Default: ./datasets/
  -c, --cores INT                  Number of CPU cores to use. 
                                   Default: All available CPU cores
  -s, --seed INT                   Random seed for reproducibility.
  -v, --verbose                    Enables verbose output.
  -t, --cancer_type STR            Cancer type to include as metadata in the output file.
  -C, --cohort STR                 Cohort name for metadata and output file naming. 
  -P, --cmap_prob_thr FLOAT        Threshold for defining residues contacts based on distance 
                                   on predicted structure and predicted aligned error (PAE). 
                                   Default: 0.5
  --mane                           Prioritizes MANE Select transcripts when multiple 
                                   structures map to the same gene symbol.
  --o3d_transcripts                Filters mutations including only transcripts in Oncodrive3D 
                                   built datasets (requires VEP output as input file).
  --use_input_symbols              Update HUGO symbols in Oncodrive3D built datasets using the 
                                   input file's entries (requires VEP output as input file).
  -h, --help                       Show this message and exit.  
```


---

> [!NOTE]
> To maximize the number of matching transcripts between the input mutations and the AlphaFold predicted structures used by Oncodrive3D, it is recommended to use the unfiltered output of VEP (including all possible transcripts) as input, along with the flags `--o3d_transcripts` `--use_input_symbols` in the `oncodrive3d run` command.

---

### Running With Singularity

```
singularity pull oncodrive3d.sif docker://bbglab/oncodrive3d:latest
singularity exec oncodrive3d.sif oncodrive3d run -i <input_maf> -p <mut_profile> \ 
                                                 -d <build_folder> -C <cohort_name>
```


### Testing

To verify that Oncodrive3D is installed and configured correctly, you can perform a test run using the provided test input files: 

```
oncodrive3d run -d <build_folder> \
                -i ./test/input/maf/TCGA_WXS_ACC.in.maf \ 
                -p ./test/input/mut_profile/TCGA_WXS_ACC.sig.json \
                -o ./test/output/ -C TCGA_WXS_ACC
```

Check the output in the `test/output/` directory to ensure the analysis completes successfully.


## Parallel Processing on Multiple Cohorts

Oncodrive3D can be run in parallel on multiple cohorts using [Nextflow](https://www.nextflow.io/). This approach enables efficient, reproducible and scalable analysis across datasets.

### Requirements

1. Install [Nextflow](https://www.nextflow.io/docs/latest/getstarted.html) (version `23.04.3` was used for testing).
2. Install and set up either or both:
   - [Singularity](https://sylabs.io/guides/latest/user-guide/installation.html)  
      Pull the Oncodrive3D Singularity image from Docker Hub:

      ```
      singularity pull oncodrive3d.sif docker://bbglab/oncodrive3d:latest
      ```

   - [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)  
      Ensure Oncodrive3D is installed in your Conda environment and update the `params` section of the `nextflow.config` file to point to your Conda installation:

         ```groovy
         params {
            ...
            conda_env = '/path/to/conda/environment/with/oncodrive3d' 
            ...
         }
         ```

      Replace `/path/to/conda/environment/with/oncodrive3d` with the path to your Conda environment. Alternatively, you can provide it as a command-line argument.


### Test Run

Run a test to ensure that everything is set up correctly and functioning as expected:

```
cd oncodrive3d_pipeline
nextflow run main.nf -profile test,container --data_dir <build_folder>
```

Replace `<build_folder>` with the path to the Oncodrive3D datasets built in the [building datasets](#building-datasets) step.
If you prefer to use Conda, replace `container` in the `-profile` argument with `conda`.

### Usage

---

> [!WARNING]
> When using the Nextflow script, ensure that your input files are organized in the following directory structure:
> 
> ```plaintext
> input/
>   ├── maf/
>   │   └── <cohort>.in.maf
>   ├── vep/
>   │   └── <cohort>.vep.tsv.gz
>   └── mut_profile/
>       └── <cohort>.sig.json
> ```
> 
> - `maf/`: Contains mutation files with the `.in.maf` extension.
> - `vep/`: Contains VEP annotation files with the `.vep.tsv.gz` extension, which include annotated mutations with all possible transcripts.
> - `mut_profile/`: Contains mutational profile files with the `.sig.json` extension.

---

```
Usage: nextflow run main.nf [OPTIONS]

Example of run using VEP output as input and MANE Select transcripts:
  nextflow run main.nf -profile container --data_dir <build_folder> --indir <input> \
                       --vep_input true --mane true
  
Options:
  --indir PATH                    Path to the input directory including the subdirectories 
                                  `maf` or `vep` and `mut_profile`. 
  --outdir PATH                   Path to the output directory. 
                                  Default: run_<timestamp>/
  --cohort_pattern STR            Pattern expression to filter specific files within the 
                                  input directory (e.g., 'TCGA*' select only TCGA cohorts). 
                                  Default: *
  --data_dir PATH                 Path to the Oncodrive3D datasets directory, which includes 
                                  the files compiled during the building datasets step.
                                  Default: ${baseDir}/datasets/
  --container PATH                Path to the Singularity image with Oncodrive3D installation. 
                                  Default: ${baseDir}/../oncodrive3d.sif
  --max_running INT               Maximum number of cohorts to process in parallel.
                                  Default: 5
  --cores INT                     Number of CPU cores used to process each cohort. 
                                  Default: 10
  --memory STR                    Amount of memory allocated for processing each cohort. 
                                  Default: 70GB
  --seed INT:                     Seed value for reproducibility.
                                  Default: 128
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "Oncodrive3D",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bbglab, bioinformatics, driversprediction, positiveselection",
    "author": null,
    "author_email": "\"BBGLab (Barcelona Biomedical Genomics Lab)\" <bbglab@irbbarcelona.org>, Stefano Pellegrini <stefano.pellegrini@irbbarcelona.org>",
    "download_url": "https://files.pythonhosted.org/packages/ef/da/63ac6b2fff1650246fa721d62d1292a224f0ed34feacc3f67dabf17fda9d/oncodrive3d-1.0.4.tar.gz",
    "platform": null,
    "description": "# Oncodrive3D\n\n**Oncodrive3D** is a fast and accurate computational method designed to analyze patterns of somatic mutation across tumors, with the goal of identifying **three-dimensional (3D) clusters** of missense mutations and detecting genes under **positive selection**. \n\nThe method leverages **AlphaFold 2-predicted protein structures** and Predicted Aligned Error (PAE) to define residue contacts within the protein's 3D space. When available, it integrates **mutational profiles** to build an accurate background model of neutral mutagenesis. By applying a novel **rank-based statistical approach**, Oncodrive3D scores potential 3D clusters and computes empirical p-values.\"\n\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![docker](https://img.shields.io/docker/v/bbglab/oncodrive3d?logo=docker)](https://hub.docker.com/r/bbglab/oncodrive3d)\n[![PyPI - Version](https://img.shields.io/pypi/v/oncodrive3d?logo=pypi)](https://pypi.org/project/Oncodrive3D/)\n\n---\n\n## License\n\nOncodrive3D is available to the general public subject to certain conditions described in its [LICENSE](LICENSE).\n\n## Requirements\n\nBefore you begin, ensure **Python 3.10 or later** is installed on your system.  \nAdditionally, you may need to install additional development tools. Depending on your environment, you can choose one of the following methods:\n\n- If you have sudo privileges:\n\n   ```bash\n   sudo apt install built-essential\n   ```\n\n- For HPC cluster environment, it is recommended to use [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or [Mamba](https://mamba.readthedocs.io/en/latest/)):\n\n   ```bash\n   conda create -n o3d python=3.10.0\n   conda activate o3d\n   conda install -c conda-forge gxx gcc libxcrypt clang zlib\n   ```\n\n\n## Installation\n\n- Install via PyPI:\n\n   ```bash\n   pip install oncodrive3d\n   ```\n\n- Alternatively, you can obtain the latest code from the repository and install it for development with pip:\n\n   ```bash\n   git clone https://github.com/bbglab/oncodrive3d.git\n   cd oncodrive3d\n   pip install -e .\n   oncodrive3d --help\n   ```\n\n- Or you can use a modern build tool like [uv](https://github.com/astral-sh/uv):\n\n   ```bash\n   git clone https://github.com/bbglab/oncodrive3d.git\n   cd oncodrive3d\n   uv run oncodrive3d --help\n   ```\n\n## Building Datasets\n\nThis step build the datasets necessary for Oncodrive3D to run the 3D clustering analysis. It is required once after installation or whenever you need to generate datasets for a different organism or apply a specific threshold to define amino acid contacts.\n\n> [!WARNING]\n> This step is highly time- and resource-intensive, requiring a significant amount of free disk space. It will download a large amount of data, including AlphaFold-predicted structures and reference genomes (if not already cached). Ensure sufficient resources are available before proceeding, as insufficient capacity may result in extended runtimes or processing failures.\n\n> [!NOTE]\n> The first time that you run Oncodrive3D building dataset step with a given reference genome, it will download it from our servers. By default the downloaded datasets go to`~/.bgdata`. If you want to move these datasets to another folder you have to define the system environment variable `BGDATA_LOCAL` with an export command.\n\n```\nUsage: oncodrive3d build-datasets [OPTIONS]\n\nExamples:\n  Basic build:\n    oncodrive3d build-datasets -o <build_folder>\n  \n  Build with MANE Select transcripts:\n    oncodrive3d build-datasets -o <build_folder> --mane\n\nOptions:\n  -o, --output_dir PATH           Path to the directory where the output files will be saved. \n                                  Default: ./datasets/\n  -s, --organism PATH             Specifies the organism (`human` or `mouse`). \n                                  Default: human\n  -m, --mane                      Use structures predicted from MANE Select transcripts \n                                  (applicable to Homo sapiens only).\n  -d, --distance_threshold INT    Distance threshold (\u00c5) for defining residues contacts. \n                                  Default: 10\n  -c, --cores INT                 Number of CPU cores for computation. \n                                  Default: All available CPU cores\n  -v, --verbose                   Enables verbose output.\n  -h, --help                      Show this message and exit.  \n```\n\nFor more information on the output of this step, please refer to the [Building Datasets Output Documentation](https://github.com/bbglab/oncodrive3d/tree/master/docs/build_output.md).\n\n\n## Running 3D clustering Analysis\n\nFor in depth information on how to obtain the required input data and for comprehensive information about the output, please refer to the [Input and Output Documentation](https://github.com/bbglab/oncodrive3d/tree/master/docs/run_input_output.md) of the 3D clustering analysis.  \n\n### Input\n\n- **Mutations file** (`required`): It can be either:\n   - **<input_maf>**: A Mutation Annotation Format (MAF) file annotated with consequences (e.g., by using [Ensembl Variant Effect Predictor (VEP)](https://www.ensembl.org/info/docs/tools/vep/index.html)).\n   - **<input_vep>**: The unfiltered output of VEP including annotations for all possible transcripts.\n\n- **<mut_profile>** (`optional`): Dictionary including the normalized frequencies of mutations (*values*) in every possible trinucleotide context (*keys*), such as 'ACA>A', 'ACC>A', and so on.\n\n---\n\n> [!NOTE] \n> Examples of the input files are available in the [Test Input Folder](https://github.com/bbglab/oncodrive3d/tree/master/test/input).  \nPlease refer to these examples to understand the expected format and structure of the input files.\n\n---\n\n---\n\n> [!NOTE]\n> Oncodrive3D uses the mutational profile of the cohort to build an accurate background model. However, it\u2019s not strictly required. If the mutational profile is not provided, the tool will use a simple uniform distribution as the background model for simulating mutations and scoring potential 3D clusters.\n\n---\n\n### Main Output\n\n- **Gene-level output**: CSV file (`\\<cohort>.3d_clustering_genes.csv`) containing the results of the analysis at the gene level. Each row represents a gene, sorted from the most significant to the least significant based on the 3D clustering analysis. The table also includes genes that were not analyzed, with the reason for exclusion provided in the `status` column.\n  \n- **Residue-level output**: CSV file (`<cohort>.3d_clustering_pos.csv`) containing the results of the analysis at the level of mutated residues. Each row corresponds to a mutated position within a gene and includes detailed information for each potential mutational cluster.\n\n\n### Usage\n\n```\nUsage: oncodrive3d run [OPTIONS]\n\nExamples:\n  Basic run:\n    oncodrive3d run -i <input_maf> -p <mut_profile> -d <build_folder> -C <cohort_name>\n  \n  Example of run using VEP output as input and MANE Select transcripts:\n    oncodrive3d run -i <input_vep> -p <mut_profile> -d <build_folder> -C <cohort_name> \\\n                    --o3d_transcripts --use_input_symbols --mane\n\nOptions:\n  -i, --input_path PATH            Path to the input file (MAF or VEP output) containing the \n                                   annotated mutations for the cohort. [required]\n  -p, --mut_profile_path PATH      Path to the JSON file specifying the cohort's mutational \n                                   profile (192 key-value pairs).\n  -o, --output_dir PATH            Path to the output directory for results. \n                                   Default: ./output/\n  -d, --data_dir PATH              Path to the directory containing the datasets built in the \n                                   building datasets step. \n                                   Default: ./datasets/\n  -c, --cores INT                  Number of CPU cores to use. \n                                   Default: All available CPU cores\n  -s, --seed INT                   Random seed for reproducibility.\n  -v, --verbose                    Enables verbose output.\n  -t, --cancer_type STR            Cancer type to include as metadata in the output file.\n  -C, --cohort STR                 Cohort name for metadata and output file naming. \n  -P, --cmap_prob_thr FLOAT        Threshold for defining residues contacts based on distance \n                                   on predicted structure and predicted aligned error (PAE). \n                                   Default: 0.5\n  --mane                           Prioritizes MANE Select transcripts when multiple \n                                   structures map to the same gene symbol.\n  --o3d_transcripts                Filters mutations including only transcripts in Oncodrive3D \n                                   built datasets (requires VEP output as input file).\n  --use_input_symbols              Update HUGO symbols in Oncodrive3D built datasets using the \n                                   input file's entries (requires VEP output as input file).\n  -h, --help                       Show this message and exit.  \n```\n\n\n---\n\n> [!NOTE]\n> To maximize the number of matching transcripts between the input mutations and the AlphaFold predicted structures used by Oncodrive3D, it is recommended to use the unfiltered output of VEP (including all possible transcripts) as input, along with the flags `--o3d_transcripts` `--use_input_symbols` in the `oncodrive3d run` command.\n\n---\n\n### Running With Singularity\n\n```\nsingularity pull oncodrive3d.sif docker://bbglab/oncodrive3d:latest\nsingularity exec oncodrive3d.sif oncodrive3d run -i <input_maf> -p <mut_profile> \\ \n                                                 -d <build_folder> -C <cohort_name>\n```\n\n\n### Testing\n\nTo verify that Oncodrive3D is installed and configured correctly, you can perform a test run using the provided test input files: \n\n```\noncodrive3d run -d <build_folder> \\\n                -i ./test/input/maf/TCGA_WXS_ACC.in.maf \\ \n                -p ./test/input/mut_profile/TCGA_WXS_ACC.sig.json \\\n                -o ./test/output/ -C TCGA_WXS_ACC\n```\n\nCheck the output in the `test/output/` directory to ensure the analysis completes successfully.\n\n\n## Parallel Processing on Multiple Cohorts\n\nOncodrive3D can be run in parallel on multiple cohorts using [Nextflow](https://www.nextflow.io/). This approach enables efficient, reproducible and scalable analysis across datasets.\n\n### Requirements\n\n1. Install [Nextflow](https://www.nextflow.io/docs/latest/getstarted.html) (version `23.04.3` was used for testing).\n2. Install and set up either or both:\n   - [Singularity](https://sylabs.io/guides/latest/user-guide/installation.html)  \n      Pull the Oncodrive3D Singularity image from Docker Hub:\n\n      ```\n      singularity pull oncodrive3d.sif docker://bbglab/oncodrive3d:latest\n      ```\n\n   - [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)  \n      Ensure Oncodrive3D is installed in your Conda environment and update the `params` section of the `nextflow.config` file to point to your Conda installation:\n\n         ```groovy\n         params {\n            ...\n            conda_env = '/path/to/conda/environment/with/oncodrive3d' \n            ...\n         }\n         ```\n\n      Replace `/path/to/conda/environment/with/oncodrive3d` with the path to your Conda environment. Alternatively, you can provide it as a command-line argument.\n\n\n### Test Run\n\nRun a test to ensure that everything is set up correctly and functioning as expected:\n\n```\ncd oncodrive3d_pipeline\nnextflow run main.nf -profile test,container --data_dir <build_folder>\n```\n\nReplace `<build_folder>` with the path to the Oncodrive3D datasets built in the [building datasets](#building-datasets) step.\nIf you prefer to use Conda, replace `container` in the `-profile` argument with `conda`.\n\n### Usage\n\n---\n\n> [!WARNING]\n> When using the Nextflow script, ensure that your input files are organized in the following directory structure:\n> \n> ```plaintext\n> input/\n>   \u251c\u2500\u2500 maf/\n>   \u2502   \u2514\u2500\u2500 <cohort>.in.maf\n>   \u251c\u2500\u2500 vep/\n>   \u2502   \u2514\u2500\u2500 <cohort>.vep.tsv.gz\n>   \u2514\u2500\u2500 mut_profile/\n>       \u2514\u2500\u2500 <cohort>.sig.json\n> ```\n> \n> - `maf/`: Contains mutation files with the `.in.maf` extension.\n> - `vep/`: Contains VEP annotation files with the `.vep.tsv.gz` extension, which include annotated mutations with all possible transcripts.\n> - `mut_profile/`: Contains mutational profile files with the `.sig.json` extension.\n\n---\n\n```\nUsage: nextflow run main.nf [OPTIONS]\n\nExample of run using VEP output as input and MANE Select transcripts:\n  nextflow run main.nf -profile container --data_dir <build_folder> --indir <input> \\\n                       --vep_input true --mane true\n  \nOptions:\n  --indir PATH                    Path to the input directory including the subdirectories \n                                  `maf` or `vep` and `mut_profile`. \n  --outdir PATH                   Path to the output directory. \n                                  Default: run_<timestamp>/\n  --cohort_pattern STR            Pattern expression to filter specific files within the \n                                  input directory (e.g., 'TCGA*' select only TCGA cohorts). \n                                  Default: *\n  --data_dir PATH                 Path to the Oncodrive3D datasets directory, which includes \n                                  the files compiled during the building datasets step.\n                                  Default: ${baseDir}/datasets/\n  --container PATH                Path to the Singularity image with Oncodrive3D installation. \n                                  Default: ${baseDir}/../oncodrive3d.sif\n  --max_running INT               Maximum number of cohorts to process in parallel.\n                                  Default: 5\n  --cores INT                     Number of CPU cores used to process each cohort. \n                                  Default: 10\n  --memory STR                    Amount of memory allocated for processing each cohort. \n                                  Default: 70GB\n  --seed INT:                     Seed value for reproducibility.\n                                  Default: 128\n```",
    "bugtrack_url": null,
    "license": "GNU Affero General Public License v3 or later (AGPLv3+)",
    "summary": "Oncodrive3D is a method designed to analyse patterns of somatic mutations across tumors to identify three-dimensional (3D) clusters of missense mutations and detect genes that are under positive selection.",
    "version": "1.0.4",
    "project_urls": {
        "Homepage": "https://github.com/bbglab/clustering_3d",
        "Issues": "https://github.com/bbglab/clustering_3d/issues",
        "Repository": "https://github.com/bbglab/clustering_3d"
    },
    "split_keywords": [
        "bbglab",
        " bioinformatics",
        " driversprediction",
        " positiveselection"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dff0306e8643c8262a9650d39810519576dfac0c223a51cf78b46a943ff7b4d3",
                "md5": "0cf358cb087150ece37a1e9c7ff38824",
                "sha256": "258a7290daf01f09abc6c7c1407c44d2ad370a98eb491c3c0f7cb379fec9bfcf"
            },
            "downloads": -1,
            "filename": "oncodrive3d-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0cf358cb087150ece37a1e9c7ff38824",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 104474,
            "upload_time": "2025-01-17T11:23:51",
            "upload_time_iso_8601": "2025-01-17T11:23:51.194748Z",
            "url": "https://files.pythonhosted.org/packages/df/f0/306e8643c8262a9650d39810519576dfac0c223a51cf78b46a943ff7b4d3/oncodrive3d-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "efda63ac6b2fff1650246fa721d62d1292a224f0ed34feacc3f67dabf17fda9d",
                "md5": "7478b900372bb859fd8f61f148dc0a76",
                "sha256": "7dfa74f179cdb9093f954f454c2e5abb6214dea7d6879bf996fe37d1b6066af4"
            },
            "downloads": -1,
            "filename": "oncodrive3d-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "7478b900372bb859fd8f61f148dc0a76",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 3956264,
            "upload_time": "2025-01-17T11:23:55",
            "upload_time_iso_8601": "2025-01-17T11:23:55.807799Z",
            "url": "https://files.pythonhosted.org/packages/ef/da/63ac6b2fff1650246fa721d62d1292a224f0ed34feacc3f67dabf17fda9d/oncodrive3d-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-17 11:23:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bbglab",
    "github_project": "clustering_3d",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "oncodrive3d"
}
        
Elapsed time: 0.87233s