Name | grz-cli JSON |
Version |
0.1.3
JSON |
| download |
home_page | None |
Summary | Tool for validation, encryption and upload of MV submissions to GDCs. |
upload_time | 2024-12-18 09:06:04 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.12 |
license | MIT |
keywords |
grz
gdc
s3
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# GRZ CLI
A command-line tool for validating, encrypting, uploading and downloading submissions to/from a GDC/GRZ (Genomrechenzentrum).
## Table of Contents
- [Introduction](#introduction)
- [Features](#features)
- [Installation](#installation)
- [Requirements](#requirements)
- [End-user setup](#end-user)
- [Installation via `conda` (recommended)](#installation-via-conda-recommended)
- [Installation via `pip` (not recommended)](#installation-via-pip-not-recommended)
- [Development setup](#development-setup)
- [Usage](#usage)
- [Configuration](#configuration)
- [Exemplary submission procedure](#exemplary-submission-procedure)
- [Command-Line Interface](#command-line-interface)
- [Testing](#testing)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgements](#acknowledgements)
## Introduction
This tool provides a way to validate files, encrypt/decrypt files using the [crypt4gh](https://crypt4gh.readthedocs.io/en/latest/) library and upload/download the encrypted files to an S3 bucket of a GDC/GRZ. It also logs the progress and outcomes of these operations in a metadata file.
It is recommended to have the following folder structure for a single submission:
```
EXAMPLE_SUBMISSION
├── files
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.vcf
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.vcf
│ ├── target_regions.bed
└── metadata
└── metadata.json
```
The current version of the tool requires the `working_dir` to have at least as much free disk space as the total size of the data being submitted.
## Features
- **Validation**: Validate file checksums, basic file metadata and BfArM requirements.
- **Encryption**: Encrypt files using `crypt4gh`.
- **Decryption**: Encrypt files using `crypt4gh`.
- **Upload**: Upload encrypted files directly to a GRZ either (via built-in `boto3`).
- **Download**: Download encrypted files from a GRZ (via built-in `boto3`).
- **Logging**: Log progress and results of operations
## Installation
### Requirements
Beside of the disk space requirements for the submission data, this tool also requires a linux environment, e.g.:
- Linux server
- Virtual machine running linux
- Docker container
- Windows subsystem for linux
- ...
### End-user setup
The recommended method to install this tool is using the conda package manager.
#### Installation via `conda` (recommended)
If `conda` is not yet available on your system, we recommend to install the [Miniforge conda distribution](https://github.com/conda-forge/miniforge) by running the following commands:
```bash
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
```
There are also alternative ways to install conda:
- [Micromamba, a single executable that does not require a base environment](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)
- [Official installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
Next, install the `grz-cli` tool:
```bash
# create conda environment and activate it
conda create -n grz-tools -c conda-forge -c bioconda "grz-cli"
conda activate grz-tools
```
##### Update instructions:
Use the following command to update the tool:
```bash
conda update -n grz-tools "grz-cli"
```
#### Installation via `pip` (not recommended)
While installation via `pip` is possible, it is not recommended because users must ensure
that the correct Python version is already installed and that they are using a virtual python environment.
```bash
pip install grz-cli
```
##### Update instructions:
Use the following command to update the tool:
```bash
pip upgrade grz-cli
```
### Development setup
For development purposes, you can clone the repository and install the package in editable mode:
```bash
git clone https://codebase.helmholtz.cloud/grz-mv-genomseq/grz-cli
# create conda environment and activate it
conda env create -f grz-cli/environment-dev.yaml -n grz-tools-dev
conda activate grz-tools-dev
# install the grz-cli tool
pip install -e grz-cli/
```
## Usage
### Configuration
**The configuration file will be provided by your associated GRZ, please place it into `~/.config/grz-cli/config.yaml`.**
The tool requires a configuration file in YAML format to specify the S3 bucket and other options.
For an exemplary configuration, see [resources/config.yaml](resources/config.yaml).
S3 access and secret key can be listed either in the config file or as environment variable (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`).
### Exemplary submission procedure
After preparing your submission as outlined above, you can use the following commands to validate, encrypt and upload the submission:
```sh
# Validate the submission
grz-cli validate --submission-dir EXAMPLE_SUBMISSION
# Encrypt the submission
grz-cli encrypt --submission-dir EXAMPLE_SUBMISSION
# Upload the submission
grz-cli upload --submission-dir EXAMPLE_SUBMISSION
```
### Troubleshooting
**In case of issues, please re-run your commands with `grz-cli --log-level DEBUG --log-file <your-log-file.log> [...]` and submit the log file to the GRZ data steward!**
## Command-Line Interface
`grz-cli` provides a command-line interface with the following subcommands:
### validate
It is recommended to run this command before continuing with encryption and upload.
Progress files are stored relative to the submission directory.
- `--submission-dir`: Path to the submission directory containing both 'metadata/' and 'files/' directories [**Required**]
Example usage:
```bash
grz_cli validate --submission-dir foo
```
Option is for the usage at a hospital (Leistungserbringer) and GDC/GRZ.
### encrypt
If a working directory is not provided, then the current directory is used automatically. The log-files are going to be stored in the sub-folder of the working directory.
Files are stored in a folder named `encrypted_files` as a sub-folder of the working directory.
- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'files/' directories [**Required**]
- `-c, --config-file`: Path to config file [_optional_]
```bash
grz-cli encrypt --submission-dir foo
```
Option is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.
### decrypt
Decrypt a submission using the GRZ private key.
- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [**Required**]
- `-c,--config-file`: Path to config file [_optional_]
```bash
grz-cli decrypt --submission-dir foo
```
Option is for the usage at a GDC/GRZ.
### upload
Upload the submission into a S3 structure of a GRZ.
- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [**Required**]
- `-c, --config-file`: Path to config file [_optional_]
Example usage:
```bash
grz-cli upload --submission-dir foo
```
Option is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.
### download
Download a submission from a GRZ
- `-s, --submission-id`: S3 submission prefix [**Required**]
- `-o, --output-dir`: Path to the target submission output directory [**Required**]
- `-c, --config-file`: Path to config file [_optional_]
Example usage:
```bash
grz-cli download --submission-id foo --output-dir bar
```
Option is for the usage at a GDC/GRZ.
## Testing
To run the tests, navigate to the root directory of your project and invoke `pytest`.
Alternatively, install `uv` and `tox` and run `uv run tox`.
## Contributing
<!-- Add details about how others can contribute to the project -->
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgements
Parts of `cryp4gh` code is used in modified form
Raw data
{
"_id": null,
"home_page": null,
"name": "grz-cli",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "GRZ, GDC, S3",
"author": null,
"author_email": "Koray Kirli <koraykirli@gmail.com>, Mathias Lesche <mathias.lesche@tu-dresden.de>, \"Florian R. H\u00f6lzlwimmer\" <git.ich@frhoelzlwimmer.de>, Till Hartmann <till.hartmann@bih-charite.de>, Thomas Sell <thomas.sell@bih-charite.de>",
"download_url": "https://files.pythonhosted.org/packages/88/ec/0fb16664cb97ac70948fea224ffb29e5b5157af3f020deacc7cd746e9969/grz_cli-0.1.3.tar.gz",
"platform": null,
"description": "# GRZ CLI\n\nA command-line tool for validating, encrypting, uploading and downloading submissions to/from a GDC/GRZ (Genomrechenzentrum).\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Features](#features)\n- [Installation](#installation)\n - [Requirements](#requirements)\n - [End-user setup](#end-user)\n - [Installation via `conda` (recommended)](#installation-via-conda-recommended)\n - [Installation via `pip` (not recommended)](#installation-via-pip-not-recommended)\n - [Development setup](#development-setup)\n- [Usage](#usage)\n - [Configuration](#configuration)\n - [Exemplary submission procedure](#exemplary-submission-procedure)\n- [Command-Line Interface](#command-line-interface)\n- [Testing](#testing)\n- [Contributing](#contributing)\n- [License](#license)\n- [Acknowledgements](#acknowledgements)\n\n## Introduction\n\nThis tool provides a way to validate files, encrypt/decrypt files using the [crypt4gh](https://crypt4gh.readthedocs.io/en/latest/) library and upload/download the encrypted files to an S3 bucket of a GDC/GRZ. It also logs the progress and outcomes of these operations in a metadata file.\n\nIt is recommended to have the following folder structure for a single submission:\n\n```\nEXAMPLE_SUBMISSION\n\u251c\u2500\u2500 files\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read1.fastq.gz\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read2.fastq.gz\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_normal.vcf\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read1.fastq.gz\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read2.fastq.gz\n\u2502 \u251c\u2500\u2500 aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.vcf\n\u2502 \u251c\u2500\u2500 target_regions.bed\n\u2514\u2500\u2500 metadata\n \u2514\u2500\u2500 metadata.json\n```\n\nThe current version of the tool requires the `working_dir` to have at least as much free disk space as the total size of the data being submitted.\n## Features\n\n- **Validation**: Validate file checksums, basic file metadata and BfArM requirements.\n- **Encryption**: Encrypt files using `crypt4gh`.\n- **Decryption**: Encrypt files using `crypt4gh`.\n- **Upload**: Upload encrypted files directly to a GRZ either (via built-in `boto3`).\n- **Download**: Download encrypted files from a GRZ (via built-in `boto3`).\n- **Logging**: Log progress and results of operations\n\n\n## Installation\n\n### Requirements\nBeside of the disk space requirements for the submission data, this tool also requires a linux environment, e.g.:\n- Linux server\n- Virtual machine running linux\n- Docker container\n- Windows subsystem for linux\n- ...\n\n\n### End-user setup\nThe recommended method to install this tool is using the conda package manager.\n\n\n#### Installation via `conda` (recommended)\n\nIf `conda` is not yet available on your system, we recommend to install the [Miniforge conda distribution](https://github.com/conda-forge/miniforge) by running the following commands:\n```bash\ncurl -L -O \"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\"\nbash Miniforge3-$(uname)-$(uname -m).sh\n```\nThere are also alternative ways to install conda:\n- [Micromamba, a single executable that does not require a base environment](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)\n- [Official installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)\n\nNext, install the `grz-cli` tool:\n```bash\n# create conda environment and activate it\nconda create -n grz-tools -c conda-forge -c bioconda \"grz-cli\"\nconda activate grz-tools\n```\n\n##### Update instructions:\nUse the following command to update the tool:\n```bash\nconda update -n grz-tools \"grz-cli\"\n```\n\n\n#### Installation via `pip` (not recommended)\nWhile installation via `pip` is possible, it is not recommended because users must ensure\nthat the correct Python version is already installed and that they are using a virtual python environment.\n\n```bash\npip install grz-cli\n```\n##### Update instructions:\nUse the following command to update the tool:\n```bash\npip upgrade grz-cli\n```\n\n\n### Development setup\nFor development purposes, you can clone the repository and install the package in editable mode:\n\n```bash\ngit clone https://codebase.helmholtz.cloud/grz-mv-genomseq/grz-cli\n# create conda environment and activate it\nconda env create -f grz-cli/environment-dev.yaml -n grz-tools-dev\nconda activate grz-tools-dev\n# install the grz-cli tool\npip install -e grz-cli/\n```\n\n## Usage\n\n### Configuration\n**The configuration file will be provided by your associated GRZ, please place it into `~/.config/grz-cli/config.yaml`.**\n\nThe tool requires a configuration file in YAML format to specify the S3 bucket and other options.\nFor an exemplary configuration, see [resources/config.yaml](resources/config.yaml).\n\nS3 access and secret key can be listed either in the config file or as environment variable (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`).\n\n### Exemplary submission procedure\nAfter preparing your submission as outlined above, you can use the following commands to validate, encrypt and upload the submission: \n```sh\n# Validate the submission\ngrz-cli validate --submission-dir EXAMPLE_SUBMISSION\n\n# Encrypt the submission\ngrz-cli encrypt --submission-dir EXAMPLE_SUBMISSION\n\n# Upload the submission\ngrz-cli upload --submission-dir EXAMPLE_SUBMISSION\n```\n\n### Troubleshooting\n**In case of issues, please re-run your commands with `grz-cli --log-level DEBUG --log-file <your-log-file.log> [...]` and submit the log file to the GRZ data steward!**\n\n## Command-Line Interface\n\n`grz-cli` provides a command-line interface with the following subcommands:\n\n### validate\n\nIt is recommended to run this command before continuing with encryption and upload.\nProgress files are stored relative to the submission directory.\n\n- `--submission-dir`: Path to the submission directory containing both 'metadata/' and 'files/' directories [**Required**]\n\nExample usage:\n\n```bash\ngrz_cli validate --submission-dir foo\n```\n\nOption is for the usage at a hospital (Leistungserbringer) and GDC/GRZ.\n\n### encrypt\n\nIf a working directory is not provided, then the current directory is used automatically. The log-files are going to be stored in the sub-folder of the working directory.\nFiles are stored in a folder named `encrypted_files` as a sub-folder of the working directory.\n\n- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'files/' directories [**Required**]\n- `-c, --config-file`: Path to config file [_optional_]\n\n```bash\ngrz-cli encrypt --submission-dir foo\n```\n\nOption is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.\n\n### decrypt\n\nDecrypt a submission using the GRZ private key.\n\n- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [**Required**]\n- `-c,--config-file`: Path to config file [_optional_]\n\n```bash\ngrz-cli decrypt --submission-dir foo\n```\n\nOption is for the usage at a GDC/GRZ.\n\n### upload\n\nUpload the submission into a S3 structure of a GRZ.\n\n- `-s, --submission-dir`: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [**Required**]\n- `-c, --config-file`: Path to config file [_optional_]\n\nExample usage:\n\n```bash\ngrz-cli upload --submission-dir foo\n```\n\nOption is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.\n\n### download\n\nDownload a submission from a GRZ\n\n- `-s, --submission-id`: S3 submission prefix [**Required**]\n- `-o, --output-dir`: Path to the target submission output directory [**Required**]\n- `-c, --config-file`: Path to config file [_optional_]\n\nExample usage:\n\n```bash\ngrz-cli download --submission-id foo --output-dir bar\n```\n\nOption is for the usage at a GDC/GRZ.\n\n## Testing\n\nTo run the tests, navigate to the root directory of your project and invoke `pytest`.\nAlternatively, install `uv` and `tox` and run `uv run tox`.\n\n## Contributing\n\n<!-- Add details about how others can contribute to the project -->\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgements\n\nParts of `cryp4gh` code is used in modified form\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tool for validation, encryption and upload of MV submissions to GDCs.",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://codebase.helmholtz.cloud/grz-mv-genomseq/grz-cli"
},
"split_keywords": [
"grz",
" gdc",
" s3"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "036d649e7d71a21b9bb73fcde5b661f953d633526cbc7a402366c4b1f5691263",
"md5": "a1ed072921f4f9c3853092b5e581fa57",
"sha256": "2dd8889d05c871121b29d714b746db9ac6466c3c3c9592d4e4112d600439d07e"
},
"downloads": -1,
"filename": "grz_cli-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a1ed072921f4f9c3853092b5e581fa57",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 39747,
"upload_time": "2024-12-18T09:06:02",
"upload_time_iso_8601": "2024-12-18T09:06:02.580132Z",
"url": "https://files.pythonhosted.org/packages/03/6d/649e7d71a21b9bb73fcde5b661f953d633526cbc7a402366c4b1f5691263/grz_cli-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "88ec0fb16664cb97ac70948fea224ffb29e5b5157af3f020deacc7cd746e9969",
"md5": "b84eeb23cdf6518db6e7aad40b44cc82",
"sha256": "d175d2aea32ae60ffc470a1355e8994bc5fd482598380816ccffc5d14223bd53"
},
"downloads": -1,
"filename": "grz_cli-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "b84eeb23cdf6518db6e7aad40b44cc82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 45030,
"upload_time": "2024-12-18T09:06:04",
"upload_time_iso_8601": "2024-12-18T09:06:04.267145Z",
"url": "https://files.pythonhosted.org/packages/88/ec/0fb16664cb97ac70948fea224ffb29e5b5157af3f020deacc7cd746e9969/grz_cli-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-18 09:06:04",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "grz-cli"
}