lionheart


Namelionheart JSON
Version 1.1.5 PyPI version JSON
download
home_pagehttps://github.com/BesenbacherLab/lionheart
SummaryLIONHEART is a method for detecting cancer from whole genome sequenced plasma cell-free DNA. Check the README for additional installation steps.
upload_time2025-02-04 16:42:22
maintainerNone
docs_urlNone
authorLudvig
requires_python<4.0,>=3.9
licenseNone
keywords cancer detection cancer cell-free dna cfdna fragmentomics nucleosomics bioinformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LIONHEART Cancer Detector <a href='https://github.com/besenbacherlab/lionheart'><img src='https://raw.githubusercontent.com/besenbacherlab/lionheart/main/lionheart_242x280_250dpi.png' align="right" height="160" /></a>

LIONHEART is a method for detecting cancer from whole genome sequenced plasma cell-free DNA.

This software lets you run feature extraction and predict the cancer status of your samples. Further, you can train a model on your own data. 

Developed for hg38. See the `remap` directory for the applied remapping pipeline.

Preprint: https://www.medrxiv.org/content/10.1101/2024.11.26.24317971v1

The code was developed and implemented by [@ludvigolsen](https://github.com/LudvigOlsen).

If you experience an issue, please [report it](https://github.com/BesenbacherLab/lionheart/issues).


## Installation

This section describes the installation of `lionheart` and the custom version of `mosdepth` (exp. time: <10m). The code has only been tested on linux but should also work on Mac and Windows.

Install the main package:

```
# Create and activate conda environment
$ conda config --set channel_priority true
$ conda env create -f https://raw.githubusercontent.com/BesenbacherLab/lionheart/refs/heads/main/environment.yml
$ conda activate lionheart

# Install package from PyPI
$ pip install lionheart

# OR install from GitHub
$ pip install git+https://github.com/BesenbacherLab/lionheart.git

```

### Custom mosdepth 

We use a modified version of `mosdepth` available at https://github.com/LudvigOlsen/mosdepth/

To install this, it requires an installation of `nim` so we can use `nimble install`. Note that we use `nim 1.6.14`.

```
# Download nim installer and run
$ curl https://nim-lang.org/choosenim/init.sh -sSf | sh

# Add to PATH
# Change the path to fit with your system
# Tip: Consider adding it to the terminal configuration file (e.g., ~/.bashrc)
$ export PATH=/home/<username>/.nimble/bin:$PATH

# Install and use nim 1.6.4 
# NOTE: This step should be done even when nim is already installed
$ choosenim 1.6.14
```

Now that nim is installed, we can install the custom mosdepth. To not override an existing mosdepth installation, we install it into a separate directory:

```
# Make a directory for installing the nim packages into
$ mkdir mosdepth_installation

# Install modified mosdepth
$ NIMBLE_DIR=mosdepth_installation nimble install -y https://github.com/LudvigOlsen/mosdepth

# Get path to mosdepth binary to use in the software
$ find mosdepth_installation/pkgs/ -name "mosdepth*"
>> mosdepth_installation/pkgs/mosdepth-0.x.x/mosdepth

```

## Get Resources

Download and unzip the required resources.
```
$ wget https://zenodo.org/records/14215762/files/inference_resources_v002.tar.gz
$ tar -xvzf inference_resources_v002.tar.gz 
```

## Main commands

This section describes the commands in `lionheart` and lists their *main* output files:

| Command                          | Description                                                         | Main Output                                                                         |
| :------------------------------- | :------------------------------------------------------------------ | :---------------------------------------------------------------------------------- |
| `lionheart extract_features`     | Extract features from a BAM file.                                   | `feature_dataset.npy` and correction profiles                                       |
| `lionheart predict_sample`       | Predict cancer status of a sample.                                  | `prediction.csv`                                                                    |
| `lionheart collect`              | Collect predictions and/or features across samples.                 | `predictions.csv`, `feature_dataset.npy`, and correction profiles *for all samples* |
| `lionheart customize_thresholds` | Extract ROC curve and more for using custom probability thresholds. | `ROC_curves.json` and `probability_densities.csv`                                   |
| `lionheart cross_validate`       | Cross-validate the model on new data and/or the included features.  | `evaluation_summary.csv`,  `splits_summary.csv`                                     |
| `lionheart train_model`          | Train a model on your own data and/or the included features.        | `model.joblib` and training data results                                            |
| `lionheart validate`             | Validate a model on a validation dataset.                           | `evaluation_scores.csv` and `predictions.csv`                                       |
| `lionheart evaluate_univariates` | Evaluate the cancer detection potential of each feature separately. | `univariate_evaluations.csv`                                                        |


## Examples

### Run via command-line interface

This example shows how to run lionheart from the command-line.

Note: If you don't have a BAM file at hand, you can download an example BAM file from: https://zenodo.org/records/13909979 
It is a downsampled version of a public BAM file from Snyder et al. (2016; 10.1016/j.cell.2015.11.050) that has been remapped to hg38. On our system, the feature extraction for this sample takes ~1h15m using 12 cores (`n_jobs`).

```
# Start by skimming the help page
$ lionheart -h

# Extract feature from a given BAM file
# `mosdepth_path` is the path to the customized `mosdepth` installation
# E.g., "/home/<username>/mosdepth/mosdepth"
# `ld_library_path` is the path to the `lib` folder in the conda environment
# E.g., "/home/<username>/anaconda3/envs/lionheart/lib/"
$ lionheart extract_features --bam_file {bam_file} --resources_dir {resources_dir} --out_dir {out_dir} --mosdepth_path {mosdepth_path} --ld_library_path {ld_library_path} --n_jobs {cores}

# `sample_dir` is the `out_dir` of `extract_features`
$ lionheart predict_sample --sample_dir {sample_dir} --resources_dir {resources_dir} --out_dir {out_dir} --thresholds max_j spec_0.95 spec_0.99 sens_0.95 sens_0.99 0.5 --identifier {sample_id}
```

After running these commands for a set of samples, you can use `lionheart collect` to collect features and predictions across the samples. You can then use `lionheart train_model` to train a model on your own data (and optionally the included features).


### Via `gwf` workflow

We provide a simple workflow for submitting jobs to slurm via the `gwf` package. Make a copy of the `workflow` directory, open `workflow.py`, change the paths and list the samples to run `lionheart` on.

The first time running a workflow it's required to first set the `gwf` backend to slurm or one of the other ![backends](https://gwf.app/reference/backends/):

```
# Start by downloading the repository
$ wget -O lionheart-main.zip https://github.com/BesenbacherLab/lionheart/archive/refs/heads/main.zip
$ unzip lionheart-main.zip

# Copy workflow directory to a location
$ cp -r lionheart-main/workflow <location>/workflow

# Navigate to your copy of the the workflow directory
$ cd <location>/workflow

# Activate conda environment
$ conda activate lionheart

# Set `gwf` backend to slurm (or another preferred backend)
$ gwf config set backend slurm
```

Open the `workflow.py` file and change the various paths. When you're ready to submit the jobs, run:

```
$ gwf run
```

`gwf` allows seeing a status of the submitted jobs:

```
$ gwf status
$ gwf status -f summary
```

### Reproduction of results

This section shows how to reproduce the main results (cross-validation and external validation) from the paper. It uses the included features so the reproduction can be run without access to the raw sequencing data.

Note that different compilations of scikit-learn on different operating systems may lead to slightly different results. On linux, the results should match the reported results.

#### Cross-validation analysis

We start by performing the nested leave-one-dataset-out cross-validation analysis from Figure 3A (not including the benchmarks).

Note that the default settings are the ones used in the paper.

```
# Perform the cross-validation
# {cv_out_dir} should specify where you want the output files
$ lionheart cross_validate --out_dir {cv_out_dir} --resources_dir {resources_dir} --use_included_features --num_jobs 10
```

The output directory should now include multiple files. The main results are in `evaluation_summary.csv` and `splits_summary.csv`. Note that the results are given for multiple probability thresholds. The threshold reported in the paper is the "Max. J Threshold". You can extract the relevant lines of the summaries with:

```
$ awk 'NR==1 || /Average/ && /J Threshold/' {cv_out_dir}/evaluation_summary.csv
$ awk 'NR==1 || /Average/ && /J Threshold/' {cv_out_dir}/splits_summary.csv
```

#### External validation analysis

To reproduce the external validation, we first train a model on all the included training datasets and then validate it on the included validation dataset:

```
# Train a model on the included datasets
# {new_model_dir} should specify where you want the model files
$ lionheart train_model --out_dir {new_model_dir} --resources_dir {resources_dir} --use_included_features

# Validate the model on the included validation dataset
# {val_out_dir} should specify where you want the output files
$ lionheart validate --out_dir {val_out_dir} --resources_dir {resources_dir} --model_dir {new_model_dir} --use_included_validation --thresholds 'max_j'
```

The model training creates the `model.joblib` file along with predictions and evaluations from the *training data* (e.g., `predictions.csv`, `evaluation_scores.csv`, and `ROC_curves.json`).

The validation creates `evaluation_scores.csv` and `predictions.csv` from applying the model on the validation dataset. You will find the reported AUC score in `evaluation_scores.csv`:

```
$ cat {val_out_dir}/evaluation_scores.csv
```

#### Univariate analyses

Finally, we reproduce the univariate modeling evaluations in Figure 2D and 2E:

```
# Evaluate the classification potential of each cell type separately
# {univariates_dir} should specify where you want the evaluation files
$ lionheart evaluate_univariates --out_dir {univariates_dir} --resources_dir {resources_dir} --use_included_features --num_jobs 10
```

This creates the `univariate_evaluations.csv` file with evaluation metrics per cell-type. There are coefficients and p-values (bonferroni-corrected) from univariate logistic regression models and evaluation metrics from per-cell-type leave-one-dataset-out cross-validation.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BesenbacherLab/lionheart",
    "name": "lionheart",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "cancer detection, cancer, cell-free DNA, cfDNA, fragmentomics, nucleosomics, bioinformatics",
    "author": "Ludvig",
    "author_email": "<mail@ludvigolsen.dk>",
    "download_url": "https://files.pythonhosted.org/packages/4a/d4/c105d34130e104e242ebc56a42105e255d4a1bfbc7206dc093b77a651437/lionheart-1.1.5.tar.gz",
    "platform": null,
    "description": "# LIONHEART Cancer Detector <a href='https://github.com/besenbacherlab/lionheart'><img src='https://raw.githubusercontent.com/besenbacherlab/lionheart/main/lionheart_242x280_250dpi.png' align=\"right\" height=\"160\" /></a>\n\nLIONHEART is a method for detecting cancer from whole genome sequenced plasma cell-free DNA.\n\nThis software lets you run feature extraction and predict the cancer status of your samples. Further, you can train a model on your own data. \n\nDeveloped for hg38. See the `remap` directory for the applied remapping pipeline.\n\nPreprint: https://www.medrxiv.org/content/10.1101/2024.11.26.24317971v1\n\nThe code was developed and implemented by [@ludvigolsen](https://github.com/LudvigOlsen).\n\nIf you experience an issue, please [report it](https://github.com/BesenbacherLab/lionheart/issues).\n\n\n## Installation\n\nThis section describes the installation of `lionheart` and the custom version of `mosdepth` (exp. time: <10m). The code has only been tested on linux but should also work on Mac and Windows.\n\nInstall the main package:\n\n```\n# Create and activate conda environment\n$ conda config --set channel_priority true\n$ conda env create -f https://raw.githubusercontent.com/BesenbacherLab/lionheart/refs/heads/main/environment.yml\n$ conda activate lionheart\n\n# Install package from PyPI\n$ pip install lionheart\n\n# OR install from GitHub\n$ pip install git+https://github.com/BesenbacherLab/lionheart.git\n\n```\n\n### Custom mosdepth \n\nWe use a modified version of `mosdepth` available at https://github.com/LudvigOlsen/mosdepth/\n\nTo install this, it requires an installation of `nim` so we can use `nimble install`. Note that we use `nim 1.6.14`.\n\n```\n# Download nim installer and run\n$ curl https://nim-lang.org/choosenim/init.sh -sSf | sh\n\n# Add to PATH\n# Change the path to fit with your system\n# Tip: Consider adding it to the terminal configuration file (e.g., ~/.bashrc)\n$ export PATH=/home/<username>/.nimble/bin:$PATH\n\n# Install and use nim 1.6.4 \n# NOTE: This step should be done even when nim is already installed\n$ choosenim 1.6.14\n```\n\nNow that nim is installed, we can install the custom mosdepth. To not override an existing mosdepth installation, we install it into a separate directory:\n\n```\n# Make a directory for installing the nim packages into\n$ mkdir mosdepth_installation\n\n# Install modified mosdepth\n$ NIMBLE_DIR=mosdepth_installation nimble install -y https://github.com/LudvigOlsen/mosdepth\n\n# Get path to mosdepth binary to use in the software\n$ find mosdepth_installation/pkgs/ -name \"mosdepth*\"\n>> mosdepth_installation/pkgs/mosdepth-0.x.x/mosdepth\n\n```\n\n## Get Resources\n\nDownload and unzip the required resources.\n```\n$ wget https://zenodo.org/records/14215762/files/inference_resources_v002.tar.gz\n$ tar -xvzf inference_resources_v002.tar.gz \n```\n\n## Main commands\n\nThis section describes the commands in `lionheart` and lists their *main* output files:\n\n| Command                          | Description                                                         | Main Output                                                                         |\n| :------------------------------- | :------------------------------------------------------------------ | :---------------------------------------------------------------------------------- |\n| `lionheart extract_features`     | Extract features from a BAM file.                                   | `feature_dataset.npy` and correction profiles                                       |\n| `lionheart predict_sample`       | Predict cancer status of a sample.                                  | `prediction.csv`                                                                    |\n| `lionheart collect`              | Collect predictions and/or features across samples.                 | `predictions.csv`, `feature_dataset.npy`, and correction profiles *for all samples* |\n| `lionheart customize_thresholds` | Extract ROC curve and more for using custom probability thresholds. | `ROC_curves.json` and `probability_densities.csv`                                   |\n| `lionheart cross_validate`       | Cross-validate the model on new data and/or the included features.  | `evaluation_summary.csv`,  `splits_summary.csv`                                     |\n| `lionheart train_model`          | Train a model on your own data and/or the included features.        | `model.joblib` and training data results                                            |\n| `lionheart validate`             | Validate a model on a validation dataset.                           | `evaluation_scores.csv` and `predictions.csv`                                       |\n| `lionheart evaluate_univariates` | Evaluate the cancer detection potential of each feature separately. | `univariate_evaluations.csv`                                                        |\n\n\n## Examples\n\n### Run via command-line interface\n\nThis example shows how to run lionheart from the command-line.\n\nNote: If you don't have a BAM file at hand, you can download an example BAM file from: https://zenodo.org/records/13909979 \nIt is a downsampled version of a public BAM file from Snyder et al. (2016; 10.1016/j.cell.2015.11.050) that has been remapped to hg38. On our system, the feature extraction for this sample takes ~1h15m using 12 cores (`n_jobs`).\n\n```\n# Start by skimming the help page\n$ lionheart -h\n\n# Extract feature from a given BAM file\n# `mosdepth_path` is the path to the customized `mosdepth` installation\n# E.g., \"/home/<username>/mosdepth/mosdepth\"\n# `ld_library_path` is the path to the `lib` folder in the conda environment\n# E.g., \"/home/<username>/anaconda3/envs/lionheart/lib/\"\n$ lionheart extract_features --bam_file {bam_file} --resources_dir {resources_dir} --out_dir {out_dir} --mosdepth_path {mosdepth_path} --ld_library_path {ld_library_path} --n_jobs {cores}\n\n# `sample_dir` is the `out_dir` of `extract_features`\n$ lionheart predict_sample --sample_dir {sample_dir} --resources_dir {resources_dir} --out_dir {out_dir} --thresholds max_j spec_0.95 spec_0.99 sens_0.95 sens_0.99 0.5 --identifier {sample_id}\n```\n\nAfter running these commands for a set of samples, you can use `lionheart collect` to collect features and predictions across the samples. You can then use `lionheart train_model` to train a model on your own data (and optionally the included features).\n\n\n### Via `gwf` workflow\n\nWe provide a simple workflow for submitting jobs to slurm via the `gwf` package. Make a copy of the `workflow` directory, open `workflow.py`, change the paths and list the samples to run `lionheart` on.\n\nThe first time running a workflow it's required to first set the `gwf` backend to slurm or one of the other ![backends](https://gwf.app/reference/backends/):\n\n```\n# Start by downloading the repository\n$ wget -O lionheart-main.zip https://github.com/BesenbacherLab/lionheart/archive/refs/heads/main.zip\n$ unzip lionheart-main.zip\n\n# Copy workflow directory to a location\n$ cp -r lionheart-main/workflow <location>/workflow\n\n# Navigate to your copy of the the workflow directory\n$ cd <location>/workflow\n\n# Activate conda environment\n$ conda activate lionheart\n\n# Set `gwf` backend to slurm (or another preferred backend)\n$ gwf config set backend slurm\n```\n\nOpen the `workflow.py` file and change the various paths. When you're ready to submit the jobs, run:\n\n```\n$ gwf run\n```\n\n`gwf` allows seeing a status of the submitted jobs:\n\n```\n$ gwf status\n$ gwf status -f summary\n```\n\n### Reproduction of results\n\nThis section shows how to reproduce the main results (cross-validation and external validation) from the paper. It uses the included features so the reproduction can be run without access to the raw sequencing data.\n\nNote that different compilations of scikit-learn on different operating systems may lead to slightly different results. On linux, the results should match the reported results.\n\n#### Cross-validation analysis\n\nWe start by performing the nested leave-one-dataset-out cross-validation analysis from Figure 3A (not including the benchmarks).\n\nNote that the default settings are the ones used in the paper.\n\n```\n# Perform the cross-validation\n# {cv_out_dir} should specify where you want the output files\n$ lionheart cross_validate --out_dir {cv_out_dir} --resources_dir {resources_dir} --use_included_features --num_jobs 10\n```\n\nThe output directory should now include multiple files. The main results are in `evaluation_summary.csv` and `splits_summary.csv`. Note that the results are given for multiple probability thresholds. The threshold reported in the paper is the \"Max. J Threshold\". You can extract the relevant lines of the summaries with:\n\n```\n$ awk 'NR==1 || /Average/ && /J Threshold/' {cv_out_dir}/evaluation_summary.csv\n$ awk 'NR==1 || /Average/ && /J Threshold/' {cv_out_dir}/splits_summary.csv\n```\n\n#### External validation analysis\n\nTo reproduce the external validation, we first train a model on all the included training datasets and then validate it on the included validation dataset:\n\n```\n# Train a model on the included datasets\n# {new_model_dir} should specify where you want the model files\n$ lionheart train_model --out_dir {new_model_dir} --resources_dir {resources_dir} --use_included_features\n\n# Validate the model on the included validation dataset\n# {val_out_dir} should specify where you want the output files\n$ lionheart validate --out_dir {val_out_dir} --resources_dir {resources_dir} --model_dir {new_model_dir} --use_included_validation --thresholds 'max_j'\n```\n\nThe model training creates the `model.joblib` file along with predictions and evaluations from the *training data* (e.g., `predictions.csv`, `evaluation_scores.csv`, and `ROC_curves.json`).\n\nThe validation creates `evaluation_scores.csv` and `predictions.csv` from applying the model on the validation dataset. You will find the reported AUC score in `evaluation_scores.csv`:\n\n```\n$ cat {val_out_dir}/evaluation_scores.csv\n```\n\n#### Univariate analyses\n\nFinally, we reproduce the univariate modeling evaluations in Figure 2D and 2E:\n\n```\n# Evaluate the classification potential of each cell type separately\n# {univariates_dir} should specify where you want the evaluation files\n$ lionheart evaluate_univariates --out_dir {univariates_dir} --resources_dir {resources_dir} --use_included_features --num_jobs 10\n```\n\nThis creates the `univariate_evaluations.csv` file with evaluation metrics per cell-type. There are coefficients and p-values (bonferroni-corrected) from univariate logistic regression models and evaluation metrics from per-cell-type leave-one-dataset-out cross-validation.",
    "bugtrack_url": null,
    "license": null,
    "summary": "LIONHEART is a method for detecting cancer from whole genome sequenced plasma cell-free DNA. Check the README for additional installation steps.",
    "version": "1.1.5",
    "project_urls": {
        "Changelog": "https://github.com/BesenbacherLab/lionheart/blob/main/CHANGELOG.md",
        "Homepage": "https://github.com/BesenbacherLab/lionheart",
        "Issues": "https://github.com/BesenbacherLab/lionheart/issues",
        "Repository": "https://github.com/BesenbacherLab/lionheart",
        "Resources": "https://zenodo.org/records/14215762"
    },
    "split_keywords": [
        "cancer detection",
        " cancer",
        " cell-free dna",
        " cfdna",
        " fragmentomics",
        " nucleosomics",
        " bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20e74a5b5c2666075574c47c219f7a1e6ad87c44d6964e59790a12bb9801cbe0",
                "md5": "9814de359c03b2d20652806cec20beba",
                "sha256": "2c81f41e598754635a9067edbf9f845ed16e82f53f521a98a4e062add5d29c94"
            },
            "downloads": -1,
            "filename": "lionheart-1.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9814de359c03b2d20652806cec20beba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 116317,
            "upload_time": "2025-02-04T16:42:20",
            "upload_time_iso_8601": "2025-02-04T16:42:20.231111Z",
            "url": "https://files.pythonhosted.org/packages/20/e7/4a5b5c2666075574c47c219f7a1e6ad87c44d6964e59790a12bb9801cbe0/lionheart-1.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ad4c105d34130e104e242ebc56a42105e255d4a1bfbc7206dc093b77a651437",
                "md5": "54513264b0de0660d1fe196356f301d8",
                "sha256": "ec9adfdfa4fc9279aa6a5b651b82c9c9474b8c27e66d6c66644423275f449cd1"
            },
            "downloads": -1,
            "filename": "lionheart-1.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "54513264b0de0660d1fe196356f301d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 88808,
            "upload_time": "2025-02-04T16:42:22",
            "upload_time_iso_8601": "2025-02-04T16:42:22.203928Z",
            "url": "https://files.pythonhosted.org/packages/4a/d4/c105d34130e104e242ebc56a42105e255d4a1bfbc7206dc093b77a651437/lionheart-1.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-04 16:42:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BesenbacherLab",
    "github_project": "lionheart",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lionheart"
}
        
Elapsed time: 0.42906s