# CroCoDeEL : **CRO**ss-sample **CO**ntamination **DE**tection and **E**stimation of its **L**evel 🐊
[](https://anaconda.org/bioconda/crocodeel)
[](https://pypi.org/project/crocodeel/)
## Introduction
CroCoDeEL is a tool that detects cross-sample contamination (aka well-to-well leakage) in shotgun metagenomic data.\
It accurately identifies contaminated samples but also pinpoints contamination sources and estimates contamination rates.\
CroCoDeEL relies only on species abundance tables and does not need negative controls nor sample position during processing (i.e. plate maps).
<p align="center">
<img src="docs/logos/logo.webp" width="350" height="350" alt="logo">
</p>
## Installation
CroCoDeEL is available on bioconda:
```
conda create --name crocodeel_env -c conda-forge -c bioconda crocodeel
conda activate crocodeel_env
```
Alternatively, you can use pip:
```
pip install crocodeel
```
Docker and Singularity containers are also available on [BioContainers](https://biocontainers.pro/tools/crocodeel)
## Installation test
You can test that CroCoDeEL is correctly installed with the following command:
```
crocodeel test_install
```
## Quick start
### Input
CroCoDeEL takes as input a species abundance table in TSV format.\
The first column should correspond to species names. The other columns correspond to the abundance of species in each sample.\
An example is available [here](crocodeel/test_data/mgs_profiles_test.tsv).
| species_name | sample1 | sample2 | sample3 | ... |
|:----------------|:-------:|:-------:|:-------:|:--------:|
| species 1 | 0 | 0.05 | 0.07 | ... |
| species 2 | 0.1 | 0.01 | 0 | ... |
| ... | ... | ... | ... | ... |
CroCoDeEL works with relative abundances.
The table will automatically be normalized so the abundance of each column equals 1.
**Important**: CroCoDeEL requires accurate estimation of the abundance of subdominant species.\
We strongly recommend using [the Meteor software suite](https://github.com/metagenopolis/meteor) to generate the species abundance table.\
Alternatively, MetaPhlan4 can be used, although it will fail to detect low-level contaminations.
We advise against using other taxonomic profilers that, according to our benchmarks, do not meet this requirement.
### Search for contamination
Run the following command to identify cross-sample contamination:
```
crocodeel search_conta -s species_abundance.tsv -c contamination_events.tsv
```
CroCoDeEL will output all detected contamination events in the file _contamination_events.tsv_.\
This TSV file includes the following details for each contamination event:
- The contamination source
- The contaminated sample (target)
- The estimated contamination rate
- The score (probability) computed by the Random Forest model
- The species specifically introduced into the target by contamination
An example output file is available [here](crocodeel/test_data/results/contamination_events.tsv).
If you are using MetaPhlan4, we strongly recommend filtering out low-abundance species to improve CroCoDeEL's sensitivity.\
Use the _--filter-low-ab_ option as shown below:
```
crocodeel search_conta -s species_abundance.tsv --filter-low-ab 20 -c contamination_events.tsv
```
### Visualization of the results
Contaminations events can be visually inspected by generating a PDF file consisting in scatterplots.
```
crocodeel plot_conta -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf
```
Each scatterplot compares in a log-scale the species abundance profiles of a contaminated sample (x-axis) and its contamination source (y-axis).\
The contamination line (in red) highlights species specifically introduced by contamination.\
An example is available [here](crocodeel/test_data/results/contamination_events.pdf).
### Easy workflow
Alternatively, you can search for cross-sample contamination and create the PDF report in one command.
```
crocodeel easy_wf -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf
```
### Results interpretation
CroCoDeEL will probably report false contamination events for samples with similar species abundances profiles (e.g. longitudinal data, animals raised together).\
For non-related samples, CroCoDeEL may occasionally generate false positives that can be filtered out by a human-expert.\
Thus, we strongly recommend inspecting scatterplots of each contamination event to discard potential false positives.\
Please check the [wiki](https://github.com/metagenopolis/CroCoDeEL/wiki) for more information.
## Citation
If you find CroCoDeEL useful, please cite:\
Goulet, L. et al. "CroCoDeEL: Accurate control-free detection of cross-sample contamination in metagenomic data." *bioRxiv* (2025). [https://doi.org/10.1101/2025.01.15.633153](https://doi.org/10.1101/2025.01.15.633153).
Raw data
{
"_id": null,
"home_page": "https://github.com/metagenopolis/CroCoDeEL",
"name": "crocodeel",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "Metagenomics",
"author": "Lindsay Goulet",
"author_email": "lindsay.goulet@inrae.fr",
"download_url": "https://files.pythonhosted.org/packages/98/bc/e9d7983a5f3c3fe6607e305bbaf7b79a662bc2d5c7e463a5d90495f235aa/crocodeel-1.0.6.tar.gz",
"platform": null,
"description": "# CroCoDeEL : **CRO**ss-sample **CO**ntamination **DE**tection and **E**stimation of its **L**evel \ud83d\udc0a\n\n[](https://anaconda.org/bioconda/crocodeel)\n[](https://pypi.org/project/crocodeel/)\n\n## Introduction\n\nCroCoDeEL is a tool that detects cross-sample contamination (aka well-to-well leakage) in shotgun metagenomic data.\\\nIt accurately identifies contaminated samples but also pinpoints contamination sources and estimates contamination rates.\\\nCroCoDeEL relies only on species abundance tables and does not need negative controls nor sample position during processing (i.e. plate maps).\n\n<p align=\"center\">\n <img src=\"docs/logos/logo.webp\" width=\"350\" height=\"350\" alt=\"logo\">\n</p>\n\n## Installation\n\nCroCoDeEL is available on bioconda:\n```\nconda create --name crocodeel_env -c conda-forge -c bioconda crocodeel\nconda activate crocodeel_env\n```\n\nAlternatively, you can use pip:\n```\npip install crocodeel\n```\n\nDocker and Singularity containers are also available on [BioContainers](https://biocontainers.pro/tools/crocodeel)\n\n## Installation test\n\nYou can test that CroCoDeEL is correctly installed with the following command:\n```\ncrocodeel test_install\n```\n\n## Quick start\n### Input\nCroCoDeEL takes as input a species abundance table in TSV format.\\\nThe first column should correspond to species names. The other columns correspond to the abundance of species in each sample.\\\nAn example is available [here](crocodeel/test_data/mgs_profiles_test.tsv).\n\n| species_name | sample1 | sample2 | sample3 | ... | \n|:----------------|:-------:|:-------:|:-------:|:--------:| \n| species 1 | 0 | 0.05 | 0.07 | ... | \n| species 2 | 0.1 | 0.01 | 0 | ... | \n| ... | ... | ... | ... | ... | \n\nCroCoDeEL works with relative abundances.\nThe table will automatically be normalized so the abundance of each column equals 1.\n\n**Important**: CroCoDeEL requires accurate estimation of the abundance of subdominant species.\\\nWe strongly recommend using [the Meteor software suite](https://github.com/metagenopolis/meteor) to generate the species abundance table.\\\nAlternatively, MetaPhlan4 can be used, although it will fail to detect low-level contaminations.\nWe advise against using other taxonomic profilers that, according to our benchmarks, do not meet this requirement.\n\n### Search for contamination\nRun the following command to identify cross-sample contamination:\n```\ncrocodeel search_conta -s species_abundance.tsv -c contamination_events.tsv\n```\nCroCoDeEL will output all detected contamination events in the file _contamination_events.tsv_.\\\nThis TSV file includes the following details for each contamination event:\n- The contamination source\n- The contaminated sample (target)\n- The estimated contamination rate\n- The score (probability) computed by the Random Forest model\n- The species specifically introduced into the target by contamination\n\nAn example output file is available [here](crocodeel/test_data/results/contamination_events.tsv).\n\nIf you are using MetaPhlan4, we strongly recommend filtering out low-abundance species to improve CroCoDeEL's sensitivity.\\\nUse the _--filter-low-ab_ option as shown below:\n```\ncrocodeel search_conta -s species_abundance.tsv --filter-low-ab 20 -c contamination_events.tsv\n```\n\n### Visualization of the results\nContaminations events can be visually inspected by generating a PDF file consisting in scatterplots.\n```\ncrocodeel plot_conta -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf\n```\nEach scatterplot compares in a log-scale the species abundance profiles of a contaminated sample (x-axis) and its contamination source (y-axis).\\\nThe contamination line (in red) highlights species specifically introduced by contamination.\\\nAn example is available [here](crocodeel/test_data/results/contamination_events.pdf).\n\n### Easy workflow\nAlternatively, you can search for cross-sample contamination and create the PDF report in one command.\n```\ncrocodeel easy_wf -s species_abundance.tsv -c contamination_events.tsv -r contamination_events.pdf\n```\n\n### Results interpretation\nCroCoDeEL will probably report false contamination events for samples with similar species abundances profiles (e.g. longitudinal data, animals raised together).\\\nFor non-related samples, CroCoDeEL may occasionally generate false positives that can be filtered out by a human-expert.\\\nThus, we strongly recommend inspecting scatterplots of each contamination event to discard potential false positives.\\\nPlease check the [wiki](https://github.com/metagenopolis/CroCoDeEL/wiki) for more information.\n\n## Citation\nIf you find CroCoDeEL useful, please cite:\\\nGoulet, L. et al. \"CroCoDeEL: Accurate control-free detection of cross-sample contamination in metagenomic data.\" *bioRxiv* (2025). [https://doi.org/10.1101/2025.01.15.633153](https://doi.org/10.1101/2025.01.15.633153).",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "CroCoDeEL is a tool that detects cross-sample (aka well-to-well) contamination in shotgun metagenomic data",
"version": "1.0.6",
"project_urls": {
"Homepage": "https://github.com/metagenopolis/CroCoDeEL",
"Repository": "https://github.com/metagenopolis/CroCoDeEL"
},
"split_keywords": [
"metagenomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a2c9b6ab95c681a2adc785be25056f0a17d2c9e3810bf9f71d0fe4d2bc64951",
"md5": "86407f7b4d85beca505668bb2ea9f030",
"sha256": "8f91e54cf0d3b1a9a03b04cb8a83f39827fe7b39fab442de216fc82dbb841236"
},
"downloads": -1,
"filename": "crocodeel-1.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "86407f7b4d85beca505668bb2ea9f030",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 736820,
"upload_time": "2025-01-20T20:56:34",
"upload_time_iso_8601": "2025-01-20T20:56:34.103542Z",
"url": "https://files.pythonhosted.org/packages/9a/2c/9b6ab95c681a2adc785be25056f0a17d2c9e3810bf9f71d0fe4d2bc64951/crocodeel-1.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "98bce9d7983a5f3c3fe6607e305bbaf7b79a662bc2d5c7e463a5d90495f235aa",
"md5": "147062175d12b1ad44758f1791e5352c",
"sha256": "6b9f4c4e8267c8c4de055c380a56ee402dd86e12efce74b80415e89ab786b7f9"
},
"downloads": -1,
"filename": "crocodeel-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "147062175d12b1ad44758f1791e5352c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 738446,
"upload_time": "2025-01-20T20:56:36",
"upload_time_iso_8601": "2025-01-20T20:56:36.251539Z",
"url": "https://files.pythonhosted.org/packages/98/bc/e9d7983a5f3c3fe6607e305bbaf7b79a662bc2d5c7e463a5d90495f235aa/crocodeel-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-20 20:56:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "metagenopolis",
"github_project": "CroCoDeEL",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "crocodeel"
}