[](http://bioconda.github.io/recipes/binette/README.html) [](https://anaconda.org/bioconda/binette)
[](https://anaconda.org/bioconda/binette)
[](https://anaconda.org/bioconda/binette)
[](https://badge.fury.io/py/Binette)
[](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b)
[](https://genotoul-bioinfo.github.io/Binette/)
[](https://github.com/genotoul-bioinfo/Binette/actions/workflows)
[](https://binette.readthedocs.io/en/latest/?badge=latest)
# Binette
**Binette** is a fast and accurate binning refinement tool designed to construct high-quality MAGs from the output of multiple binning tools.
### How It Works
From the input bin sets, Binette constructs new **hybrid bins**. A **bin** is a set of contigs. When at least two bins overlap (i.e., share at least one contig), Binette applies fundamental set operations to generate new bins:
- **Intersection Bin**: Contains contigs that are shared by overlapping bins.
- **Difference Bin**: Includes contigs that are unique to one bin and not found in others.
- **Union Bin**: Encompasses all contigs from the overlapping bins.
Binette then evaluates bin quality using **CheckM2**, ensuring the selection of the best possible bins.
### Why Binette?
Binette is inspired by the **metaWRAP bin-refinement tool** but effectively addresses its limitations. Key improvements include:
- **Enhanced Speed**
Binette significantly accelerates the refinement process by running **CheckM2's** initial steps (e.g., Prodigal and Diamond) only once for all contigs. These intermediate results are then reused for bin quality assessment, eliminating redundant computations.
- **No Limit on Input Bin Sets**
Unlike metaWRAP, Binette supports **any number of input bin sets**, allowing seamless processing of multiple binning outputs.
<!--
- **Improved Bin Selection**
Binette selects the best bins in a more accurate and elegant manner.
- **User-Friendly**
Designed for ease of use with a streamlined workflow.
-->
> [!NOTE]
> For a detailed guide and tutorial, see the [Binette documentation](https://binette.readthedocs.io/).
## Installation
### With Bioconda
Binette can be easily installed with conda
```bash
conda create -c bioconda -c defaults -c conda-forge -n binette binette
conda activate binette
```
Binette should be able to run :
```
binette -h
```
### From a conda environment
Clone this repository:
```
git clone https://github.com/genotoul-bioinfo/Binette
cd Binette
```
Then create a Conda environment using the `binette.yaml` file:
```
conda env create -n binette -f binette.yaml
conda activate binette
```
Finally install Binette with pip
```
pip install .
```
Binette should be able to run :
```
binette -h
```
### Downloading the CheckM2 database
Before using Binette, it is necessary to download the CheckM2 database:
```bash
checkm2 database --download --path <checkm2/database/>
```
Make sure to replace `<checkm2/database/>` with the desired path where you want to store the CheckM2 database.
## Usage
### Input Formats
Binette supports two input formats for bin sets:
1. **Contig2bin Tables:** You can provide bin sets using contig2bin tables, which establish the relationship between each contig and its corresponding bin. In this format, you need to specify the `--contig2bin_tables` argument.
For example, consider the following two `contig2bin_tables`:
- `bin_set1.tsv`:
```tsv
contig_1 binA
contig_8 binA
contig_15 binB
contig_9 binC
```
- `bin_set2.tsv`:
```tsv
contig_1 bin.0
contig_8 bin.0
contig_15 bin.1
contig_9 bin.2
contig_10 bin.0
```
The `binette` command to process this input would be:
```bash
binette --contig2bin_tables bin_set1.tsv bin_set2.tsv --contigs assembly.fasta
```
2. **Bin Directories:** Alternatively, you can use bin directories, where each bin is represented by a separate FASTA file. For this format, you need to provide the `--bin_dirs` argument. Here's an example of two bin directories:
```
bin_set1/
├── binA.fa: contains sequences of contig_1, contig_8
├── binB.fa: contains sequences of contig_15
└── binC.fa: contains sequences of contig_9
```
```
bin_set2/
├── binA.fa: contains sequences of contig_1, contig_8, contig_10
├── binB.fa: contains sequences of contig_15
└── binC.fa: contains sequences of contig_9
```
The `binette` command to process this input would be:
```bash
binette --bin_dirs bin_set1 bin_set2 --contigs assembly.fasta
```
In both formats, the `--contigs` argument should specify a FASTA file containing all the contigs found in the bins. Typically, this file would be the assembly FASTA file used to generate the bins. In these examples the `assembly.fasta` file should contain at least the five contigs mentioned in the `contig2bin_tables` files or in the bin fasta files: `contig_1`, `contig_8`, `contig_15`, `contig_9`, and `contig_10`.
### Outputs
Binette results are stored in the `results` directory. You can specify a different directory using the `--outdir` option.
In this directory you will find:
- **`final_bins_quality_reports.tsv`**: This is a TSV (tab-separated values) file containing quality information about the final selected bins.
- **`final_bins/`**: This directory stores all the selected bins in fasta format. Can be skipped with `--no-write-fasta-bins`.
- **`final_contig_to_bin.tsv`**: A headerless TSV file mapping each contig to its assigned bin. This format is much lighter than the fasta output to describe the final Binette bins.
- **`input_bins_quality_reports/`**: A directory storing quality reports for the input bin sets, with files following the same structure as `final_bins_quality_reports.tsv`.
- **`temporary_files/`**: This directory contains intermediate files. If you choose to use the `--resume` option, Binette will utilize files in this directory to prevent the recomputation of time-consuming steps.
The `final_bins_quality_reports.tsv` file contains the following columns:
| Column Name | Description |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **name** | The unique name of the bin. |
| **origin** | Indicates the source of the bin: either an original bin set (e.g., `B`) or `binette` for intermediate bins. |
| **is\_original** | Boolean flag indicating if the bin is an original bin (`True`) or an intermediate bin (`False`). |
| **original\_name** | The name of the original bin from which this bin was derived. |
| **completeness** | The completeness of the bin, determined by CheckM2. |
| **contamination** | The contamination of the bin, determined by CheckM2. |
| **checkm2\_model** | The CheckM2 model used for quality prediction: `Gradient Boost (General Model)` or `Neural Network (Specific Model)`.|
| **score** | Computed score: `completeness - contamination * weight`. The contamination weight can be customized using the `--contamination_weight` option. |
| **size** | Total size of the bin in nucleotides. |
| **N50** | The N50 of the bin, representing the length for which 50% of the total nucleotides are in contigs of that length or longer. |
| **coding\_density** | The percentage of the bin that codes for proteins (genes length / total bin length × 100). Only computed when genes are freshly identified. Empty when using `--proteins` or `--resume` options. |
| **contig\_count** | Number of contigs contained within the bin. |
## Help, feature requests and bug reporting
To report bugs, request new features, or seek help and support, please open an [issue](https://github.com/genotoul-bioinfo/Binette/issues).
## Licence
This tool is released as open source software under the terms of the [GNU General Public Licence](LICENSE).
## Citation
Binette is a scientific software tool with a [published paper](https://joss.theoj.org/papers/10.21105/joss.06782) in the [Journal of Open Source Software](https://joss.theoj.org/). If you use Binette in academic research, please cite:
> **Binette: a fast and accurate bin refinement tool to construct high-quality Metagenome Assembled Genomes.**
> Mainguy et al., (2024).
> *Journal of Open Source Software, 9(102), 6782.*
> doi: [10.21105/joss.06782](https://doi.org/10.21105/joss.06782)
Binette extensively uses **CheckM2**. If your work relies on Binette, consider citing:
> **CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning.**
> Chklovski, Alex, et al. (2023).
> *Nature Methods, 20(8), 1203-1212.*
> doi: [10.1038/s41592-023-01940-w](https://doi.org/10.1038/s41592-023-01940-w)
Raw data
{
"_id": null,
"home_page": null,
"name": "Binette",
"maintainer": "Jean Mainguy",
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "Bioinformatics, Prokaryote, Binning, Refinement, Metagenomics",
"author": "Jean Mainguy",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f2/03/1261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd/binette-1.2.1.tar.gz",
"platform": null,
"description": "[](http://bioconda.github.io/recipes/binette/README.html) [](https://anaconda.org/bioconda/binette)\n[](https://anaconda.org/bioconda/binette) \n[](https://anaconda.org/bioconda/binette)\n[](https://badge.fury.io/py/Binette)\n[](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b)\n\n[](https://genotoul-bioinfo.github.io/Binette/) \n[](https://github.com/genotoul-bioinfo/Binette/actions/workflows)\n[](https://binette.readthedocs.io/en/latest/?badge=latest)\n\n\n# Binette\n\n**Binette** is a fast and accurate binning refinement tool designed to construct high-quality MAGs from the output of multiple binning tools. \n\n### How It Works \n\nFrom the input bin sets, Binette constructs new **hybrid bins**. A **bin** is a set of contigs. When at least two bins overlap (i.e., share at least one contig), Binette applies fundamental set operations to generate new bins: \n\n- **Intersection Bin**: Contains contigs that are shared by overlapping bins. \n- **Difference Bin**: Includes contigs that are unique to one bin and not found in others. \n- **Union Bin**: Encompasses all contigs from the overlapping bins. \n\nBinette then evaluates bin quality using **CheckM2**, ensuring the selection of the best possible bins. \n\n### Why Binette? \n\nBinette is inspired by the **metaWRAP bin-refinement tool** but effectively addresses its limitations. Key improvements include: \n\n- **Enhanced Speed** \n Binette significantly accelerates the refinement process by running **CheckM2's** initial steps (e.g., Prodigal and Diamond) only once for all contigs. These intermediate results are then reused for bin quality assessment, eliminating redundant computations. \n\n- **No Limit on Input Bin Sets** \n Unlike metaWRAP, Binette supports **any number of input bin sets**, allowing seamless processing of multiple binning outputs. \n\n<!-- \n- **Improved Bin Selection** \n Binette selects the best bins in a more accurate and elegant manner. \n\n- **User-Friendly** \n Designed for ease of use with a streamlined workflow. \n-->\n\n\n> [!NOTE]\n> For a detailed guide and tutorial, see the [Binette documentation](https://binette.readthedocs.io/). \n\n\n## Installation\n\n### With Bioconda\n\nBinette can be easily installed with conda \n\n```bash\nconda create -c bioconda -c defaults -c conda-forge -n binette binette\nconda activate binette\n```\n\nBinette should be able to run :\n\n```\nbinette -h\n```\n\n\n### From a conda environment\n\nClone this repository: \n```\ngit clone https://github.com/genotoul-bioinfo/Binette\ncd Binette\n```\n\nThen create a Conda environment using the `binette.yaml` file:\n```\nconda env create -n binette -f binette.yaml\nconda activate binette \n```\n\nFinally install Binette with pip\n\n```\npip install .\n```\n\nBinette should be able to run :\n\n```\nbinette -h\n```\n\n\n### Downloading the CheckM2 database\n\nBefore using Binette, it is necessary to download the CheckM2 database:\n\n```bash\ncheckm2 database --download --path <checkm2/database/>\n```\n\nMake sure to replace `<checkm2/database/>` with the desired path where you want to store the CheckM2 database.\n\n\n## Usage \n\n### Input Formats\n\nBinette supports two input formats for bin sets: \n\n1. **Contig2bin Tables:** You can provide bin sets using contig2bin tables, which establish the relationship between each contig and its corresponding bin. In this format, you need to specify the `--contig2bin_tables` argument. \n\nFor example, consider the following two `contig2bin_tables`:\n\n- `bin_set1.tsv`:\n\n ```tsv\n contig_1 binA\n contig_8 binA\n contig_15 binB\n contig_9 binC\n ```\n \n- `bin_set2.tsv`:\n\n ```tsv\n contig_1 bin.0\n contig_8 bin.0\n contig_15 bin.1\n contig_9 bin.2\n contig_10 bin.0\n ```\n \n The `binette` command to process this input would be:\n \n ```bash\n binette --contig2bin_tables bin_set1.tsv bin_set2.tsv --contigs assembly.fasta\n ```\n\n2. **Bin Directories:** Alternatively, you can use bin directories, where each bin is represented by a separate FASTA file. For this format, you need to provide the `--bin_dirs` argument. Here's an example of two bin directories:\n\n ```\n bin_set1/\n \u251c\u2500\u2500 binA.fa: contains sequences of contig_1, contig_8\n \u251c\u2500\u2500 binB.fa: contains sequences of contig_15\n \u2514\u2500\u2500 binC.fa: contains sequences of contig_9\n ```\n \n ```\n bin_set2/\n \u251c\u2500\u2500 binA.fa: contains sequences of contig_1, contig_8, contig_10\n \u251c\u2500\u2500 binB.fa: contains sequences of contig_15\n \u2514\u2500\u2500 binC.fa: contains sequences of contig_9\n ```\n \n The `binette` command to process this input would be:\n \n ```bash\n binette --bin_dirs bin_set1 bin_set2 --contigs assembly.fasta\n ```\n\nIn both formats, the `--contigs` argument should specify a FASTA file containing all the contigs found in the bins. Typically, this file would be the assembly FASTA file used to generate the bins. In these examples the `assembly.fasta` file should contain at least the five contigs mentioned in the `contig2bin_tables` files or in the bin fasta files: `contig_1`, `contig_8`, `contig_15`, `contig_9`, and `contig_10`.\n\n### Outputs\n\nBinette results are stored in the `results` directory. You can specify a different directory using the `--outdir` option.\n\nIn this directory you will find:\n- **`final_bins_quality_reports.tsv`**: This is a TSV (tab-separated values) file containing quality information about the final selected bins.\n- **`final_bins/`**: This directory stores all the selected bins in fasta format. Can be skipped with `--no-write-fasta-bins`.\n- **`final_contig_to_bin.tsv`**: A headerless TSV file mapping each contig to its assigned bin. This format is much lighter than the fasta output to describe the final Binette bins.\n- **`input_bins_quality_reports/`**: A directory storing quality reports for the input bin sets, with files following the same structure as `final_bins_quality_reports.tsv`.\n- **`temporary_files/`**: This directory contains intermediate files. If you choose to use the `--resume` option, Binette will utilize files in this directory to prevent the recomputation of time-consuming steps.\n\n\nThe `final_bins_quality_reports.tsv` file contains the following columns:\n| Column Name | Description |\n| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |\n| **name** | The unique name of the bin. |\n| **origin** | Indicates the source of the bin: either an original bin set (e.g., `B`) or `binette` for intermediate bins. |\n| **is\\_original** | Boolean flag indicating if the bin is an original bin (`True`) or an intermediate bin (`False`). |\n| **original\\_name** | The name of the original bin from which this bin was derived. |\n| **completeness** | The completeness of the bin, determined by CheckM2. |\n| **contamination** | The contamination of the bin, determined by CheckM2. |\n| **checkm2\\_model** | The CheckM2 model used for quality prediction: `Gradient Boost (General Model)` or `Neural Network (Specific Model)`.|\n| **score** | Computed score: `completeness - contamination * weight`. The contamination weight can be customized using the `--contamination_weight` option. |\n| **size** | Total size of the bin in nucleotides. |\n| **N50** | The N50 of the bin, representing the length for which 50% of the total nucleotides are in contigs of that length or longer. |\n| **coding\\_density** | The percentage of the bin that codes for proteins (genes length / total bin length \u00d7 100). Only computed when genes are freshly identified. Empty when using `--proteins` or `--resume` options. |\n| **contig\\_count** | Number of contigs contained within the bin. |\n \n\n\n## Help, feature requests and bug reporting\n\nTo report bugs, request new features, or seek help and support, please open an [issue](https://github.com/genotoul-bioinfo/Binette/issues). \n\n\n## Licence\n\nThis tool is released as open source software under the terms of the [GNU General Public Licence](LICENSE).\n\n## Citation \n\nBinette is a scientific software tool with a [published paper](https://joss.theoj.org/papers/10.21105/joss.06782) in the [Journal of Open Source Software](https://joss.theoj.org/). If you use Binette in academic research, please cite: \n\n> **Binette: a fast and accurate bin refinement tool to construct high-quality Metagenome Assembled Genomes.** \n> Mainguy et al., (2024). \n> *Journal of Open Source Software, 9(102), 6782.* \n> doi: [10.21105/joss.06782](https://doi.org/10.21105/joss.06782) \n\n\nBinette extensively uses **CheckM2**. If your work relies on Binette, consider citing: \n\n> **CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning.** \n> Chklovski, Alex, et al. (2023). \n> *Nature Methods, 20(8), 1203-1212.* \n> doi: [10.1038/s41592-023-01940-w](https://doi.org/10.1038/s41592-023-01940-w)\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Binette: accurate binning refinement tool to constructs high quality MAGs.",
"version": "1.2.1",
"project_urls": {
"Documentation": "https://binette.readthedocs.io",
"Repository": "https://github.com/genotoul-bioinfo/Binette"
},
"split_keywords": [
"bioinformatics",
" prokaryote",
" binning",
" refinement",
" metagenomics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a950661c6dbba50a218eed9b30dbda8d194885bd10b31d8628a066138fc4c723",
"md5": "b1dc05d098125a7b727266f1c0643985",
"sha256": "a0446fc4fe34fcbb4dabeed1ba744e766381d0eef5a2fe95b44a54285775dbfd"
},
"downloads": -1,
"filename": "binette-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b1dc05d098125a7b727266f1c0643985",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 46282,
"upload_time": "2025-11-07T10:30:04",
"upload_time_iso_8601": "2025-11-07T10:30:04.154828Z",
"url": "https://files.pythonhosted.org/packages/a9/50/661c6dbba50a218eed9b30dbda8d194885bd10b31d8628a066138fc4c723/binette-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f2031261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd",
"md5": "86a16e9565a6618c56e1bf1c91a016fc",
"sha256": "26acda9889a95497581503eee357c976477a0dcdc2174679fdd94bf68c8917b4"
},
"downloads": -1,
"filename": "binette-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "86a16e9565a6618c56e1bf1c91a016fc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 123146,
"upload_time": "2025-11-07T10:30:05",
"upload_time_iso_8601": "2025-11-07T10:30:05.344124Z",
"url": "https://files.pythonhosted.org/packages/f2/03/1261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd/binette-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-07 10:30:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "genotoul-bioinfo",
"github_project": "Binette",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "binette"
}