Binette


NameBinette JSON
Version 1.2.1 PyPI version JSON
download
home_pageNone
SummaryBinette: accurate binning refinement tool to constructs high quality MAGs.
upload_time2025-11-07 10:30:05
maintainerJean Mainguy
docs_urlNone
authorJean Mainguy
requires_python>=3.12
licenseNone
keywords bioinformatics prokaryote binning refinement metagenomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html)  [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg?cache-control=no-cache)](https://anaconda.org/bioconda/binette) 
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg?cache-control=no-cache)](https://anaconda.org/bioconda/binette)
[![PyPI version](https://badge.fury.io/py/Binette.svg?cache-control=no-cache)](https://badge.fury.io/py/Binette)
[![status](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b/status.svg)](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b)

[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/) 
[![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows)
[![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)


# Binette

**Binette** is a fast and accurate binning refinement tool designed to construct high-quality MAGs from the output of multiple binning tools.  

### How It Works  

From the input bin sets, Binette constructs new **hybrid bins**. A **bin** is a set of contigs. When at least two bins overlap (i.e., share at least one contig), Binette applies fundamental set operations to generate new bins:  

- **Intersection Bin**: Contains contigs that are shared by overlapping bins.  
- **Difference Bin**: Includes contigs that are unique to one bin and not found in others.  
- **Union Bin**: Encompasses all contigs from the overlapping bins.  

Binette then evaluates bin quality using **CheckM2**, ensuring the selection of the best possible bins.  

### Why Binette?  

Binette is inspired by the **metaWRAP bin-refinement tool** but effectively addresses its limitations. Key improvements include:  

- **Enhanced Speed**  
  Binette significantly accelerates the refinement process by running **CheckM2's** initial steps (e.g., Prodigal and Diamond) only once for all contigs. These intermediate results are then reused for bin quality assessment, eliminating redundant computations.  

- **No Limit on Input Bin Sets**  
  Unlike metaWRAP, Binette supports **any number of input bin sets**, allowing seamless processing of multiple binning outputs.  

<!--  
- **Improved Bin Selection**  
  Binette selects the best bins in a more accurate and elegant manner.  

- **User-Friendly**  
  Designed for ease of use with a streamlined workflow.  
-->


> [!NOTE]
> For a detailed guide and tutorial, see the [Binette documentation](https://binette.readthedocs.io/). 


## Installation

### With Bioconda

Binette can be easily installed with conda 

```bash
conda create -c bioconda -c defaults -c conda-forge -n binette binette
conda activate binette
```

Binette should be able to run :

```
binette -h
```


### From a conda environment

Clone this repository: 
```
git clone https://github.com/genotoul-bioinfo/Binette
cd Binette
```

Then create a Conda environment using the `binette.yaml` file:
```
conda env create -n binette -f binette.yaml
conda activate binette 
```

Finally install Binette with pip

```
pip install .
```

Binette should be able to run :

```
binette -h
```


### Downloading the CheckM2 database

Before using Binette, it is necessary to download the CheckM2 database:

```bash
checkm2 database --download --path <checkm2/database/>
```

Make sure to replace `<checkm2/database/>` with the desired path where you want to store the CheckM2 database.


## Usage 

### Input Formats

Binette supports two input formats for bin sets: 

1. **Contig2bin Tables:** You can provide bin sets using contig2bin tables, which establish the relationship between each contig and its corresponding bin. In this format, you need to specify the `--contig2bin_tables` argument. 

For example, consider the following two `contig2bin_tables`:

- `bin_set1.tsv`:

    ```tsv
    contig_1   binA
    contig_8   binA
    contig_15  binB
    contig_9   binC
    ```
    
- `bin_set2.tsv`:

    ```tsv
    contig_1   bin.0
    contig_8   bin.0
    contig_15  bin.1
    contig_9   bin.2
    contig_10  bin.0
    ```
    
    The `binette` command to process this input would be:
    
    ```bash
    binette --contig2bin_tables bin_set1.tsv bin_set2.tsv --contigs assembly.fasta
    ```

2. **Bin Directories:** Alternatively, you can use bin directories, where each bin is represented by a separate FASTA file. For this format, you need to provide the `--bin_dirs` argument. Here's an example of two bin directories:

    ```
    bin_set1/
    ├── binA.fa: contains sequences of contig_1, contig_8
    ├── binB.fa: contains sequences of contig_15
    └── binC.fa: contains sequences of contig_9
    ```
    
    ```
    bin_set2/
    ├── binA.fa: contains sequences of contig_1, contig_8, contig_10
    ├── binB.fa: contains sequences of contig_15
    └── binC.fa: contains sequences of contig_9
    ```
    
    The `binette` command to process this input would be:
    
    ```bash
    binette --bin_dirs bin_set1 bin_set2 --contigs assembly.fasta
    ```

In both formats, the `--contigs` argument should specify a FASTA file containing all the contigs found in the bins. Typically, this file would be the assembly FASTA file used to generate the bins. In these examples the `assembly.fasta` file should contain at least the five contigs mentioned in the `contig2bin_tables` files or in the bin fasta files: `contig_1`, `contig_8`, `contig_15`, `contig_9`, and `contig_10`.

### Outputs

Binette results are stored in the `results` directory. You can specify a different directory using the `--outdir` option.

In this directory you will find:
- **`final_bins_quality_reports.tsv`**: This is a TSV (tab-separated values) file containing quality information about the final selected bins.
- **`final_bins/`**: This directory stores all the selected bins in fasta format. Can be skipped with `--no-write-fasta-bins`.
- **`final_contig_to_bin.tsv`**: A headerless TSV file mapping each contig to its assigned bin. This format is much lighter than the fasta output to describe the final Binette bins.
- **`input_bins_quality_reports/`**: A directory storing quality reports for the input bin sets, with files following the same structure as `final_bins_quality_reports.tsv`.
- **`temporary_files/`**: This directory contains intermediate files. If you choose to use the `--resume` option, Binette will utilize files in this directory to prevent the recomputation of time-consuming steps.


The `final_bins_quality_reports.tsv` file contains the following columns:
| Column Name        | Description                                                                                                                                    |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **name**           | The unique name of the bin.                                                                                                                    |
| **origin**         | Indicates the source of the bin: either an original bin set (e.g., `B`) or `binette` for intermediate bins.                                    |
| **is\_original**   | Boolean flag indicating if the bin is an original bin (`True`) or an intermediate bin (`False`).                                               |
| **original\_name** | The name of the original bin from which this bin was derived.                                                                                  |
| **completeness**   | The completeness of the bin, determined by CheckM2.                                                                                            |
| **contamination**  | The contamination of the bin, determined by CheckM2.                                                                                           |
| **checkm2\_model** | The CheckM2 model used for quality prediction: `Gradient Boost (General Model)` or `Neural Network (Specific Model)`.|
| **score**          | Computed score: `completeness - contamination * weight`. The contamination weight can be customized using the `--contamination_weight` option. |
| **size**           | Total size of the bin in nucleotides.                                                                                                          |
| **N50**            | The N50 of the bin, representing the length for which 50% of the total nucleotides are in contigs of that length or longer.                    |
| **coding\_density** | The percentage of the bin that codes for proteins (genes length / total bin length × 100). Only computed when genes are freshly identified. Empty when using `--proteins` or `--resume` options. |
| **contig\_count**  | Number of contigs contained within the bin.                                                                                                    |
   


## Help, feature requests and bug reporting

To report bugs, request new features, or seek help and support, please open an [issue](https://github.com/genotoul-bioinfo/Binette/issues). 


## Licence

This tool is released as open source software under the terms of the [GNU General Public Licence](LICENSE).

## Citation  

Binette is a scientific software tool with a [published paper](https://joss.theoj.org/papers/10.21105/joss.06782) in the [Journal of Open Source Software](https://joss.theoj.org/). If you use Binette in academic research, please cite:  

> **Binette: a fast and accurate bin refinement tool to construct high-quality Metagenome Assembled Genomes.**  
> Mainguy et al., (2024).    
> *Journal of Open Source Software, 9(102), 6782.*    
> doi: [10.21105/joss.06782](https://doi.org/10.21105/joss.06782)   


Binette extensively uses **CheckM2**. If your work relies on Binette, consider citing:  

> **CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning.**   
> Chklovski, Alex, et al. (2023).  
> *Nature Methods, 20(8), 1203-1212.*    
> doi: [10.1038/s41592-023-01940-w](https://doi.org/10.1038/s41592-023-01940-w)



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "Binette",
    "maintainer": "Jean Mainguy",
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "Bioinformatics, Prokaryote, Binning, Refinement, Metagenomics",
    "author": "Jean Mainguy",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f2/03/1261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd/binette-1.2.1.tar.gz",
    "platform": null,
    "description": "[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html)  [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette)\n[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg?cache-control=no-cache)](https://anaconda.org/bioconda/binette) \n[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg?cache-control=no-cache)](https://anaconda.org/bioconda/binette)\n[![PyPI version](https://badge.fury.io/py/Binette.svg?cache-control=no-cache)](https://badge.fury.io/py/Binette)\n[![status](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b/status.svg)](https://joss.theoj.org/papers/ad304709d59f1a51a31614393b09ba2b)\n\n[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/) \n[![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows)\n[![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)\n\n\n# Binette\n\n**Binette** is a fast and accurate binning refinement tool designed to construct high-quality MAGs from the output of multiple binning tools.  \n\n### How It Works  \n\nFrom the input bin sets, Binette constructs new **hybrid bins**. A **bin** is a set of contigs. When at least two bins overlap (i.e., share at least one contig), Binette applies fundamental set operations to generate new bins:  \n\n- **Intersection Bin**: Contains contigs that are shared by overlapping bins.  \n- **Difference Bin**: Includes contigs that are unique to one bin and not found in others.  \n- **Union Bin**: Encompasses all contigs from the overlapping bins.  \n\nBinette then evaluates bin quality using **CheckM2**, ensuring the selection of the best possible bins.  \n\n### Why Binette?  \n\nBinette is inspired by the **metaWRAP bin-refinement tool** but effectively addresses its limitations. Key improvements include:  \n\n- **Enhanced Speed**  \n  Binette significantly accelerates the refinement process by running **CheckM2's** initial steps (e.g., Prodigal and Diamond) only once for all contigs. These intermediate results are then reused for bin quality assessment, eliminating redundant computations.  \n\n- **No Limit on Input Bin Sets**  \n  Unlike metaWRAP, Binette supports **any number of input bin sets**, allowing seamless processing of multiple binning outputs.  \n\n<!--  \n- **Improved Bin Selection**  \n  Binette selects the best bins in a more accurate and elegant manner.  \n\n- **User-Friendly**  \n  Designed for ease of use with a streamlined workflow.  \n-->\n\n\n> [!NOTE]\n> For a detailed guide and tutorial, see the [Binette documentation](https://binette.readthedocs.io/). \n\n\n## Installation\n\n### With Bioconda\n\nBinette can be easily installed with conda \n\n```bash\nconda create -c bioconda -c defaults -c conda-forge -n binette binette\nconda activate binette\n```\n\nBinette should be able to run :\n\n```\nbinette -h\n```\n\n\n### From a conda environment\n\nClone this repository: \n```\ngit clone https://github.com/genotoul-bioinfo/Binette\ncd Binette\n```\n\nThen create a Conda environment using the `binette.yaml` file:\n```\nconda env create -n binette -f binette.yaml\nconda activate binette \n```\n\nFinally install Binette with pip\n\n```\npip install .\n```\n\nBinette should be able to run :\n\n```\nbinette -h\n```\n\n\n### Downloading the CheckM2 database\n\nBefore using Binette, it is necessary to download the CheckM2 database:\n\n```bash\ncheckm2 database --download --path <checkm2/database/>\n```\n\nMake sure to replace `<checkm2/database/>` with the desired path where you want to store the CheckM2 database.\n\n\n## Usage \n\n### Input Formats\n\nBinette supports two input formats for bin sets: \n\n1. **Contig2bin Tables:** You can provide bin sets using contig2bin tables, which establish the relationship between each contig and its corresponding bin. In this format, you need to specify the `--contig2bin_tables` argument. \n\nFor example, consider the following two `contig2bin_tables`:\n\n- `bin_set1.tsv`:\n\n    ```tsv\n    contig_1   binA\n    contig_8   binA\n    contig_15  binB\n    contig_9   binC\n    ```\n    \n- `bin_set2.tsv`:\n\n    ```tsv\n    contig_1   bin.0\n    contig_8   bin.0\n    contig_15  bin.1\n    contig_9   bin.2\n    contig_10  bin.0\n    ```\n    \n    The `binette` command to process this input would be:\n    \n    ```bash\n    binette --contig2bin_tables bin_set1.tsv bin_set2.tsv --contigs assembly.fasta\n    ```\n\n2. **Bin Directories:** Alternatively, you can use bin directories, where each bin is represented by a separate FASTA file. For this format, you need to provide the `--bin_dirs` argument. Here's an example of two bin directories:\n\n    ```\n    bin_set1/\n    \u251c\u2500\u2500 binA.fa: contains sequences of contig_1, contig_8\n    \u251c\u2500\u2500 binB.fa: contains sequences of contig_15\n    \u2514\u2500\u2500 binC.fa: contains sequences of contig_9\n    ```\n    \n    ```\n    bin_set2/\n    \u251c\u2500\u2500 binA.fa: contains sequences of contig_1, contig_8, contig_10\n    \u251c\u2500\u2500 binB.fa: contains sequences of contig_15\n    \u2514\u2500\u2500 binC.fa: contains sequences of contig_9\n    ```\n    \n    The `binette` command to process this input would be:\n    \n    ```bash\n    binette --bin_dirs bin_set1 bin_set2 --contigs assembly.fasta\n    ```\n\nIn both formats, the `--contigs` argument should specify a FASTA file containing all the contigs found in the bins. Typically, this file would be the assembly FASTA file used to generate the bins. In these examples the `assembly.fasta` file should contain at least the five contigs mentioned in the `contig2bin_tables` files or in the bin fasta files: `contig_1`, `contig_8`, `contig_15`, `contig_9`, and `contig_10`.\n\n### Outputs\n\nBinette results are stored in the `results` directory. You can specify a different directory using the `--outdir` option.\n\nIn this directory you will find:\n- **`final_bins_quality_reports.tsv`**: This is a TSV (tab-separated values) file containing quality information about the final selected bins.\n- **`final_bins/`**: This directory stores all the selected bins in fasta format. Can be skipped with `--no-write-fasta-bins`.\n- **`final_contig_to_bin.tsv`**: A headerless TSV file mapping each contig to its assigned bin. This format is much lighter than the fasta output to describe the final Binette bins.\n- **`input_bins_quality_reports/`**: A directory storing quality reports for the input bin sets, with files following the same structure as `final_bins_quality_reports.tsv`.\n- **`temporary_files/`**: This directory contains intermediate files. If you choose to use the `--resume` option, Binette will utilize files in this directory to prevent the recomputation of time-consuming steps.\n\n\nThe `final_bins_quality_reports.tsv` file contains the following columns:\n| Column Name        | Description                                                                                                                                    |\n| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |\n| **name**           | The unique name of the bin.                                                                                                                    |\n| **origin**         | Indicates the source of the bin: either an original bin set (e.g., `B`) or `binette` for intermediate bins.                                    |\n| **is\\_original**   | Boolean flag indicating if the bin is an original bin (`True`) or an intermediate bin (`False`).                                               |\n| **original\\_name** | The name of the original bin from which this bin was derived.                                                                                  |\n| **completeness**   | The completeness of the bin, determined by CheckM2.                                                                                            |\n| **contamination**  | The contamination of the bin, determined by CheckM2.                                                                                           |\n| **checkm2\\_model** | The CheckM2 model used for quality prediction: `Gradient Boost (General Model)` or `Neural Network (Specific Model)`.|\n| **score**          | Computed score: `completeness - contamination * weight`. The contamination weight can be customized using the `--contamination_weight` option. |\n| **size**           | Total size of the bin in nucleotides.                                                                                                          |\n| **N50**            | The N50 of the bin, representing the length for which 50% of the total nucleotides are in contigs of that length or longer.                    |\n| **coding\\_density** | The percentage of the bin that codes for proteins (genes length / total bin length \u00d7 100). Only computed when genes are freshly identified. Empty when using `--proteins` or `--resume` options. |\n| **contig\\_count**  | Number of contigs contained within the bin.                                                                                                    |\n   \n\n\n## Help, feature requests and bug reporting\n\nTo report bugs, request new features, or seek help and support, please open an [issue](https://github.com/genotoul-bioinfo/Binette/issues). \n\n\n## Licence\n\nThis tool is released as open source software under the terms of the [GNU General Public Licence](LICENSE).\n\n## Citation  \n\nBinette is a scientific software tool with a [published paper](https://joss.theoj.org/papers/10.21105/joss.06782) in the [Journal of Open Source Software](https://joss.theoj.org/). If you use Binette in academic research, please cite:  \n\n> **Binette: a fast and accurate bin refinement tool to construct high-quality Metagenome Assembled Genomes.**  \n> Mainguy et al., (2024).    \n> *Journal of Open Source Software, 9(102), 6782.*    \n> doi: [10.21105/joss.06782](https://doi.org/10.21105/joss.06782)   \n\n\nBinette extensively uses **CheckM2**. If your work relies on Binette, consider citing:  \n\n> **CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning.**   \n> Chklovski, Alex, et al. (2023).  \n> *Nature Methods, 20(8), 1203-1212.*    \n> doi: [10.1038/s41592-023-01940-w](https://doi.org/10.1038/s41592-023-01940-w)\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Binette: accurate binning refinement tool to constructs high quality MAGs.",
    "version": "1.2.1",
    "project_urls": {
        "Documentation": "https://binette.readthedocs.io",
        "Repository": "https://github.com/genotoul-bioinfo/Binette"
    },
    "split_keywords": [
        "bioinformatics",
        " prokaryote",
        " binning",
        " refinement",
        " metagenomics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a950661c6dbba50a218eed9b30dbda8d194885bd10b31d8628a066138fc4c723",
                "md5": "b1dc05d098125a7b727266f1c0643985",
                "sha256": "a0446fc4fe34fcbb4dabeed1ba744e766381d0eef5a2fe95b44a54285775dbfd"
            },
            "downloads": -1,
            "filename": "binette-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1dc05d098125a7b727266f1c0643985",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 46282,
            "upload_time": "2025-11-07T10:30:04",
            "upload_time_iso_8601": "2025-11-07T10:30:04.154828Z",
            "url": "https://files.pythonhosted.org/packages/a9/50/661c6dbba50a218eed9b30dbda8d194885bd10b31d8628a066138fc4c723/binette-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f2031261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd",
                "md5": "86a16e9565a6618c56e1bf1c91a016fc",
                "sha256": "26acda9889a95497581503eee357c976477a0dcdc2174679fdd94bf68c8917b4"
            },
            "downloads": -1,
            "filename": "binette-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "86a16e9565a6618c56e1bf1c91a016fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 123146,
            "upload_time": "2025-11-07T10:30:05",
            "upload_time_iso_8601": "2025-11-07T10:30:05.344124Z",
            "url": "https://files.pythonhosted.org/packages/f2/03/1261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd/binette-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-07 10:30:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "genotoul-bioinfo",
    "github_project": "Binette",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "binette"
}
        
Elapsed time: 3.28020s