remag

Name	remag JSON
Version	0.2.3 JSON
	download
home_page	None
Summary	Recovery of high-quality eukaryotic genomes from complex metagenomes
upload_time	2025-08-20 09:45:19
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	metagenomics binning neural networks contrastive learning bioinformatics
VCS
bugtrack_url
requirements	numpy pandas pysam rich-click torch loguru scikit-learn tqdm xgboost joblib psutil leidenalg igraph
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # REMAG

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16443991.svg)](https://doi.org/10.5281/zenodo.16443991)

**RE**covery of eukaryotic genomes using contrastive learning. A specialized metagenomic binning tool designed for recovering high-quality eukaryotic genomes from mixed prokaryotic-eukaryotic samples.

## Quick Start

### Option 1: Using Conda (Recommended - handles all dependencies)
```bash
# Create environment and install everything
conda create -n remag -c bioconda -c conda-forge remag miniprot
conda activate remag

# Run REMAG
remag -f contigs.fasta -c alignments.bam -o output_directory
```

### Option 2: Using Docker (No local installation needed)
```bash
docker run --rm -v $(pwd):/data danielzmbp/remag:latest \
  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output
```

### Option 3: Using pip
```bash
# Create environment first
conda create -n remag python=3.9
conda activate remag

# Install dependencies and REMAG
conda install -c bioconda miniprot
pip install remag

# Run REMAG
remag -f contigs.fasta -c alignments.bam -o output_directory
```

## Installation

### Recommended: Conda Installation

This is the easiest method as conda handles all dependencies automatically:

```bash
# Create a new environment with all dependencies
conda create -n remag -c bioconda -c conda-forge remag miniprot
conda activate remag

# Verify installation
remag --help
```

### Alternative: PyPI Installation

If you prefer pip, you'll need to install the external dependency separately:

```bash
# Step 1: Create and activate environment
conda create -n remag python=3.9
conda activate remag

# Step 2: Install external dependency
conda install -c bioconda miniprot

# Step 3: Install REMAG from PyPI
pip install remag
```

### Advanced Conda Setup

For additional features:

```bash
# Basic installation
conda create -n remag -c bioconda remag miniprot
conda activate remag

# Add optional plotting capabilities
conda install -c conda-forge matplotlib umap-learn
```

**Note**: Bioconda installs only the core dependencies. Optional features (plotting, GPU acceleration) must be installed separately using conda or pip extras.

### Using Docker

```bash
# Pull and run the latest version
docker run --rm -v $(pwd):/data danielzmbp/remag:latest \
  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output

# Or use a specific version
docker run --rm -v $(pwd):/data danielzmbp/remag:0.2.2 \
  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output

# For interactive use
docker run -it --rm -v $(pwd):/data danielzmbp/remag:latest /bin/bash
```

### Using Singularity

```bash
# Pull and run the latest version directly
singularity run docker://danielzmbp/remag:latest \
  -f contigs.fasta -c alignments.bam -o output_directory

# Build Singularity image from Docker Hub
singularity build remag_v0.2.2.sif docker://danielzmbp/remag:v0.2.2

# Or build latest version
singularity build remag_latest.sif docker://danielzmbp/remag:latest

# Run with Singularity
singularity run --bind $(pwd):/data remag_v0.2.2.sif \
  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output

# Or use exec for direct command execution
singularity exec --bind $(pwd):/data remag_v0.2.2.sif \
  remag -f /data/contigs.fasta -c /data/alignments.bam -o /data/output

# For interactive shell
singularity shell --bind $(pwd):/data remag_v0.2.2.sif

# Build a local Singularity image file (optional)
singularity build remag.sif docker://danielzmbp/remag:latest
singularity run remag.sif -f contigs.fasta -c alignments.bam -o output_directory
```

### From source

```bash
# Create and activate conda environment
conda create -n remag python=3.9
conda activate remag

# Clone and install
git clone https://github.com/danielzmbp/remag.git
cd remag
pip install .
```

### Development installation

For contributors and developers:

```bash
# Install with development dependencies
pip install -e ".[dev]"
```

### Optional Features Installation

For visualization capabilities:

```bash
# Install with plotting dependencies
pip install "remag[plotting]"
```


## Usage

### Command line interface

After installation, you can use REMAG via the command line:

```bash
remag -f contigs.fasta -c alignments.bam -o output_directory
```

### Python module mode

```bash
python -m remag -f contigs.fasta -c alignments.bam -o output_directory
```

## How REMAG Works

REMAG uses a sophisticated multi-stage pipeline specifically designed for eukaryotic genome recovery:

1. **Bacterial Pre-filtering**: By default, REMAG automatically filters out bacterial contigs using the integrated 4CAC classifier (can be disabled with `--skip-bacterial-filter`)
2. **Feature Extraction**: Combines k-mer composition (4-mers) with coverage profiles across multiple samples. Large contigs are split into overlapping fragments for augmentation during training
3. **Contrastive Learning**: Trains a Siamese neural network using the Barlow Twins self-supervised loss function. This creates embeddings where fragments from the same contig are close together
4. **Clustering**: Graph-based Leiden clustering on the learned contig embeddings to form bins
5. **Quality Assessment**: Uses miniprot to align bins against a database of eukaryotic core genes to detect contamination
6. **Iterative Refinement**: Automatically splits contaminated bins based on core gene duplications to improve bin quality

## Key Features

- **Automatic Bacterial Filtering**: The 4CAC classifier automatically identifies and removes bacterial sequences before binning
- **Multi-Sample Support**: Can process coverage information from multiple samples (BAM/CRAM files) simultaneously
- **Barlow Twins Loss**: Uses a self-supervised contrastive learning approach that doesn't require negative pairs
- **Fragment Augmentation**: Large contigs are split into multiple overlapping fragments during training to improve representation learning

## Options

```
  -f, --fasta PATH                Input FASTA file with contigs to bin. Can be gzipped.  [required]
  -c, --coverage PATH             Coverage files for calculation. Supports BAM, CRAM (indexed), and TSV formats. Auto-detects format by extension. Each file represents one sample. Supports space-separated paths and glob patterns (e.g., "*.bam", "*.cram", "*.tsv"). Use quotes around glob patterns.
  -o, --output PATH               Output directory for results.  [required]
  --epochs INTEGER RANGE          Training epochs for neural network.  [default: 400; 20<=x<=2000]
  --batch-size INTEGER RANGE      Batch size for training.  [default: 2048; 16<=x<=8192]
  --embedding-dim INTEGER RANGE   Embedding dimension for contrastive learning.  [default: 256; 64<=x<=512]
  --base-learning-rate FLOAT RANGE
                                  Base learning rate for contrastive learning training (scaled by batch size).  [default: 0.008; 0.00001<=x<=0.1]
  --min-cluster-size INTEGER RANGE
                                  Minimum number of contigs required to form a cluster/bin.  [default: 2; 2<=x<=100]
  --leiden-resolution FLOAT       Resolution parameter for Leiden clustering (higher = more clusters).  [default: 1.0; 0.1<=x<=5.0]
  --leiden-k-neighbors INTEGER    Number of nearest neighbors for k-NN graph construction in Leiden clustering.  [default: 15; 5<=x<=100]
  --leiden-similarity-threshold FLOAT
                                  Minimum cosine similarity threshold for k-NN graph edges in Leiden clustering.  [default: 0.1; 0.0<=x<=1.0]
  --min-contig-length INTEGER RANGE
                                  Minimum contig length in base pairs for binning consideration.  [default: 1000; 500<=x<=10000]
  --max-positive-pairs INTEGER RANGE
                                  Maximum number of positive pairs for contrastive learning training.  [default: 5000000; 100000<=x<=10000000]
  -t, --threads INTEGER RANGE     Number of CPU cores to use for parallel processing.  [default: 8; 1<=x<=64]
  --min-bin-size INTEGER RANGE    Minimum total bin size in base pairs for output.  [default: 100000; 50000<=x<=10000000]
  -v, --verbose                   Enable verbose logging.
  --skip-bacterial-filter         Skip bacterial contig filtering (4CAC classifier + contrastive learning).
  --skip-refinement               Skip bin refinement.
  --max-refinement-rounds INTEGER RANGE
                                  Maximum refinement rounds.  [default: 2; 1<=x<=10]
  --num-augmentations INTEGER RANGE
                                  Number of random fragments per contig.  [default: 8; 1<=x<=32]
  --keep-intermediate             Keep intermediate files (training fragments, etc.).
  -h, --help                      Show this message and exit.
```

## Output

REMAG produces several output files:

### Core output files (always created):
- `bins/`: Directory containing FASTA files for each bin
- `bins.csv`: Final contig-to-bin assignments
- `remag.log`: Detailed log file
- `*_non_bacterial_filtered.fasta`: Filtered FASTA file with bacterial contigs removed (when bacterial filtering is enabled)

### Additional files (with `--keep-intermediate` option):
- `embeddings.csv`: Contig embeddings from the neural network
- `siamese_model.pt`: Trained Siamese neural network model
- `params.json`: Complete run parameters for reproducibility
- `features.csv`: Extracted k-mer and coverage features
- `fragments.pkl`: Fragment information used during training
- `classification_results.csv`: 4CAC bacterial classification results
- `refinement_summary.json`: Summary of the bin refinement process
- `gene_mappings_cache.json`: Cached gene-to-contig mappings for faster refinement
- `core_gene_duplication_results.json`: Core gene duplication analysis from refinement
- `temp_miniprot/`: Temporary directory for miniprot alignments (removed unless --keep-intermediate)

### Visualization (optional, requires plotting dependencies):
To generate UMAP visualization plots:

```bash
# Install plotting dependencies if not already installed
pip install remag[plotting]

# Generate UMAP visualization from embeddings
python scripts/plot_features.py --features output_directory/embeddings.csv --clusters output_directory/bins.csv --output output_directory
```

This creates:
- `umap_coordinates.csv`: UMAP projections for visualization
- `umap_plot.pdf`: UMAP visualization plot with cluster assignments


## Requirements

### Core dependencies (always installed):
- Python 3.8+
- PyTorch (≥1.11.0)
- scikit-learn (≥1.0.0)
- XGBoost (≥1.6.0) - for 4CAC classifier
- leidenalg (≥0.9.0) - for graph-based clustering
- igraph (≥0.10.0) - for graph construction in Leiden clustering
- pandas (≥1.3.0)
- numpy (≥1.21.0)
- pysam (≥0.18.0)
- loguru (≥0.6.0)
- tqdm (≥4.62.0)
- rich-click (≥1.5.0)
- joblib (≥1.1.0)
- psutil (≥5.8.0)

### Optional dependencies:
- **For visualization**: matplotlib (≥3.5.0), umap-learn (≥0.5.0)
  - Install with: `pip install remag[plotting]`

The package includes a pre-trained 4CAC classifier model for bacterial contig filtering. The 4CAC classifier code and models are adapted from the [Shamir-Lab/4CAC repository](https://github.com/Shamir-Lab/4CAC).

## Acknowledgments

The integrated 4CAC classifier (`xgbclass` module) is adapted from the work by Shamir Lab:

- **Repository**: [Shamir-Lab/4CAC](https://github.com/Shamir-Lab/4CAC)
- **Paper**: Pu L, Shamir R. 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs. Nucleic Acids Res. 2024;52(19):e94–e94.
   

## License

MIT License - see LICENSE file for details.

## Citation

If you use REMAG in your research, please cite:

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16443991.svg)](https://doi.org/10.5281/zenodo.16443991)

```bibtex
@software{gomez_perez_2025_remag,
  author       = {Gómez-Pérez, Daniel},
  title        = {REMAG: Recovering high-quality Eukaryotic genomes from complex metagenomes},
  year         = 2025,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.16443991},
  url          = {https://doi.org/10.5281/zenodo.16443991}
}
```

Note: The DOI 10.5281/zenodo.16443991 represents all versions and will always resolve to the latest release. A manuscript describing REMAG is in preparation.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "remag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Daniel G\u00f3mez-P\u00e9rez <daniel.gomez-perez@earlham.ac.uk>",
    "keywords": "metagenomics, binning, neural networks, contrastive learning, bioinformatics",
    "author": null,
    "author_email": "Daniel G\u00f3mez-P\u00e9rez <daniel.gomez-perez@earlham.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/9f/69/13066d0a0f1337d54bad1c0a7ebc6bb719c1f7d92568862b7f2dfbcd22df/remag-0.2.3.tar.gz",
    "platform": null,
    "description": "# REMAG\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16443991.svg)](https://doi.org/10.5281/zenodo.16443991)\n\n**RE**covery of eukaryotic genomes using contrastive learning. A specialized metagenomic binning tool designed for recovering high-quality eukaryotic genomes from mixed prokaryotic-eukaryotic samples.\n\n## Quick Start\n\n### Option 1: Using Conda (Recommended - handles all dependencies)\n```bash\n# Create environment and install everything\nconda create -n remag -c bioconda -c conda-forge remag miniprot\nconda activate remag\n\n# Run REMAG\nremag -f contigs.fasta -c alignments.bam -o output_directory\n```\n\n### Option 2: Using Docker (No local installation needed)\n```bash\ndocker run --rm -v $(pwd):/data danielzmbp/remag:latest \\\n  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output\n```\n\n### Option 3: Using pip\n```bash\n# Create environment first\nconda create -n remag python=3.9\nconda activate remag\n\n# Install dependencies and REMAG\nconda install -c bioconda miniprot\npip install remag\n\n# Run REMAG\nremag -f contigs.fasta -c alignments.bam -o output_directory\n```\n\n## Installation\n\n### Recommended: Conda Installation\n\nThis is the easiest method as conda handles all dependencies automatically:\n\n```bash\n# Create a new environment with all dependencies\nconda create -n remag -c bioconda -c conda-forge remag miniprot\nconda activate remag\n\n# Verify installation\nremag --help\n```\n\n### Alternative: PyPI Installation\n\nIf you prefer pip, you'll need to install the external dependency separately:\n\n```bash\n# Step 1: Create and activate environment\nconda create -n remag python=3.9\nconda activate remag\n\n# Step 2: Install external dependency\nconda install -c bioconda miniprot\n\n# Step 3: Install REMAG from PyPI\npip install remag\n```\n\n### Advanced Conda Setup\n\nFor additional features:\n\n```bash\n# Basic installation\nconda create -n remag -c bioconda remag miniprot\nconda activate remag\n\n# Add optional plotting capabilities\nconda install -c conda-forge matplotlib umap-learn\n```\n\n**Note**: Bioconda installs only the core dependencies. Optional features (plotting, GPU acceleration) must be installed separately using conda or pip extras.\n\n### Using Docker\n\n```bash\n# Pull and run the latest version\ndocker run --rm -v $(pwd):/data danielzmbp/remag:latest \\\n  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output\n\n# Or use a specific version\ndocker run --rm -v $(pwd):/data danielzmbp/remag:0.2.2 \\\n  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output\n\n# For interactive use\ndocker run -it --rm -v $(pwd):/data danielzmbp/remag:latest /bin/bash\n```\n\n### Using Singularity\n\n```bash\n# Pull and run the latest version directly\nsingularity run docker://danielzmbp/remag:latest \\\n  -f contigs.fasta -c alignments.bam -o output_directory\n\n# Build Singularity image from Docker Hub\nsingularity build remag_v0.2.2.sif docker://danielzmbp/remag:v0.2.2\n\n# Or build latest version\nsingularity build remag_latest.sif docker://danielzmbp/remag:latest\n\n# Run with Singularity\nsingularity run --bind $(pwd):/data remag_v0.2.2.sif \\\n  -f /data/contigs.fasta -c /data/alignments.bam -o /data/output\n\n# Or use exec for direct command execution\nsingularity exec --bind $(pwd):/data remag_v0.2.2.sif \\\n  remag -f /data/contigs.fasta -c /data/alignments.bam -o /data/output\n\n# For interactive shell\nsingularity shell --bind $(pwd):/data remag_v0.2.2.sif\n\n# Build a local Singularity image file (optional)\nsingularity build remag.sif docker://danielzmbp/remag:latest\nsingularity run remag.sif -f contigs.fasta -c alignments.bam -o output_directory\n```\n\n### From source\n\n```bash\n# Create and activate conda environment\nconda create -n remag python=3.9\nconda activate remag\n\n# Clone and install\ngit clone https://github.com/danielzmbp/remag.git\ncd remag\npip install .\n```\n\n### Development installation\n\nFor contributors and developers:\n\n```bash\n# Install with development dependencies\npip install -e \".[dev]\"\n```\n\n### Optional Features Installation\n\nFor visualization capabilities:\n\n```bash\n# Install with plotting dependencies\npip install \"remag[plotting]\"\n```\n\n\n## Usage\n\n### Command line interface\n\nAfter installation, you can use REMAG via the command line:\n\n```bash\nremag -f contigs.fasta -c alignments.bam -o output_directory\n```\n\n### Python module mode\n\n```bash\npython -m remag -f contigs.fasta -c alignments.bam -o output_directory\n```\n\n## How REMAG Works\n\nREMAG uses a sophisticated multi-stage pipeline specifically designed for eukaryotic genome recovery:\n\n1. **Bacterial Pre-filtering**: By default, REMAG automatically filters out bacterial contigs using the integrated 4CAC classifier (can be disabled with `--skip-bacterial-filter`)\n2. **Feature Extraction**: Combines k-mer composition (4-mers) with coverage profiles across multiple samples. Large contigs are split into overlapping fragments for augmentation during training\n3. **Contrastive Learning**: Trains a Siamese neural network using the Barlow Twins self-supervised loss function. This creates embeddings where fragments from the same contig are close together\n4. **Clustering**: Graph-based Leiden clustering on the learned contig embeddings to form bins\n5. **Quality Assessment**: Uses miniprot to align bins against a database of eukaryotic core genes to detect contamination\n6. **Iterative Refinement**: Automatically splits contaminated bins based on core gene duplications to improve bin quality\n\n## Key Features\n\n- **Automatic Bacterial Filtering**: The 4CAC classifier automatically identifies and removes bacterial sequences before binning\n- **Multi-Sample Support**: Can process coverage information from multiple samples (BAM/CRAM files) simultaneously\n- **Barlow Twins Loss**: Uses a self-supervised contrastive learning approach that doesn't require negative pairs\n- **Fragment Augmentation**: Large contigs are split into multiple overlapping fragments during training to improve representation learning\n\n## Options\n\n```\n  -f, --fasta PATH                Input FASTA file with contigs to bin. Can be gzipped.  [required]\n  -c, --coverage PATH             Coverage files for calculation. Supports BAM, CRAM (indexed), and TSV formats. Auto-detects format by extension. Each file represents one sample. Supports space-separated paths and glob patterns (e.g., \"*.bam\", \"*.cram\", \"*.tsv\"). Use quotes around glob patterns.\n  -o, --output PATH               Output directory for results.  [required]\n  --epochs INTEGER RANGE          Training epochs for neural network.  [default: 400; 20<=x<=2000]\n  --batch-size INTEGER RANGE      Batch size for training.  [default: 2048; 16<=x<=8192]\n  --embedding-dim INTEGER RANGE   Embedding dimension for contrastive learning.  [default: 256; 64<=x<=512]\n  --base-learning-rate FLOAT RANGE\n                                  Base learning rate for contrastive learning training (scaled by batch size).  [default: 0.008; 0.00001<=x<=0.1]\n  --min-cluster-size INTEGER RANGE\n                                  Minimum number of contigs required to form a cluster/bin.  [default: 2; 2<=x<=100]\n  --leiden-resolution FLOAT       Resolution parameter for Leiden clustering (higher = more clusters).  [default: 1.0; 0.1<=x<=5.0]\n  --leiden-k-neighbors INTEGER    Number of nearest neighbors for k-NN graph construction in Leiden clustering.  [default: 15; 5<=x<=100]\n  --leiden-similarity-threshold FLOAT\n                                  Minimum cosine similarity threshold for k-NN graph edges in Leiden clustering.  [default: 0.1; 0.0<=x<=1.0]\n  --min-contig-length INTEGER RANGE\n                                  Minimum contig length in base pairs for binning consideration.  [default: 1000; 500<=x<=10000]\n  --max-positive-pairs INTEGER RANGE\n                                  Maximum number of positive pairs for contrastive learning training.  [default: 5000000; 100000<=x<=10000000]\n  -t, --threads INTEGER RANGE     Number of CPU cores to use for parallel processing.  [default: 8; 1<=x<=64]\n  --min-bin-size INTEGER RANGE    Minimum total bin size in base pairs for output.  [default: 100000; 50000<=x<=10000000]\n  -v, --verbose                   Enable verbose logging.\n  --skip-bacterial-filter         Skip bacterial contig filtering (4CAC classifier + contrastive learning).\n  --skip-refinement               Skip bin refinement.\n  --max-refinement-rounds INTEGER RANGE\n                                  Maximum refinement rounds.  [default: 2; 1<=x<=10]\n  --num-augmentations INTEGER RANGE\n                                  Number of random fragments per contig.  [default: 8; 1<=x<=32]\n  --keep-intermediate             Keep intermediate files (training fragments, etc.).\n  -h, --help                      Show this message and exit.\n```\n\n## Output\n\nREMAG produces several output files:\n\n### Core output files (always created):\n- `bins/`: Directory containing FASTA files for each bin\n- `bins.csv`: Final contig-to-bin assignments\n- `remag.log`: Detailed log file\n- `*_non_bacterial_filtered.fasta`: Filtered FASTA file with bacterial contigs removed (when bacterial filtering is enabled)\n\n### Additional files (with `--keep-intermediate` option):\n- `embeddings.csv`: Contig embeddings from the neural network\n- `siamese_model.pt`: Trained Siamese neural network model\n- `params.json`: Complete run parameters for reproducibility\n- `features.csv`: Extracted k-mer and coverage features\n- `fragments.pkl`: Fragment information used during training\n- `classification_results.csv`: 4CAC bacterial classification results\n- `refinement_summary.json`: Summary of the bin refinement process\n- `gene_mappings_cache.json`: Cached gene-to-contig mappings for faster refinement\n- `core_gene_duplication_results.json`: Core gene duplication analysis from refinement\n- `temp_miniprot/`: Temporary directory for miniprot alignments (removed unless --keep-intermediate)\n\n### Visualization (optional, requires plotting dependencies):\nTo generate UMAP visualization plots:\n\n```bash\n# Install plotting dependencies if not already installed\npip install remag[plotting]\n\n# Generate UMAP visualization from embeddings\npython scripts/plot_features.py --features output_directory/embeddings.csv --clusters output_directory/bins.csv --output output_directory\n```\n\nThis creates:\n- `umap_coordinates.csv`: UMAP projections for visualization\n- `umap_plot.pdf`: UMAP visualization plot with cluster assignments\n\n\n## Requirements\n\n### Core dependencies (always installed):\n- Python 3.8+\n- PyTorch (\u22651.11.0)\n- scikit-learn (\u22651.0.0)\n- XGBoost (\u22651.6.0) - for 4CAC classifier\n- leidenalg (\u22650.9.0) - for graph-based clustering\n- igraph (\u22650.10.0) - for graph construction in Leiden clustering\n- pandas (\u22651.3.0)\n- numpy (\u22651.21.0)\n- pysam (\u22650.18.0)\n- loguru (\u22650.6.0)\n- tqdm (\u22654.62.0)\n- rich-click (\u22651.5.0)\n- joblib (\u22651.1.0)\n- psutil (\u22655.8.0)\n\n### Optional dependencies:\n- **For visualization**: matplotlib (\u22653.5.0), umap-learn (\u22650.5.0)\n  - Install with: `pip install remag[plotting]`\n\nThe package includes a pre-trained 4CAC classifier model for bacterial contig filtering. The 4CAC classifier code and models are adapted from the [Shamir-Lab/4CAC repository](https://github.com/Shamir-Lab/4CAC).\n\n## Acknowledgments\n\nThe integrated 4CAC classifier (`xgbclass` module) is adapted from the work by Shamir Lab:\n\n- **Repository**: [Shamir-Lab/4CAC](https://github.com/Shamir-Lab/4CAC)\n- **Paper**: Pu L, Shamir R. 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs. Nucleic Acids Res. 2024;52(19):e94\u2013e94.\n   \n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Citation\n\nIf you use REMAG in your research, please cite:\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16443991.svg)](https://doi.org/10.5281/zenodo.16443991)\n\n```bibtex\n@software{gomez_perez_2025_remag,\n  author       = {G\u00f3mez-P\u00e9rez, Daniel},\n  title        = {REMAG: Recovering high-quality Eukaryotic genomes from complex metagenomes},\n  year         = 2025,\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.16443991},\n  url          = {https://doi.org/10.5281/zenodo.16443991}\n}\n```\n\nNote: The DOI 10.5281/zenodo.16443991 represents all versions and will always resolve to the latest release. A manuscript describing REMAG is in preparation.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Recovery of high-quality eukaryotic genomes from complex metagenomes",
    "version": "0.2.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/danielzmbp/remag/issues",
        "Documentation": "https://github.com/danielzmbp/remag",
        "Homepage": "https://github.com/danielzmbp/remag",
        "Repository": "https://github.com/danielzmbp/remag"
    },
    "split_keywords": [
        "metagenomics",
        " binning",
        " neural networks",
        " contrastive learning",
        " bioinformatics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a2dae806622c2265b51367babb255e5dc4ea19331ef8856821b1ab970bdde816",
                "md5": "da24aa7603efa1c8b92f5726673c18e3",
                "sha256": "e89672d9d79dd1a8a139f955e70180ed58ebb2df405132a68a9cb1112c71106c"
            },
            "downloads": -1,
            "filename": "remag-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da24aa7603efa1c8b92f5726673c18e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 75414775,
            "upload_time": "2025-08-20T09:45:15",
            "upload_time_iso_8601": "2025-08-20T09:45:15.662701Z",
            "url": "https://files.pythonhosted.org/packages/a2/da/e806622c2265b51367babb255e5dc4ea19331ef8856821b1ab970bdde816/remag-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f6913066d0a0f1337d54bad1c0a7ebc6bb719c1f7d92568862b7f2dfbcd22df",
                "md5": "426a73b4b4318102f194ddf4e7d0d214",
                "sha256": "36f0912301883afca8bf082c182130e832912e2f624eb79b60e74dc48e02cb6c"
            },
            "downloads": -1,
            "filename": "remag-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "426a73b4b4318102f194ddf4e7d0d214",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 74963100,
            "upload_time": "2025-08-20T09:45:19",
            "upload_time_iso_8601": "2025-08-20T09:45:19.983877Z",
            "url": "https://files.pythonhosted.org/packages/9f/69/13066d0a0f1337d54bad1c0a7ebc6bb719c1f7d92568862b7f2dfbcd22df/remag-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-20 09:45:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "danielzmbp",
    "github_project": "remag",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "pysam",
            "specs": [
                [
                    ">=",
                    "0.18.0"
                ]
            ]
        },
        {
            "name": "rich-click",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.11.0"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    ">=",
                    "0.6.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.62.0"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    ">=",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.8.0"
                ]
            ]
        },
        {
            "name": "leidenalg",
            "specs": [
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "igraph",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ]
            ]
        }
    ],
    "lcname": "remag"
}

None