cloneArmy


NamecloneArmy JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryAnalyze haplotypes from Illumina paired-end amplicon sequencing
upload_time2024-11-24 04:02:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CloneArmy

CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.

## Features

- Fast paired-end read processing using BWA-MEM
- Quality-based filtering of bases and alignments
- Haplotype identification and frequency analysis
- Statistical comparison between samples with FDR correction
- Interactive visualization of mutation frequencies
- Rich command-line interface with progress tracking
- Comprehensive HTML reports
- Multi-threading support
- Support for full-length sequence analysis

## Installation

```bash
pip install cloneArmy
```

### Requirements

- Python ≥ 3.8
- BWA (must be installed and available in PATH)
- Samtools (must be installed and available in PATH)

## Usage

### Command Line Interface

#### Basic Analysis

```bash
# Basic usage
cloneArmy run /path/to/fastq/directory reference.fasta

# With all options
cloneArmy run /path/to/fastq/directory reference.fasta \
    --threads 8 \
    --output results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --no-report  # Skip HTML report generation
```

#### Comparative Analysis

```bash
# Compare two samples
cloneArmy compare \
    /path/to/sample1/fastq \
    /path/to/sample2/fastq \
    reference.fasta \
    --threads 8 \
    --output comparison_results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --full-length-only  # Only consider full-length sequences
```

### Python API

```python
from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis

# Initialize processor
processor = AmpliconProcessor(
    reference_path="reference.fasta",
    min_base_quality=20,
    min_mapping_quality=30
)

# Process samples
results1 = processor.process_sample(
    fastq_r1="sample1_R1.fastq.gz",
    fastq_r2="sample1_R2.fastq.gz",
    output_dir="results/sample1",
    threads=4
)

results2 = processor.process_sample(
    fastq_r1="sample2_R1.fastq.gz",
    fastq_r2="sample2_R2.fastq.gz",
    output_dir="results/sample2",
    threads=4
)

# Perform comparative analysis
comparison_results = run_comparative_analysis(
    results1={"sample1": results1},
    results2={"sample2": results2},
    reference_seq="ATCG...",  # Reference sequence string
    output_path="comparison_results.csv",
    full_length_only=False
)

# Results are returned as pandas DataFrames
print(results1)  # Sample 1 haplotypes
print(comparison_results)  # Statistical comparison
```

## Output Files

### Single Sample Analysis
- Sorted BAM file with alignments
- CSV file containing haplotype information:
  - Sequence
  - Read count
  - Frequency
  - Number of mutations
  - Full-length status
- Interactive HTML report (optional)
- Console output with summary statistics

### Comparative Analysis
- CSV file with statistical comparisons:
  - Mutation positions and types
  - Frequencies in each sample
  - Statistical significance (p-values)
  - FDR-corrected p-values
- Interactive HTML plot showing mutation frequency differences
- Console output with significant mutations

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cloneArmy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Jason D Limberis <Jason.Limberis@ucsf.edu>",
    "download_url": "https://files.pythonhosted.org/packages/aa/5a/7e77eeecc894f2abd3aba243139ee31e056bb56f550b73a069ebdec0f2dc/clonearmy-0.2.0.tar.gz",
    "platform": null,
    "description": "# CloneArmy\n\nCloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.\n\n## Features\n\n- Fast paired-end read processing using BWA-MEM\n- Quality-based filtering of bases and alignments\n- Haplotype identification and frequency analysis\n- Statistical comparison between samples with FDR correction\n- Interactive visualization of mutation frequencies\n- Rich command-line interface with progress tracking\n- Comprehensive HTML reports\n- Multi-threading support\n- Support for full-length sequence analysis\n\n## Installation\n\n```bash\npip install cloneArmy\n```\n\n### Requirements\n\n- Python \u2265 3.8\n- BWA (must be installed and available in PATH)\n- Samtools (must be installed and available in PATH)\n\n## Usage\n\n### Command Line Interface\n\n#### Basic Analysis\n\n```bash\n# Basic usage\ncloneArmy run /path/to/fastq/directory reference.fasta\n\n# With all options\ncloneArmy run /path/to/fastq/directory reference.fasta \\\n    --threads 8 \\\n    --output results \\\n    --min-base-quality 20 \\\n    --min-mapping-quality 30 \\\n    --no-report  # Skip HTML report generation\n```\n\n#### Comparative Analysis\n\n```bash\n# Compare two samples\ncloneArmy compare \\\n    /path/to/sample1/fastq \\\n    /path/to/sample2/fastq \\\n    reference.fasta \\\n    --threads 8 \\\n    --output comparison_results \\\n    --min-base-quality 20 \\\n    --min-mapping-quality 30 \\\n    --full-length-only  # Only consider full-length sequences\n```\n\n### Python API\n\n```python\nfrom pathlib import Path\nfrom clone_army.processor import AmpliconProcessor\nfrom clone_army.comparison import run_comparative_analysis\n\n# Initialize processor\nprocessor = AmpliconProcessor(\n    reference_path=\"reference.fasta\",\n    min_base_quality=20,\n    min_mapping_quality=30\n)\n\n# Process samples\nresults1 = processor.process_sample(\n    fastq_r1=\"sample1_R1.fastq.gz\",\n    fastq_r2=\"sample1_R2.fastq.gz\",\n    output_dir=\"results/sample1\",\n    threads=4\n)\n\nresults2 = processor.process_sample(\n    fastq_r1=\"sample2_R1.fastq.gz\",\n    fastq_r2=\"sample2_R2.fastq.gz\",\n    output_dir=\"results/sample2\",\n    threads=4\n)\n\n# Perform comparative analysis\ncomparison_results = run_comparative_analysis(\n    results1={\"sample1\": results1},\n    results2={\"sample2\": results2},\n    reference_seq=\"ATCG...\",  # Reference sequence string\n    output_path=\"comparison_results.csv\",\n    full_length_only=False\n)\n\n# Results are returned as pandas DataFrames\nprint(results1)  # Sample 1 haplotypes\nprint(comparison_results)  # Statistical comparison\n```\n\n## Output Files\n\n### Single Sample Analysis\n- Sorted BAM file with alignments\n- CSV file containing haplotype information:\n  - Sequence\n  - Read count\n  - Frequency\n  - Number of mutations\n  - Full-length status\n- Interactive HTML report (optional)\n- Console output with summary statistics\n\n### Comparative Analysis\n- CSV file with statistical comparisons:\n  - Mutation positions and types\n  - Frequencies in each sample\n  - Statistical significance (p-values)\n  - FDR-corrected p-values\n- Interactive HTML plot showing mutation frequency differences\n- Console output with significant mutations\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Analyze haplotypes from Illumina paired-end amplicon sequencing",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa5a7e77eeecc894f2abd3aba243139ee31e056bb56f550b73a069ebdec0f2dc",
                "md5": "923a9b6acdcaa97057300020e333e3ae",
                "sha256": "220f37d88e07b90fb7bcd9277138a28fadcac89c06b86145ce9b6e6e40a26e27"
            },
            "downloads": -1,
            "filename": "clonearmy-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "923a9b6acdcaa97057300020e333e3ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 15158,
            "upload_time": "2024-11-24T04:02:50",
            "upload_time_iso_8601": "2024-11-24T04:02:50.136715Z",
            "url": "https://files.pythonhosted.org/packages/aa/5a/7e77eeecc894f2abd3aba243139ee31e056bb56f550b73a069ebdec0f2dc/clonearmy-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-24 04:02:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "clonearmy"
}
        
Elapsed time: 0.82743s