URAdime


NameURAdime JSON
Version 0.2.4 PyPI version JSON
download
home_pageNone
SummaryUniversal Read Analysis of DIMErs
upload_time2025-01-01 15:25:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseGPL-3.0
keywords pcr primers dna sequencing analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Python Tests](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml/badge.svg)](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml)

# URAdime

URAdime (Universal Read Analysis of DIMErs) is a Python package for analyzing primer sequences in sequencing data to identify dimers and chimeras.

## Installation

```bash
pip install uradime
```

## Usage

URAdime can be used both as a command-line tool and as a Python package.

### Command Line Interface

```bash
# Basic usage
uradime -b input.bam -p primers.tsv -o results/my_analysis

# Full options
uradime \
    -b input.bam \                    # Input BAM file
    -p primers.tsv \                  # Primer file (tab-separated)
    -o results/my_analysis \          # Output prefix
    -t 8 \                            # Number of threads
    -m 1000 \                         # Maximum reads to process (0 for all)
    -c 100 \                          # Chunk size for parallel processing
    -u \                              # Process only unaligned reads
    --max-distance 2 \                # Maximum Levenshtein distance for matching
    --unaligned-only \                # only check the unaligned reads  
    --window-size 20 \                # Allowed padding on the 5' ends of the reads, sometime needs to be very big due to universal tails etc. setting this parameter too large can cause unexpected results
    --ignore-amplicon-size \          # Usefull if short read sequecing like Illumina where the paired read length is not the size of the actual amplicon
    --check-termini \                 # Turn off check for partial matches at read termini
    --terminus-length 14 \            # Length of terminus to check for partial matches
    --overlap-threshold 0.8 \         # Minimum fraction of overlap required to consider primers as overlapping (0.0-1.0), this is added for hissPCR support
    --downsample 5.0 \                # Percentage of reads to randomly sample from the BAM file (0.1-100.0)
    --filtered-bam filtered.bam \     # Output BAM file containing only correctly matched and sized reads
    -v                                # Verbose output
```



### Python Package

```python
from uradime import bam_to_fasta_parallel, create_analysis_summary, load_primers, parallel_analysis_pipeline

# Basic usage
result_df = bam_to_fasta_parallel(
    bam_path="your_file.bam",
    primer_file="primers.tsv",
    num_threads=4
)

# Advanced usage with all parameters
result_df = bam_to_fasta_parallel(
    bam_path="your_file.bam",
    primer_file="primers.tsv",
    window_size=20,              # Allowed padding on 5' ends
    unaligned_only=False,        # Process only unaligned reads
    max_reads=200,               # Maximum reads to process (0 for all)
    num_threads=4,               # Number of threads
    chunk_size=50,               # Reads per chunk for parallel processing
    downsample_percentage=100.0, # Percentage of reads to analyze
    max_distance=2,              # Maximum Levenshtein distance for matching
    overlap_threshold=0.8        # Minimum primer overlap fraction
)

# Load primers for analysis
primers_df, _ = load_primers("primers.tsv")

# Create analysis summary
summary_df, matched_pairs, mismatched_pairs = create_analysis_summary(
    result_df,
    primers_df,
    ignore_amplicon_size=False,  # Ignore amplicon size checks
    debug=False,                 # Print debug information
    size_tolerance=0.10          # Size tolerance as fraction of expected size
)

# Complete analysis pipeline
results = parallel_analysis_pipeline(
    bam_path="your_file.bam",
    primer_file="primers.tsv",
    window_size=20,
    num_threads=4,
    max_reads=200,
    chunk_size=50,
    ignore_amplicon_size=False,
    max_distance=2,
    downsample_percentage=100.0,
    unaligned_only=False,
    debug=False,
    size_tolerance=0.10,
    overlap_threshold=0.8
)

# Access pipeline results
result_df = results['results']           # Complete analysis results
summary_df = results['summary']          # Analysis summary
matched_pairs = results['matched_pairs'] # Reads with matching primer pairs
mismatched_pairs = results['mismatched_pairs'] # Reads with mismatched primers
```

## Input Files

### Primer File Format (TSV)
The primer file should be tab-separated with the following columns:
- Name: Primer pair name
- Forward: Forward primer sequence
- Reverse: Reverse primer sequence
- Size: Expected amplicon size

Example:
```
Name    Forward             Reverse             Size
Pair1   ATCGATCGATCG       TAGCTAGCTAGC       100
Pair2   GCTAGCTAGCTA       CGATTCGATCGA       150
```

## Output Files

The tool generates several CSV files with the analysis results:
- `*_summary.csv`: Overall analysis summary
- `*_matched_pairs.csv`: Reads with matching primer pairs
- `*_mismatched_pairs.csv`: Reads with mismatched primer pairs
- `*_wrong_size_pairs.csv`: Reads with correct primer pairs but wrong size


## Requirements

- Python ≥3.7
- pysam
- pandas
- biopython
- python-Levenshtein
- tqdm
- numpy

## License

This project is licensed under GNU GPL.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "URAdime",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "PCR, primers, DNA, sequencing, analysis",
    "author": null,
    "author_email": "Jason D Limberis <Jason.Limberis@ucsf.edu>",
    "download_url": "https://files.pythonhosted.org/packages/8d/f7/9ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8/uradime-0.2.4.tar.gz",
    "platform": null,
    "description": "[![Python Tests](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml/badge.svg)](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml)\n\n# URAdime\n\nURAdime (Universal Read Analysis of DIMErs) is a Python package for analyzing primer sequences in sequencing data to identify dimers and chimeras.\n\n## Installation\n\n```bash\npip install uradime\n```\n\n## Usage\n\nURAdime can be used both as a command-line tool and as a Python package.\n\n### Command Line Interface\n\n```bash\n# Basic usage\nuradime -b input.bam -p primers.tsv -o results/my_analysis\n\n# Full options\nuradime \\\n    -b input.bam \\                    # Input BAM file\n    -p primers.tsv \\                  # Primer file (tab-separated)\n    -o results/my_analysis \\          # Output prefix\n    -t 8 \\                            # Number of threads\n    -m 1000 \\                         # Maximum reads to process (0 for all)\n    -c 100 \\                          # Chunk size for parallel processing\n    -u \\                              # Process only unaligned reads\n    --max-distance 2 \\                # Maximum Levenshtein distance for matching\n    --unaligned-only \\                # only check the unaligned reads  \n    --window-size 20 \\                # Allowed padding on the 5' ends of the reads, sometime needs to be very big due to universal tails etc. setting this parameter too large can cause unexpected results\n    --ignore-amplicon-size \\          # Usefull if short read sequecing like Illumina where the paired read length is not the size of the actual amplicon\n    --check-termini \\                 # Turn off check for partial matches at read termini\n    --terminus-length 14 \\            # Length of terminus to check for partial matches\n    --overlap-threshold 0.8 \\         # Minimum fraction of overlap required to consider primers as overlapping (0.0-1.0), this is added for hissPCR support\n    --downsample 5.0 \\                # Percentage of reads to randomly sample from the BAM file (0.1-100.0)\n    --filtered-bam filtered.bam \\     # Output BAM file containing only correctly matched and sized reads\n    -v                                # Verbose output\n```\n\n\n\n### Python Package\n\n```python\nfrom uradime import bam_to_fasta_parallel, create_analysis_summary, load_primers, parallel_analysis_pipeline\n\n# Basic usage\nresult_df = bam_to_fasta_parallel(\n    bam_path=\"your_file.bam\",\n    primer_file=\"primers.tsv\",\n    num_threads=4\n)\n\n# Advanced usage with all parameters\nresult_df = bam_to_fasta_parallel(\n    bam_path=\"your_file.bam\",\n    primer_file=\"primers.tsv\",\n    window_size=20,              # Allowed padding on 5' ends\n    unaligned_only=False,        # Process only unaligned reads\n    max_reads=200,               # Maximum reads to process (0 for all)\n    num_threads=4,               # Number of threads\n    chunk_size=50,               # Reads per chunk for parallel processing\n    downsample_percentage=100.0, # Percentage of reads to analyze\n    max_distance=2,              # Maximum Levenshtein distance for matching\n    overlap_threshold=0.8        # Minimum primer overlap fraction\n)\n\n# Load primers for analysis\nprimers_df, _ = load_primers(\"primers.tsv\")\n\n# Create analysis summary\nsummary_df, matched_pairs, mismatched_pairs = create_analysis_summary(\n    result_df,\n    primers_df,\n    ignore_amplicon_size=False,  # Ignore amplicon size checks\n    debug=False,                 # Print debug information\n    size_tolerance=0.10          # Size tolerance as fraction of expected size\n)\n\n# Complete analysis pipeline\nresults = parallel_analysis_pipeline(\n    bam_path=\"your_file.bam\",\n    primer_file=\"primers.tsv\",\n    window_size=20,\n    num_threads=4,\n    max_reads=200,\n    chunk_size=50,\n    ignore_amplicon_size=False,\n    max_distance=2,\n    downsample_percentage=100.0,\n    unaligned_only=False,\n    debug=False,\n    size_tolerance=0.10,\n    overlap_threshold=0.8\n)\n\n# Access pipeline results\nresult_df = results['results']           # Complete analysis results\nsummary_df = results['summary']          # Analysis summary\nmatched_pairs = results['matched_pairs'] # Reads with matching primer pairs\nmismatched_pairs = results['mismatched_pairs'] # Reads with mismatched primers\n```\n\n## Input Files\n\n### Primer File Format (TSV)\nThe primer file should be tab-separated with the following columns:\n- Name: Primer pair name\n- Forward: Forward primer sequence\n- Reverse: Reverse primer sequence\n- Size: Expected amplicon size\n\nExample:\n```\nName    Forward             Reverse             Size\nPair1   ATCGATCGATCG       TAGCTAGCTAGC       100\nPair2   GCTAGCTAGCTA       CGATTCGATCGA       150\n```\n\n## Output Files\n\nThe tool generates several CSV files with the analysis results:\n- `*_summary.csv`: Overall analysis summary\n- `*_matched_pairs.csv`: Reads with matching primer pairs\n- `*_mismatched_pairs.csv`: Reads with mismatched primer pairs\n- `*_wrong_size_pairs.csv`: Reads with correct primer pairs but wrong size\n\n\n## Requirements\n\n- Python \u22653.7\n- pysam\n- pandas\n- biopython\n- python-Levenshtein\n- tqdm\n- numpy\n\n## License\n\nThis project is licensed under GNU GPL.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "Universal Read Analysis of DIMErs",
    "version": "0.2.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/SemiQuant/URAdime/issues",
        "Homepage": "https://github.com/SemiQuant/URAdime"
    },
    "split_keywords": [
        "pcr",
        " primers",
        " dna",
        " sequencing",
        " analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8df79ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8",
                "md5": "722fe34b45218107c63ac625cd60a582",
                "sha256": "4e8fc7e0c1036caea0335962e4502369f19280ec71fe2d591c881af441a24ec3"
            },
            "downloads": -1,
            "filename": "uradime-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "722fe34b45218107c63ac625cd60a582",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 19256,
            "upload_time": "2025-01-01T15:25:21",
            "upload_time_iso_8601": "2025-01-01T15:25:21.632097Z",
            "url": "https://files.pythonhosted.org/packages/8d/f7/9ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8/uradime-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-01 15:25:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SemiQuant",
    "github_project": "URAdime",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "uradime"
}
        
Elapsed time: 0.46789s