Name | URAdime JSON |
Version |
0.2.4
JSON |
| download |
home_page | None |
Summary | Universal Read Analysis of DIMErs |
upload_time | 2025-01-01 15:25:21 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | GPL-3.0 |
keywords |
pcr
primers
dna
sequencing
analysis
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[![Python Tests](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml/badge.svg)](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml)
# URAdime
URAdime (Universal Read Analysis of DIMErs) is a Python package for analyzing primer sequences in sequencing data to identify dimers and chimeras.
## Installation
```bash
pip install uradime
```
## Usage
URAdime can be used both as a command-line tool and as a Python package.
### Command Line Interface
```bash
# Basic usage
uradime -b input.bam -p primers.tsv -o results/my_analysis
# Full options
uradime \
-b input.bam \ # Input BAM file
-p primers.tsv \ # Primer file (tab-separated)
-o results/my_analysis \ # Output prefix
-t 8 \ # Number of threads
-m 1000 \ # Maximum reads to process (0 for all)
-c 100 \ # Chunk size for parallel processing
-u \ # Process only unaligned reads
--max-distance 2 \ # Maximum Levenshtein distance for matching
--unaligned-only \ # only check the unaligned reads
--window-size 20 \ # Allowed padding on the 5' ends of the reads, sometime needs to be very big due to universal tails etc. setting this parameter too large can cause unexpected results
--ignore-amplicon-size \ # Usefull if short read sequecing like Illumina where the paired read length is not the size of the actual amplicon
--check-termini \ # Turn off check for partial matches at read termini
--terminus-length 14 \ # Length of terminus to check for partial matches
--overlap-threshold 0.8 \ # Minimum fraction of overlap required to consider primers as overlapping (0.0-1.0), this is added for hissPCR support
--downsample 5.0 \ # Percentage of reads to randomly sample from the BAM file (0.1-100.0)
--filtered-bam filtered.bam \ # Output BAM file containing only correctly matched and sized reads
-v # Verbose output
```
### Python Package
```python
from uradime import bam_to_fasta_parallel, create_analysis_summary, load_primers, parallel_analysis_pipeline
# Basic usage
result_df = bam_to_fasta_parallel(
bam_path="your_file.bam",
primer_file="primers.tsv",
num_threads=4
)
# Advanced usage with all parameters
result_df = bam_to_fasta_parallel(
bam_path="your_file.bam",
primer_file="primers.tsv",
window_size=20, # Allowed padding on 5' ends
unaligned_only=False, # Process only unaligned reads
max_reads=200, # Maximum reads to process (0 for all)
num_threads=4, # Number of threads
chunk_size=50, # Reads per chunk for parallel processing
downsample_percentage=100.0, # Percentage of reads to analyze
max_distance=2, # Maximum Levenshtein distance for matching
overlap_threshold=0.8 # Minimum primer overlap fraction
)
# Load primers for analysis
primers_df, _ = load_primers("primers.tsv")
# Create analysis summary
summary_df, matched_pairs, mismatched_pairs = create_analysis_summary(
result_df,
primers_df,
ignore_amplicon_size=False, # Ignore amplicon size checks
debug=False, # Print debug information
size_tolerance=0.10 # Size tolerance as fraction of expected size
)
# Complete analysis pipeline
results = parallel_analysis_pipeline(
bam_path="your_file.bam",
primer_file="primers.tsv",
window_size=20,
num_threads=4,
max_reads=200,
chunk_size=50,
ignore_amplicon_size=False,
max_distance=2,
downsample_percentage=100.0,
unaligned_only=False,
debug=False,
size_tolerance=0.10,
overlap_threshold=0.8
)
# Access pipeline results
result_df = results['results'] # Complete analysis results
summary_df = results['summary'] # Analysis summary
matched_pairs = results['matched_pairs'] # Reads with matching primer pairs
mismatched_pairs = results['mismatched_pairs'] # Reads with mismatched primers
```
## Input Files
### Primer File Format (TSV)
The primer file should be tab-separated with the following columns:
- Name: Primer pair name
- Forward: Forward primer sequence
- Reverse: Reverse primer sequence
- Size: Expected amplicon size
Example:
```
Name Forward Reverse Size
Pair1 ATCGATCGATCG TAGCTAGCTAGC 100
Pair2 GCTAGCTAGCTA CGATTCGATCGA 150
```
## Output Files
The tool generates several CSV files with the analysis results:
- `*_summary.csv`: Overall analysis summary
- `*_matched_pairs.csv`: Reads with matching primer pairs
- `*_mismatched_pairs.csv`: Reads with mismatched primer pairs
- `*_wrong_size_pairs.csv`: Reads with correct primer pairs but wrong size
## Requirements
- Python ≥3.7
- pysam
- pandas
- biopython
- python-Levenshtein
- tqdm
- numpy
## License
This project is licensed under GNU GPL.
Raw data
{
"_id": null,
"home_page": null,
"name": "URAdime",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "PCR, primers, DNA, sequencing, analysis",
"author": null,
"author_email": "Jason D Limberis <Jason.Limberis@ucsf.edu>",
"download_url": "https://files.pythonhosted.org/packages/8d/f7/9ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8/uradime-0.2.4.tar.gz",
"platform": null,
"description": "[![Python Tests](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml/badge.svg)](https://github.com/SemiQuant/URAdime/actions/workflows/python-app.yml)\n\n# URAdime\n\nURAdime (Universal Read Analysis of DIMErs) is a Python package for analyzing primer sequences in sequencing data to identify dimers and chimeras.\n\n## Installation\n\n```bash\npip install uradime\n```\n\n## Usage\n\nURAdime can be used both as a command-line tool and as a Python package.\n\n### Command Line Interface\n\n```bash\n# Basic usage\nuradime -b input.bam -p primers.tsv -o results/my_analysis\n\n# Full options\nuradime \\\n -b input.bam \\ # Input BAM file\n -p primers.tsv \\ # Primer file (tab-separated)\n -o results/my_analysis \\ # Output prefix\n -t 8 \\ # Number of threads\n -m 1000 \\ # Maximum reads to process (0 for all)\n -c 100 \\ # Chunk size for parallel processing\n -u \\ # Process only unaligned reads\n --max-distance 2 \\ # Maximum Levenshtein distance for matching\n --unaligned-only \\ # only check the unaligned reads \n --window-size 20 \\ # Allowed padding on the 5' ends of the reads, sometime needs to be very big due to universal tails etc. setting this parameter too large can cause unexpected results\n --ignore-amplicon-size \\ # Usefull if short read sequecing like Illumina where the paired read length is not the size of the actual amplicon\n --check-termini \\ # Turn off check for partial matches at read termini\n --terminus-length 14 \\ # Length of terminus to check for partial matches\n --overlap-threshold 0.8 \\ # Minimum fraction of overlap required to consider primers as overlapping (0.0-1.0), this is added for hissPCR support\n --downsample 5.0 \\ # Percentage of reads to randomly sample from the BAM file (0.1-100.0)\n --filtered-bam filtered.bam \\ # Output BAM file containing only correctly matched and sized reads\n -v # Verbose output\n```\n\n\n\n### Python Package\n\n```python\nfrom uradime import bam_to_fasta_parallel, create_analysis_summary, load_primers, parallel_analysis_pipeline\n\n# Basic usage\nresult_df = bam_to_fasta_parallel(\n bam_path=\"your_file.bam\",\n primer_file=\"primers.tsv\",\n num_threads=4\n)\n\n# Advanced usage with all parameters\nresult_df = bam_to_fasta_parallel(\n bam_path=\"your_file.bam\",\n primer_file=\"primers.tsv\",\n window_size=20, # Allowed padding on 5' ends\n unaligned_only=False, # Process only unaligned reads\n max_reads=200, # Maximum reads to process (0 for all)\n num_threads=4, # Number of threads\n chunk_size=50, # Reads per chunk for parallel processing\n downsample_percentage=100.0, # Percentage of reads to analyze\n max_distance=2, # Maximum Levenshtein distance for matching\n overlap_threshold=0.8 # Minimum primer overlap fraction\n)\n\n# Load primers for analysis\nprimers_df, _ = load_primers(\"primers.tsv\")\n\n# Create analysis summary\nsummary_df, matched_pairs, mismatched_pairs = create_analysis_summary(\n result_df,\n primers_df,\n ignore_amplicon_size=False, # Ignore amplicon size checks\n debug=False, # Print debug information\n size_tolerance=0.10 # Size tolerance as fraction of expected size\n)\n\n# Complete analysis pipeline\nresults = parallel_analysis_pipeline(\n bam_path=\"your_file.bam\",\n primer_file=\"primers.tsv\",\n window_size=20,\n num_threads=4,\n max_reads=200,\n chunk_size=50,\n ignore_amplicon_size=False,\n max_distance=2,\n downsample_percentage=100.0,\n unaligned_only=False,\n debug=False,\n size_tolerance=0.10,\n overlap_threshold=0.8\n)\n\n# Access pipeline results\nresult_df = results['results'] # Complete analysis results\nsummary_df = results['summary'] # Analysis summary\nmatched_pairs = results['matched_pairs'] # Reads with matching primer pairs\nmismatched_pairs = results['mismatched_pairs'] # Reads with mismatched primers\n```\n\n## Input Files\n\n### Primer File Format (TSV)\nThe primer file should be tab-separated with the following columns:\n- Name: Primer pair name\n- Forward: Forward primer sequence\n- Reverse: Reverse primer sequence\n- Size: Expected amplicon size\n\nExample:\n```\nName Forward Reverse Size\nPair1 ATCGATCGATCG TAGCTAGCTAGC 100\nPair2 GCTAGCTAGCTA CGATTCGATCGA 150\n```\n\n## Output Files\n\nThe tool generates several CSV files with the analysis results:\n- `*_summary.csv`: Overall analysis summary\n- `*_matched_pairs.csv`: Reads with matching primer pairs\n- `*_mismatched_pairs.csv`: Reads with mismatched primer pairs\n- `*_wrong_size_pairs.csv`: Reads with correct primer pairs but wrong size\n\n\n## Requirements\n\n- Python \u22653.7\n- pysam\n- pandas\n- biopython\n- python-Levenshtein\n- tqdm\n- numpy\n\n## License\n\nThis project is licensed under GNU GPL.\n",
"bugtrack_url": null,
"license": "GPL-3.0",
"summary": "Universal Read Analysis of DIMErs",
"version": "0.2.4",
"project_urls": {
"Bug Tracker": "https://github.com/SemiQuant/URAdime/issues",
"Homepage": "https://github.com/SemiQuant/URAdime"
},
"split_keywords": [
"pcr",
" primers",
" dna",
" sequencing",
" analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8df79ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8",
"md5": "722fe34b45218107c63ac625cd60a582",
"sha256": "4e8fc7e0c1036caea0335962e4502369f19280ec71fe2d591c881af441a24ec3"
},
"downloads": -1,
"filename": "uradime-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "722fe34b45218107c63ac625cd60a582",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 19256,
"upload_time": "2025-01-01T15:25:21",
"upload_time_iso_8601": "2025-01-01T15:25:21.632097Z",
"url": "https://files.pythonhosted.org/packages/8d/f7/9ec8ac770988fe743c53ca1042024ed2865ab848dee8df998810bd9ed2f8/uradime-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-01 15:25:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SemiQuant",
"github_project": "URAdime",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "uradime"
}