# extracTR
## Introduction
extracTR is a tool for identifying and analyzing tandem repeats in genomic sequences. It works with raw sequencing data (FASTQ) or assembled genomes (FASTA), using k-mer based approaches to detect repetitive patterns efficiently.
## Features
- Efficient tandem repeat detection from raw sequencing data
- Support for single-end and paired-end FASTQ files
- Support for genome assemblies in FASTA format
- Customizable parameters for fine-tuning repeat detection
- Output in easy-to-analyze CSV format
- Multi-threaded processing for improved performance
## Requirements
- Python 3.7 or later
- Jellyfish 2.3.0 or later
- Conda (for easy environment management)
## Installation
We recommend installing extracTR in a separate Conda environment to manage dependencies effectively.
1. Create a new Conda environment:
```bash
conda create -n extractr_env python=3.9
```
2. Activate the environment:
```bash
conda activate extractr_env
```
3. Install Jellyfish:
```bash
conda install -c bioconda jellyfish
```
4. Install extracTR using pip:
```bash
pip install extracTR
```
To deactivate the environment when you're done:
```bash
conda deactivate
```
## Usage
Before running extracTR, ensure that you have removed adapters from your sequencing reads and activated the Conda environment:
```bash
conda activate extractr_env
```
Basic usage:
For paired-end FASTQ files:
```bash
extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -c 30
```
For single-end FASTQ file:
```bash
extracTR -1 reads.fastq -o output_prefix -c 30
```
For genome assembly in FASTA format:
```bash
extracTR -f genome.fasta -o output_prefix -c 30
```
Advanced usage with custom parameters:
```bash
extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -t 64 -c 30 -k 25
```
Options:
- `-1, --fastq1`: Input file with forward DNA sequences in FASTQ format
- `-2, --fastq2`: Input file with reverse DNA sequences in FASTQ format (optional for paired-end data)
- `-f, --fasta`: Input genome assembly in FASTA format
- `-o, --output`: Prefix for output files
- `-t, --threads`: Number of threads to use (default: 32)
- `-c, --coverage`: Coverage to use for indexing (required)
- `-k, --k`: K-mer size to use for indexing (default: 23)
- `--lu`: Coverage cutoff for k-mers (default: 100 * coverage)
Note: You must provide either FASTQ file(s) or a FASTA file as input.
## Output
extracTR generates the following output files:
- `{output_prefix}.csv`: Main output file containing detected tandem repeats
- `{output_prefix}.sdat`: Intermediate file with k-mer frequency data
- Additional files for detailed analysis and debugging
Raw data
{
"_id": null,
"home_page": "https://github.com/aglabx/extracTR",
"name": "extracTR",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Aleksey Komissarov",
"author_email": "ad3002@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b7/2a/1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726/extracTR-0.2.18.tar.gz",
"platform": null,
"description": "# extracTR\n\n## Introduction\n\nextracTR is a tool for identifying and analyzing tandem repeats in genomic sequences. It works with raw sequencing data (FASTQ) or assembled genomes (FASTA), using k-mer based approaches to detect repetitive patterns efficiently.\n\n## Features\n\n- Efficient tandem repeat detection from raw sequencing data\n- Support for single-end and paired-end FASTQ files\n- Support for genome assemblies in FASTA format\n- Customizable parameters for fine-tuning repeat detection\n- Output in easy-to-analyze CSV format\n- Multi-threaded processing for improved performance\n\n## Requirements\n\n- Python 3.7 or later\n- Jellyfish 2.3.0 or later\n- Conda (for easy environment management)\n\n## Installation\n\nWe recommend installing extracTR in a separate Conda environment to manage dependencies effectively.\n\n1. Create a new Conda environment:\n\n```bash\nconda create -n extractr_env python=3.9\n```\n\n2. Activate the environment:\n\n```bash\nconda activate extractr_env\n```\n\n3. Install Jellyfish:\n\n```bash\nconda install -c bioconda jellyfish\n```\n\n4. Install extracTR using pip:\n\n```bash\npip install extracTR\n```\n\nTo deactivate the environment when you're done:\n\n```bash\nconda deactivate\n```\n\n## Usage\n\nBefore running extracTR, ensure that you have removed adapters from your sequencing reads and activated the Conda environment:\n\n```bash\nconda activate extractr_env\n```\n\nBasic usage:\n\nFor paired-end FASTQ files:\n```bash\nextracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -c 30\n```\n\nFor single-end FASTQ file:\n```bash\nextracTR -1 reads.fastq -o output_prefix -c 30\n```\n\nFor genome assembly in FASTA format:\n```bash\nextracTR -f genome.fasta -o output_prefix -c 30\n```\n\nAdvanced usage with custom parameters:\n\n```bash\nextracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -t 64 -c 30 -k 25\n```\n\nOptions:\n- `-1, --fastq1`: Input file with forward DNA sequences in FASTQ format\n- `-2, --fastq2`: Input file with reverse DNA sequences in FASTQ format (optional for paired-end data)\n- `-f, --fasta`: Input genome assembly in FASTA format\n- `-o, --output`: Prefix for output files\n- `-t, --threads`: Number of threads to use (default: 32)\n- `-c, --coverage`: Coverage to use for indexing (required)\n- `-k, --k`: K-mer size to use for indexing (default: 23)\n- `--lu`: Coverage cutoff for k-mers (default: 100 * coverage)\n\nNote: You must provide either FASTQ file(s) or a FASTA file as input.\n\n## Output\n\nextracTR generates the following output files:\n- `{output_prefix}.csv`: Main output file containing detected tandem repeats\n- `{output_prefix}.sdat`: Intermediate file with k-mer frequency data\n- Additional files for detailed analysis and debugging\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "Extract and analyze satellite DNA from raw sequences.",
"version": "0.2.18",
"project_urls": {
"Homepage": "https://github.com/aglabx/extracTR"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b72a1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726",
"md5": "1b2b970e78ec0fe90767fd0b38200f58",
"sha256": "4e925842f5b01fef8835ce5fbb9897cc267943cc747481ac552d62fc5da15765"
},
"downloads": -1,
"filename": "extracTR-0.2.18.tar.gz",
"has_sig": false,
"md5_digest": "1b2b970e78ec0fe90767fd0b38200f58",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 15664,
"upload_time": "2024-12-04T12:53:39",
"upload_time_iso_8601": "2024-12-04T12:53:39.826722Z",
"url": "https://files.pythonhosted.org/packages/b7/2a/1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726/extracTR-0.2.18.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-04 12:53:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aglabx",
"github_project": "extracTR",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "tqdm",
"specs": []
},
{
"name": "intervaltree",
"specs": []
},
{
"name": "aindex2",
"specs": []
}
],
"lcname": "extractr"
}