extracTR


NameextracTR JSON
Version 0.2.18 PyPI version JSON
download
home_pagehttps://github.com/aglabx/extracTR
SummaryExtract and analyze satellite DNA from raw sequences.
upload_time2024-12-04 12:53:39
maintainerNone
docs_urlNone
authorAleksey Komissarov
requires_python>=3.6
licenseBSD
keywords
VCS
bugtrack_url
requirements tqdm intervaltree aindex2
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # extracTR

## Introduction

extracTR is a tool for identifying and analyzing tandem repeats in genomic sequences. It works with raw sequencing data (FASTQ) or assembled genomes (FASTA), using k-mer based approaches to detect repetitive patterns efficiently.

## Features

- Efficient tandem repeat detection from raw sequencing data
- Support for single-end and paired-end FASTQ files
- Support for genome assemblies in FASTA format
- Customizable parameters for fine-tuning repeat detection
- Output in easy-to-analyze CSV format
- Multi-threaded processing for improved performance

## Requirements

- Python 3.7 or later
- Jellyfish 2.3.0 or later
- Conda (for easy environment management)

## Installation

We recommend installing extracTR in a separate Conda environment to manage dependencies effectively.

1. Create a new Conda environment:

```bash
conda create -n extractr_env python=3.9
```

2. Activate the environment:

```bash
conda activate extractr_env
```

3. Install Jellyfish:

```bash
conda install -c bioconda jellyfish
```

4. Install extracTR using pip:

```bash
pip install extracTR
```

To deactivate the environment when you're done:

```bash
conda deactivate
```

## Usage

Before running extracTR, ensure that you have removed adapters from your sequencing reads and activated the Conda environment:

```bash
conda activate extractr_env
```

Basic usage:

For paired-end FASTQ files:
```bash
extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -c 30
```

For single-end FASTQ file:
```bash
extracTR -1 reads.fastq -o output_prefix -c 30
```

For genome assembly in FASTA format:
```bash
extracTR -f genome.fasta -o output_prefix -c 30
```

Advanced usage with custom parameters:

```bash
extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -t 64 -c 30 -k 25
```

Options:
- `-1, --fastq1`: Input file with forward DNA sequences in FASTQ format
- `-2, --fastq2`: Input file with reverse DNA sequences in FASTQ format (optional for paired-end data)
- `-f, --fasta`: Input genome assembly in FASTA format
- `-o, --output`: Prefix for output files
- `-t, --threads`: Number of threads to use (default: 32)
- `-c, --coverage`: Coverage to use for indexing (required)
- `-k, --k`: K-mer size to use for indexing (default: 23)
- `--lu`: Coverage cutoff for k-mers (default: 100 * coverage)

Note: You must provide either FASTQ file(s) or a FASTA file as input.

## Output

extracTR generates the following output files:
- `{output_prefix}.csv`: Main output file containing detected tandem repeats
- `{output_prefix}.sdat`: Intermediate file with k-mer frequency data
- Additional files for detailed analysis and debugging

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aglabx/extracTR",
    "name": "extracTR",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Aleksey Komissarov",
    "author_email": "ad3002@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/2a/1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726/extracTR-0.2.18.tar.gz",
    "platform": null,
    "description": "# extracTR\n\n## Introduction\n\nextracTR is a tool for identifying and analyzing tandem repeats in genomic sequences. It works with raw sequencing data (FASTQ) or assembled genomes (FASTA), using k-mer based approaches to detect repetitive patterns efficiently.\n\n## Features\n\n- Efficient tandem repeat detection from raw sequencing data\n- Support for single-end and paired-end FASTQ files\n- Support for genome assemblies in FASTA format\n- Customizable parameters for fine-tuning repeat detection\n- Output in easy-to-analyze CSV format\n- Multi-threaded processing for improved performance\n\n## Requirements\n\n- Python 3.7 or later\n- Jellyfish 2.3.0 or later\n- Conda (for easy environment management)\n\n## Installation\n\nWe recommend installing extracTR in a separate Conda environment to manage dependencies effectively.\n\n1. Create a new Conda environment:\n\n```bash\nconda create -n extractr_env python=3.9\n```\n\n2. Activate the environment:\n\n```bash\nconda activate extractr_env\n```\n\n3. Install Jellyfish:\n\n```bash\nconda install -c bioconda jellyfish\n```\n\n4. Install extracTR using pip:\n\n```bash\npip install extracTR\n```\n\nTo deactivate the environment when you're done:\n\n```bash\nconda deactivate\n```\n\n## Usage\n\nBefore running extracTR, ensure that you have removed adapters from your sequencing reads and activated the Conda environment:\n\n```bash\nconda activate extractr_env\n```\n\nBasic usage:\n\nFor paired-end FASTQ files:\n```bash\nextracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -c 30\n```\n\nFor single-end FASTQ file:\n```bash\nextracTR -1 reads.fastq -o output_prefix -c 30\n```\n\nFor genome assembly in FASTA format:\n```bash\nextracTR -f genome.fasta -o output_prefix -c 30\n```\n\nAdvanced usage with custom parameters:\n\n```bash\nextracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -t 64 -c 30 -k 25\n```\n\nOptions:\n- `-1, --fastq1`: Input file with forward DNA sequences in FASTQ format\n- `-2, --fastq2`: Input file with reverse DNA sequences in FASTQ format (optional for paired-end data)\n- `-f, --fasta`: Input genome assembly in FASTA format\n- `-o, --output`: Prefix for output files\n- `-t, --threads`: Number of threads to use (default: 32)\n- `-c, --coverage`: Coverage to use for indexing (required)\n- `-k, --k`: K-mer size to use for indexing (default: 23)\n- `--lu`: Coverage cutoff for k-mers (default: 100 * coverage)\n\nNote: You must provide either FASTQ file(s) or a FASTA file as input.\n\n## Output\n\nextracTR generates the following output files:\n- `{output_prefix}.csv`: Main output file containing detected tandem repeats\n- `{output_prefix}.sdat`: Intermediate file with k-mer frequency data\n- Additional files for detailed analysis and debugging\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Extract and analyze satellite DNA from raw sequences.",
    "version": "0.2.18",
    "project_urls": {
        "Homepage": "https://github.com/aglabx/extracTR"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b72a1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726",
                "md5": "1b2b970e78ec0fe90767fd0b38200f58",
                "sha256": "4e925842f5b01fef8835ce5fbb9897cc267943cc747481ac552d62fc5da15765"
            },
            "downloads": -1,
            "filename": "extracTR-0.2.18.tar.gz",
            "has_sig": false,
            "md5_digest": "1b2b970e78ec0fe90767fd0b38200f58",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15664,
            "upload_time": "2024-12-04T12:53:39",
            "upload_time_iso_8601": "2024-12-04T12:53:39.826722Z",
            "url": "https://files.pythonhosted.org/packages/b7/2a/1d31d09a0e7a54d072d1c25fc721c945c5c50ee73b61fe09f7b1817c4726/extracTR-0.2.18.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-04 12:53:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aglabx",
    "github_project": "extracTR",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "intervaltree",
            "specs": []
        },
        {
            "name": "aindex2",
            "specs": []
        }
    ],
    "lcname": "extractr"
}
        
Elapsed time: 0.80683s