ENATool


NameENATool JSON
Version 2.0.0 PyPI version JSON
download
home_pageNone
SummaryComprehensive tool for downloading and managing ENA sequencing data
upload_time2025-10-25 16:06:29
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT
keywords bioinformatics sequencing ena fastq download metadata genomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ENATool 🧬

[![PyPI version](https://badge.fury.io/py/ENATool.svg)](https://badge.fury.io/py/ENATool)
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A comprehensive Python package for downloading and managing sequencing data from the European Nucleotide Archive (ENA) in terminal and through Python interface.

## ✨ Features

- 📊 **Extract Metadata** - Get comprehensive sample information from ENA projects
- 📥 **Download FASTQ Files** - Automated download with progress tracking
- 🔄 **Auto Fallback** - Automatically tries NCBI if ENA metadata unavailable
- 📈 **Progress Bars** - Real-time progress for downloads and metadata retrieval
- 📋 **Interactive Reports** - Generate searchable HTML tables with DataTables.js
- 💾 **Export to CSV** - Save metadata in standard formats
- 🔍 **Smart Verification** - Check fastq file integrity and skip existing files
- 💻 **Command line and Python interface**

## 🚀 Quick Start

### Installation

```bash
# Install from PyPI
pip install ENATool
```

### Basic Usage in Terminal

```bash
# Custom output directory
enatool download PRJNA335681 --path data/my_project
```

### Basic Usage in Python

```python
import ENATool

# Fetch metadata AND download files in one command
info, downloads = ENATool.fetch('PRJNA335681', path='data/my_project', download=True)
```
## 📊 Example Output Files

ENATool creates organized output:

```
my_project/
├── PRJNA335681.csv              # Sample metadata
├── PRJNA335681.html             # Interactive table
├── downoad_info_table.csv       # Download tracking
└── raw_reads/                  # Downloaded FASTQ files
    ├── SRR123456/
    │   ├── SRR123456_1.fastq.gz
    │   └── SRR123456_2.fastq.gz
    └── SRR123457/
        └── SRR123457.fastq.gz
```

## 🔧 Requirements

- Python >= 3.7
- pandas >= 1.3.0
- numpy >= 1.20.0
- requests >= 2.25.0
- xmltodict >= 0.12.0
- tqdm >= 4.60.0
- lxml >= 4.6.0

## 📖 Documentation

- [Use ENATool in Terminal](#use-enatool-in-terminal)
   - [Fetching metadata](#fetching-metadata)
   - [Download reads and fetch metadata](#download-reads-and-fetch-metadata)
   - [Show project summary](#show-project-summary-stdout)
   - [Redownload corrupted files or download selected files only](#redownload-corrupted-files-or-download-only-selected-files)
   - [Leave files with incorrect md5 checksum](#leave-files-with-incorrect-md5-checksum)
   - [Process multiple projects](#process-multiple-projects)
   - [Hide banner](#hide-banner)
   - [Disable progress bar](#disable-progress-bar)
- [Use ENATool in Python](#use-enatool-in-python)
   - [Fetch Metadata](#fetch-metadata)
   - [Download FASTQ Files](#download-fastq-files)
   - [Download only a subset of samples](#download-only-a-subset-of-samples)
   - [Leave files with incorrect md5 checksum](#leave-files-with-incorrect-md5-checksum-1)
   - [Disable progress bar](#disable-progress-bar-1)
   - [Work with multiple datasets](#work-with-multiple-datasets)
   - [Python API Reference](#python-api-reference)
- [Citation](#-citation)

## Use ENATool in Terminal

### Fetching Metadata

Download metadata for all samples in an ENA project using `enatool fetch`.

**Syntax:**
```bash
enatool fetch PROJECT_ID [--path DIR]
```

**Arguments:**
- `PROJECT_ID` (required): ENA project accession (e.g., PRJNA335681)
- `--path DIR` or `-p DIR`: Output directory (default: PROJECT_ID)

**What it does:**
- Downloads sample metadata from ENA
- Tries NCBI BioSample as fallback if ENA fails
- Creates CSV file with all metadata
- Generates interactive HTML report
- Shows progress bars

**Output files:**
- `PROJECT_ID.csv` - Metadata in CSV format
- `PROJECT_ID.html` - Interactive HTML table

**Examples:**

```bash
# Basic usage - saves to PRJNA335681/
enatool fetch PRJNA335681

# Custom output directory
enatool fetch PRJNA335681 --path data/my_project
```

### Download Reads and Fetch Metadata
Download metadata for all samples in an ENA project and download sample files using using `enatool download`.

**Syntax:**
```bash
enatool download PROJECT_ID [--path DIR]
```

**Arguments:**
- `PROJECT_ID` (required): ENA project accession
- `--path DIR` or `-p DIR`: Output directory (default: PROJECT_ID)

**What it does:**
- Downloads metadata (same as `fetch`)
- Downloads all FASTQ files for all samples
- Uses enaDataGet tool
- Skips files that already exist
- Tracks download status

**Output files:**
- `PROJECT_ID.csv` - Metadata
- `PROJECT_ID.html` - Interactive table
- `downoad_info_table.csv` - Download tracking
- `raw_reads/` - Directory with FASTQ files
  - `SRR123456/` - One directory per run
    - `SRR123456_1.fastq.gz` - Forward reads
    - `SRR123456_2.fastq.gz` - Reverse reads (if paired-end)

**Examples:**

```bash
# Download everything
enatool download PRJNA335681

# Custom output directory
enatool download PRJNA335681 --path data/project1
```

### Show Project Summary [stdout]

Display summary information about a downloaded project using `enatool info`.

**Syntax:**
```bash
enatool info PROJECT_ID --path DIR
```

**Arguments:**
- `PROJECT_ID` (required): ENA project accession
- `--path DIR` or `-p DIR` (required): Directory containing metadata

**What it does:**
- Reads metadata from CSV file
- Shows summary statistics
- Displays organism breakdown
- Shows sequencing platforms
- Shows download status (if available)

**Examples:**

```bash
# Show info for custom directory
enatool info PRJNA335681 --path data/my_project
```

**Output:**
```
📊 Project Information: PRJNAXXXXXX
============================================================
Total samples: 50

Organisms (2):
  • Homo sapiens: 45
  • Mus musculus: 5

Sequencing Platforms:
  • ILLUMINA: 50

Library Strategies:
  • RNA-Seq: 30
  • WGS: 15
  • ChIP-Seq: 5

Library Layout:
  • PAIRED: 45
  • SINGLE: 5

Download Status:
  • OK: 48
  • Error: 2
```


### Redownload Corrupted Files or Download Only Selected Files

Download all FASTQ files using previously fetched metadata or based on the subsetted metadata table using `enatool download-files`. Also forces redownload of files which previously ended up with a error.

**Syntax:**
```bash
enatool download-files PROJECT_ID --path DIR
```

**Arguments:**
- `PROJECT_ID` (required): ENA project accession
- `--path DIR` or `-p DIR` (required): Directory containing metadata

**What it does:**
- Loads sample names from existing CSV file (`PROJECT_ID.csv`)
- Downloads FASTQ files
- Useful if you already have metadata and just want the files or for filtered metadata tables.

**Use cases:**
- You fetched metadata earlier with `enatool fetch`
- You filtered the CSV file manually
- You want to re-download after failures

**Examples:**

```bash
# First get metadata (fast)
enatool fetch PRJNA335681 --path my_project

# Later, download files 
enatool download-files PRJNA335681 --path my_project

# Or after filtering CSV file
enatool download-files PRJNA335681 --path my_project
```

### Leave files with incorrect md5 checksum

By default ENATool removes all the files which ended up being corrupted or md5 chesum did not match. However, you may use `--keep-failed` paramter to prevent the removal.

**Syntax:**
```bash
# with download command
enatool download PROJECT_ID --path DIR --keep-failed

# with download-files command
enatool download-files PROJECT_ID --path DIR --keep-failed
```

### Process multiple projects

For processing multiple projects:

```bash
# Simple loop
for project in PRJNA335681 PRJNA123456 PRJNA789012; do
    echo "Processing $project..."
    enatool fetch $project --path data/$project
done

# Or with download
for project in PRJNA335681 PRJNA123456; do
    echo "Downloading $project..."
    enatool download $project --path data/$project
done
```

### Hide banner
Use a global `enatool` option: `--no-banner`. Follows right after `enatool` and before the action command.

**Example:**
```bash
enatool --no-banner fetch PRJNA335681
```

### Disable progress bar
Use a global `enatool` option: `--no-progress-bar`. Follows right after `enatool` and before the action command.

**Example:**
```bash
enatool --no-progress-bar fetch PRJNA335681
```

__
## Use ENATool in Python
### Fetch Metadata

Use `fetch()` function to download metadata:

```python
import ENATool

# Basic usage - just get metadata
info_table = ENATool.fetch('PRJNA335681')

# Specify custom directory
info_table = ENATool.fetch('PRJNA335681', path='data/my_project')

# Get metadata AND download files
info_table, downloads = ENATool.fetch('PRJNA335681', download=True)

# Show some basic stats
print(f"Total samples: {len(info_table)}")
print(f"Organisms: {info_table['scientific_name'].unique()}")
print(f"Platforms: {info_table['instrument_platform'].value_counts()}")
```

**What you get:**
- Sample accessions and metadata
- Run accessions and sequencing details
- FASTQ file URLs and checksums
- Organism and experimental information
- Interactive HTML report

### Download FASTQ Files

```python
import ENATool

# Get metadata AND download files
info_table, downloads = ENATool.fetch('PRJNA335681', download=True)

# Check results
print(downloads['download_status'].value_counts())
```

**Download status values:**
- `OK` - Successfully downloaded
- `Exists` - File already exists (skipped)
- `Error` - Download failed

### Download only a subset of samples

```python
import ENATool

# Get metadata
info = ENATool.fetch('PRJNA335681')

# Filter samples
human_samples = info[info['scientific_name'] == 'Homo sapiens']

# ! Important ! 
# Re-initialize for filtered table
human_samples.ena.reinit(info)

# Download only filtered samples
downloads = human_samples.ena.download()

# Save to CSV
human_samples.to_csv('human_samples.csv', index=False)
```

### Leave files with incorrect md5 checksum
Prevent ENATool from automatic removal of the corrupted files.

```python
import ENATool

# Could be used in fetch method
info_table, downloads = ENATool.fetch('PRJNA335681', download=True, keep_failed=True)

# Could be used in download method
info = ENATool.fetch('PRJNA335681')
downloads = info.ena.download(keep_failed=True)
```

### Disable progress bar
```python
import ENATool

# Could be used in fetch method
info_table, downloads = ENATool.fetch('PRJNA335681', download=True, NO_PROGRESS_BAR=True)

# Could be used in download method
info = ENATool.fetch('PRJNA335681')
downloads = info.ena.download(NO_PROGRESS_BAR=True)
```

### Work with multiple datasets

```python
import ENATool

projects = ['PRJNA335681', 'PRJEB2961', 'PRJEB28350']

for project_id in projects:
    try:
        info = ENATool.fetch(project_id, path=f'data/{project_id}')
        print(f"✓ {project_id}: {len(info)} samples")
    except Exception as e:
        print(f"✗ {project_id}: {e}")
```

### Python API Reference

#### `ENATool.fetch(project_id, path=None, download=False)`

Main entry point for fetching ENA data.

**Parameters:**
- `project_id` (str): ENA project accession (e.g., 'PRJNA335681')
- `path` (str, optional): Directory for outputs (defaults to project_id)
- `download` (bool, optional): Auto-download FASTQ files (default: False)

**Returns:**
- DataFrame (if download=False)
- Tuple of (info_table, download_table) (if download=True)

#### `DataFrame.ena.download()`

Download FASTQ files for samples in DataFrame.

**Returns:**
- DataFrame with download status


## 📝 Citation

If you use ENATool in your research, please cite:

```
Tikhonova, P. (2021). ENATool: European Nucleotide Archive Data Manager
(v2.0.0). Zenodo. https://doi.org/10.5281/zenodo.17443004
```


## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **PyPI:** https://pypi.org/project/ENATool/
- **GitHub:** https://github.com/PollyTikhonova/ENATool
- **Documentation:** https://github.com/PollyTikhonova/ENATool#readme
- **Bug Reports:** https://github.com/PollyTikhonova/ENATool/issues

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ENATool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "bioinformatics, sequencing, ENA, FASTQ, download, metadata, genomics",
    "author": null,
    "author_email": "\"P.Tikhonova\" <tikhonova.polly@mail.ru>",
    "download_url": "https://files.pythonhosted.org/packages/6a/ae/67b65af1f0a4b9ff6357040b95bc8f7b0a9a069a9ff0cad4111ed8835f8e/enatool-2.0.0.tar.gz",
    "platform": null,
    "description": "# ENATool \ud83e\uddec\n\n[![PyPI version](https://badge.fury.io/py/ENATool.svg)](https://badge.fury.io/py/ENATool)\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA comprehensive Python package for downloading and managing sequencing data from the European Nucleotide Archive (ENA) in terminal and through Python interface.\n\n## \u2728 Features\n\n- \ud83d\udcca **Extract Metadata** - Get comprehensive sample information from ENA projects\n- \ud83d\udce5 **Download FASTQ Files** - Automated download with progress tracking\n- \ud83d\udd04 **Auto Fallback** - Automatically tries NCBI if ENA metadata unavailable\n- \ud83d\udcc8 **Progress Bars** - Real-time progress for downloads and metadata retrieval\n- \ud83d\udccb **Interactive Reports** - Generate searchable HTML tables with DataTables.js\n- \ud83d\udcbe **Export to CSV** - Save metadata in standard formats\n- \ud83d\udd0d **Smart Verification** - Check fastq file integrity and skip existing files\n- \ud83d\udcbb **Command line and Python interface**\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Install from PyPI\npip install ENATool\n```\n\n### Basic Usage in Terminal\n\n```bash\n# Custom output directory\nenatool download PRJNA335681 --path data/my_project\n```\n\n### Basic Usage in Python\n\n```python\nimport ENATool\n\n# Fetch metadata AND download files in one command\ninfo, downloads = ENATool.fetch('PRJNA335681', path='data/my_project', download=True)\n```\n## \ud83d\udcca Example Output Files\n\nENATool creates organized output:\n\n```\nmy_project/\n\u251c\u2500\u2500 PRJNA335681.csv              # Sample metadata\n\u251c\u2500\u2500 PRJNA335681.html             # Interactive table\n\u251c\u2500\u2500 downoad_info_table.csv       # Download tracking\n\u2514\u2500\u2500 raw_reads/                  # Downloaded FASTQ files\n    \u251c\u2500\u2500 SRR123456/\n    \u2502   \u251c\u2500\u2500 SRR123456_1.fastq.gz\n    \u2502   \u2514\u2500\u2500 SRR123456_2.fastq.gz\n    \u2514\u2500\u2500 SRR123457/\n        \u2514\u2500\u2500 SRR123457.fastq.gz\n```\n\n## \ud83d\udd27 Requirements\n\n- Python >= 3.7\n- pandas >= 1.3.0\n- numpy >= 1.20.0\n- requests >= 2.25.0\n- xmltodict >= 0.12.0\n- tqdm >= 4.60.0\n- lxml >= 4.6.0\n\n## \ud83d\udcd6 Documentation\n\n- [Use ENATool in Terminal](#use-enatool-in-terminal)\n   - [Fetching metadata](#fetching-metadata)\n   - [Download reads and fetch metadata](#download-reads-and-fetch-metadata)\n   - [Show project summary](#show-project-summary-stdout)\n   - [Redownload corrupted files or download selected files only](#redownload-corrupted-files-or-download-only-selected-files)\n   - [Leave files with incorrect md5 checksum](#leave-files-with-incorrect-md5-checksum)\n   - [Process multiple projects](#process-multiple-projects)\n   - [Hide banner](#hide-banner)\n   - [Disable progress bar](#disable-progress-bar)\n- [Use ENATool in Python](#use-enatool-in-python)\n   - [Fetch Metadata](#fetch-metadata)\n   - [Download FASTQ Files](#download-fastq-files)\n   - [Download only a subset of samples](#download-only-a-subset-of-samples)\n   - [Leave files with incorrect md5 checksum](#leave-files-with-incorrect-md5-checksum-1)\n   - [Disable progress bar](#disable-progress-bar-1)\n   - [Work with multiple datasets](#work-with-multiple-datasets)\n   - [Python API Reference](#python-api-reference)\n- [Citation](#-citation)\n\n## Use ENATool in Terminal\n\n### Fetching Metadata\n\nDownload metadata for all samples in an ENA project using `enatool fetch`.\n\n**Syntax:**\n```bash\nenatool fetch PROJECT_ID [--path DIR]\n```\n\n**Arguments:**\n- `PROJECT_ID` (required): ENA project accession (e.g., PRJNA335681)\n- `--path DIR` or `-p DIR`: Output directory (default: PROJECT_ID)\n\n**What it does:**\n- Downloads sample metadata from ENA\n- Tries NCBI BioSample as fallback if ENA fails\n- Creates CSV file with all metadata\n- Generates interactive HTML report\n- Shows progress bars\n\n**Output files:**\n- `PROJECT_ID.csv` - Metadata in CSV format\n- `PROJECT_ID.html` - Interactive HTML table\n\n**Examples:**\n\n```bash\n# Basic usage - saves to PRJNA335681/\nenatool fetch PRJNA335681\n\n# Custom output directory\nenatool fetch PRJNA335681 --path data/my_project\n```\n\n### Download Reads and Fetch Metadata\nDownload metadata for all samples in an ENA project and download sample files using using `enatool download`.\n\n**Syntax:**\n```bash\nenatool download PROJECT_ID [--path DIR]\n```\n\n**Arguments:**\n- `PROJECT_ID` (required): ENA project accession\n- `--path DIR` or `-p DIR`: Output directory (default: PROJECT_ID)\n\n**What it does:**\n- Downloads metadata (same as `fetch`)\n- Downloads all FASTQ files for all samples\n- Uses enaDataGet tool\n- Skips files that already exist\n- Tracks download status\n\n**Output files:**\n- `PROJECT_ID.csv` - Metadata\n- `PROJECT_ID.html` - Interactive table\n- `downoad_info_table.csv` - Download tracking\n- `raw_reads/` - Directory with FASTQ files\n  - `SRR123456/` - One directory per run\n    - `SRR123456_1.fastq.gz` - Forward reads\n    - `SRR123456_2.fastq.gz` - Reverse reads (if paired-end)\n\n**Examples:**\n\n```bash\n# Download everything\nenatool download PRJNA335681\n\n# Custom output directory\nenatool download PRJNA335681 --path data/project1\n```\n\n### Show Project Summary [stdout]\n\nDisplay summary information about a downloaded project using `enatool info`.\n\n**Syntax:**\n```bash\nenatool info PROJECT_ID --path DIR\n```\n\n**Arguments:**\n- `PROJECT_ID` (required): ENA project accession\n- `--path DIR` or `-p DIR` (required): Directory containing metadata\n\n**What it does:**\n- Reads metadata from CSV file\n- Shows summary statistics\n- Displays organism breakdown\n- Shows sequencing platforms\n- Shows download status (if available)\n\n**Examples:**\n\n```bash\n# Show info for custom directory\nenatool info PRJNA335681 --path data/my_project\n```\n\n**Output:**\n```\n\ud83d\udcca Project Information: PRJNAXXXXXX\n============================================================\nTotal samples: 50\n\nOrganisms (2):\n  \u2022 Homo sapiens: 45\n  \u2022 Mus musculus: 5\n\nSequencing Platforms:\n  \u2022 ILLUMINA: 50\n\nLibrary Strategies:\n  \u2022 RNA-Seq: 30\n  \u2022 WGS: 15\n  \u2022 ChIP-Seq: 5\n\nLibrary Layout:\n  \u2022 PAIRED: 45\n  \u2022 SINGLE: 5\n\nDownload Status:\n  \u2022 OK: 48\n  \u2022 Error: 2\n```\n\n\n### Redownload Corrupted Files or Download Only Selected Files\n\nDownload all FASTQ files using previously fetched metadata or based on the subsetted metadata table using `enatool download-files`. Also forces redownload of files which previously ended up with a error.\n\n**Syntax:**\n```bash\nenatool download-files PROJECT_ID --path DIR\n```\n\n**Arguments:**\n- `PROJECT_ID` (required): ENA project accession\n- `--path DIR` or `-p DIR` (required): Directory containing metadata\n\n**What it does:**\n- Loads sample names from existing CSV file (`PROJECT_ID.csv`)\n- Downloads FASTQ files\n- Useful if you already have metadata and just want the files or for filtered metadata tables.\n\n**Use cases:**\n- You fetched metadata earlier with `enatool fetch`\n- You filtered the CSV file manually\n- You want to re-download after failures\n\n**Examples:**\n\n```bash\n# First get metadata (fast)\nenatool fetch PRJNA335681 --path my_project\n\n# Later, download files \nenatool download-files PRJNA335681 --path my_project\n\n# Or after filtering CSV file\nenatool download-files PRJNA335681 --path my_project\n```\n\n### Leave files with incorrect md5 checksum\n\nBy default ENATool removes all the files which ended up being corrupted or md5 chesum did not match. However, you may use `--keep-failed` paramter to prevent the removal.\n\n**Syntax:**\n```bash\n# with download command\nenatool download PROJECT_ID --path DIR --keep-failed\n\n# with download-files command\nenatool download-files PROJECT_ID --path DIR --keep-failed\n```\n\n### Process multiple projects\n\nFor processing multiple projects:\n\n```bash\n# Simple loop\nfor project in PRJNA335681 PRJNA123456 PRJNA789012; do\n    echo \"Processing $project...\"\n    enatool fetch $project --path data/$project\ndone\n\n# Or with download\nfor project in PRJNA335681 PRJNA123456; do\n    echo \"Downloading $project...\"\n    enatool download $project --path data/$project\ndone\n```\n\n### Hide banner\nUse a global `enatool` option: `--no-banner`. Follows right after `enatool` and before the action command.\n\n**Example:**\n```bash\nenatool --no-banner fetch PRJNA335681\n```\n\n### Disable progress bar\nUse a global `enatool` option: `--no-progress-bar`. Follows right after `enatool` and before the action command.\n\n**Example:**\n```bash\nenatool --no-progress-bar fetch PRJNA335681\n```\n\n__\n## Use ENATool in Python\n### Fetch Metadata\n\nUse `fetch()` function to download metadata:\n\n```python\nimport ENATool\n\n# Basic usage - just get metadata\ninfo_table = ENATool.fetch('PRJNA335681')\n\n# Specify custom directory\ninfo_table = ENATool.fetch('PRJNA335681', path='data/my_project')\n\n# Get metadata AND download files\ninfo_table, downloads = ENATool.fetch('PRJNA335681', download=True)\n\n# Show some basic stats\nprint(f\"Total samples: {len(info_table)}\")\nprint(f\"Organisms: {info_table['scientific_name'].unique()}\")\nprint(f\"Platforms: {info_table['instrument_platform'].value_counts()}\")\n```\n\n**What you get:**\n- Sample accessions and metadata\n- Run accessions and sequencing details\n- FASTQ file URLs and checksums\n- Organism and experimental information\n- Interactive HTML report\n\n### Download FASTQ Files\n\n```python\nimport ENATool\n\n# Get metadata AND download files\ninfo_table, downloads = ENATool.fetch('PRJNA335681', download=True)\n\n# Check results\nprint(downloads['download_status'].value_counts())\n```\n\n**Download status values:**\n- `OK` - Successfully downloaded\n- `Exists` - File already exists (skipped)\n- `Error` - Download failed\n\n### Download only a subset of samples\n\n```python\nimport ENATool\n\n# Get metadata\ninfo = ENATool.fetch('PRJNA335681')\n\n# Filter samples\nhuman_samples = info[info['scientific_name'] == 'Homo sapiens']\n\n# ! Important ! \n# Re-initialize for filtered table\nhuman_samples.ena.reinit(info)\n\n# Download only filtered samples\ndownloads = human_samples.ena.download()\n\n# Save to CSV\nhuman_samples.to_csv('human_samples.csv', index=False)\n```\n\n### Leave files with incorrect md5 checksum\nPrevent ENATool from automatic removal of the corrupted files.\n\n```python\nimport ENATool\n\n# Could be used in fetch method\ninfo_table, downloads = ENATool.fetch('PRJNA335681', download=True, keep_failed=True)\n\n# Could be used in download method\ninfo = ENATool.fetch('PRJNA335681')\ndownloads = info.ena.download(keep_failed=True)\n```\n\n### Disable progress bar\n```python\nimport ENATool\n\n# Could be used in fetch method\ninfo_table, downloads = ENATool.fetch('PRJNA335681', download=True, NO_PROGRESS_BAR=True)\n\n# Could be used in download method\ninfo = ENATool.fetch('PRJNA335681')\ndownloads = info.ena.download(NO_PROGRESS_BAR=True)\n```\n\n### Work with multiple datasets\n\n```python\nimport ENATool\n\nprojects = ['PRJNA335681', 'PRJEB2961', 'PRJEB28350']\n\nfor project_id in projects:\n    try:\n        info = ENATool.fetch(project_id, path=f'data/{project_id}')\n        print(f\"\u2713 {project_id}: {len(info)} samples\")\n    except Exception as e:\n        print(f\"\u2717 {project_id}: {e}\")\n```\n\n### Python API Reference\n\n#### `ENATool.fetch(project_id, path=None, download=False)`\n\nMain entry point for fetching ENA data.\n\n**Parameters:**\n- `project_id` (str): ENA project accession (e.g., 'PRJNA335681')\n- `path` (str, optional): Directory for outputs (defaults to project_id)\n- `download` (bool, optional): Auto-download FASTQ files (default: False)\n\n**Returns:**\n- DataFrame (if download=False)\n- Tuple of (info_table, download_table) (if download=True)\n\n#### `DataFrame.ena.download()`\n\nDownload FASTQ files for samples in DataFrame.\n\n**Returns:**\n- DataFrame with download status\n\n\n## \ud83d\udcdd Citation\n\nIf you use ENATool in your research, please cite:\n\n```\nTikhonova, P. (2021). ENATool: European Nucleotide Archive Data Manager\n(v2.0.0). Zenodo. https://doi.org/10.5281/zenodo.17443004\n```\n\n\n## \ud83d\udcdc License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **PyPI:** https://pypi.org/project/ENATool/\n- **GitHub:** https://github.com/PollyTikhonova/ENATool\n- **Documentation:** https://github.com/PollyTikhonova/ENATool#readme\n- **Bug Reports:** https://github.com/PollyTikhonova/ENATool/issues\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Comprehensive tool for downloading and managing ENA sequencing data",
    "version": "2.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/PollyTikhonova/ENATool/issues",
        "Documentation": "https://github.com/PollyTikhonova/ENATool#readme",
        "Homepage": "https://github.com/PollyTikhonova/ENATool",
        "Repository": "https://github.com/PollyTikhonova/ENATool"
    },
    "split_keywords": [
        "bioinformatics",
        " sequencing",
        " ena",
        " fastq",
        " download",
        " metadata",
        " genomics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fa03c39a9686b8144d7b8d955d907719e9adad657c12cee84bab44ce37181071",
                "md5": "ed53c5dcc0b1ff7c112cc797fbbd083c",
                "sha256": "8c669c5d748ac83bd7138f1e37b4d05edda3af97c5e444057497016e004632b2"
            },
            "downloads": -1,
            "filename": "enatool-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed53c5dcc0b1ff7c112cc797fbbd083c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 21159,
            "upload_time": "2025-10-25T16:06:28",
            "upload_time_iso_8601": "2025-10-25T16:06:28.196377Z",
            "url": "https://files.pythonhosted.org/packages/fa/03/c39a9686b8144d7b8d955d907719e9adad657c12cee84bab44ce37181071/enatool-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6aae67b65af1f0a4b9ff6357040b95bc8f7b0a9a069a9ff0cad4111ed8835f8e",
                "md5": "9764775148d94851557002f3580d839c",
                "sha256": "94e2eb295c17ed22b27c080b04d4f0410ace603cc47e5c198481d7771e7cbfde"
            },
            "downloads": -1,
            "filename": "enatool-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9764775148d94851557002f3580d839c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 19602,
            "upload_time": "2025-10-25T16:06:29",
            "upload_time_iso_8601": "2025-10-25T16:06:29.333780Z",
            "url": "https://files.pythonhosted.org/packages/6a/ae/67b65af1f0a4b9ff6357040b95bc8f7b0a9a069a9ff0cad4111ed8835f8e/enatool-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 16:06:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PollyTikhonova",
    "github_project": "ENATool",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "enatool"
}
        
Elapsed time: 2.71548s