Name | metagene JSON |
Version |
0.0.14
JSON |
| download |
home_page | None |
Summary | Metagene Profiling Analysis and Visualization |
upload_time | 2025-07-30 13:39:11 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.12 |
license | None |
keywords |
dna
rna
biology
metagene
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Metagene
[](https://pypi.python.org/pypi/metagene)
[](https://pepy.tech/project/metagene)
**Metagene Profiling Analysis and Visualization**
This tool allows you to analyze metagene, the distribution of genomic features relative to gene regions (5'UTR, CDS, 3'UTR) and create publication-ready metagene profile plots.
## Installation
Install metagene using pip:
```bash
pip install metagene
```
minimal python version requirement: 3.12
## Quick Start
### Command Line Interface
Basic metagene analysis using a built-in reference:
```bash
# Using built-in human genome reference (GRCh38)
metagene -i sites.tsv.gz -r GRCh38 --with-header -m 1,2,3 -w 5 \
-o output.tsv -s scores.tsv -p plot.png
```
Using a custom GTF file:
```bash
# Using custom GTF annotation
metagene -i sites.bed -g custom.gtf.gz -m 1,2,3 -w 5 \
-o output.tsv -s scores.tsv -p plot.png
```
### Python API
```python
from metagene import (
load_sites, load_reference, map_to_transcripts,
normalize_positions, plot_profile
)
# Load your genomic sites
sites_df = load_sites("sites.tsv.gz", with_header=True, meta_col_index=[0, 1, 2])
# Load reference genome annotation
reference_df = load_reference("GRCh38") # or load_gtf("custom.gtf.gz")
# Perform metagene analysis
annotated_df = map_to_transcripts(sites_df, reference_df)
gene_bins, gene_stats, gene_splits = normalize_positions(
annotated_df, split_strategy="median", bin_number=100
)
# Generate plot
plot_profile(gene_bins, gene_splits, "metagene_plot.png")
print(f"Analyzed {gene_bins['count'].sum()} sites")
print(f"Gene splits - 5'UTR: {gene_splits[0]:.3f}, CDS: {gene_splits[1]:.3f}, 3'UTR: {gene_splits[2]:.3f}")
print(f"Gene statistics - 5'UTR: {gene_stats['5UTR']}, CDS: {gene_stats['CDS']}, 3'UTR: {gene_stats['3UTR']}")
```
## Input Formats
### TSV Format
```
ref pos strand score pvalue
chr1 1000000 + 0.85 0.001
chr1 2000000 - 0.72 0.005
```
### BED Format
```
chr1 999999 1000000 score1 0.85 +
chr1 1999999 2000000 score2 0.72 -
```
### Column Specification
- Use `-m/--meta-columns` to specify coordinate columns (1-based indexing)
- Use `-w/--weight-columns` to specify score/weight columns
- Use `-H/--with-header` if your file has a header line
## Built-in References
Metagene includes pre-processed gene annotations for major model organisms:
| Species | Assembly | Reference |
| ------------------- | ----------- | ------------------------------------------ |
| **Human** | GRCh38/hg38 | `GRCh38`, `hg38` |
| | GRCh37/hg19 | `GRCh37`, `hg19` |
| **Mouse** | GRCm39/mm39 | `GRCm39`, `mm39` |
| | GRCm38/mm10 | `GRCm38`, `mm10` |
| | mm9/NCBIM37 | `mm9`, `NCBIM37` |
| **Arabidopsis** | TAIR10 | `TAIR10` |
| **Rice** | IRGSP-1.0 | `IRGSP-1.0` |
| **Model Organisms** | Various | `dm6`, `ce11`, `WBcel235`, `sacCer3`, etc. |
### Managing References
List all available references:
```bash
metagene --list
```
This will show all 23+ available references organized by species:
```
Human:
GRCh37 - Human genome GRCh37 (Ensembl release 75)
GRCh38 - Human genome GRCh38 (Ensembl release 110)
hg19 - Human genome hg19 (UCSC 2021)
hg38 - Human genome hg38 (UCSC 2022)
Mouse:
GRCm38 - Mouse genome GRCm38 (Ensembl release 102)
GRCm39 - Mouse genome GRCm39 (Ensembl release 110)
mm10 - Mouse genome mm10 (UCSC 2021)
mm39 - Mouse genome mm39 (UCSC 2024)
mm9 - Mouse genome mm9 (UCSC 2020)
... and more
```
Download a specific reference:
```bash
metagene --download GRCh38
```
Download all references (requires ~10GB disk space):
```bash
metagene --download all
```
## CLI Options
```
Usage: metagene [OPTIONS]
Run metagene analysis on genomic sites.
Options:
--version Show the version and exit.
-i, --input PATH Input file path (BED, GTF, TSV or CSV, etc.)
-o, --output PATH Output file path (TSV, CSV)
-s, --output-score PATH Output file for binned score statistics
-p, --output-figure PATH Output file for metagene plot
-r, --reference TEXT Built-in reference genome to use (e.g.,
GRCh38, GRCm39)
-g, --gtf PATH GTF/GFF file path for custom reference
--region Region to analyze (default: all)
-b, --bins INTEGER Number of bins for analysis (default: 100)
-H, --with-header Input file has header line
-S, --separator TEXT Separator for input file (default: tab)
-m, --meta-columns TEXT Input column indices (1-based) for genomic
coordinates. The columns should contain
Chromosome,Start,End,Strand or
Chromosome,Site,Strand
-w, --weight-columns TEXT Input column indices (1-based) for
weight/score values
-n, --weight-names TEXT Names for weight columns
--score-transform
Transform to apply to scores (default: none)
--normalize Normalize scores by transcript length
--list List all available built-in references and
exit
--download TEXT Download a specific reference (e.g., GRCh38)
or 'all' for all references
-h, --help Show this message and exit.
```
## API Reference (Core Functions)
- `load_sites(file, with_header=False, meta_col_index=[0,1,2])` - Load genomic sites
- `load_reference(name)` - Load built-in reference genome
- `load_gtf(file)` - Load custom GTF annotation
- `map_to_transcripts(sites, reference)` - Annotate sites with gene information
- `normalize_positions(annotated_sites, strategy="median")` - Normalize to relative positions
- `plot_profile(data, gene_splits, output_file)` - Generate metagene plot
## Demo

The plot shows the distribution of genomic sites across normalized gene regions:
- **5'UTR** (0.0 - first split): 5' untranslated region
- **CDS** (first split - second split): Coding sequence
- **3'UTR** (second split - 1.0): 3' untranslated region
## TODO:
- [ ] How to 100k sites on human genome in less than 10s?
- [ ] The core function should be move into [variant](https://github.com/y9c/variant)
Raw data
{
"_id": null,
"home_page": null,
"name": "metagene",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "DNA, RNA, biology, metagene",
"author": null,
"author_email": "Ye Chang <yech1990@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/58/72/7f6fc2e59afc2d40976c04efce4b5056a5c04ead714374a253a81e6fa49a/metagene-0.0.14.tar.gz",
"platform": null,
"description": "# Metagene\n\n[](https://pypi.python.org/pypi/metagene)\n[](https://pepy.tech/project/metagene)\n\n**Metagene Profiling Analysis and Visualization**\n\nThis tool allows you to analyze metagene, the distribution of genomic features relative to gene regions (5'UTR, CDS, 3'UTR) and create publication-ready metagene profile plots.\n\n\n## Installation\n\nInstall metagene using pip:\n\n```bash\npip install metagene\n```\nminimal python version requirement: 3.12\n\n## Quick Start\n\n### Command Line Interface\n\nBasic metagene analysis using a built-in reference:\n\n```bash\n# Using built-in human genome reference (GRCh38)\nmetagene -i sites.tsv.gz -r GRCh38 --with-header -m 1,2,3 -w 5 \\\n -o output.tsv -s scores.tsv -p plot.png\n```\n\nUsing a custom GTF file:\n\n```bash\n# Using custom GTF annotation\nmetagene -i sites.bed -g custom.gtf.gz -m 1,2,3 -w 5 \\\n -o output.tsv -s scores.tsv -p plot.png\n```\n\n### Python API\n\n```python\nfrom metagene import (\n load_sites, load_reference, map_to_transcripts, \n normalize_positions, plot_profile\n)\n\n# Load your genomic sites\nsites_df = load_sites(\"sites.tsv.gz\", with_header=True, meta_col_index=[0, 1, 2])\n\n# Load reference genome annotation\nreference_df = load_reference(\"GRCh38\") # or load_gtf(\"custom.gtf.gz\")\n\n# Perform metagene analysis\nannotated_df = map_to_transcripts(sites_df, reference_df)\ngene_bins, gene_stats, gene_splits = normalize_positions(\n annotated_df, split_strategy=\"median\", bin_number=100\n)\n\n# Generate plot\nplot_profile(gene_bins, gene_splits, \"metagene_plot.png\")\n\nprint(f\"Analyzed {gene_bins['count'].sum()} sites\")\nprint(f\"Gene splits - 5'UTR: {gene_splits[0]:.3f}, CDS: {gene_splits[1]:.3f}, 3'UTR: {gene_splits[2]:.3f}\")\nprint(f\"Gene statistics - 5'UTR: {gene_stats['5UTR']}, CDS: {gene_stats['CDS']}, 3'UTR: {gene_stats['3UTR']}\")\n```\n\n## Input Formats\n\n### TSV Format\n```\nref\tpos\tstrand\tscore\tpvalue\nchr1\t1000000\t+\t0.85\t0.001\nchr1\t2000000\t-\t0.72\t0.005\n```\n\n### BED Format\n```\nchr1\t999999\t1000000\tscore1\t0.85\t+\nchr1\t1999999\t2000000\tscore2\t0.72\t-\n```\n\n### Column Specification\n- Use `-m/--meta-columns` to specify coordinate columns (1-based indexing)\n- Use `-w/--weight-columns` to specify score/weight columns\n- Use `-H/--with-header` if your file has a header line\n\n## Built-in References\n\nMetagene includes pre-processed gene annotations for major model organisms:\n\n| Species | Assembly | Reference |\n| ------------------- | ----------- | ------------------------------------------ |\n| **Human** | GRCh38/hg38 | `GRCh38`, `hg38` |\n| | GRCh37/hg19 | `GRCh37`, `hg19` |\n| **Mouse** | GRCm39/mm39 | `GRCm39`, `mm39` |\n| | GRCm38/mm10 | `GRCm38`, `mm10` |\n| | mm9/NCBIM37 | `mm9`, `NCBIM37` |\n| **Arabidopsis** | TAIR10 | `TAIR10` |\n| **Rice** | IRGSP-1.0 | `IRGSP-1.0` |\n| **Model Organisms** | Various | `dm6`, `ce11`, `WBcel235`, `sacCer3`, etc. |\n\n### Managing References\n\nList all available references:\n```bash\nmetagene --list\n```\n\nThis will show all 23+ available references organized by species:\n```\nHuman:\n GRCh37 - Human genome GRCh37 (Ensembl release 75)\n GRCh38 - Human genome GRCh38 (Ensembl release 110)\n hg19 - Human genome hg19 (UCSC 2021)\n hg38 - Human genome hg38 (UCSC 2022)\n\nMouse:\n GRCm38 - Mouse genome GRCm38 (Ensembl release 102)\n GRCm39 - Mouse genome GRCm39 (Ensembl release 110)\n mm10 - Mouse genome mm10 (UCSC 2021)\n mm39 - Mouse genome mm39 (UCSC 2024)\n mm9 - Mouse genome mm9 (UCSC 2020)\n\n... and more\n```\n\nDownload a specific reference:\n```bash\nmetagene --download GRCh38\n```\n\nDownload all references (requires ~10GB disk space):\n```bash\nmetagene --download all\n```\n\n\n## CLI Options\n\n```\nUsage: metagene [OPTIONS]\n\n Run metagene analysis on genomic sites.\n\nOptions:\n --version Show the version and exit.\n -i, --input PATH Input file path (BED, GTF, TSV or CSV, etc.)\n -o, --output PATH Output file path (TSV, CSV)\n -s, --output-score PATH Output file for binned score statistics\n -p, --output-figure PATH Output file for metagene plot\n -r, --reference TEXT Built-in reference genome to use (e.g.,\n GRCh38, GRCm39)\n -g, --gtf PATH GTF/GFF file path for custom reference\n --region Region to analyze (default: all)\n -b, --bins INTEGER Number of bins for analysis (default: 100)\n -H, --with-header Input file has header line\n -S, --separator TEXT Separator for input file (default: tab)\n -m, --meta-columns TEXT Input column indices (1-based) for genomic\n coordinates. The columns should contain\n Chromosome,Start,End,Strand or\n Chromosome,Site,Strand\n -w, --weight-columns TEXT Input column indices (1-based) for\n weight/score values\n -n, --weight-names TEXT Names for weight columns\n --score-transform \n Transform to apply to scores (default: none)\n --normalize Normalize scores by transcript length\n --list List all available built-in references and\n exit\n --download TEXT Download a specific reference (e.g., GRCh38)\n or 'all' for all references\n -h, --help Show this message and exit.\n```\n\n## API Reference (Core Functions)\n\n- `load_sites(file, with_header=False, meta_col_index=[0,1,2])` - Load genomic sites\n- `load_reference(name)` - Load built-in reference genome\n- `load_gtf(file)` - Load custom GTF annotation \n- `map_to_transcripts(sites, reference)` - Annotate sites with gene information\n- `normalize_positions(annotated_sites, strategy=\"median\")` - Normalize to relative positions\n- `plot_profile(data, gene_splits, output_file)` - Generate metagene plot\n\n\n## Demo\n\n\n\nThe plot shows the distribution of genomic sites across normalized gene regions:\n- **5'UTR** (0.0 - first split): 5' untranslated region\n- **CDS** (first split - second split): Coding sequence \n- **3'UTR** (second split - 1.0): 3' untranslated region\n\n## TODO:\n\n- [ ] How to 100k sites on human genome in less than 10s?\n- [ ] The core function should be move into [variant](https://github.com/y9c/variant)",
"bugtrack_url": null,
"license": null,
"summary": "Metagene Profiling Analysis and Visualization",
"version": "0.0.14",
"project_urls": {
"Repository": "https://github.com/y9c/metagene"
},
"split_keywords": [
"dna",
" rna",
" biology",
" metagene"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "85749f17000e6eeb36f42d8317b49d44e4549e02644c8c338b25ce7166f73efa",
"md5": "eea2c302fa9ae51e64727b1426c00589",
"sha256": "2af71dae63d0e653968011c938647bd1f40bc2cbe7fb1cfaa8be37680408b74b"
},
"downloads": -1,
"filename": "metagene-0.0.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eea2c302fa9ae51e64727b1426c00589",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 26963,
"upload_time": "2025-07-30T13:39:10",
"upload_time_iso_8601": "2025-07-30T13:39:10.731883Z",
"url": "https://files.pythonhosted.org/packages/85/74/9f17000e6eeb36f42d8317b49d44e4549e02644c8c338b25ce7166f73efa/metagene-0.0.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "58727f6fc2e59afc2d40976c04efce4b5056a5c04ead714374a253a81e6fa49a",
"md5": "b5c4a6c3957bd0fd49e3fa68f74c7d79",
"sha256": "00c5df33aa66d702c7af6ca0654279fd297a4c237f912c80cf6eccded3d3580b"
},
"downloads": -1,
"filename": "metagene-0.0.14.tar.gz",
"has_sig": false,
"md5_digest": "b5c4a6c3957bd0fd49e3fa68f74c7d79",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 1401132,
"upload_time": "2025-07-30T13:39:11",
"upload_time_iso_8601": "2025-07-30T13:39:11.925546Z",
"url": "https://files.pythonhosted.org/packages/58/72/7f6fc2e59afc2d40976c04efce4b5056a5c04ead714374a253a81e6fa49a/metagene-0.0.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-30 13:39:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "y9c",
"github_project": "metagene",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "metagene"
}