# METHYLMAP
## EXAMPLE
![GNAS methylmap](assets/1000Genomes_GNAS.png)
Methylmap is a tool for visualization of modified nucleotide frequencies for large cohort sizes, and allows for quick and easy consulting of nucleotide methylation frequencies of individuals in the 1000Genomes ONT project.
You can visualize your own data through the methylmap command line tool (available through bioconda and pypi) and on the methylmap web application <https://methylmap.bioinf.be>.
Consulting the methylation frequencies of the 1000Genomes ONT project can be done through the methylmap web application.
Installation of the methylmap command line tool:
```bash
conda install -c bioconda methylmap
```
Or:
```bash
pip install methylmap
```
If this application is useful for your research, please cite:
[our preprint](https://www.biorxiv.org/content/10.1101/2022.11.28.518239v1) and [the underlying 1000 Genomes Project ONT dataset](https://www.medrxiv.org/content/10.1101/2024.03.05.24303792v1).
### METHYLMAP WEB APPLICATION
#### INPUT POSSIBILITIES
The methylmap web application supports the visualization of own modification frequencies data by uploading a tab separated .tsv file. The file should contain the following columns: "chrom", "position", "sample_1", "sample_2", ... "sample_n". Example:
```text
chrom position sample_1 sample_2 sample_3 sample_4
chr1 100000.0 0.000 0.167 0.000 0.077
chr1 100000.5 0.000 0.000 0.100 0.000
chr1 100001.0 0.000 0.000 0.000 0.222
chr1 100002.0 0.000 0.000 0.000 0.000
chr1 100003.0 0.000 0.000 0.000 0.000
```
#### GENERATING A METHYLATION FREQUENCY TABLE WITH THE MULTIPARSETABLE.PY SCRIPT
The required input table can be generated using the multiparsetable.py script, that supports the following input possibilities:
- BAM/CRAM files with MM and ML tags.
- files from nanopolish (as processed by calculate_methylation_frequency.py). The methylation calls can additionally be phased using the available scripts in the "scripts" folder.
=> The multiparsetable.py script can be found in the "scripts" folder. Example:
```bash
python multiparsetable.py --files cramfileA.cram cramfileB.cram --fasta reference.fa --output methfreqtable.tsv --window chr20:58839718-58911192
python multiparsetable.py --files nanopolishfileA.tsv nanopolishfileB.tsv --output methfreqtable.tsv --window chr20:58839718-58911192
```
#### ANNOTATION FILES
- currently available annotation files on the methylmap website are:
- <https://www.gencodegenes.org/human/release_46.html>: Release 46 (GRCh38.p14) - comprehensive gene annotation
- <https://www.gencodegenes.org/mouse/>: Release M36 (GRCm39) - comprehensive gene annotation
If you would like to use another annotation file, please upload your request through the Github Issues page.
### METHYLMAP COMMAND LINE TOOL
#### INPUT POSSIBILITIES
- BAM/CRAM files with MM and ML tags. Use --files input option and --fasta for the reference genome.
- files from nanopolish (as processed by calculate_methylation_frequency.py). The methylation calls can additionally be phased using the available scripts in the "scripts" folder. Use --files input option.
- a tab separated table with nucleotide modification frequencies over all positions (methfreqtable), required header names are "chrom" (column with chromosome information) and "position" (columns with position information). Use --table input option. Example:
```text
chrom position sample_1 sample_2 sample_3 sample_4
chr1 100000.0 0.000 0.167 0.000 0.077
chr1 100000.5 0.000 0.000 0.100 0.000
chr1 100001.0 0.000 0.000 0.000 0.222
chr1 100002.0 0.000 0.000 0.000 0.000
chr1 100003.0 0.000 0.000 0.000 0.000
```
- a tab separated file with an overview table containing all nanopolish or BAM/CRAM files and their sample name and experimental group (header requires "path", "name" and "group"). Use --table input option. When using BAM/CRAM files, please provide the reference genome with the --fasta argument. Example:
```text
path name group
/home/path_to_file/bamfile_sample_1.bam samplename_1 case
/home/path_to_file/bamfile_sample_2.bam samplename_2 control
/home/path_to_file/bamfile_sample_3.bam samplename_3 control
/home/path_to_file/bamfile_sample_4.bam samplename_4 case
```
#### EXAMPLE COMMAND LINE TOOL USAGE
Example command line tool usage:
```bash
methylmap --files cramfileA.cram cramfileB.cram --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192
methylmap --files nanopolishfileA.tsv nanopolishfileB.tsv --gff annotation.gff3.gz --window chr20:58839718-58911192
methylmap --table methfreqtable.tsv --gff annotation.gff3.gz --window chr20:58839718-58911192
methylmap --table overviewtable.tsv --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192 (--fasta argument required when files in overviewtable are BAM/CRAM files)
methylmap --files cramfileA.cram cramfileB.cram --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192 --names sampleA sampleB sampleC sampleD --groups case control case control
```
#### IMPORTANT INFORMATION
Important: Adding a GFF3 file is required, use the --gff argument.
- File should be bgzipped
- File should be sorted (use: zcat annotation.gff3.gz | sort -k1,1V -k4,4n | bgzip > annotation_sorted.gff3.gz)
- File should be indexed (use: tabix -p gff annotation_sorted.gff3.gz)
Important: When using BAM/CRAM files as input, a reference genome with the --fasta argument is required.
- File should not be zipped/bgzipped.
- Indexed file should be present in the same directory as the fasta file.
Important: When perfroming hierarchical clustering, missing values are imputed using the pandas interpolate method. Genomic positions with missing values after imputation are removed from the visualization.
```text
usage: methylmap [-h] [-f FILES [FILES ...] | -t TABLE] [-w WINDOW] [-n [NAMES ...]] --gff GFF [--output OUTPUT] [--groups [GROUPS ...]] [-s] [--fasta FASTA]
[--mod {m,h}] [--hapl] [--dendro] [--threads THREADS] [--quiet] [--debug] [--host HOST] [--port PORT] [-v]
Plotting tool for population scale nucleotide modifications.
options:
-h, --help show this help message and exit
-f FILES [FILES ...], --files FILES [FILES ...]
list with BAM/CRAM files or nanopolish (processed with calculate_methylation_frequency.py) files
-t TABLE, --table TABLE
methfreqtable or overviewtable input
-w WINDOW, --window WINDOW
region to visualise, format: chr:start-end (example: chr20:58839718-58911192)
-n [NAMES ...], --names [NAMES ...]
list with sample names
--gff GFF add annotation track based on GFF3 file
--output OUTPUT TSV file to write the frequencies to.
--groups [GROUPS ...]
list of experimental group for each sample
-s, --simplify simplify annotation track to show genes rather than transcripts
--fasta FASTA fasta reference file, required when input is BAM/CRAM files or overviewtable with BAM/CRAM files
--mod {m,h} modified base of interest when BAM/CRAM files as input. Options are: m, h, default = m
--hapl display modification frequencies in input BAM/CRAM file for each haplotype (alternating haplotypes in methylmap)
--dendro perform hierarchical clustering on the samples/haplotypes and visualize with dendrogram on sorted heatmap as output
--threads THREADS number of threads to use when processing BAM/CRAM files
--quiet suppress modkit output
--debug Run the app in debug mode
--host HOST Host IP used to serve the application
--port PORT Port used to serve the application
-v, --version print version and exit
```
### MORE INFORMATION
The app.py script is the main script for the methylmap web application. The methylmap.py script is the main script for the methylmap command line tool.
More information: <https://www.biorxiv.org/content/10.1101/2022.11.28.518239v1>
Raw data
{
"_id": null,
"home_page": "https://github.com/EliseCoopman/methylmap",
"name": "methylmap",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": "methylation plot",
"author": "Elise Coopman",
"author_email": "elisecoopman@yahoo.com",
"download_url": "https://files.pythonhosted.org/packages/02/3f/28645ee8dcf98595d9e543be0a7eda18e2d4bb4f880810438692544ba121/methylmap-0.5.7.tar.gz",
"platform": null,
"description": "# METHYLMAP\n\n## EXAMPLE\n\n![GNAS methylmap](assets/1000Genomes_GNAS.png) \n\nMethylmap is a tool for visualization of modified nucleotide frequencies for large cohort sizes, and allows for quick and easy consulting of nucleotide methylation frequencies of individuals in the 1000Genomes ONT project.\n\nYou can visualize your own data through the methylmap command line tool (available through bioconda and pypi) and on the methylmap web application <https://methylmap.bioinf.be>.\nConsulting the methylation frequencies of the 1000Genomes ONT project can be done through the methylmap web application.\n\nInstallation of the methylmap command line tool:\n\n```bash\nconda install -c bioconda methylmap\n```\n\nOr:\n\n```bash\npip install methylmap\n```\n\nIf this application is useful for your research, please cite:\n[our preprint](https://www.biorxiv.org/content/10.1101/2022.11.28.518239v1) and [the underlying 1000 Genomes Project ONT dataset](https://www.medrxiv.org/content/10.1101/2024.03.05.24303792v1).\n\n### METHYLMAP WEB APPLICATION\n\n#### INPUT POSSIBILITIES\n\nThe methylmap web application supports the visualization of own modification frequencies data by uploading a tab separated .tsv file. The file should contain the following columns: \"chrom\", \"position\", \"sample_1\", \"sample_2\", ... \"sample_n\". Example:\n\n```text\nchrom position sample_1 sample_2 sample_3 sample_4\nchr1 100000.0 0.000 0.167 0.000 0.077\nchr1 100000.5 0.000 0.000 0.100 0.000\nchr1 100001.0 0.000 0.000 0.000 0.222\nchr1 100002.0 0.000 0.000 0.000 0.000\nchr1 100003.0 0.000 0.000 0.000 0.000\n```\n\n#### GENERATING A METHYLATION FREQUENCY TABLE WITH THE MULTIPARSETABLE.PY SCRIPT\n\nThe required input table can be generated using the multiparsetable.py script, that supports the following input possibilities:\n\n- BAM/CRAM files with MM and ML tags.\n\n- files from nanopolish (as processed by calculate_methylation_frequency.py). The methylation calls can additionally be phased using the available scripts in the \"scripts\" folder.\n\n=> The multiparsetable.py script can be found in the \"scripts\" folder. Example:\n\n```bash\npython multiparsetable.py --files cramfileA.cram cramfileB.cram --fasta reference.fa --output methfreqtable.tsv --window chr20:58839718-58911192\npython multiparsetable.py --files nanopolishfileA.tsv nanopolishfileB.tsv --output methfreqtable.tsv --window chr20:58839718-58911192 \n```\n\n#### ANNOTATION FILES\n\n- currently available annotation files on the methylmap website are:\n\n - <https://www.gencodegenes.org/human/release_46.html>: Release 46 (GRCh38.p14) - comprehensive gene annotation\n\n - <https://www.gencodegenes.org/mouse/>: Release M36 (GRCm39) - comprehensive gene annotation\n\n If you would like to use another annotation file, please upload your request through the Github Issues page.\n\n### METHYLMAP COMMAND LINE TOOL\n\n#### INPUT POSSIBILITIES\n\n- BAM/CRAM files with MM and ML tags. Use --files input option and --fasta for the reference genome.\n- files from nanopolish (as processed by calculate_methylation_frequency.py). The methylation calls can additionally be phased using the available scripts in the \"scripts\" folder. Use --files input option.\n- a tab separated table with nucleotide modification frequencies over all positions (methfreqtable), required header names are \"chrom\" (column with chromosome information) and \"position\" (columns with position information). Use --table input option. Example:\n\n```text\nchrom position sample_1 sample_2 sample_3 sample_4\nchr1 100000.0 0.000 0.167 0.000 0.077\nchr1 100000.5 0.000 0.000 0.100 0.000\nchr1 100001.0 0.000 0.000 0.000 0.222\nchr1 100002.0 0.000 0.000 0.000 0.000\nchr1 100003.0 0.000 0.000 0.000 0.000\n```\n\n- a tab separated file with an overview table containing all nanopolish or BAM/CRAM files and their sample name and experimental group (header requires \"path\", \"name\" and \"group\"). Use --table input option. When using BAM/CRAM files, please provide the reference genome with the --fasta argument. Example:\n\n```text\npath name group\n/home/path_to_file/bamfile_sample_1.bam samplename_1 case\n/home/path_to_file/bamfile_sample_2.bam samplename_2 control\n/home/path_to_file/bamfile_sample_3.bam samplename_3 control\n/home/path_to_file/bamfile_sample_4.bam samplename_4 case\n```\n\n#### EXAMPLE COMMAND LINE TOOL USAGE\n\nExample command line tool usage:\n\n```bash\nmethylmap --files cramfileA.cram cramfileB.cram --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192\nmethylmap --files nanopolishfileA.tsv nanopolishfileB.tsv --gff annotation.gff3.gz --window chr20:58839718-58911192 \nmethylmap --table methfreqtable.tsv --gff annotation.gff3.gz --window chr20:58839718-58911192\nmethylmap --table overviewtable.tsv --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192 (--fasta argument required when files in overviewtable are BAM/CRAM files)\nmethylmap --files cramfileA.cram cramfileB.cram --fasta reference.fa --gff annotation.gff3.gz --window chr20:58839718-58911192 --names sampleA sampleB sampleC sampleD --groups case control case control\n```\n\n#### IMPORTANT INFORMATION\n\nImportant: Adding a GFF3 file is required, use the --gff argument.\n\n- File should be bgzipped\n- File should be sorted (use: zcat annotation.gff3.gz | sort -k1,1V -k4,4n | bgzip > annotation_sorted.gff3.gz)\n- File should be indexed (use: tabix -p gff annotation_sorted.gff3.gz)\n\nImportant: When using BAM/CRAM files as input, a reference genome with the --fasta argument is required.\n\n- File should not be zipped/bgzipped.\n- Indexed file should be present in the same directory as the fasta file.\n\nImportant: When perfroming hierarchical clustering, missing values are imputed using the pandas interpolate method. Genomic positions with missing values after imputation are removed from the visualization.\n\n```text\nusage: methylmap [-h] [-f FILES [FILES ...] | -t TABLE] [-w WINDOW] [-n [NAMES ...]] --gff GFF [--output OUTPUT] [--groups [GROUPS ...]] [-s] [--fasta FASTA]\n [--mod {m,h}] [--hapl] [--dendro] [--threads THREADS] [--quiet] [--debug] [--host HOST] [--port PORT] [-v]\n\nPlotting tool for population scale nucleotide modifications.\n\noptions:\n -h, --help show this help message and exit\n -f FILES [FILES ...], --files FILES [FILES ...]\n list with BAM/CRAM files or nanopolish (processed with calculate_methylation_frequency.py) files\n -t TABLE, --table TABLE\n methfreqtable or overviewtable input\n -w WINDOW, --window WINDOW\n region to visualise, format: chr:start-end (example: chr20:58839718-58911192)\n -n [NAMES ...], --names [NAMES ...]\n list with sample names\n --gff GFF add annotation track based on GFF3 file\n --output OUTPUT TSV file to write the frequencies to.\n --groups [GROUPS ...]\n list of experimental group for each sample\n -s, --simplify simplify annotation track to show genes rather than transcripts\n --fasta FASTA fasta reference file, required when input is BAM/CRAM files or overviewtable with BAM/CRAM files\n --mod {m,h} modified base of interest when BAM/CRAM files as input. Options are: m, h, default = m\n --hapl display modification frequencies in input BAM/CRAM file for each haplotype (alternating haplotypes in methylmap)\n --dendro perform hierarchical clustering on the samples/haplotypes and visualize with dendrogram on sorted heatmap as output\n --threads THREADS number of threads to use when processing BAM/CRAM files\n --quiet suppress modkit output\n --debug Run the app in debug mode\n --host HOST Host IP used to serve the application\n --port PORT Port used to serve the application\n -v, --version print version and exit\n```\n\n### MORE INFORMATION\n\nThe app.py script is the main script for the methylmap web application. The methylmap.py script is the main script for the methylmap command line tool.\n\nMore information: <https://www.biorxiv.org/content/10.1101/2022.11.28.518239v1>\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Plotting tool for population scale nucleotide modifications.",
"version": "0.5.7",
"project_urls": {
"Homepage": "https://github.com/EliseCoopman/methylmap"
},
"split_keywords": [
"methylation",
"plot"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "023f28645ee8dcf98595d9e543be0a7eda18e2d4bb4f880810438692544ba121",
"md5": "11024b81647e8d117583b4b8c6c68b6a",
"sha256": "e7c9395fce7c4653c3a8028b6a43e4e484f02a4f0e960d9fa7a01fd755598ba9"
},
"downloads": -1,
"filename": "methylmap-0.5.7.tar.gz",
"has_sig": false,
"md5_digest": "11024b81647e8d117583b4b8c6c68b6a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 25736,
"upload_time": "2025-01-05T13:31:19",
"upload_time_iso_8601": "2025-01-05T13:31:19.144785Z",
"url": "https://files.pythonhosted.org/packages/02/3f/28645ee8dcf98595d9e543be0a7eda18e2d4bb4f880810438692544ba121/methylmap-0.5.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 13:31:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "EliseCoopman",
"github_project": "methylmap",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"==",
"1.23.5"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.4.0"
]
]
},
{
"name": "pyranges",
"specs": [
[
">=",
"0.0.77"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "dash",
"specs": [
[
"==",
"2.13.0"
]
]
},
{
"name": "dash-bootstrap-components",
"specs": [
[
"==",
"1.6.0"
]
]
}
],
"lcname": "methylmap"
}