methylpy


Namemethylpy JSON
Version 1.4.7 PyPI version JSON
download
home_page
SummaryBisulfite sequencing data processing and differential methylation analysis
upload_time2023-05-20 00:52:42
maintainer
docs_urlNone
authorYupeng He
requires_python
licenseLICENSE.txt
keywords bioinformatics pipeline dna methylation bisulfite sequencing data nome-seq data differential methylation calling dmrs epigenetics functional genomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            methylpy
========

Welcome to the home page of methylpy, a pyhton-based analysis pipeline
for

-  (single-cell) (whole-genome) bisulfite sequencing data
-  (single-cell) NOMe-seq data
-  differential methylation analysis

methylpy is available at
`github <https://github.com/yupenghe/methylpy>`__ and
`PyPI <https://pypi.python.org/pypi/methylpy/>`__.

Note
====

-  Version 1.3 has major changes on options related to mapping. A new
   aligner, minimap2, is supported starting in this version. To
   accommodate this new features, ``--bowtie2`` option is replaced with
   ``--aligner``, which specifies the aligner to use. The parameters of
   ``--build-reference`` function are modified as well.
-  methylpy only considers cytosines that are in uppercase in the genome
   fasta file (i.e. not masked)
-  methylpy was initiated by and built on the work of `Mattew D.
   Schultz <https://github.com/schultzmattd>`__
-  beta version of
   `tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__
   is released!

What can methylpy do?
=====================

Processing bisulfite sequencing data and NOMe-seq data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-  fast and flexible pipeline for both single-end and paired-end data
-  all the way from raw reads (fastq) to methylation state and/or open
   chromatin readouts
-  also support getting readouts from alignment (BAM file)
-  including options for read trimming, quality filter and PCR duplicate
   removal
-  accept compressed input and generate compressed output
-  support post-bisulfite adaptor tagging (PBAT) data

Calling differentially methylated regions (DMRs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-  DMR calling at single cytosine level
-  support comparison across 2 or more samples/groups
-  conservative and accurate
-  useful feature for dealing with low-coverage data by combining data
   of adjacent cytosines

What you want to do
===================

-  `Use methylpy without
   installation <#use-methylpy-without-installation>`__
-  `Install methylpy <#install-methylpy>`__
-  `Test methylpy <#test-methylpy>`__
-  `Process data <#process-data>`__
-  `Call DMRs <#call-dmrs>`__
-  `Additional functions for data
   processing <#additional-functions-for-data-processing>`__
-  `Cite methylpy <#cite-methylpy>`__

run ``methylpy -h`` to get a list of functions.

Use methylpy without installation
=================================

Methylpy can be used within docker container with all dependencies
resolved. The docker image for methylpy can be built from the
``Dockerfile`` under ``methylpy/`` directory using the below command. It
will take ~3g space.

::

    git clone https://github.com/yupenghe/methylpy.git
    cd methylpy/
    docker build -t methylpy:latest ./

Then, you can start a docker container by running

::

    docker run -it methylpy:latest

methylpy can be run with full functionality within the container. You
can mount your working directory to the container by adding ``-v``
option to the docker command and store methylpy output there.

::

    docker run -it -v /YOUR/WORKING/PATH/:/output methylpy:latest

See `here <https://docs.docker.com/storage/volumes/>`__ for details.

Install methylpy
================

Step 1 - Download methylpy
^^^^^^^^^^^^^^^^^^^^^^^^^^

The easiest way of installing methylpy will be through PyPI by running
``pip install methylpy``. The command ``pip install --upgrade methylpy``
updates methylpy to latest version.

Methylpy can also be installed through
`anaconda <https://www.anaconda.com/download/>`__ or [miniconda]
(https://docs.conda.io/en/latest/miniconda.html).

::

    conda env create --name methylpy_env
    conda activate methylpy_env
    conda install -y -c bioconda -c conda-forge methylpy              

Alternatively, methylpy can be installed through github: enter the
directory where you would like to install methylpy and run

::

    git clone https://github.com/yupenghe/methylpy.git
    cd methylpy/
    python setup.py install

If you would like to install methylpy in path of your choice, run
``python setup.py install --prefix=/USER/PATH/``. Then, try ``methylpy``
and if no error pops out, the setup is likely successful. See `Test
methylpy <#test-methylpy>`__ for more rigorious test. Last, processing
large dataset will require large spare space for temporary files.
Usually, the default directory for temporary files will not meet the
need. You may want to set the ``TMPDIR`` environmental variable to the
(absolute) path of a directory on hard drive with sufficient space (e.g.
``/YOUR/TMP/DIR/``). This can be done by adding the below command to
``~/.bashrc file``: ``export TMPDIR=/YOUR/TMP/DIR/`` and run
``source ~/.bashrc``.

Step 2 - Install dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

python is required for running methylpy. Both python2 (>=2.7.9) and
python3 (>=3.6.2) will work. methylpy also depends on two python
modules, `numpy <http://www.numpy.org/>`__ and
`scipy <https://www.scipy.org/>`__. The easiest way to get these
dependencies is to install
`anaconda <https://www.anaconda.com/download/>`__.

In addition, some features of methylpy depend on several publicly
available tools (not all of them are required if you only use a subset
of methylpy functions). \*
`cutadapt <http://cutadapt.readthedocs.io/en/stable/installation.html>`__
(>=1.9) for raw read trimming \*
`bowtie <http://bowtie-bio.sourceforge.net/index.shtml>`__ and/or
`bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`__ for
alignment \* `samtools <https://github.com/samtools/samtools>`__ (>=1.3)
for alignment result manipulation. Samtools can also be installed using
conda ``conda install -c bioconda samtools`` \*
`Picard <https://broadinstitute.github.io/picard/index.html>`__
(>=2.10.8) for PCR duplicate removal \* java for running Picard (its
path needs to be included in ``PATH`` environment variable) . \*
`wigToBigWig <http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig>`__
for converting methylpy output to bigwig format

Lastly, if paths to cutadapt, bowtie/bowtie2, samtools and wigToBigWig
are included in ``PATH`` variable, methylpy can run these tools
directly. Otherwise, the paths have to be passed to methylpy as
augments. Path to Picard needs to be passed to methylpy as a parameter
to run PCR duplicate removal.

Optional step - Compile rms.cpp
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

DMR finding requires an executable
``methylpy/methylpy/run_rms_tests.out``, which was compiled from C++
code ``methylpy/methylpy/rms.cpp``. In most cases, the precompiled file
can be used directly. To test this, simply run execute
``methylpy/methylpy/run_rms_tests.out``. If help page shows, recompiling
is not required. If error turns up, the executable needs to be
regenerated by compiling ``rms.cpp`` and this step requires
`GSL <https://www.gnu.org/software/gsl/>`__ installed correctly. In most
linux operating system, the below commands will do the job

::

    cd methylpy/methylpy/
    g++ -O3 -l gsl -l gslcblas -o run_rms_tests.out rms.cpp

In Ubuntu (>=16.04), please try the below commands first.

::

    cd methylpy/methylpy/
    g++ -o run_rms_tests.out rms.cpp `gsl-config --cflags --libs`

Lastly, the compiled file ``run_rms_tests.out`` needs to be copied to
the directory where methylpy is installed. You can get the directory by
running the blow commands in python console (``python`` to open a python
console):

::

    import methylpy
    print(methylpy.__file__[:methylpy.__file__.rfind("/")]+"/")

Test methylpy
=============

To test whether methylpy and the dependencies are installed and set up
correctly, run

::

    wget http://neomorph.salk.edu/yupeng/share/methylpy_test.tar.gz
    tar -xf methylpy_test.tar.gz
    cd methylpy_test/
    python run_test.py

The test should take around 3 minutes, and progress will be printed on
screen. After the test is started, two files ``test_output_msg.txt`` and
``test_error_msg.txt`` will be generated. The former contains more
details about each test and the later stores error message (if any) as
well as additional information.

If test fails, please check ``test_error_msg.txt`` for the error
message. If you decide to submit an issue regarding test failure to
methylpy github page, please include the error message in this file.

Process data
============

Please see
`tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__.
for more details.

Step 1 - Build converted genome reference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Build bowtie/bowtie2 index for converted genome. Run
``methylpy build-reference -h`` to get more information. An example of
building mm10 mouse reference index:

::

    methylpy build-reference \
        --input-files mm10_bt2/mm10.fa \
        --output-prefix mm10_bt2/mm10 \
        --bowtie2 True

Step 2 - Process bisulfite sequencing and NOMe-seq data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Function ``single-end-pipeline`` is For processing single-end data. Run
``methylpy single-end-pipeline -h`` to get help information. Below code
is an example of using methylpy to process single-end bisulfite
sequencing data. For processing NOMe-seq data, please use
``num_upstr_bases=1`` to include one base upstream cytosine as part of
cytosine sequence context, which can be used to tease out GC sites.

::

    methylpy single-end-pipeline \
        --read-files raw/mESC_R1.fastq.gz \
        --sample mESC \
        --forward-ref mm10_bt2/mm10_f \
        --reverse-ref mm10_bt2/mm10_r \
        --ref-fasta mm10_bt2/mm10.fa \
        --num-procs 8 \
        --remove-clonal True \
        --path-to-picard="picard/"

An command example for processing paired-end data. Run
``methylpy paired-end-pipeline -h`` to get more information.

::

    methylpy paired-end-pipeline \
        --read1-files raw/mESC_R1.fastq.gz \
        --read2-files raw/mESC_R2.fastq.gz \
        --sample mESC \
        --forward-ref mm10_bt2/mm10_f \
        --reverse-ref mm10_bt2/mm10_r \
        --ref-fasta mm10_bt2/mm10.fa \
        --num-procs 8 \
        --remove-clonal True \
        --path-to-picard="picard/"

If you would like methylpy to perform binomial test for teasing out
sites that show methylation above noise level (which is mainly due to
sodium bisulfite non-conversion), please check options ``--binom-test``
and ``--unmethylated-control``.

Output format
^^^^^^^^^^^^^

Output file(s) are (compressed) tab-separated text file(s) in allc
format. "allc" stands for all cytosine (C). Each row in an allc file
corresponds to one cytosine in the genome. An allc file contain 7
mandatory columns and no header. Two additional columns may be added
with ``--add-snp-info`` option when using ``single-end-pipeline``,
``paired-end-pipeline`` or ``call-methylation-state`` methods.

+---------+----------+----------+--------+
| index   | column   | example  | note   |
|         | name     |          |        |
+=========+==========+==========+========+
| 1       | chromoso | 12       | with   |
|         | me       |          | no     |
|         |          |          | "chr"  |
+---------+----------+----------+--------+
| 2       | position | 18283342 | 1-base |
|         |          |          | d      |
+---------+----------+----------+--------+
| 3       | strand   | +        | either |
|         |          |          | + or - |
+---------+----------+----------+--------+
| 4       | sequence | CGT      | can be |
|         | context  |          | more   |
|         |          |          | than 3 |
|         |          |          | bases  |
+---------+----------+----------+--------+
| 5       | mc       | 18       | count  |
|         |          |          | of     |
|         |          |          | reads  |
|         |          |          | suppor |
|         |          |          | ting   |
|         |          |          | methyl |
|         |          |          | ation  |
+---------+----------+----------+--------+
| 6       | cov      | 21       | read   |
|         |          |          | covera |
|         |          |          | ge     |
+---------+----------+----------+--------+
| 7       | methylat | 1        | indica |
|         | ed       |          | tor    |
|         |          |          | of     |
|         |          |          | signif |
|         |          |          | icant  |
|         |          |          | methyl |
|         |          |          | ation  |
|         |          |          | (1 if  |
|         |          |          | no     |
|         |          |          | test   |
|         |          |          | is     |
|         |          |          | perfor |
|         |          |          | med)   |
+---------+----------+----------+--------+
| 8       | (optiona | 3,2,3    | number |
|         | l)       |          | of     |
|         | num\_mat |          | match  |
|         | ches     |          | baseca |
|         |          |          | lls    |
|         |          |          | at     |
|         |          |          | contex |
|         |          |          | t      |
|         |          |          | nucleo |
|         |          |          | tides  |
+---------+----------+----------+--------+
| 9       | (optiona | 0,1,0    | number |
|         | l)       |          | of     |
|         | num\_mis |          | mismat |
|         | matches  |          | ches   |
|         |          |          | at     |
|         |          |          | contex |
|         |          |          | t      |
|         |          |          | nucleo |
|         |          |          | tides  |
+---------+----------+----------+--------+

Call DMRs
=========

This function will take a list of compressed/uncompressed allc files
(output files from methylpy pipeline) as input and look for DMRs. Help
information of this function is available via running
``methylpy DMRfind -h``.

Below is the code of an example of calling DMRs for CG methylation
between two samples, ``AD_HT`` and ``AD_IT`` on chromosome 1 through 5
using 8 processors.

::

    methylpy DMRfind \
        --allc-files allc/allc_AD_HT.tsv.gz allc/allc_AD_IT.tsv.gz \
        --samples AD_HT AD_IT \
        --mc-type "CGN" \
        --chroms 1 2 3 4 5 \
        --num-procs 8 \
        --output-prefix DMR_HT_IT

Please see
`tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__
for details.

Additional functions for data processing
========================================

Extract cytosine methylation state from BAM file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``call-methylation-state`` function allows users to get cytosine
methylation state (allc file) from alignment file (BAM file). It is part
of the data processing pipeline which is especially useful for getting
the allc file from alignment file from other methylation data pipelines
like bismark. Run ``methylpy call-methylation-state -h`` to get help
information. Below is an example of running this function. Please make
sure to remove ``--paired-end True`` or use ``--paired-end False`` for
BAM file from single-end data.

::

    methylpy call-methylation-state \
        --input-file mESC_processed_reads_no_clonal.bam \
        --paired-end True \
        --sample mESC \
        --ref-fasta mm10_bt2/mm10.fa \
        --num-procs 8

Get methylation level for genomic regions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Calculating methylation level of certain genomic regions can give an
estimate of the methylation abundance of these loci. This can be
achieved using the ``add-methylation-level`` function. See
``methylpy add-methylation-level -h`` for more details about the input
format and available options.

::

    methylpy add-methylation-level \
        --input-tsv-file DMR_AD_IT.tsv \
        --output-file DMR_AD_IT_with_level.tsv \
        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \
            allc/allc_AD_IT_1.tsv.gz allc/allc_AD_IT_2.tsv.gz \
        --samples AD_HT_1 AD_HT_2 AD_IT_1 AD_IT_2 \
        --mc-type CGN \
        --num-procs 4

Merge allc files
^^^^^^^^^^^^^^^^

The ``merge-allc`` function can merge multiple allc files into a single
allc file. It is useful when separate allc files are generated for
replicates of a tissue or cell type, and one wants to get a single allc
file for that tissue/cell type. See ``methylpy merge-allc -h`` for more
information.

::

    methylpy merge-allc \
        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \
        --output-file allc/allc_AD_HT.tsv.gz \
        --num-procs 1 \
        --compress-output True

Filter allc files
^^^^^^^^^^^^^^^^^

The ``filter-allc`` function is for filtering sites by cytosine context,
coverage etc. See ``methylpy filter-allc -h`` for more information.

::

    methylpy filter-allc \
        --allc-file allc/allc_AD_HT_1.tsv.gz \
        --output-file allc/allCG_AD_HT_1.tsv.gz \
        --mc-type CGN \
        --min-cov 2 \
        --compress-output True

Index allc files
^^^^^^^^^^^^^^^^

The ``index-allc`` function allows creating index file for each allc
file. The index file can be used for speeding up allc file reading
similar to the .fai file for .fasta file. See ``methylpy index-allc -h``
for more information.

::

    methylpy index-allc \
        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \
        --num-procs 2 \
        --no-reindex False

Convert allc file to bigwig format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``allc-to-bigwig`` function generates bigwig file from allc file.
Methylation level will be calculated in equally divided non-overlapping
genomic bins and the output will be stored in a bigwig file. See
``methylpy allc-to-bigwig -h`` for more information.

::

    methylpy allc-to-bigwig \
        --allc-file results/allc_mESC.tsv.gz \
        --output-file results/allc_mESC.bw \
        --ref-fasta mm10_bt2/mm10.fa \
        --mc-type CGN \
        --bin-size 100  

Quality filter for bisulfite sequencing reads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sometimes, we want to filter out reads that cannot be mapped confidently
or are likely from under-converted DNA fragments. This can be done using
the ``bam-quality-filter`` function. See
``methylpy bam-quality-filter -h`` for parameter inforamtion.

For example, below command can be used to filter out reads with less
than 30 MAPQ score (poor alignment) and with mCH level greater than 0.7
(under-conversion) if the reads contain enough (at least 3) CH sites.

::

    methylpy bam-quality-filter \
        --input-file mESC_processed_reads_no_clonal.bam \
        --output-file mESC_processed_reads_no_clonal.filtered.bam \
        --ref-fasta mm10_bt2/mm10.fa \
        --min-mapq 30 \
        --min-num-ch 3 \
        --max-mch-level 0.7 \
        --buffer-line-number 100

Reidentify DMRs from existing result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

methylpy is able to reidentify-DMR based on the result of previous
DMRfind run. This function is especially useful in picking out DMRs
across a subset of categories and/or with different filters. See
``methylpy reidentify-DMR -h`` for details about the options.

::

    methylpy reidentify-DMR \
        --input-rms-file results/DMR_P0_FBvsHT_rms_results.tsv.gz \
        --output-file results/DMR_P0_FBvsHT_rms_results_recollapsed.tsv \
        --collapse-samples P0_FB_1 P0_FB_2 P0_HT_1 P0_HT_2 \
        --sample-category P0_FB P0_FB P0_HT P0_HT \
        --min-cluster 2

Cite methylpy
=============

If you use methylpy, please cite >Matthew D. Schultz, Yupeng He, John
W.Whitaker, Manoj Hariharan, Eran A. Mukamel, Danny Leung, Nisha
Rajagopal, Joseph R. Nery, Mark A. Urich, Huaming Chen, Shin Lin, Yiing
Lin, Bing Ren, Terrence J. Sejnowski, Wei Wang, Joseph R. Ecker. Human
Body Epigenome Maps Reveal Noncanonical DNA Methylation Variation.
Nature. 523(7559):212-216, 2015 Jul.
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "methylpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Bioinformatics pipeline,DNA methylation,Bisulfite sequencing data,Nome-seq data,Differential methylation,Calling DMRs,Epigenetics,Functional genomics",
    "author": "Yupeng He",
    "author_email": "yupeng.he.bioinfo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/22/1f/fbcee6e88e48eec5878c042aac0ab9fa10f39e8a8eb1467606e68260a86d/methylpy-1.4.7.tar.gz",
    "platform": null,
    "description": "methylpy\n========\n\nWelcome to the home page of methylpy, a pyhton-based analysis pipeline\nfor\n\n-  (single-cell) (whole-genome) bisulfite sequencing data\n-  (single-cell) NOMe-seq data\n-  differential methylation analysis\n\nmethylpy is available at\n`github <https://github.com/yupenghe/methylpy>`__ and\n`PyPI <https://pypi.python.org/pypi/methylpy/>`__.\n\nNote\n====\n\n-  Version 1.3 has major changes on options related to mapping. A new\n   aligner, minimap2, is supported starting in this version. To\n   accommodate this new features, ``--bowtie2`` option is replaced with\n   ``--aligner``, which specifies the aligner to use. The parameters of\n   ``--build-reference`` function are modified as well.\n-  methylpy only considers cytosines that are in uppercase in the genome\n   fasta file (i.e. not masked)\n-  methylpy was initiated by and built on the work of `Mattew D.\n   Schultz <https://github.com/schultzmattd>`__\n-  beta version of\n   `tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__\n   is released!\n\nWhat can methylpy do?\n=====================\n\nProcessing bisulfite sequencing data and NOMe-seq data\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n-  fast and flexible pipeline for both single-end and paired-end data\n-  all the way from raw reads (fastq) to methylation state and/or open\n   chromatin readouts\n-  also support getting readouts from alignment (BAM file)\n-  including options for read trimming, quality filter and PCR duplicate\n   removal\n-  accept compressed input and generate compressed output\n-  support post-bisulfite adaptor tagging (PBAT) data\n\nCalling differentially methylated regions (DMRs)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n-  DMR calling at single cytosine level\n-  support comparison across 2 or more samples/groups\n-  conservative and accurate\n-  useful feature for dealing with low-coverage data by combining data\n   of adjacent cytosines\n\nWhat you want to do\n===================\n\n-  `Use methylpy without\n   installation <#use-methylpy-without-installation>`__\n-  `Install methylpy <#install-methylpy>`__\n-  `Test methylpy <#test-methylpy>`__\n-  `Process data <#process-data>`__\n-  `Call DMRs <#call-dmrs>`__\n-  `Additional functions for data\n   processing <#additional-functions-for-data-processing>`__\n-  `Cite methylpy <#cite-methylpy>`__\n\nrun ``methylpy -h`` to get a list of functions.\n\nUse methylpy without installation\n=================================\n\nMethylpy can be used within docker container with all dependencies\nresolved. The docker image for methylpy can be built from the\n``Dockerfile`` under ``methylpy/`` directory using the below command. It\nwill take ~3g space.\n\n::\n\n    git clone https://github.com/yupenghe/methylpy.git\n    cd methylpy/\n    docker build -t methylpy:latest ./\n\nThen, you can start a docker container by running\n\n::\n\n    docker run -it methylpy:latest\n\nmethylpy can be run with full functionality within the container. You\ncan mount your working directory to the container by adding ``-v``\noption to the docker command and store methylpy output there.\n\n::\n\n    docker run -it -v /YOUR/WORKING/PATH/:/output methylpy:latest\n\nSee `here <https://docs.docker.com/storage/volumes/>`__ for details.\n\nInstall methylpy\n================\n\nStep 1 - Download methylpy\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe easiest way of installing methylpy will be through PyPI by running\n``pip install methylpy``. The command ``pip install --upgrade methylpy``\nupdates methylpy to latest version.\n\nMethylpy can also be installed through\n`anaconda <https://www.anaconda.com/download/>`__ or [miniconda]\n(https://docs.conda.io/en/latest/miniconda.html).\n\n::\n\n    conda env create --name methylpy_env\n    conda activate methylpy_env\n    conda install -y -c bioconda -c conda-forge methylpy              \n\nAlternatively, methylpy can be installed through github: enter the\ndirectory where you would like to install methylpy and run\n\n::\n\n    git clone https://github.com/yupenghe/methylpy.git\n    cd methylpy/\n    python setup.py install\n\nIf you would like to install methylpy in path of your choice, run\n``python setup.py install --prefix=/USER/PATH/``. Then, try ``methylpy``\nand if no error pops out, the setup is likely successful. See `Test\nmethylpy <#test-methylpy>`__ for more rigorious test. Last, processing\nlarge dataset will require large spare space for temporary files.\nUsually, the default directory for temporary files will not meet the\nneed. You may want to set the ``TMPDIR`` environmental variable to the\n(absolute) path of a directory on hard drive with sufficient space (e.g.\n``/YOUR/TMP/DIR/``). This can be done by adding the below command to\n``~/.bashrc file``: ``export TMPDIR=/YOUR/TMP/DIR/`` and run\n``source ~/.bashrc``.\n\nStep 2 - Install dependencies\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\npython is required for running methylpy. Both python2 (>=2.7.9) and\npython3 (>=3.6.2) will work. methylpy also depends on two python\nmodules, `numpy <http://www.numpy.org/>`__ and\n`scipy <https://www.scipy.org/>`__. The easiest way to get these\ndependencies is to install\n`anaconda <https://www.anaconda.com/download/>`__.\n\nIn addition, some features of methylpy depend on several publicly\navailable tools (not all of them are required if you only use a subset\nof methylpy functions). \\*\n`cutadapt <http://cutadapt.readthedocs.io/en/stable/installation.html>`__\n(>=1.9) for raw read trimming \\*\n`bowtie <http://bowtie-bio.sourceforge.net/index.shtml>`__ and/or\n`bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`__ for\nalignment \\* `samtools <https://github.com/samtools/samtools>`__ (>=1.3)\nfor alignment result manipulation. Samtools can also be installed using\nconda ``conda install -c bioconda samtools`` \\*\n`Picard <https://broadinstitute.github.io/picard/index.html>`__\n(>=2.10.8) for PCR duplicate removal \\* java for running Picard (its\npath needs to be included in ``PATH`` environment variable) . \\*\n`wigToBigWig <http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig>`__\nfor converting methylpy output to bigwig format\n\nLastly, if paths to cutadapt, bowtie/bowtie2, samtools and wigToBigWig\nare included in ``PATH`` variable, methylpy can run these tools\ndirectly. Otherwise, the paths have to be passed to methylpy as\naugments. Path to Picard needs to be passed to methylpy as a parameter\nto run PCR duplicate removal.\n\nOptional step - Compile rms.cpp\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDMR finding requires an executable\n``methylpy/methylpy/run_rms_tests.out``, which was compiled from C++\ncode ``methylpy/methylpy/rms.cpp``. In most cases, the precompiled file\ncan be used directly. To test this, simply run execute\n``methylpy/methylpy/run_rms_tests.out``. If help page shows, recompiling\nis not required. If error turns up, the executable needs to be\nregenerated by compiling ``rms.cpp`` and this step requires\n`GSL <https://www.gnu.org/software/gsl/>`__ installed correctly. In most\nlinux operating system, the below commands will do the job\n\n::\n\n    cd methylpy/methylpy/\n    g++ -O3 -l gsl -l gslcblas -o run_rms_tests.out rms.cpp\n\nIn Ubuntu (>=16.04), please try the below commands first.\n\n::\n\n    cd methylpy/methylpy/\n    g++ -o run_rms_tests.out rms.cpp `gsl-config --cflags --libs`\n\nLastly, the compiled file ``run_rms_tests.out`` needs to be copied to\nthe directory where methylpy is installed. You can get the directory by\nrunning the blow commands in python console (``python`` to open a python\nconsole):\n\n::\n\n    import methylpy\n    print(methylpy.__file__[:methylpy.__file__.rfind(\"/\")]+\"/\")\n\nTest methylpy\n=============\n\nTo test whether methylpy and the dependencies are installed and set up\ncorrectly, run\n\n::\n\n    wget http://neomorph.salk.edu/yupeng/share/methylpy_test.tar.gz\n    tar -xf methylpy_test.tar.gz\n    cd methylpy_test/\n    python run_test.py\n\nThe test should take around 3 minutes, and progress will be printed on\nscreen. After the test is started, two files ``test_output_msg.txt`` and\n``test_error_msg.txt`` will be generated. The former contains more\ndetails about each test and the later stores error message (if any) as\nwell as additional information.\n\nIf test fails, please check ``test_error_msg.txt`` for the error\nmessage. If you decide to submit an issue regarding test failure to\nmethylpy github page, please include the error message in this file.\n\nProcess data\n============\n\nPlease see\n`tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__.\nfor more details.\n\nStep 1 - Build converted genome reference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBuild bowtie/bowtie2 index for converted genome. Run\n``methylpy build-reference -h`` to get more information. An example of\nbuilding mm10 mouse reference index:\n\n::\n\n    methylpy build-reference \\\n        --input-files mm10_bt2/mm10.fa \\\n        --output-prefix mm10_bt2/mm10 \\\n        --bowtie2 True\n\nStep 2 - Process bisulfite sequencing and NOMe-seq data\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFunction ``single-end-pipeline`` is For processing single-end data. Run\n``methylpy single-end-pipeline -h`` to get help information. Below code\nis an example of using methylpy to process single-end bisulfite\nsequencing data. For processing NOMe-seq data, please use\n``num_upstr_bases=1`` to include one base upstream cytosine as part of\ncytosine sequence context, which can be used to tease out GC sites.\n\n::\n\n    methylpy single-end-pipeline \\\n        --read-files raw/mESC_R1.fastq.gz \\\n        --sample mESC \\\n        --forward-ref mm10_bt2/mm10_f \\\n        --reverse-ref mm10_bt2/mm10_r \\\n        --ref-fasta mm10_bt2/mm10.fa \\\n        --num-procs 8 \\\n        --remove-clonal True \\\n        --path-to-picard=\"picard/\"\n\nAn command example for processing paired-end data. Run\n``methylpy paired-end-pipeline -h`` to get more information.\n\n::\n\n    methylpy paired-end-pipeline \\\n        --read1-files raw/mESC_R1.fastq.gz \\\n        --read2-files raw/mESC_R2.fastq.gz \\\n        --sample mESC \\\n        --forward-ref mm10_bt2/mm10_f \\\n        --reverse-ref mm10_bt2/mm10_r \\\n        --ref-fasta mm10_bt2/mm10.fa \\\n        --num-procs 8 \\\n        --remove-clonal True \\\n        --path-to-picard=\"picard/\"\n\nIf you would like methylpy to perform binomial test for teasing out\nsites that show methylation above noise level (which is mainly due to\nsodium bisulfite non-conversion), please check options ``--binom-test``\nand ``--unmethylated-control``.\n\nOutput format\n^^^^^^^^^^^^^\n\nOutput file(s) are (compressed) tab-separated text file(s) in allc\nformat. \"allc\" stands for all cytosine (C). Each row in an allc file\ncorresponds to one cytosine in the genome. An allc file contain 7\nmandatory columns and no header. Two additional columns may be added\nwith ``--add-snp-info`` option when using ``single-end-pipeline``,\n``paired-end-pipeline`` or ``call-methylation-state`` methods.\n\n+---------+----------+----------+--------+\n| index   | column   | example  | note   |\n|         | name     |          |        |\n+=========+==========+==========+========+\n| 1       | chromoso | 12       | with   |\n|         | me       |          | no     |\n|         |          |          | \"chr\"  |\n+---------+----------+----------+--------+\n| 2       | position | 18283342 | 1-base |\n|         |          |          | d      |\n+---------+----------+----------+--------+\n| 3       | strand   | +        | either |\n|         |          |          | + or - |\n+---------+----------+----------+--------+\n| 4       | sequence | CGT      | can be |\n|         | context  |          | more   |\n|         |          |          | than 3 |\n|         |          |          | bases  |\n+---------+----------+----------+--------+\n| 5       | mc       | 18       | count  |\n|         |          |          | of     |\n|         |          |          | reads  |\n|         |          |          | suppor |\n|         |          |          | ting   |\n|         |          |          | methyl |\n|         |          |          | ation  |\n+---------+----------+----------+--------+\n| 6       | cov      | 21       | read   |\n|         |          |          | covera |\n|         |          |          | ge     |\n+---------+----------+----------+--------+\n| 7       | methylat | 1        | indica |\n|         | ed       |          | tor    |\n|         |          |          | of     |\n|         |          |          | signif |\n|         |          |          | icant  |\n|         |          |          | methyl |\n|         |          |          | ation  |\n|         |          |          | (1 if  |\n|         |          |          | no     |\n|         |          |          | test   |\n|         |          |          | is     |\n|         |          |          | perfor |\n|         |          |          | med)   |\n+---------+----------+----------+--------+\n| 8       | (optiona | 3,2,3    | number |\n|         | l)       |          | of     |\n|         | num\\_mat |          | match  |\n|         | ches     |          | baseca |\n|         |          |          | lls    |\n|         |          |          | at     |\n|         |          |          | contex |\n|         |          |          | t      |\n|         |          |          | nucleo |\n|         |          |          | tides  |\n+---------+----------+----------+--------+\n| 9       | (optiona | 0,1,0    | number |\n|         | l)       |          | of     |\n|         | num\\_mis |          | mismat |\n|         | matches  |          | ches   |\n|         |          |          | at     |\n|         |          |          | contex |\n|         |          |          | t      |\n|         |          |          | nucleo |\n|         |          |          | tides  |\n+---------+----------+----------+--------+\n\nCall DMRs\n=========\n\nThis function will take a list of compressed/uncompressed allc files\n(output files from methylpy pipeline) as input and look for DMRs. Help\ninformation of this function is available via running\n``methylpy DMRfind -h``.\n\nBelow is the code of an example of calling DMRs for CG methylation\nbetween two samples, ``AD_HT`` and ``AD_IT`` on chromosome 1 through 5\nusing 8 processors.\n\n::\n\n    methylpy DMRfind \\\n        --allc-files allc/allc_AD_HT.tsv.gz allc/allc_AD_IT.tsv.gz \\\n        --samples AD_HT AD_IT \\\n        --mc-type \"CGN\" \\\n        --chroms 1 2 3 4 5 \\\n        --num-procs 8 \\\n        --output-prefix DMR_HT_IT\n\nPlease see\n`tutorial <https://github.com/yupenghe/methylpy/blob/methylpy/tutorial/tutorial.md>`__\nfor details.\n\nAdditional functions for data processing\n========================================\n\nExtract cytosine methylation state from BAM file\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``call-methylation-state`` function allows users to get cytosine\nmethylation state (allc file) from alignment file (BAM file). It is part\nof the data processing pipeline which is especially useful for getting\nthe allc file from alignment file from other methylation data pipelines\nlike bismark. Run ``methylpy call-methylation-state -h`` to get help\ninformation. Below is an example of running this function. Please make\nsure to remove ``--paired-end True`` or use ``--paired-end False`` for\nBAM file from single-end data.\n\n::\n\n    methylpy call-methylation-state \\\n        --input-file mESC_processed_reads_no_clonal.bam \\\n        --paired-end True \\\n        --sample mESC \\\n        --ref-fasta mm10_bt2/mm10.fa \\\n        --num-procs 8\n\nGet methylation level for genomic regions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCalculating methylation level of certain genomic regions can give an\nestimate of the methylation abundance of these loci. This can be\nachieved using the ``add-methylation-level`` function. See\n``methylpy add-methylation-level -h`` for more details about the input\nformat and available options.\n\n::\n\n    methylpy add-methylation-level \\\n        --input-tsv-file DMR_AD_IT.tsv \\\n        --output-file DMR_AD_IT_with_level.tsv \\\n        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \\\n            allc/allc_AD_IT_1.tsv.gz allc/allc_AD_IT_2.tsv.gz \\\n        --samples AD_HT_1 AD_HT_2 AD_IT_1 AD_IT_2 \\\n        --mc-type CGN \\\n        --num-procs 4\n\nMerge allc files\n^^^^^^^^^^^^^^^^\n\nThe ``merge-allc`` function can merge multiple allc files into a single\nallc file. It is useful when separate allc files are generated for\nreplicates of a tissue or cell type, and one wants to get a single allc\nfile for that tissue/cell type. See ``methylpy merge-allc -h`` for more\ninformation.\n\n::\n\n    methylpy merge-allc \\\n        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \\\n        --output-file allc/allc_AD_HT.tsv.gz \\\n        --num-procs 1 \\\n        --compress-output True\n\nFilter allc files\n^^^^^^^^^^^^^^^^^\n\nThe ``filter-allc`` function is for filtering sites by cytosine context,\ncoverage etc. See ``methylpy filter-allc -h`` for more information.\n\n::\n\n    methylpy filter-allc \\\n        --allc-file allc/allc_AD_HT_1.tsv.gz \\\n        --output-file allc/allCG_AD_HT_1.tsv.gz \\\n        --mc-type CGN \\\n        --min-cov 2 \\\n        --compress-output True\n\nIndex allc files\n^^^^^^^^^^^^^^^^\n\nThe ``index-allc`` function allows creating index file for each allc\nfile. The index file can be used for speeding up allc file reading\nsimilar to the .fai file for .fasta file. See ``methylpy index-allc -h``\nfor more information.\n\n::\n\n    methylpy index-allc \\\n        --allc-files allc/allc_AD_HT_1.tsv.gz allc/allc_AD_HT_2.tsv.gz \\\n        --num-procs 2 \\\n        --no-reindex False\n\nConvert allc file to bigwig format\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``allc-to-bigwig`` function generates bigwig file from allc file.\nMethylation level will be calculated in equally divided non-overlapping\ngenomic bins and the output will be stored in a bigwig file. See\n``methylpy allc-to-bigwig -h`` for more information.\n\n::\n\n    methylpy allc-to-bigwig \\\n        --allc-file results/allc_mESC.tsv.gz \\\n        --output-file results/allc_mESC.bw \\\n        --ref-fasta mm10_bt2/mm10.fa \\\n        --mc-type CGN \\\n        --bin-size 100  \n\nQuality filter for bisulfite sequencing reads\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSometimes, we want to filter out reads that cannot be mapped confidently\nor are likely from under-converted DNA fragments. This can be done using\nthe ``bam-quality-filter`` function. See\n``methylpy bam-quality-filter -h`` for parameter inforamtion.\n\nFor example, below command can be used to filter out reads with less\nthan 30 MAPQ score (poor alignment) and with mCH level greater than 0.7\n(under-conversion) if the reads contain enough (at least 3) CH sites.\n\n::\n\n    methylpy bam-quality-filter \\\n        --input-file mESC_processed_reads_no_clonal.bam \\\n        --output-file mESC_processed_reads_no_clonal.filtered.bam \\\n        --ref-fasta mm10_bt2/mm10.fa \\\n        --min-mapq 30 \\\n        --min-num-ch 3 \\\n        --max-mch-level 0.7 \\\n        --buffer-line-number 100\n\nReidentify DMRs from existing result\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nmethylpy is able to reidentify-DMR based on the result of previous\nDMRfind run. This function is especially useful in picking out DMRs\nacross a subset of categories and/or with different filters. See\n``methylpy reidentify-DMR -h`` for details about the options.\n\n::\n\n    methylpy reidentify-DMR \\\n        --input-rms-file results/DMR_P0_FBvsHT_rms_results.tsv.gz \\\n        --output-file results/DMR_P0_FBvsHT_rms_results_recollapsed.tsv \\\n        --collapse-samples P0_FB_1 P0_FB_2 P0_HT_1 P0_HT_2 \\\n        --sample-category P0_FB P0_FB P0_HT P0_HT \\\n        --min-cluster 2\n\nCite methylpy\n=============\n\nIf you use methylpy, please cite >Matthew D. Schultz, Yupeng He, John\nW.Whitaker, Manoj Hariharan, Eran A. Mukamel, Danny Leung, Nisha\nRajagopal, Joseph R. Nery, Mark A. Urich, Huaming Chen, Shin Lin, Yiing\nLin, Bing Ren, Terrence J. Sejnowski, Wei Wang, Joseph R. Ecker. Human\nBody Epigenome Maps Reveal Noncanonical DNA Methylation Variation.\nNature. 523(7559):212-216, 2015 Jul.",
    "bugtrack_url": null,
    "license": "LICENSE.txt",
    "summary": "Bisulfite sequencing data processing and differential methylation analysis",
    "version": "1.4.7",
    "project_urls": null,
    "split_keywords": [
        "bioinformatics pipeline",
        "dna methylation",
        "bisulfite sequencing data",
        "nome-seq data",
        "differential methylation",
        "calling dmrs",
        "epigenetics",
        "functional genomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "221ffbcee6e88e48eec5878c042aac0ab9fa10f39e8a8eb1467606e68260a86d",
                "md5": "4111b7db3e8a30d838a856faf7446ad4",
                "sha256": "bbd9ab01cd7b6bba9b96b94aed2e1deee5af405d743eef4f031eb8b846ae8156"
            },
            "downloads": -1,
            "filename": "methylpy-1.4.7.tar.gz",
            "has_sig": false,
            "md5_digest": "4111b7db3e8a30d838a856faf7446ad4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 111836,
            "upload_time": "2023-05-20T00:52:42",
            "upload_time_iso_8601": "2023-05-20T00:52:42.968472Z",
            "url": "https://files.pythonhosted.org/packages/22/1f/fbcee6e88e48eec5878c042aac0ab9fa10f39e8a8eb1467606e68260a86d/methylpy-1.4.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-20 00:52:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "methylpy"
}
        
Elapsed time: 0.13013s