YMP - a Flexible Omics Pipeline
===============================
|Install with Bioconda| |Github Unit Tests| |Read the Docs| |Codacy grade| |Codecov|
.. |Install with Bioconda| image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat
:target: http://bioconda.github.io/recipes/ymp/README.html
.. |Github Unit Tests| image:: https://github.com/epruesse/ymp/workflows/Unit%20Tests/badge.svg
:target: https://github.com/epruesse/ymp/actions?query=workflow%3A%22Unit+Tests%22
.. |CircleCI| image:: https://img.shields.io/circleci/project/github/epruesse/ymp.svg?label=CircleCI
:target: https://circleci.com/gh/epruesse/ymp
.. |Read the Docs| image:: https://img.shields.io/readthedocs/ymp/latest.svg
:target: https://ymp.readthedocs.io/en/latest
.. |Codacy grade| image:: https://img.shields.io/codacy/grade/07ec32ae80194ec8b9184e1f6b5e6649.svg
:target: https://app.codacy.com/app/elmar/ymp
.. |Codecov| image:: https://img.shields.io/codecov/c/github/epruesse/ymp.svg
:target: https://codecov.io/gh/epruesse/ymp
.. begin intro
YMP is a tool that makes it easy to process large amounts of NGS read
data. It comes "batteries included" with everything needed to
preprocess your reads (QC, trimming, contaminant removal), assemble
metagenomes, annotate assemblies, or assemble and quantify RNA-Seq
transcripts, offering a choice of tools for each of those procecssing
stages. When your needs exceed what the stock YMP processing stages
provide, you can easily add your own, using YMP to drive novel tools,
tools specific to your area of research, or tools you wrote yourself.
.. end intro
:Note:
Intrigued, but think YMP doesn't exactly fit your needs?
Missing processing stages for your favorite tool? Found a bug?
Open an issue, create a PR, or better yet, join the team!
The `YMP documentation <http://ymp.readthedocs.io/>`__ is available at
readthedocs.
.. begin features
Features:
---------
batteries included
YMP comes with a large number of *Stages* implementing common read
processing steps. These stages cover the most common topics,
including quality control, filtering and sorting of reads, assembly
of metagenomes and transcripts, read mapping, community profiling,
visualisation and pathway analysis.
For a complete list, check the `documentation
<http://ymp.readthedocs.io/en/latest/stages.html>`__ or the `source
<https://github.com/epruesse/ymp/tree/development/src/ymp/rules>`__.
get started quickly
Simply point YMP at a folder containing read files, at a mapping
file, a list of URLs or even an SRA RunTable and YMP will configure
itself. Use tab expansion to complete your desired series of stages
to be applied to your data. YMP will then proceed to do your
bidding, downloading raw read files and reference databases as
needed, installing requisite software environments and scheduling
the execution of tools either locally or on your cluster.
explore alternative workflows
Not sure which assembler works best for your data, or what the
effect of more stringent quality trimming would be? YMP is made for
this! By keeping the output of each stage in a folder named to match
the stack of applied stages, YMP can manage many variant workflows
in parallel, while minimizing the amount of duplicate computation
and storage.
go beyond the beaten path
Built on top of Bioconda_ and Snakemake_, YMP is easily extended with
your own Snakefiles, allowing you to integrate any type of
processing you desire into YMP, including your own, custom made
tools. Within the YMP framework, you can also make use of the
extensions to the Snakemake language provided by YMP (default
values, inheritance, recursive wildcard expansion, etc.), making
writing rules less error prone and repetative.
.. _Snakemake: https://snakemake.readthedocs.io
.. _Bioconda: https://bioconda.github.io
.. end features
.. begin background
Background
----------
Bioinformatical data processing workflows can easily get very complex,
even convoluted. On the way from the raw read data to publishable
results, a sizeable collection of tools needs to be applied,
intermediate outputs verified, reference databases selected, and
summary data produced. A host of data files must be managed, processed
individually or aggregated by host or spatial transect along the way.
And, of course, to arrive at a workflow that is just right for a
particular study, many alternative workflow variants need to be
evaluated. Which tools perform best? Which parameters are right? Does
re-ordering steps make a difference? Should the data be assembled
individually, grouped, or should a grand co-assembly be computed?
Which reference database is most appropriate?
Answering these questions is a time consuming process, justifying the
plethora of published ready made pipelines each providing a polished
workflow for a typical study type or use case. The price for the
convenience of such a polished pipeline is the lack of flexibility -
they are not meant to be adapted or extended to match the needs of a
particular study. Workflow management systems on the other hand offer
great flexibility by focussing on the orchestration of user defined
workflows, but typicially require significant initial effort as they
come without predefined workflows.
YMP strives to walk the middle ground between these. It brings
everything needed to classic metagenome and RNA-Seq workflows, yet
built on the workflow management system Snakemake_, it can be easily
expanded by simply adding Snakemake rules files. Designed around the
needs of processing primarily multi-omic NGS read data, it brings a
framework for handling read file meta data, provisioning reference
databases, and organizing rules into semantic stages.
.. _Snakemake: https://snakemake.readthedocs.io
.. end background
.. begin developer info
Working with the Github Development Version
-------------------------------------------
Installing from GitHub
~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Clone the repository::
git clone --recurse-submodules https://github.com/epruesse/ymp.git
Or, if your have github ssh keys set up::
git clone --recurse-submodules git@github.com:epruesse/ymp.git
2. Create and activate conda environment::
conda env create -n ymp --file environment.yaml
source activate ymp
3. Install YMP into conda environment::
pip install -e .
4. Verify that YMP works::
source activate ymp
ymp --help
Updating Development Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usually, all you need to do is a pull::
git pull
git submodule update --recursive --remote
If environments where updated, you may want to regenerate the local
installations and clean out environments no longer used to save disk
space::
source activate ymp
ymp env update
ymp env clean
# alternatively, you can just delete existing envs and let YMP
# reinstall as needed:
# rm -rf ~/.ymp/conda*
conda clean -a
If you see errors before jobs are executed, the core requirements may
have changed. To update the YMP conda environment, enter the folder
where you installed YMP and run the following::
source activate ymp
conda env update --file environment.yaml
If something changed in ``setup.py``, a re-install may be necessary::
source activate ymp
pip install -U -e .
.. end developer info
Raw data
{
"_id": null,
"home_page": "https://github.com/epruesse/ymp",
"name": "ymp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "bioinformatics pipeline workflow automation rnaseq genomics metagenomics conda bioconda snakemake",
"author": "Elmar Pruesse",
"author_email": "elmar@pruesse.net",
"download_url": "https://files.pythonhosted.org/packages/ec/89/63981046e0482f5ae272f60b66780f227bdeb3aa2aa7c627b691ddc9bf9c/ymp-0.3.1.tar.gz",
"platform": "linux",
"description": "YMP - a Flexible Omics Pipeline\n===============================\n\n|Install with Bioconda| |Github Unit Tests| |Read the Docs| |Codacy grade| |Codecov|\n\n.. |Install with Bioconda| image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat\n :target: http://bioconda.github.io/recipes/ymp/README.html\n.. |Github Unit Tests| image:: https://github.com/epruesse/ymp/workflows/Unit%20Tests/badge.svg\n :target: https://github.com/epruesse/ymp/actions?query=workflow%3A%22Unit+Tests%22\n.. |CircleCI| image:: https://img.shields.io/circleci/project/github/epruesse/ymp.svg?label=CircleCI\n :target: https://circleci.com/gh/epruesse/ymp\n.. |Read the Docs| image:: https://img.shields.io/readthedocs/ymp/latest.svg\n :target: https://ymp.readthedocs.io/en/latest\n.. |Codacy grade| image:: https://img.shields.io/codacy/grade/07ec32ae80194ec8b9184e1f6b5e6649.svg\n :target: https://app.codacy.com/app/elmar/ymp\n.. |Codecov| image:: https://img.shields.io/codecov/c/github/epruesse/ymp.svg\n :target: https://codecov.io/gh/epruesse/ymp\n\n.. begin intro\n\nYMP is a tool that makes it easy to process large amounts of NGS read\ndata. It comes \"batteries included\" with everything needed to\npreprocess your reads (QC, trimming, contaminant removal), assemble\nmetagenomes, annotate assemblies, or assemble and quantify RNA-Seq\ntranscripts, offering a choice of tools for each of those procecssing\nstages. When your needs exceed what the stock YMP processing stages\nprovide, you can easily add your own, using YMP to drive novel tools,\ntools specific to your area of research, or tools you wrote yourself.\n\n.. end intro\n\n:Note:\n Intrigued, but think YMP doesn't exactly fit your needs?\n\n Missing processing stages for your favorite tool? Found a bug?\n\n Open an issue, create a PR, or better yet, join the team!\n \nThe `YMP documentation <http://ymp.readthedocs.io/>`__ is available at\nreadthedocs.\n\n.. begin features\n\nFeatures:\n---------\n\nbatteries included\n YMP comes with a large number of *Stages* implementing common read\n processing steps. These stages cover the most common topics,\n including quality control, filtering and sorting of reads, assembly\n of metagenomes and transcripts, read mapping, community profiling,\n visualisation and pathway analysis.\n\n For a complete list, check the `documentation\n <http://ymp.readthedocs.io/en/latest/stages.html>`__ or the `source\n <https://github.com/epruesse/ymp/tree/development/src/ymp/rules>`__.\n\nget started quickly\n Simply point YMP at a folder containing read files, at a mapping\n file, a list of URLs or even an SRA RunTable and YMP will configure\n itself. Use tab expansion to complete your desired series of stages\n to be applied to your data. YMP will then proceed to do your\n bidding, downloading raw read files and reference databases as\n needed, installing requisite software environments and scheduling\n the execution of tools either locally or on your cluster.\n\nexplore alternative workflows\n Not sure which assembler works best for your data, or what the\n effect of more stringent quality trimming would be? YMP is made for\n this! By keeping the output of each stage in a folder named to match\n the stack of applied stages, YMP can manage many variant workflows\n in parallel, while minimizing the amount of duplicate computation\n and storage.\n\ngo beyond the beaten path\n Built on top of Bioconda_ and Snakemake_, YMP is easily extended with\n your own Snakefiles, allowing you to integrate any type of\n processing you desire into YMP, including your own, custom made\n tools. Within the YMP framework, you can also make use of the\n extensions to the Snakemake language provided by YMP (default\n values, inheritance, recursive wildcard expansion, etc.), making\n writing rules less error prone and repetative.\n\n.. _Snakemake: https://snakemake.readthedocs.io\n.. _Bioconda: https://bioconda.github.io\n \n.. end features\n\n.. begin background\n\nBackground\n----------\n\nBioinformatical data processing workflows can easily get very complex,\neven convoluted. On the way from the raw read data to publishable\nresults, a sizeable collection of tools needs to be applied,\nintermediate outputs verified, reference databases selected, and\nsummary data produced. A host of data files must be managed, processed\nindividually or aggregated by host or spatial transect along the way.\nAnd, of course, to arrive at a workflow that is just right for a\nparticular study, many alternative workflow variants need to be\nevaluated. Which tools perform best? Which parameters are right? Does\nre-ordering steps make a difference? Should the data be assembled\nindividually, grouped, or should a grand co-assembly be computed?\nWhich reference database is most appropriate?\n\nAnswering these questions is a time consuming process, justifying the\nplethora of published ready made pipelines each providing a polished\nworkflow for a typical study type or use case. The price for the\nconvenience of such a polished pipeline is the lack of flexibility -\nthey are not meant to be adapted or extended to match the needs of a\nparticular study. Workflow management systems on the other hand offer\ngreat flexibility by focussing on the orchestration of user defined\nworkflows, but typicially require significant initial effort as they\ncome without predefined workflows.\n\nYMP strives to walk the middle ground between these. It brings\neverything needed to classic metagenome and RNA-Seq workflows, yet\nbuilt on the workflow management system Snakemake_, it can be easily\nexpanded by simply adding Snakemake rules files. Designed around the\nneeds of processing primarily multi-omic NGS read data, it brings a\nframework for handling read file meta data, provisioning reference\ndatabases, and organizing rules into semantic stages.\n\n.. _Snakemake: https://snakemake.readthedocs.io\n\n.. end background\n\n.. begin developer info\n\nWorking with the Github Development Version\n-------------------------------------------\n\n\nInstalling from GitHub\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1. Clone the repository::\n\n git clone --recurse-submodules https://github.com/epruesse/ymp.git\n \n Or, if your have github ssh keys set up::\n\n git clone --recurse-submodules git@github.com:epruesse/ymp.git\n\n2. Create and activate conda environment::\n\n conda env create -n ymp --file environment.yaml\n source activate ymp\n\n3. Install YMP into conda environment::\n \n pip install -e .\n\n4. Verify that YMP works::\n\n source activate ymp\n ymp --help\n\n\nUpdating Development Version\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUsually, all you need to do is a pull::\n \n git pull\n git submodule update --recursive --remote\n\nIf environments where updated, you may want to regenerate the local\ninstallations and clean out environments no longer used to save disk\nspace::\n\n source activate ymp\n ymp env update\n ymp env clean\n # alternatively, you can just delete existing envs and let YMP\n # reinstall as needed:\n # rm -rf ~/.ymp/conda*\n conda clean -a\n\nIf you see errors before jobs are executed, the core requirements may\nhave changed. To update the YMP conda environment, enter the folder\nwhere you installed YMP and run the following::\n\n source activate ymp\n conda env update --file environment.yaml\n \nIf something changed in ``setup.py``, a re-install may be necessary::\n\n source activate ymp\n pip install -U -e .\n\n.. end developer info\n",
"bugtrack_url": null,
"license": "GPL-3",
"summary": "Flexible multi-omic pipeline system",
"version": "0.3.1",
"project_urls": {
"Documentation": "https://ymp.readthedocs.io",
"Homepage": "https://github.com/epruesse/ymp",
"Source": "https://github.com/epruesse/ymp"
},
"split_keywords": [
"bioinformatics",
"pipeline",
"workflow",
"automation",
"rnaseq",
"genomics",
"metagenomics",
"conda",
"bioconda",
"snakemake"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "950b7fb0146c32365337709d1d8df942a7f2049f46ccebaf65f08a4bd801c396",
"md5": "20a91c12e22d2d9f52671e6ca6e8a2d1",
"sha256": "74a30da95ec47346462c526517caf3fe329054222d1cc8c72d61ff090d85e743"
},
"downloads": -1,
"filename": "ymp-0.3.1-py36-none-any.whl",
"has_sig": false,
"md5_digest": "20a91c12e22d2d9f52671e6ca6e8a2d1",
"packagetype": "bdist_wheel",
"python_version": "py36",
"requires_python": ">=3.10",
"size": 1631367,
"upload_time": "2024-07-19T20:47:06",
"upload_time_iso_8601": "2024-07-19T20:47:06.177004Z",
"url": "https://files.pythonhosted.org/packages/95/0b/7fb0146c32365337709d1d8df942a7f2049f46ccebaf65f08a4bd801c396/ymp-0.3.1-py36-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ec8963981046e0482f5ae272f60b66780f227bdeb3aa2aa7c627b691ddc9bf9c",
"md5": "bd339d604dbf3f594dab4524638a8297",
"sha256": "dbd88b552584f1268c8b7b68df18a94a08eca127561bf60162b8e894711814b4"
},
"downloads": -1,
"filename": "ymp-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "bd339d604dbf3f594dab4524638a8297",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1526096,
"upload_time": "2024-07-19T20:47:08",
"upload_time_iso_8601": "2024-07-19T20:47:08.483350Z",
"url": "https://files.pythonhosted.org/packages/ec/89/63981046e0482f5ae272f60b66780f227bdeb3aa2aa7c627b691ddc9bf9c/ymp-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-19 20:47:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "epruesse",
"github_project": "ymp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"circle": true,
"lcname": "ymp"
}