itermae


Nameitermae JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttp://gitlab.com/darachm/itermae
SummaryCommandline tool for parsing NGS reads by multiple fuzzy regex operations
upload_time2020-12-03 20:46:24
maintainer
docs_urlNone
authorDarach Miller
requires_python>=3.6
licenseBSD 2-clause
keywords fastq regex fuzzy amplicon parser barcode extractor extracter
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # itermae

This is tool that parses FASTQ format reads using patterns. 
Specifically, it uses fuzzy regular expressions, so patterns that allow some
degeneracy and using the sequence, not just position, to parse reads.
Then it rebuilds SAM, FASTQ, or FASTA file streams for piping into other tools
or into other files.

It is pretty much just a wrapper to apply fuzzy regex from the 
[`regex`](https://pypi.org/project/regex/)
to sequences in 
[`Biopython`](https://pypi.org/project/biopython/) 
format. That's pretty much it, but it's designed
to be a flexible command line interface to that, for easy parallelization.

# Availability, installation, 'installation'

Options:

1. Use pip to install `itermae`, so 

    python3 -m pip install itermae

1. You can clone this repo, and install it locally. Dependencies are in
    `requirements.txt`, so 
    `python3 -m pip install -r requirements.txt` will install those.
    But if you're not using pip anyways, then you... do you.

1. You can use [Singularity](https://syslab.org) to pull and run a 
    [Singularity image of itermae.py](https://singularity-hub.org/collections/4537), 
    where everything is already installed.
    This is the recommended usage. This image is built with a few other tools,
    like gawk, perl, and parallel, to make command line munging easier.

# Usage

`itermae` is envisioned to be used in a pipe-line where you just got your
FASTQ reads back, and you want to parse them. You can use `zcat` to feed
small chunks into the tool, develop operations that match, filter, and extract
the right groups to assemble the output you want. Then you wrap it it up behind
`parallel` and feed the whole FASTQ file via `zcat` in on standard in.
This parallelizes with a small memory footprint (tune the chunk size), then
you write it out to disk (or stream into another tool?).

Do one thing well, right?

See the jupyter notebook in `demo/`, and the HTML produced from that in that
same folder. That should have some examples and ideas for how to use it.

I believe I'm the only one using this tool, so let me know if you ever try it.
I'd love to hear about it, and would be very eager to help you use it and
try to adapt it to work to your purposes. 

Oh, and this is for BASH shells on Linux/Unix boxes ! I have no idea how
OSX/windows stuff works. Are you unfamiliar with this? If you're at a 
university, ask your librarian. If you're not, look it up online or use the
lessons at Software Carpentries. Or tweet at me about it...

# Caution!

The output group formation and filtering is just using `eval`. This gives
flexibility, but is nowhere near remotely thinking that it would be anywhere
near anything like secure. So this is for use at the command line on your
computer, not web-facing or anything of the sort. Be responsible.



            

Raw data

            {
    "_id": null,
    "home_page": "http://gitlab.com/darachm/itermae",
    "name": "itermae",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "fastq regex fuzzy amplicon parser barcode extractor extracter",
    "author": "Darach Miller",
    "author_email": "darachm@stanford.edu",
    "download_url": "https://files.pythonhosted.org/packages/08/73/dca491ff8f538fb6da159e6d3e6665e6e0b7677d652cc216b0f9c21f94a8/itermae-0.4.1.tar.gz",
    "platform": "",
    "description": "# itermae\n\nThis is tool that parses FASTQ format reads using patterns. \nSpecifically, it uses fuzzy regular expressions, so patterns that allow some\ndegeneracy and using the sequence, not just position, to parse reads.\nThen it rebuilds SAM, FASTQ, or FASTA file streams for piping into other tools\nor into other files.\n\nIt is pretty much just a wrapper to apply fuzzy regex from the \n[`regex`](https://pypi.org/project/regex/)\nto sequences in \n[`Biopython`](https://pypi.org/project/biopython/) \nformat. That's pretty much it, but it's designed\nto be a flexible command line interface to that, for easy parallelization.\n\n# Availability, installation, 'installation'\n\nOptions:\n\n1. Use pip to install `itermae`, so \n\n    python3 -m pip install itermae\n\n1. You can clone this repo, and install it locally. Dependencies are in\n    `requirements.txt`, so \n    `python3 -m pip install -r requirements.txt` will install those.\n    But if you're not using pip anyways, then you... do you.\n\n1. You can use [Singularity](https://syslab.org) to pull and run a \n    [Singularity image of itermae.py](https://singularity-hub.org/collections/4537), \n    where everything is already installed.\n    This is the recommended usage. This image is built with a few other tools,\n    like gawk, perl, and parallel, to make command line munging easier.\n\n# Usage\n\n`itermae` is envisioned to be used in a pipe-line where you just got your\nFASTQ reads back, and you want to parse them. You can use `zcat` to feed\nsmall chunks into the tool, develop operations that match, filter, and extract\nthe right groups to assemble the output you want. Then you wrap it it up behind\n`parallel` and feed the whole FASTQ file via `zcat` in on standard in.\nThis parallelizes with a small memory footprint (tune the chunk size), then\nyou write it out to disk (or stream into another tool?).\n\nDo one thing well, right?\n\nSee the jupyter notebook in `demo/`, and the HTML produced from that in that\nsame folder. That should have some examples and ideas for how to use it.\n\nI believe I'm the only one using this tool, so let me know if you ever try it.\nI'd love to hear about it, and would be very eager to help you use it and\ntry to adapt it to work to your purposes. \n\nOh, and this is for BASH shells on Linux/Unix boxes ! I have no idea how\nOSX/windows stuff works. Are you unfamiliar with this? If you're at a \nuniversity, ask your librarian. If you're not, look it up online or use the\nlessons at Software Carpentries. Or tweet at me about it...\n\n# Caution!\n\nThe output group formation and filtering is just using `eval`. This gives\nflexibility, but is nowhere near remotely thinking that it would be anywhere\nnear anything like secure. So this is for use at the command line on your\ncomputer, not web-facing or anything of the sort. Be responsible.\n\n\n",
    "bugtrack_url": null,
    "license": "BSD 2-clause",
    "summary": "Commandline tool for parsing NGS reads by multiple fuzzy regex operations",
    "version": "0.4.1",
    "split_keywords": [
        "fastq",
        "regex",
        "fuzzy",
        "amplicon",
        "parser",
        "barcode",
        "extractor",
        "extracter"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "dc48d18abef0644d793735e2f6db3a6f",
                "sha256": "28ff203639771cd5e651207f8ce8149de511f0c88173f52bcdd665ace87a0edb"
            },
            "downloads": -1,
            "filename": "itermae-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dc48d18abef0644d793735e2f6db3a6f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 13420,
            "upload_time": "2020-12-03T20:46:23",
            "upload_time_iso_8601": "2020-12-03T20:46:23.295118Z",
            "url": "https://files.pythonhosted.org/packages/50/ca/8f8fcbb9b8fe35cef4e6f4d6a9692190d8b51843d5e6ae5f5c0d51b34b9e/itermae-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "c689ee73d2231bdb578d07505ed2807e",
                "sha256": "05cc46352ca82621facbbd193206f224de52f516b907c5d90cfc84f5872dae72"
            },
            "downloads": -1,
            "filename": "itermae-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c689ee73d2231bdb578d07505ed2807e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 11279,
            "upload_time": "2020-12-03T20:46:24",
            "upload_time_iso_8601": "2020-12-03T20:46:24.629741Z",
            "url": "https://files.pythonhosted.org/packages/08/73/dca491ff8f538fb6da159e6d3e6665e6e0b7677d652cc216b0f9c21f94a8/itermae-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-12-03 20:46:24",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "gitlab_user": null,
    "gitlab_project": "darachm",
    "lcname": "itermae"
}
        
Elapsed time: 0.17565s