seqpro


Nameseqpro JSON
Version 0.1.12 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-04-26 19:39:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![PyPI - Downloads](https://img.shields.io/pypi/dm/seqpro)
![GitHub stars](https://img.shields.io/github/stars/ML4GLand/SeqPro)

# SeqPro (Sequence processing toolkit)
```python
import seqpro as sp
```

SeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).

All functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule to integrate nicely with [SeqData](https://github.com/ML4GLand/SeqData).

Computational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.

# Installation

```bash
pip install seqpro
```

## API

```python
N = 2
L = 3

# Generating random sequences
seqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)

# Padding
sp.pad_seqs(seqs, pad="right", pad_value="N", length=5, length_axis=-1)

# One-hot encoding and decoding
ohe = sp.ohe(seqs, alphabet=sp.DNA)
sp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char="N")

# Tokenization
token_map = {"A": 7, "C": 8, "G": 9, "T": 10, "N": 11}
tokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)
sp.decode_tokens(tokens, token_map=token_map)

# Reverse complement
sp.reverse_complement(seqs, alphabet=sp.DNA)

# k-let preserving shuffling
sp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)

# Calculating GC or nucleotide content
sp.gc_content(seqs, alphabet=sp.DNA)
sp.nucleotide_content(seqs, alphabet=sp.DNA)

# Randomly jittering sequences
sp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)

# Collapse coverage to a given bin width
sp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)
```

# More to come!

All contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "seqpro",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "David Laub <dlaub@ucsd.edu>, Adam Klie <aklie@ucsd.edu>",
    "download_url": "https://files.pythonhosted.org/packages/4e/8e/139bfe67a67a967a22347c78ea3d5ed4868cd3f6b317ce3c14d29ad8d495/seqpro-0.1.12.tar.gz",
    "platform": null,
    "description": "![PyPI - Downloads](https://img.shields.io/pypi/dm/seqpro)\n![GitHub stars](https://img.shields.io/github/stars/ML4GLand/SeqPro)\n\n# SeqPro (Sequence processing toolkit)\n```python\nimport seqpro as sp\n```\n\nSeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).\n\nAll functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule to integrate nicely with [SeqData](https://github.com/ML4GLand/SeqData).\n\nComputational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.\n\n# Installation\n\n```bash\npip install seqpro\n```\n\n## API\n\n```python\nN = 2\nL = 3\n\n# Generating random sequences\nseqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)\n\n# Padding\nsp.pad_seqs(seqs, pad=\"right\", pad_value=\"N\", length=5, length_axis=-1)\n\n# One-hot encoding and decoding\nohe = sp.ohe(seqs, alphabet=sp.DNA)\nsp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char=\"N\")\n\n# Tokenization\ntoken_map = {\"A\": 7, \"C\": 8, \"G\": 9, \"T\": 10, \"N\": 11}\ntokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)\nsp.decode_tokens(tokens, token_map=token_map)\n\n# Reverse complement\nsp.reverse_complement(seqs, alphabet=sp.DNA)\n\n# k-let preserving shuffling\nsp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)\n\n# Calculating GC or nucleotide content\nsp.gc_content(seqs, alphabet=sp.DNA)\nsp.nucleotide_content(seqs, alphabet=sp.DNA)\n\n# Randomly jittering sequences\nsp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)\n\n# Collapse coverage to a given bin width\nsp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)\n```\n\n# More to come!\n\nAll contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "0.1.12",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a55b0817720d82dfc95755ea39da8e79c7a28e0b25a6b0e1cdb5d7107e86d512",
                "md5": "f40c2a6a6cb799770629c453af6d6a86",
                "sha256": "17f83ada4420dbca680231a030371797c1c7b412ac0a4334e2df09b65c5a6e38"
            },
            "downloads": -1,
            "filename": "seqpro-0.1.12-cp38-abi3-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "f40c2a6a6cb799770629c453af6d6a86",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.9",
            "size": 345074,
            "upload_time": "2024-04-26T19:39:02",
            "upload_time_iso_8601": "2024-04-26T19:39:02.998130Z",
            "url": "https://files.pythonhosted.org/packages/a5/5b/0817720d82dfc95755ea39da8e79c7a28e0b25a6b0e1cdb5d7107e86d512/seqpro-0.1.12-cp38-abi3-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4e8e139bfe67a67a967a22347c78ea3d5ed4868cd3f6b317ce3c14d29ad8d495",
                "md5": "bbf31a286e4d83603e8a63ec6ae3d6df",
                "sha256": "cbe672fcd4320eaf3b0ae01d905acd385994e4c3c61139ce578ab9fe06349fdd"
            },
            "downloads": -1,
            "filename": "seqpro-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "bbf31a286e4d83603e8a63ec6ae3d6df",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 30039,
            "upload_time": "2024-04-26T19:39:05",
            "upload_time_iso_8601": "2024-04-26T19:39:05.166240Z",
            "url": "https://files.pythonhosted.org/packages/4e/8e/139bfe67a67a967a22347c78ea3d5ed4868cd3f6b317ce3c14d29ad8d495/seqpro-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-26 19:39:05",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "seqpro"
}
        
Elapsed time: 0.25333s