Name | seqpro JSON |
Version |
0.1.16
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2025-01-09 19:54:07 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|


# SeqPro (Sequence processing toolkit)
```python
import seqpro as sp
```
SeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).
All functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule to integrate nicely with [SeqData](https://github.com/ML4GLand/SeqData).
Computational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.
# Installation
```bash
pip install seqpro
```
## API
```python
N = 2
L = 3
# Generating random sequences
seqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)
# Padding
sp.pad_seqs(seqs, pad="right", pad_value="N", length=5, length_axis=-1)
# One-hot encoding and decoding
ohe = sp.ohe(seqs, alphabet=sp.DNA)
sp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char="N")
# Tokenization
token_map = {"A": 7, "C": 8, "G": 9, "T": 10, "N": 11}
tokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)
sp.decode_tokens(tokens, token_map=token_map)
# Reverse complement
sp.reverse_complement(seqs, alphabet=sp.DNA)
# k-let preserving shuffling
sp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)
# Calculating GC or nucleotide content
sp.gc_content(seqs, alphabet=sp.DNA)
sp.nucleotide_content(seqs, alphabet=sp.DNA)
# Randomly jittering sequences
sp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)
# Collapse coverage to a given bin width
sp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)
```
# More to come!
All contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)
Raw data
{
"_id": null,
"home_page": null,
"name": "seqpro",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "David Laub <dlaub@ucsd.edu>, Adam Klie <aklie@ucsd.edu>",
"download_url": "https://files.pythonhosted.org/packages/24/d7/fef6bac781df0097e302a0da5f09d2c95a8c188dbbd64ecd2e89cbdbaf84/seqpro-0.1.16.tar.gz",
"platform": null,
"description": "\n\n\n# SeqPro (Sequence processing toolkit)\n```python\nimport seqpro as sp\n```\n\nSeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).\n\nAll functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule to integrate nicely with [SeqData](https://github.com/ML4GLand/SeqData).\n\nComputational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.\n\n# Installation\n\n```bash\npip install seqpro\n```\n\n## API\n\n```python\nN = 2\nL = 3\n\n# Generating random sequences\nseqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)\n\n# Padding\nsp.pad_seqs(seqs, pad=\"right\", pad_value=\"N\", length=5, length_axis=-1)\n\n# One-hot encoding and decoding\nohe = sp.ohe(seqs, alphabet=sp.DNA)\nsp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char=\"N\")\n\n# Tokenization\ntoken_map = {\"A\": 7, \"C\": 8, \"G\": 9, \"T\": 10, \"N\": 11}\ntokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)\nsp.decode_tokens(tokens, token_map=token_map)\n\n# Reverse complement\nsp.reverse_complement(seqs, alphabet=sp.DNA)\n\n# k-let preserving shuffling\nsp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)\n\n# Calculating GC or nucleotide content\nsp.gc_content(seqs, alphabet=sp.DNA)\nsp.nucleotide_content(seqs, alphabet=sp.DNA)\n\n# Randomly jittering sequences\nsp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)\n\n# Collapse coverage to a given bin width\nsp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)\n```\n\n# More to come!\n\nAll contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)\n\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.1.16",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8161a348d5e68cbf54da2422798105134d67189c158987a7edca1641b4375cf5",
"md5": "d2d4787c7cbf8e2a81e226515b650e75",
"sha256": "212c2bfaa2ff70cb5e21192cab0fb75cd582a14990a861c8993a19dba9924bac"
},
"downloads": -1,
"filename": "seqpro-0.1.16-cp39-abi3-manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "d2d4787c7cbf8e2a81e226515b650e75",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 405266,
"upload_time": "2025-01-09T19:54:06",
"upload_time_iso_8601": "2025-01-09T19:54:06.046326Z",
"url": "https://files.pythonhosted.org/packages/81/61/a348d5e68cbf54da2422798105134d67189c158987a7edca1641b4375cf5/seqpro-0.1.16-cp39-abi3-manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "24d7fef6bac781df0097e302a0da5f09d2c95a8c188dbbd64ecd2e89cbdbaf84",
"md5": "eb9124e74c62dff495ef42dc37bef1c1",
"sha256": "f647ab6e1962af3a41623baca4fa621d2b11173c0de807d4775fabc4fe01e92d"
},
"downloads": -1,
"filename": "seqpro-0.1.16.tar.gz",
"has_sig": false,
"md5_digest": "eb9124e74c62dff495ef42dc37bef1c1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 31347,
"upload_time": "2025-01-09T19:54:07",
"upload_time_iso_8601": "2025-01-09T19:54:07.412264Z",
"url": "https://files.pythonhosted.org/packages/24/d7/fef6bac781df0097e302a0da5f09d2c95a8c188dbbd64ecd2e89cbdbaf84/seqpro-0.1.16.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-09 19:54:07",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "seqpro"
}