seqpro


Nameseqpro JSON
Version 0.1.11 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-02-02 23:15:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![PyPI - Downloads](https://img.shields.io/pypi/dm/seqpro)
![GitHub stars](https://img.shields.io/github/stars/ML4GLand/SeqPro)

# SeqPro (Sequence processing toolkit)
```python
import seqpro as sp
```

SeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).

All functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule.

Computational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.

# Installation

```bash
pip install seqpro
```

## Sequence cleaners (`cleaners`)

### Remove sequences with ambiguous bases

```python

# Padding
sp.pad_seqs(seqs, pad="right", pad_value="N", max_length=None)

# One-hot encoding
sp.ohe(seqs, alphabet=sp.alphabets.DNA)

# Decode one-hot encoding
sp.decode_ohe(ohe, ohe_axis=1, alphabet=sp.alphabets.DNA, unknown__char="N")

# Reverse complement
sp.reverse_complement(seqs, alphabet=sp.alphabets.DNA)

# k-let preserving shuffling
sp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)

# Calculating GC content
sp.gc_content(seqs, normalize=True)

# Generating random sequences
sp.random_seqs(shape=(N, L), alphabet=sp.alphabets.DNA, seed=1234)

# Randomly jittering sequences
sp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)
```

## Manipulating coverage
```python

# Collapse coverage to a given bin width
sp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)

# Can jitter coverage and sequences so they stay aligned
sp.jitter((seqs, coverage), max_jitter=128, length_axis=1, seed=1234)

## One-hot encoding

```python
sp.ohe(seqs)
```

## Sequence analysis (`analyzers`)

### Calculate sequence properties (e.g. GC content)

```python
sp.gc_content(seqs)
sp.nucleotide_content(seqs)
```

# More to come!

All contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)

### Preparing sequences for sequence-to-function models


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "seqpro",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "David Laub <dlaub@ucsd.edu>, Adam Klie <aklie@ucsd.edu>",
    "download_url": "https://files.pythonhosted.org/packages/9f/41/09574cfdf2f42edf143db27ea892d006cc069877d2e13f30980939c481b9/seqpro-0.1.11.tar.gz",
    "platform": null,
    "description": "![PyPI - Downloads](https://img.shields.io/pypi/dm/seqpro)\n![GitHub stars](https://img.shields.io/github/stars/ML4GLand/SeqPro)\n\n# SeqPro (Sequence processing toolkit)\n```python\nimport seqpro as sp\n```\n\nSeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including [SeqData](https://github.com/ML4GLand/SeqData), [MotifData](https://github.com/ML4GLand/MotifData), [SeqExplainer](https://github.com/ML4GLand/SeqExplainer), and [EUGENe](https://github.com/ML4GLand/EUGENe).\n\nAll functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the `seqpro.xr` submodule.\n\nComputational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.\n\n# Installation\n\n```bash\npip install seqpro\n```\n\n## Sequence cleaners (`cleaners`)\n\n### Remove sequences with ambiguous bases\n\n```python\n\n# Padding\nsp.pad_seqs(seqs, pad=\"right\", pad_value=\"N\", max_length=None)\n\n# One-hot encoding\nsp.ohe(seqs, alphabet=sp.alphabets.DNA)\n\n# Decode one-hot encoding\nsp.decode_ohe(ohe, ohe_axis=1, alphabet=sp.alphabets.DNA, unknown__char=\"N\")\n\n# Reverse complement\nsp.reverse_complement(seqs, alphabet=sp.alphabets.DNA)\n\n# k-let preserving shuffling\nsp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)\n\n# Calculating GC content\nsp.gc_content(seqs, normalize=True)\n\n# Generating random sequences\nsp.random_seqs(shape=(N, L), alphabet=sp.alphabets.DNA, seed=1234)\n\n# Randomly jittering sequences\nsp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)\n```\n\n## Manipulating coverage\n```python\n\n# Collapse coverage to a given bin width\nsp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)\n\n# Can jitter coverage and sequences so they stay aligned\nsp.jitter((seqs, coverage), max_jitter=128, length_axis=1, seed=1234)\n\n## One-hot encoding\n\n```python\nsp.ohe(seqs)\n```\n\n## Sequence analysis (`analyzers`)\n\n### Calculate sequence properties (e.g. GC content)\n\n```python\nsp.gc_content(seqs)\nsp.nucleotide_content(seqs)\n```\n\n# More to come!\n\nAll contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our [code of conduct](https://github.com/ML4GLand/EUGENe/blob/main/CODE_OF_CONDUCT.md)\n\n### Preparing sequences for sequence-to-function models\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "0.1.11",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0fc570ceaff0a3ab9b35ac864bffd6cfc9b961b4946bfe9aae7d7517f6f48c5c",
                "md5": "b9f5d0b6599d8c03a9eb6adae173e545",
                "sha256": "c57646737f3280243e4a5d7f906846ba8acb39eb8d05211a146e6fbedf472514"
            },
            "downloads": -1,
            "filename": "seqpro-0.1.11-cp38-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b9f5d0b6599d8c03a9eb6adae173e545",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 350011,
            "upload_time": "2024-02-02T23:15:40",
            "upload_time_iso_8601": "2024-02-02T23:15:40.163954Z",
            "url": "https://files.pythonhosted.org/packages/0f/c5/70ceaff0a3ab9b35ac864bffd6cfc9b961b4946bfe9aae7d7517f6f48c5c/seqpro-0.1.11-cp38-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f4109574cfdf2f42edf143db27ea892d006cc069877d2e13f30980939c481b9",
                "md5": "d334d63c35fecbc6363bd6ca77bae93e",
                "sha256": "a0fdc532ff3a803818c1734cbe2d37f79e2e2601dbd35e52cfbe414564902aa2"
            },
            "downloads": -1,
            "filename": "seqpro-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "d334d63c35fecbc6363bd6ca77bae93e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 26914,
            "upload_time": "2024-02-02T23:15:42",
            "upload_time_iso_8601": "2024-02-02T23:15:42.295194Z",
            "url": "https://files.pythonhosted.org/packages/9f/41/09574cfdf2f42edf143db27ea892d006cc069877d2e13f30980939c481b9/seqpro-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-02 23:15:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "seqpro"
}
        
Elapsed time: 0.21929s