miniFasta


NameminiFasta JSON
Version 3.0.3 PyPI version JSON
download
home_pagehttps://github.com/not-a-feature/miniFASTA
SummaryA simple FASTA read and write toolbox for small to medium size projects without dependencies.
upload_time2023-07-28 20:28:53
maintainer
docs_urlNone
authorJules Kreuer / not_a_feature
requires_python>=3.7
licensegpl-3.0
keywords fasta reader bio bioinformatics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![miniFASTA](https://github.com/not-a-feature/miniFASTA/raw/main/miniFASTA.png)

A simple FASTA read and write toolbox for small to medium size projects.


[![DOI](https://zenodo.org/badge/440126588.svg)](https://zenodo.org/badge/latestdoi/440126588)
![Test Badge](https://github.com/not-a-feature/miniFASTA/actions/workflows/tests.yml/badge.svg)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)<br>
![Download Badge](https://img.shields.io/pypi/dm/miniFASTA.svg)
![Python Version Badge](https://img.shields.io/pypi/pyversions/miniFASTA)
[![install with conda](https://img.shields.io/badge/install%20with-conda-brightgreen.svg?style=flat)](https://anaconda.org/conda-forge/minifasta)



FASTA files are text-based files for storing nucleotide or amino acid sequences.
Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.

miniFASTA offers an alternative to this and brings many useful functions without relying on third party packages.

## Installation
Using pip  / pip3:
```bash
pip install miniFasta
```
Or by source with pip:
```bash
git clone git@github.com:not-a-feature/miniFASTA.git
cd miniFASTA
pip install .
```
Or by conda:
```bash
conda install -c conda-forge minifasta
```

## How to use
miniFASTA offers easy to use functions for fasta handling.
The five main parts are:
- read()
- write()
- fasta_object()
    - toAmino()
    - roRevComp()
    - valid()
    - len() / str() / eq() / iter()
- translate_seq()
- reverse_comp()

## Reading FASTA files
`read()` is a fasta reader which is able to handle compressed and non-compressed files.
Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read.
This function returns a Iterator of fasta_objects. If only the sequences should be returnes set the positional argument `seq=True`.
The entries are usually casted to upper case letters. Set `read("path.fasta", upper=False)` to disable casting.

```python
# Read fasta_objects
fos = mf.read("dolphin.fasta") # Iterator of fasta_objects.
fos = list(fos) # Casts the iterator to list of fasta_objects

# Read only the sequence
fasta_strings = mf.read("dolphin.fasta", seq=True) # Iterator of string.
fasta_strings = [fo.body for fo in mf.read("dolphin.fasta")] # Alternative

# Options and compressed files
fos = mf.read("mouse.fasta", upper=False) # The entries won't be casted to upper case.
fos = mf.read("reads.tar.gz") # Is able to handle compressed files.
```

## Writing FASTA files
`write()` is a basic fasta writer.
It takes a single or a list of fasta_objects and writes it to the given path.

The file is usually overwritten. Set `write(fo, "path.fasta", mode="a")` to append file.

```python
fos = mf.read("dolphin.fasta") # Iterator of fasta entries
fos = list(fos) # Materialize
mf.write(fos, "new.fasta")
```

### fasta_object()
The core component of miniFASTA is the ```fasta_object()```. This object represents an FASTA entry and consists of a head and body.

```python
import miniFasta as mf
fo = mf.fasta_object(">Atlantic dolphin", "CGGCCTTCTATCTTCTTC", stype="DNA")
fo.getHead() or fo.head
# >Atlantic dolphin

fo.getSeq() or fo.body
# CGGCCTTCTATCTTCTTC

### Following functions are defined on a fasta_object():

str(fo) # will return:
# >Atlantic dolphin
# CGGCCTTCTATCTTCTTC

# Body length
len(fo) # will return 18, the length of the body

# Equality
fo == fo # True

fo_b = mf.fasta_object(">Same Body", "CGGCCTTCTATCTTCTTC")
fo == fo_b # True

fo_c = mf.fasta_object(">Different Body", "ZZZZAGCTAG")
fo == fo_c # False

for s in fo:
    # Iterates through the sequence of fo.
```

**fasta_object(...).valid()**

Checks if the body contains invalid characters.
_stype_ of fasta_object needs to be set in order to check for illegal characters in its body.

stype is one of:
- ANY : [default] Allows all characters.
- NA  : Allows all Nucleic Acid Codes (DNA & RNA).
- DNA : Allows all IUPAC DNA Codes.
- RNA : Allows all IUPAC RNA Codes.
- PROT: Allows all IUPAC Aminoacid Codes.

Optional: allowedChars can be set to overwrite default settings.

```python
# The default object allows all characters.
# True
fasta_object(">valid", "Ä'_**?.asdLLA").valid()

# Only if stype is specified, valid can check for illegal characters.
# True
fasta_object(">valid", "ACGTUAGTGU", stype="NA").valid()

# False, as W is not allowed for DNA/RNA
fasta_object(">invalid", "ACWYUOTGU", stype="NA").valid()

# True
fasta_object(">valid", "AGGATTA", stype="ANY").valid(allowedChars = "AGTC")

# True, as stype is ignored if allowedChars is set.
fasta_object(">valid", "WYU", stype="DNA").valid(allowedChars = "WYU")
```

**fasta_object(...).toAmino(translation_dict)**

Translates the body to an amino-acid sequence. See `tranlate_seq()` for more details.
```python
fo.toAmino()
fo.getBody() # Will return RPSIFF
d = {"CCG": "Z", "CTT": "A" ...}
fo.toAmino(d)
fo.getBody # Will return ZA...
```
**fasta_object(...).toRevComp(complement_dict)**

Converts the body to its reverse comlement. See `reverse_comp()` for more details.
```python
fo.toRevComp()
fo.getBody # Will return GAAGAAGATAGAAGGCCG
```

## Sequence translation
`translate_seq()` translates a sequence starting at position 0.
Unless translation_dict is provided, the standart bacterial code is used. If the codon was not found, it will be replaced by an `~`. Tailing bases that do not fit into a codon will be ignored.

```python
mf.translate_seq("CGGCCTTCTATCTTCTTC") # Will return RPSIFF

d = {"CGG": "Z", "CTT": "A"}
mf.translate_seq("CGGCTT", d) # Will return ZA.
```

## Reverse Complement
`reverse_comp()` converts a sequence to its reverse comlement.
Unless complement_dict is provided, the standart complement is used. If no complement was found, the nucleotide remains unchanged.
```python
mf.reverse_comp("CGGCCTTCTATCTTCTTC") # Will return GAAGAAGATAGAAGGCCG

d = {"C": "Z", "T": "Y"}
mf.reverse_comp("TC", d) # Will return ZY
```

## License
```
Copyright (C) 2022 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:

| Permissions      | Conditions                   | Limitations |
| ---------------- | ---------------------------- | ----------- |
| ✓ Commercial use | Disclose source              | ✕ Liability |
| ✓ Distribution   | License and copyright notice | ✕ Warranty  |
| ✓ Modification   | Same license                 |             |
| ✓ Patent use     | State changes                |             |
| ✓ Private use    |                              |             |
```
Go to [LICENSE.md](https://github.com/not-a-feature/miniFASTA/blob/main/LICENSE) to see the full version.

## Dependencies
In addition to packages included in Python 3, this piece of software uses 3rd-party software packages for development purposes that are not required in the published version.
Go to [DEPENDENCIES.md](https://github.com/not-a-feature/miniFASTA/blob/main/DEPENDENCIES.md) to see all dependencies and licenses.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/not-a-feature/miniFASTA",
    "name": "miniFasta",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "FASTA,reader,bio,bioinformatics",
    "author": "Jules Kreuer / not_a_feature",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/70/d0/a41ce802a0bcae36fb9f6dde9eaa90e58fc7f04a476a055980ecc3a39e47/miniFasta-3.0.3.tar.gz",
    "platform": "unix",
    "description": "![miniFASTA](https://github.com/not-a-feature/miniFASTA/raw/main/miniFASTA.png)\n\nA simple FASTA read and write toolbox for small to medium size projects.\n\n\n[![DOI](https://zenodo.org/badge/440126588.svg)](https://zenodo.org/badge/latestdoi/440126588)\n![Test Badge](https://github.com/not-a-feature/miniFASTA/actions/workflows/tests.yml/badge.svg)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)<br>\n![Download Badge](https://img.shields.io/pypi/dm/miniFASTA.svg)\n![Python Version Badge](https://img.shields.io/pypi/pyversions/miniFASTA)\n[![install with conda](https://img.shields.io/badge/install%20with-conda-brightgreen.svg?style=flat)](https://anaconda.org/conda-forge/minifasta)\n\n\n\nFASTA files are text-based files for storing nucleotide or amino acid sequences.\nReading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.\n\nminiFASTA offers an alternative to this and brings many useful functions without relying on third party packages.\n\n## Installation\nUsing pip  / pip3:\n```bash\npip install miniFasta\n```\nOr by source with pip:\n```bash\ngit clone git@github.com:not-a-feature/miniFASTA.git\ncd miniFASTA\npip install .\n```\nOr by conda:\n```bash\nconda install -c conda-forge minifasta\n```\n\n## How to use\nminiFASTA offers easy to use functions for fasta handling.\nThe five main parts are:\n- read()\n- write()\n- fasta_object()\n    - toAmino()\n    - roRevComp()\n    - valid()\n    - len() / str() / eq() / iter()\n- translate_seq()\n- reverse_comp()\n\n## Reading FASTA files\n`read()` is a fasta reader which is able to handle compressed and non-compressed files.\nFollowing compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read.\nThis function returns a Iterator of fasta_objects. If only the sequences should be returnes set the positional argument `seq=True`.\nThe entries are usually casted to upper case letters. Set `read(\"path.fasta\", upper=False)` to disable casting.\n\n```python\n# Read fasta_objects\nfos = mf.read(\"dolphin.fasta\") # Iterator of fasta_objects.\nfos = list(fos) # Casts the iterator to list of fasta_objects\n\n# Read only the sequence\nfasta_strings = mf.read(\"dolphin.fasta\", seq=True) # Iterator of string.\nfasta_strings = [fo.body for fo in mf.read(\"dolphin.fasta\")] # Alternative\n\n# Options and compressed files\nfos = mf.read(\"mouse.fasta\", upper=False) # The entries won't be casted to upper case.\nfos = mf.read(\"reads.tar.gz\") # Is able to handle compressed files.\n```\n\n## Writing FASTA files\n`write()` is a basic fasta writer.\nIt takes a single or a list of fasta_objects and writes it to the given path.\n\nThe file is usually overwritten. Set `write(fo, \"path.fasta\", mode=\"a\")` to append file.\n\n```python\nfos = mf.read(\"dolphin.fasta\") # Iterator of fasta entries\nfos = list(fos) # Materialize\nmf.write(fos, \"new.fasta\")\n```\n\n### fasta_object()\nThe core component of miniFASTA is the ```fasta_object()```. This object represents an FASTA entry and consists of a head and body.\n\n```python\nimport miniFasta as mf\nfo = mf.fasta_object(\">Atlantic dolphin\", \"CGGCCTTCTATCTTCTTC\", stype=\"DNA\")\nfo.getHead() or fo.head\n# >Atlantic dolphin\n\nfo.getSeq() or fo.body\n# CGGCCTTCTATCTTCTTC\n\n### Following functions are defined on a fasta_object():\n\nstr(fo) # will return:\n# >Atlantic dolphin\n# CGGCCTTCTATCTTCTTC\n\n# Body length\nlen(fo) # will return 18, the length of the body\n\n# Equality\nfo == fo # True\n\nfo_b = mf.fasta_object(\">Same Body\", \"CGGCCTTCTATCTTCTTC\")\nfo == fo_b # True\n\nfo_c = mf.fasta_object(\">Different Body\", \"ZZZZAGCTAG\")\nfo == fo_c # False\n\nfor s in fo:\n    # Iterates through the sequence of fo.\n```\n\n**fasta_object(...).valid()**\n\nChecks if the body contains invalid characters.\n_stype_ of fasta_object needs to be set in order to check for illegal characters in its body.\n\nstype is one of:\n- ANY : [default] Allows all characters.\n- NA  : Allows all Nucleic Acid Codes (DNA & RNA).\n- DNA : Allows all IUPAC DNA Codes.\n- RNA : Allows all IUPAC RNA Codes.\n- PROT: Allows all IUPAC Aminoacid Codes.\n\nOptional: allowedChars can be set to overwrite default settings.\n\n```python\n# The default object allows all characters.\n# True\nfasta_object(\">valid\", \"\u00c4'_**?.asdLLA\").valid()\n\n# Only if stype is specified, valid can check for illegal characters.\n# True\nfasta_object(\">valid\", \"ACGTUAGTGU\", stype=\"NA\").valid()\n\n# False, as W is not allowed for DNA/RNA\nfasta_object(\">invalid\", \"ACWYUOTGU\", stype=\"NA\").valid()\n\n# True\nfasta_object(\">valid\", \"AGGATTA\", stype=\"ANY\").valid(allowedChars = \"AGTC\")\n\n# True, as stype is ignored if allowedChars is set.\nfasta_object(\">valid\", \"WYU\", stype=\"DNA\").valid(allowedChars = \"WYU\")\n```\n\n**fasta_object(...).toAmino(translation_dict)**\n\nTranslates the body to an amino-acid sequence. See `tranlate_seq()` for more details.\n```python\nfo.toAmino()\nfo.getBody() # Will return RPSIFF\nd = {\"CCG\": \"Z\", \"CTT\": \"A\" ...}\nfo.toAmino(d)\nfo.getBody # Will return ZA...\n```\n**fasta_object(...).toRevComp(complement_dict)**\n\nConverts the body to its reverse comlement. See `reverse_comp()` for more details.\n```python\nfo.toRevComp()\nfo.getBody # Will return GAAGAAGATAGAAGGCCG\n```\n\n## Sequence translation\n`translate_seq()` translates a sequence starting at position 0.\nUnless translation_dict is provided, the standart bacterial code is used. If the codon was not found, it will be replaced by an `~`. Tailing bases that do not fit into a codon will be ignored.\n\n```python\nmf.translate_seq(\"CGGCCTTCTATCTTCTTC\") # Will return RPSIFF\n\nd = {\"CGG\": \"Z\", \"CTT\": \"A\"}\nmf.translate_seq(\"CGGCTT\", d) # Will return ZA.\n```\n\n## Reverse Complement\n`reverse_comp()` converts a sequence to its reverse comlement.\nUnless complement_dict is provided, the standart complement is used. If no complement was found, the nucleotide remains unchanged.\n```python\nmf.reverse_comp(\"CGGCCTTCTATCTTCTTC\") # Will return GAAGAAGATAGAAGGCCG\n\nd = {\"C\": \"Z\", \"T\": \"Y\"}\nmf.reverse_comp(\"TC\", d) # Will return ZY\n```\n\n## License\n```\nCopyright (C) 2022 by Jules Kreuer - @not_a_feature\nThis piece of software is published unter the GNU General Public License v3.0\nTLDR:\n\n| Permissions      | Conditions                   | Limitations |\n| ---------------- | ---------------------------- | ----------- |\n| \u2713 Commercial use | Disclose source              | \u2715 Liability |\n| \u2713 Distribution   | License and copyright notice | \u2715 Warranty  |\n| \u2713 Modification   | Same license                 |             |\n| \u2713 Patent use     | State changes                |             |\n| \u2713 Private use    |                              |             |\n```\nGo to [LICENSE.md](https://github.com/not-a-feature/miniFASTA/blob/main/LICENSE) to see the full version.\n\n## Dependencies\nIn addition to packages included in Python 3, this piece of software uses 3rd-party software packages for development purposes that are not required in the published version.\nGo to [DEPENDENCIES.md](https://github.com/not-a-feature/miniFASTA/blob/main/DEPENDENCIES.md) to see all dependencies and licenses.\n",
    "bugtrack_url": null,
    "license": "gpl-3.0",
    "summary": "A simple FASTA read and write toolbox for small to medium size projects without dependencies.",
    "version": "3.0.3",
    "project_urls": {
        "Homepage": "https://github.com/not-a-feature/miniFASTA"
    },
    "split_keywords": [
        "fasta",
        "reader",
        "bio",
        "bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e3ab18291bb9590b23795c7fdfadd96e7ea958591fd1fa8ae9b87f3eda5b0ae",
                "md5": "12551431bfc764ced198b718c6d3e73f",
                "sha256": "5b1cdf634bee5f1b0807f47bc62acfb4f361bac63f89deb1352614e79a5ba9c9"
            },
            "downloads": -1,
            "filename": "miniFasta-3.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "12551431bfc764ced198b718c6d3e73f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20881,
            "upload_time": "2023-07-28T20:28:51",
            "upload_time_iso_8601": "2023-07-28T20:28:51.761153Z",
            "url": "https://files.pythonhosted.org/packages/3e/3a/b18291bb9590b23795c7fdfadd96e7ea958591fd1fa8ae9b87f3eda5b0ae/miniFasta-3.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "70d0a41ce802a0bcae36fb9f6dde9eaa90e58fc7f04a476a055980ecc3a39e47",
                "md5": "9165a9e50f90a24c1134045fff69f632",
                "sha256": "c956407123e681f877ef1d40f42d7f8aa19f87fb3d2cb8fde0de5148f68f3345"
            },
            "downloads": -1,
            "filename": "miniFasta-3.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9165a9e50f90a24c1134045fff69f632",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 23923,
            "upload_time": "2023-07-28T20:28:53",
            "upload_time_iso_8601": "2023-07-28T20:28:53.434792Z",
            "url": "https://files.pythonhosted.org/packages/70/d0/a41ce802a0bcae36fb9f6dde9eaa90e58fc7f04a476a055980ecc3a39e47/miniFasta-3.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-28 20:28:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "not-a-feature",
    "github_project": "miniFASTA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "minifasta"
}
        
Elapsed time: 0.16897s