fastapy


Namefastapy JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryA lightweight Python module to read and write FASTA sequence records
upload_time2023-03-19 15:05:28
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords fasta sequence record parser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fastapy
A lightweight Python package to read and write sequence records in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format).

The design was inspired by the utility of BioPython’s SeqIO, which supports many sequence formats. This repo focuses only on FASTA records. It is faster than BioPython, can handle compressed FASTA files (gzip, bzip2, zip), and has no Python package dependencies.

## Requirements
Python >= 3.8

## Installation

You can install `fastapy` from [PyPI](https://pypi.org/project/fastapy/):

```bash
pip install fastapy
```

or directly from GitHub:

```bash
pip install "git+https://github.com/aziele/fastapy.git"
```

You can also use `fastapy` without installation since it doesn't have any dependencies. Simply clone or download this repository and you're ready to use it.

```bash
git clone https://github.com/aziele/fastapy.git
cd fastapy
python
>>> import fastapy
>>> fastapy.__doc__
'A lightweight Python module to read and write FASTA sequence records'
```

## Quick Start
Typical usage is to read a FASTA file and loop over the sequences record(s).

```python
import fastapy

for record in fastapy.parse('test/test.fasta'):
    print(record.id, len(record), record.seq[:10], record.desc)
```

Output:

```
NP_002433.1  362   METDAPQPGL   RNA-binding protein Musashi homolog 1 [Homo sapiens]
ENO94161.1    79   MKLLISGLGP   RRM domain-containing RNA-binding protein
sequence     292   MKLSKIALMM
```

## Usage
This module contains the `Record` class representing a FASTA sequence record and the `parse()` function to read FASTA records from a file.

### Record object
Record is an object that contains information on a FASTA sequence record, including id, description, and the sequence itself.

```python
import fastapy

record = fastapy.Record(
    id='NP_950171.2', 
    seq='MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE',
    desc='APITD1-CORT protein isoform 2 [Homo sapiens]'
)

print(record.id)            # NP_950171.2
print(record.desc)          # APITD1-CORT protein isoform 2 [Homo sapiens]
print(record.seq)           # MEEEAE..
print(record.description)   # >NP_950171.2 G APITD1-CORT protein isoform 2 [Homo sapiens]
print(len(record))          # 77
print('EEEA' in record)     # True
```

By default, the sequence line is wrapped to 70 characters. You can provide the line length. Use zero (or None) for no wrapping.

```python
print(record)
# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFAS
# ICRKRQE

print(record.format(wrap=30))
# >NP_001382951.1 G protein subunit gamma 5 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCL
# CEEVALDKEMQFSKQTIAAISELTFRQCEN
# FAKDLEMFASICRKRQE

print(record.format(wrap=None))
# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE
```

### parse
The `parse()` function is a generator to read FASTA records as `Record` objects one by one from a file (plain FASTA or compressed using gzip or bzip2). Because only one record is created at a time, very little memory is required.

```python
import fastapy

for record in fastapy.parse('test/test.fasta.gz'):
    print(record.id)
```

For some tasks you may need to have a reusable access to the records. For this purpose, you can use the built-in Python `list()` function to turn the iterator into a list:

```python
import fastapy

records = list(fastapy.parse('test/test.fasta.gz'))
print(records[0].id)   # First record
print(records[-1].id)  # Last record
```

Another common task is to index your records by sequence identifier. Use `to_dict()` to turn a Record iterator (or list) into a dictionary.

```python
import fastapy

records = fastapy.to_dict(fasta.parse('test/test.fasta.gz'))
print(records['NP_002433.1'])   # Use any record id
```

### read
The `read()` function reads only the first FASTA record from a file. It does not read any subsequent records in the file.

```python
import fastapy

seq_record = fastapy.read('test/test.fasta')
print(seq_record.id)           # NP_002433.1
```

## Test
You can run tests to ensure that the module works as expected.

```
./test/test.py
```

## License

[GNU General Public License, version 3](https://www.gnu.org/licenses/gpl-3.0.html)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fastapy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "FASTA,sequence,record,parser",
    "author": null,
    "author_email": "Andrzej Zielezinski <a.zielezinski@gmail.com>, Maciej Michalczyk <mccv99@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/36/92/17a324e4dd9347107b5a5cebd5d7aef467ad59de91cdf0b4231be0b9c1a5/fastapy-1.0.3.tar.gz",
    "platform": null,
    "description": "# fastapy\nA lightweight Python package to read and write sequence records in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format).\n\nThe design was inspired by the utility of BioPython\u2019s SeqIO, which supports many sequence formats. This repo focuses only on FASTA records. It is faster than BioPython, can handle compressed FASTA files (gzip, bzip2, zip), and has no Python package dependencies.\n\n## Requirements\nPython >= 3.8\n\n## Installation\n\nYou can install `fastapy` from [PyPI](https://pypi.org/project/fastapy/):\n\n```bash\npip install fastapy\n```\n\nor directly from GitHub:\n\n```bash\npip install \"git+https://github.com/aziele/fastapy.git\"\n```\n\nYou can also use `fastapy` without installation since it doesn't have any dependencies. Simply clone or download this repository and you're ready to use it.\n\n```bash\ngit clone https://github.com/aziele/fastapy.git\ncd fastapy\npython\n>>> import fastapy\n>>> fastapy.__doc__\n'A lightweight Python module to read and write FASTA sequence records'\n```\n\n## Quick Start\nTypical usage is to read a FASTA file and loop over the sequences record(s).\n\n```python\nimport fastapy\n\nfor record in fastapy.parse('test/test.fasta'):\n    print(record.id, len(record), record.seq[:10], record.desc)\n```\n\nOutput:\n\n```\nNP_002433.1  362   METDAPQPGL   RNA-binding protein Musashi homolog 1 [Homo sapiens]\nENO94161.1    79   MKLLISGLGP   RRM domain-containing RNA-binding protein\nsequence     292   MKLSKIALMM\n```\n\n## Usage\nThis module contains the `Record` class representing a FASTA sequence record and the `parse()` function to read FASTA records from a file.\n\n### Record object\nRecord is an object that contains information on a FASTA sequence record, including id, description, and the sequence itself.\n\n```python\nimport fastapy\n\nrecord = fastapy.Record(\n    id='NP_950171.2', \n    seq='MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE',\n    desc='APITD1-CORT protein isoform 2 [Homo sapiens]'\n)\n\nprint(record.id)            # NP_950171.2\nprint(record.desc)          # APITD1-CORT protein isoform 2 [Homo sapiens]\nprint(record.seq)           # MEEEAE..\nprint(record.description)   # >NP_950171.2 G APITD1-CORT protein isoform 2 [Homo sapiens]\nprint(len(record))          # 77\nprint('EEEA' in record)     # True\n```\n\nBy default, the sequence line is wrapped to 70 characters. You can provide the line length. Use zero (or None) for no wrapping.\n\n```python\nprint(record)\n# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]\n# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFAS\n# ICRKRQE\n\nprint(record.format(wrap=30))\n# >NP_001382951.1 G protein subunit gamma 5 [Homo sapiens]\n# MEEEAETEEQQRFSYQQRLKAAVHYTVGCL\n# CEEVALDKEMQFSKQTIAAISELTFRQCEN\n# FAKDLEMFASICRKRQE\n\nprint(record.format(wrap=None))\n# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]\n# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE\n```\n\n### parse\nThe `parse()` function is a generator to read FASTA records as `Record` objects one by one from a file (plain FASTA or compressed using gzip or bzip2). Because only one record is created at a time, very little memory is required.\n\n```python\nimport fastapy\n\nfor record in fastapy.parse('test/test.fasta.gz'):\n    print(record.id)\n```\n\nFor some tasks you may need to have a reusable access to the records. For this purpose, you can use the built-in Python `list()` function to turn the iterator into a list:\n\n```python\nimport fastapy\n\nrecords = list(fastapy.parse('test/test.fasta.gz'))\nprint(records[0].id)   # First record\nprint(records[-1].id)  # Last record\n```\n\nAnother common task is to index your records by sequence identifier. Use `to_dict()` to turn a Record iterator (or list) into a dictionary.\n\n```python\nimport fastapy\n\nrecords = fastapy.to_dict(fasta.parse('test/test.fasta.gz'))\nprint(records['NP_002433.1'])   # Use any record id\n```\n\n### read\nThe `read()` function reads only the first FASTA record from a file. It does not read any subsequent records in the file.\n\n```python\nimport fastapy\n\nseq_record = fastapy.read('test/test.fasta')\nprint(seq_record.id)           # NP_002433.1\n```\n\n## Test\nYou can run tests to ensure that the module works as expected.\n\n```\n./test/test.py\n```\n\n## License\n\n[GNU General Public License, version 3](https://www.gnu.org/licenses/gpl-3.0.html)",
    "bugtrack_url": null,
    "license": null,
    "summary": "A lightweight Python module to read and write FASTA sequence records",
    "version": "1.0.3",
    "split_keywords": [
        "fasta",
        "sequence",
        "record",
        "parser"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5a24afa6ac235ffbdc21ad9172c35f1f1af70a8d0d65802a43dd784544150ebc",
                "md5": "1af4c9c0d7d3e0f05959edb917335d30",
                "sha256": "6f1041f393f32e98749017ed2177e52f77dc61d98727d691c26b360045c869ab"
            },
            "downloads": -1,
            "filename": "fastapy-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1af4c9c0d7d3e0f05959edb917335d30",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 17555,
            "upload_time": "2023-03-19T15:05:25",
            "upload_time_iso_8601": "2023-03-19T15:05:25.470021Z",
            "url": "https://files.pythonhosted.org/packages/5a/24/afa6ac235ffbdc21ad9172c35f1f1af70a8d0d65802a43dd784544150ebc/fastapy-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "369217a324e4dd9347107b5a5cebd5d7aef467ad59de91cdf0b4231be0b9c1a5",
                "md5": "a6e664b12bd19dbe38b0f14913480a0c",
                "sha256": "0ffeb63f1f1b7b5bb39d44b69528dd0237753498254271dfb540f9ab271fd539"
            },
            "downloads": -1,
            "filename": "fastapy-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a6e664b12bd19dbe38b0f14913480a0c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 22999,
            "upload_time": "2023-03-19T15:05:28",
            "upload_time_iso_8601": "2023-03-19T15:05:28.214625Z",
            "url": "https://files.pythonhosted.org/packages/36/92/17a324e4dd9347107b5a5cebd5d7aef467ad59de91cdf0b4231be0b9c1a5/fastapy-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-19 15:05:28",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "fastapy"
}
        
Elapsed time: 0.04517s