profasta


Nameprofasta JSON
Version 0.0.4 PyPI version JSON
download
home_page
SummaryA Python library for working with protein containing FASTA files.
upload_time2024-02-16 12:18:34
maintainer
docs_urlNone
author
requires_python>=3.9
licenseMIT License Copyright (c) 2024 David M. Hollenstein Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords fasta bioinformatics mass spectrometry proteomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ProFASTA
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)

## Introduction
ProFASTA is a Python library for working with FASTA files containing protein records. Unlike other packages, ProFASTA prioritizes simplicity, while aiming to provide a set of useful features required in the field of proteomics based mass spectrometry. 

The library is still in early development and the interface might change over time. At the current stage ProFASTA provides functionality for parsing and writing FASTA files, as well as for providing access to protein records imported from FASTA files.

ProFASTA is developed as part of the computational toolbox for the [Mass Spectrometry Facility](https://www.maxperutzlabs.ac.at/research/facilities/mass-spectrometry-facility) at the Max Perutz Labs (University of Vienna).

## Similar projects
If ProFASTA doesn't meet your requirements, consider exploring these alternative Python packages with a focus on protein-containing FASTA files:

- [fastapy](https://pypi.org/project/fastapy/) is a lightweight package with no dependencies that offers FASTA reading functionality.
- [protfasta](https://pypi.org/project/protfasta/) is another library with no dependencies that provides reading functionality along with basic validation (e.g., duplicate headers, conversion of non-canonical amino acids). The library also allows writing FASTA files with the ability to specify the sequence line length.
- [pyteomics](https://pyteomics.readthedocs.io/en/latest/index.html) is a feature-rich package that provides tools to handle various sorts of proteomics data. It provides functions for FASTA reading, automatic parsing of headers (in various formats defined at uniprot.org), writing, and generation of decoy entries. Note that pyteomics is a large package with many dependencies.

## Usage example
The following code snippet shows how to import a FASTA file containing UniProt protein entries, retrieve a protein record by its UniProt accession number and print its gene name:

```python
>>> import profasta
>>> 
>>> fasta_path = "./example_data/uniprot_hsapiens_10entries.fasta"
>>> db = profasta.db.ProteinDatabase()
>>> db.add_fasta(fasta_path, header_parser="uniprot")
>>> protein_record = db["O75385"]
>>> print(protein_record.header_fields["gene_name"])
ULK1
```

## Requirements
Python >= 3.9

## Installation
The following command will install the latest version of ProFASTA and its dependencies from PyPi, the Python Packaging Index:

```
pip install profasta
```

To uninstall the ProFASTA library use:

```
pip uninstall profasta
```

## Planned features
**Main requirements**
- [x] parse FASTA file
- [x] parse FASTA header
    - [x] built-in parser that never fails
    - [x] built-in parser for uniprot format
    - [x] allow user defined parser
- [x] write FASTA file
    -[x] allow custom FASTA header generation
    
**Additional features**
- [x] read multiple FASTA files and write a combined file
- [x] add protein records to an existing FASTA file
- [x] generate decoy protein records by reversing the sequence
    - [x] add decoy protein records to an existing FASTA file
- [ ] validate FASTA file / FASTA records


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "profasta",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "fasta,bioinformatics,mass spectrometry,proteomics",
    "author": "",
    "author_email": "\"David M. Hollenstein\" <hollenstein.david@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/42/95/17154532891c387571dbee9776dcda1eafab249cb8cb91e82191deee5382/profasta-0.0.4.tar.gz",
    "platform": null,
    "description": "# ProFASTA\r\n[![Project Status: WIP \u2013 Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)\r\n\r\n## Introduction\r\nProFASTA is a Python library for working with FASTA files containing protein records. Unlike other packages, ProFASTA prioritizes simplicity, while aiming to provide a set of useful features required in the field of proteomics based mass spectrometry. \r\n\r\nThe library is still in early development and the interface might change over time. At the current stage ProFASTA provides functionality for parsing and writing FASTA files, as well as for providing access to protein records imported from FASTA files.\r\n\r\nProFASTA is developed as part of the computational toolbox for the [Mass Spectrometry Facility](https://www.maxperutzlabs.ac.at/research/facilities/mass-spectrometry-facility) at the Max Perutz Labs (University of Vienna).\r\n\r\n## Similar projects\r\nIf ProFASTA doesn't meet your requirements, consider exploring these alternative Python packages with a focus on protein-containing FASTA files:\r\n\r\n- [fastapy](https://pypi.org/project/fastapy/) is a lightweight package with no dependencies that offers FASTA reading functionality.\r\n- [protfasta](https://pypi.org/project/protfasta/) is another library with no dependencies that provides reading functionality along with basic validation (e.g., duplicate headers, conversion of non-canonical amino acids). The library also allows writing FASTA files with the ability to specify the sequence line length.\r\n- [pyteomics](https://pyteomics.readthedocs.io/en/latest/index.html) is a feature-rich package that provides tools to handle various sorts of proteomics data. It provides functions for FASTA reading, automatic parsing of headers (in various formats defined at uniprot.org), writing, and generation of decoy entries. Note that pyteomics is a large package with many dependencies.\r\n\r\n## Usage example\r\nThe following code snippet shows how to import a FASTA file containing UniProt protein entries, retrieve a protein record by its UniProt accession number and print its gene name:\r\n\r\n```python\r\n>>> import profasta\r\n>>> \r\n>>> fasta_path = \"./example_data/uniprot_hsapiens_10entries.fasta\"\r\n>>> db = profasta.db.ProteinDatabase()\r\n>>> db.add_fasta(fasta_path, header_parser=\"uniprot\")\r\n>>> protein_record = db[\"O75385\"]\r\n>>> print(protein_record.header_fields[\"gene_name\"])\r\nULK1\r\n```\r\n\r\n## Requirements\r\nPython >= 3.9\r\n\r\n## Installation\r\nThe following command will install the latest version of ProFASTA and its dependencies from PyPi, the Python Packaging Index:\r\n\r\n```\r\npip install profasta\r\n```\r\n\r\nTo uninstall the ProFASTA library use:\r\n\r\n```\r\npip uninstall profasta\r\n```\r\n\r\n## Planned features\r\n**Main requirements**\r\n- [x] parse FASTA file\r\n- [x] parse FASTA header\r\n    - [x] built-in parser that never fails\r\n    - [x] built-in parser for uniprot format\r\n    - [x] allow user defined parser\r\n- [x] write FASTA file\r\n    -[x] allow custom FASTA header generation\r\n    \r\n**Additional features**\r\n- [x] read multiple FASTA files and write a combined file\r\n- [x] add protein records to an existing FASTA file\r\n- [x] generate decoy protein records by reversing the sequence\r\n    - [x] add decoy protein records to an existing FASTA file\r\n- [ ] validate FASTA file / FASTA records\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 David M. Hollenstein  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "A Python library for working with protein containing FASTA files.",
    "version": "0.0.4",
    "project_urls": {
        "repository": "https://github.com/hollenstein/profasta"
    },
    "split_keywords": [
        "fasta",
        "bioinformatics",
        "mass spectrometry",
        "proteomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7438a6f5270eceaa3a98fcd07d836ab08e0bd90b337daa68ca092b859f5219eb",
                "md5": "7a0d8eb7c48a26d56b3b435dded87c93",
                "sha256": "4e619668c33077e654420b330f85dc252fa17d58d5c688de10b690471a8cfcfa"
            },
            "downloads": -1,
            "filename": "profasta-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7a0d8eb7c48a26d56b3b435dded87c93",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10954,
            "upload_time": "2024-02-16T12:18:31",
            "upload_time_iso_8601": "2024-02-16T12:18:31.511802Z",
            "url": "https://files.pythonhosted.org/packages/74/38/a6f5270eceaa3a98fcd07d836ab08e0bd90b337daa68ca092b859f5219eb/profasta-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "429517154532891c387571dbee9776dcda1eafab249cb8cb91e82191deee5382",
                "md5": "315fbc9203fdb2824bd21ad35d2f2e90",
                "sha256": "0f9478fecdec59d1e182c1254fe19eed546c88acd92404a58e4a8b479c9c7f2b"
            },
            "downloads": -1,
            "filename": "profasta-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "315fbc9203fdb2824bd21ad35d2f2e90",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11435,
            "upload_time": "2024-02-16T12:18:34",
            "upload_time_iso_8601": "2024-02-16T12:18:34.292627Z",
            "url": "https://files.pythonhosted.org/packages/42/95/17154532891c387571dbee9776dcda1eafab249cb8cb91e82191deee5382/profasta-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-16 12:18:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hollenstein",
    "github_project": "profasta",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "profasta"
}
        
Elapsed time: 0.80124s