uniprotparser


Nameuniprotparser JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://github.com/noatgnu/UniprotWebParser
SummaryGetting Uniprot Data from Uniprot Accession ID through Uniprot REST API
upload_time2024-01-19 12:19:47
maintainer
docs_urlNone
authorToan K. Phung
requires_python>=3.9,<4.0
licenseMIT
keywords uniprot protein sequence database parser
VCS
bugtrack_url
requirements aiohttp aiosignal async-timeout attrs certifi charset-normalizer click colorama frozenlist idna multidict requests urllib3 yarl
Travis-CI No Travis.
coveralls test coverage No coveralls.
            UniProt Database Web Parser Project
--
[![Downloads](https://static.pepy.tech/personalized-badge/uniprotparser?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/uniprotparser)


TLDR: This parser can be used to parse UniProt accession id and obtain related data from the UniProt web database.

To use:

```bash
python -m pip install uniprotparser
```
or 

```bash
python3 -m pip install uniprotparser
```
With version 1.2.0, we have exposed `to` and `from` mapping parameters for UniProt API where you can indicate which database you want to map to and from.


```python
from uniprotparser import get_from_fields, get_to_fields

#to get all available fields to map from

from_fields = get_from_fields()
print(from_fields)

#to get all available fields to map to
to_fields = get_to_fields()
print(to_fields)
```

These parameters can be passed to the `parse` method of the `UniprotParser` class as follow

```python
from uniprotparser.betaparser import UniprotParser

parser = UniprotParser()
for p in parser.parse(ids=["P06493"], to_key="UniProtKB", from_key="UniProtKB_AC-ID"):
    print(p)
```


With version 1.1.0, a simple CLI interface has been added to the package.

```bash
Usage: uniprotparser [OPTIONS]

Options:
  -i, --input FILENAME   Input file containing a list of accession ids
  -o, --output FILENAME  Output file
  --help                 Show this message and exit.
```

With version 1.0.5, support for asyncio through `aiohttp` has been added to `betaparser`. Usage can be seen as follow

```python
from uniprotparser.betaparser import UniprotParser
from io import StringIO
import asyncio
import pandas as pd

async def main():
    example_acc_list = ["Q99490", "Q8NEJ0", "Q13322", "P05019", "P35568", "Q15323"]
    parser = UniprotParser()
    df = []
    #Yield result for 500 accession ids at a time
    async for r in parser.parse_async(ids=example_acc_list):
        df.append(pd.read_csv(StringIO(r), sep="\t"))
    
    #Check if there were more than one result and consolidate them into one dataframe
    if len(df) > 0:
        df = pd.concat(df, ignore_index=True)
    else:
        df = df[0]

asyncio.run(main())
```

With version 1.0.2, support for the new UniProt REST API have been added under `betaparser` module of the package.

In order to utilize this new module, you can follow the example bellow

```python
from uniprotparser.betaparser import UniprotParser
from io import StringIO

import pandas as pd
example_acc_list = ["Q99490", "Q8NEJ0", "Q13322", "P05019", "P35568", "Q15323"]
parser = UniprotParser()
df = []
#Yield result for 500 accession ids at a time
for r in parser.parse(ids=example_acc_list):
    df.append(pd.read_csv(StringIO(r), sep="\t"))

#Check if there were more than one result and consolidate them into one dataframe
if len(df) > 0:
    df = pd.concat(df, ignore_index=True)
else:
    df = df[0]


```

---
To parse UniProt accession with legacy API

```python
from uniprotparser.parser import UniprotSequence

protein_id = "seq|P06493|swiss"

acc_id = UniprotSequence(protein_id, parse_acc=True)

#Access ACCID
acc_id.accession

#Access isoform id
acc_id.isoform
```

To get additional data from UniProt online database

```python
from uniprotparser.parser import UniprotParser
from io import StringIO
#Install pandas first to handle tabulated data
import pandas as pd

protein_accession = "P06493"

parser = UniprotParser([protein_accession])

#To get tabulated data
result = []
for i in parser.parse("tab"):
    tab_data = pd.read_csv(i, sep="\t")
    last_column_name = tab_data.columns[-1]
    tab_data.rename(columns={last_column_name: "query"}, inplace=True)
    result.append(tab_data)
fin = pd.concat(result, ignore_index=True)

#To get fasta sequence
with open("fasta_output.fasta", "wt") as fasta_output:
    for i in parser.parse():
        fasta_output.write(i)
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/noatgnu/UniprotWebParser",
    "name": "uniprotparser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "uniprot,protein sequence,database,parser",
    "author": "Toan K. Phung",
    "author_email": "toan.phungkhoiquoctoan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c5/df/bdb51d1e31d64359280d0229c42ac62449c749c65d4b9b247ea955cbde37/uniprotparser-1.2.0.tar.gz",
    "platform": null,
    "description": "UniProt Database Web Parser Project\n--\n[![Downloads](https://static.pepy.tech/personalized-badge/uniprotparser?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/uniprotparser)\n\n\nTLDR: This parser can be used to parse UniProt accession id and obtain related data from the UniProt web database.\n\nTo use:\n\n```bash\npython -m pip install uniprotparser\n```\nor \n\n```bash\npython3 -m pip install uniprotparser\n```\nWith version 1.2.0, we have exposed `to` and `from` mapping parameters for UniProt API where you can indicate which database you want to map to and from.\n\n\n```python\nfrom uniprotparser import get_from_fields, get_to_fields\n\n#to get all available fields to map from\n\nfrom_fields = get_from_fields()\nprint(from_fields)\n\n#to get all available fields to map to\nto_fields = get_to_fields()\nprint(to_fields)\n```\n\nThese parameters can be passed to the `parse` method of the `UniprotParser` class as follow\n\n```python\nfrom uniprotparser.betaparser import UniprotParser\n\nparser = UniprotParser()\nfor p in parser.parse(ids=[\"P06493\"], to_key=\"UniProtKB\", from_key=\"UniProtKB_AC-ID\"):\n    print(p)\n```\n\n\nWith version 1.1.0, a simple CLI interface has been added to the package.\n\n```bash\nUsage: uniprotparser [OPTIONS]\n\nOptions:\n  -i, --input FILENAME   Input file containing a list of accession ids\n  -o, --output FILENAME  Output file\n  --help                 Show this message and exit.\n```\n\nWith version 1.0.5, support for asyncio through `aiohttp` has been added to `betaparser`. Usage can be seen as follow\n\n```python\nfrom uniprotparser.betaparser import UniprotParser\nfrom io import StringIO\nimport asyncio\nimport pandas as pd\n\nasync def main():\n    example_acc_list = [\"Q99490\", \"Q8NEJ0\", \"Q13322\", \"P05019\", \"P35568\", \"Q15323\"]\n    parser = UniprotParser()\n    df = []\n    #Yield result for 500 accession ids at a time\n    async for r in parser.parse_async(ids=example_acc_list):\n        df.append(pd.read_csv(StringIO(r), sep=\"\\t\"))\n    \n    #Check if there were more than one result and consolidate them into one dataframe\n    if len(df) > 0:\n        df = pd.concat(df, ignore_index=True)\n    else:\n        df = df[0]\n\nasyncio.run(main())\n```\n\nWith version 1.0.2, support for the new UniProt REST API have been added under `betaparser` module of the package.\n\nIn order to utilize this new module, you can follow the example bellow\n\n```python\nfrom uniprotparser.betaparser import UniprotParser\nfrom io import StringIO\n\nimport pandas as pd\nexample_acc_list = [\"Q99490\", \"Q8NEJ0\", \"Q13322\", \"P05019\", \"P35568\", \"Q15323\"]\nparser = UniprotParser()\ndf = []\n#Yield result for 500 accession ids at a time\nfor r in parser.parse(ids=example_acc_list):\n    df.append(pd.read_csv(StringIO(r), sep=\"\\t\"))\n\n#Check if there were more than one result and consolidate them into one dataframe\nif len(df) > 0:\n    df = pd.concat(df, ignore_index=True)\nelse:\n    df = df[0]\n\n\n```\n\n---\nTo parse UniProt accession with legacy API\n\n```python\nfrom uniprotparser.parser import UniprotSequence\n\nprotein_id = \"seq|P06493|swiss\"\n\nacc_id = UniprotSequence(protein_id, parse_acc=True)\n\n#Access ACCID\nacc_id.accession\n\n#Access isoform id\nacc_id.isoform\n```\n\nTo get additional data from UniProt online database\n\n```python\nfrom uniprotparser.parser import UniprotParser\nfrom io import StringIO\n#Install pandas first to handle tabulated data\nimport pandas as pd\n\nprotein_accession = \"P06493\"\n\nparser = UniprotParser([protein_accession])\n\n#To get tabulated data\nresult = []\nfor i in parser.parse(\"tab\"):\n    tab_data = pd.read_csv(i, sep=\"\\t\")\n    last_column_name = tab_data.columns[-1]\n    tab_data.rename(columns={last_column_name: \"query\"}, inplace=True)\n    result.append(tab_data)\nfin = pd.concat(result, ignore_index=True)\n\n#To get fasta sequence\nwith open(\"fasta_output.fasta\", \"wt\") as fasta_output:\n    for i in parser.parse():\n        fasta_output.write(i)\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Getting Uniprot Data from Uniprot Accession ID through Uniprot REST API",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/noatgnu/UniprotWebParser",
        "Repository": "https://github.com/noatgnu/UniprotWebParser"
    },
    "split_keywords": [
        "uniprot",
        "protein sequence",
        "database",
        "parser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d50991c817543f32d77b11b22417fb3456e059659a63313d01eff77ea7ee134",
                "md5": "332937a4e36f0bb487270f834b04da62",
                "sha256": "2a190cb8dad9191fb68004121775aa11cd778b6e90a903a8fc61f311eb8db6d9"
            },
            "downloads": -1,
            "filename": "uniprotparser-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "332937a4e36f0bb487270f834b04da62",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 9776,
            "upload_time": "2024-01-19T12:19:45",
            "upload_time_iso_8601": "2024-01-19T12:19:45.991247Z",
            "url": "https://files.pythonhosted.org/packages/0d/50/991c817543f32d77b11b22417fb3456e059659a63313d01eff77ea7ee134/uniprotparser-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5dfbdb51d1e31d64359280d0229c42ac62449c749c65d4b9b247ea955cbde37",
                "md5": "e3ea8d193aa194a9fd27ef016929c940",
                "sha256": "d901206df802cd103a5560c358730fa39879b7af2dbc9af72d2166625e33f876"
            },
            "downloads": -1,
            "filename": "uniprotparser-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e3ea8d193aa194a9fd27ef016929c940",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 8968,
            "upload_time": "2024-01-19T12:19:47",
            "upload_time_iso_8601": "2024-01-19T12:19:47.177961Z",
            "url": "https://files.pythonhosted.org/packages/c5/df/bdb51d1e31d64359280d0229c42ac62449c749c65d4b9b247ea955cbde37/uniprotparser-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-19 12:19:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "noatgnu",
    "github_project": "UniprotWebParser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": [
                [
                    "==",
                    "3.8.4"
                ]
            ]
        },
        {
            "name": "aiosignal",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "async-timeout",
            "specs": [
                [
                    "==",
                    "4.0.2"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "22.2.0"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2022.12.7"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.3"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "frozenlist",
            "specs": [
                [
                    "==",
                    "1.3.3"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.4"
                ]
            ]
        },
        {
            "name": "multidict",
            "specs": [
                [
                    "==",
                    "6.0.4"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.28.2"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "1.26.15"
                ]
            ]
        },
        {
            "name": "yarl",
            "specs": [
                [
                    "==",
                    "1.8.2"
                ]
            ]
        }
    ],
    "lcname": "uniprotparser"
}
        
Elapsed time: 0.24888s