parseid

Name	parseid JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/Tiezhengyuan/parse_identifier
Summary	suck, parse identifiers or accession numbers
upload_time	2024-05-01 15:06:28
maintainer	None
docs_url	None
author	Tiezheng Yuan
requires_python	None
license	None
keywords	pypi cicd python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            \n# parseID: suck, parse identifiers or accession numbers

## Introduction
parseID is a bioinformatics data structure library optimized for sucking identifiers or 
accession numbers into memory, parse those identifiers accession numbers to each other. 

Identifiers or accession numbers are defined and referenced by various biological databases.
Their number could be million size or even billion level.
Some data operations, such as query or parse, are very common.

parseID employs Data structure "trie" and "ditrie". Trie could suck tremendous identifiers into memory at a time. 
Ditrie could suck a large number of mapping of identifiers. Through the trie and ditrie, 
huge data operations including insert, get, search, delete, scan etc could be quickly called.

## testing

```
pytest -s tests
```

## quick start
There is one example about how huge accession numbers are sucked into Trie.
The mapping file could be downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz into local space.
Retrieve 176,513,729 (03/25/2024) UniProt Accession numbers from the file and feed them into Trie.
Showed as the example below, accession numbers are stored in the object uniprotkb_acc_trie. 
```
from parseid import ProcessID
infile = 'gene_refseq_uniprotkb_collab'
uniprotkb_acc_trie = ProcessID(infile).uniprotkb_protein_accession()
```

Retrieve pairs of NCBI protein accession number and UniProt Accession numbers
from file and feed them into Ditrie. Showed as the example below, 
the mapping fo two accession numbers are stored in the object map_trie, 
which is ready for query or parsing.
```
from parseid import ProcessID
infile = 'gene_refseq_uniprotkb_collab'
ncbi_uniprotkb_ditrie = ProcessID(infile).map_ncbi_uniprotkb()
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Tiezhengyuan/parse_identifier",
    "name": "parseid",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "pypi, cicd, python",
    "author": "Tiezheng Yuan",
    "author_email": "tiezhengyuan@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/46/d7/213d03af70e826d71c2e97973fd76add43efad9e9eb5efce940f6a67119d/parseid-0.2.0.tar.gz",
    "platform": null,
    "description": "\\n# parseID: suck, parse identifiers or accession numbers\n\n## Introduction\nparseID is a bioinformatics data structure library optimized for sucking identifiers or \naccession numbers into memory, parse those identifiers accession numbers to each other. \n\nIdentifiers or accession numbers are defined and referenced by various biological databases.\nTheir number could be million size or even billion level.\nSome data operations, such as query or parse, are very common.\n\nparseID employs Data structure \"trie\" and \"ditrie\". Trie could suck tremendous identifiers into memory at a time. \nDitrie could suck a large number of mapping of identifiers. Through the trie and ditrie, \nhuge data operations including insert, get, search, delete, scan etc could be quickly called.\n\n## testing\n\n```\npytest -s tests\n```\n\n## quick start\nThere is one example about how huge accession numbers are sucked into Trie.\nThe mapping file could be downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz into local space.\nRetrieve 176,513,729 (03/25/2024) UniProt Accession numbers from the file and feed them into Trie.\nShowed as the example below, accession numbers are stored in the object uniprotkb_acc_trie. \n```\nfrom parseid import ProcessID\ninfile = 'gene_refseq_uniprotkb_collab'\nuniprotkb_acc_trie = ProcessID(infile).uniprotkb_protein_accession()\n```\n\nRetrieve pairs of NCBI protein accession number and UniProt Accession numbers\nfrom file and feed them into Ditrie. Showed as the example below, \nthe mapping fo two accession numbers are stored in the object map_trie, \nwhich is ready for query or parsing.\n```\nfrom parseid import ProcessID\ninfile = 'gene_refseq_uniprotkb_collab'\nncbi_uniprotkb_ditrie = ProcessID(infile).map_ncbi_uniprotkb()\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "suck, parse identifiers or accession numbers",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/Tiezhengyuan/parse_identifier"
    },
    "split_keywords": [
        "pypi",
        " cicd",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3483afd20396c8533dea61108ef1b57e6723a2d0fd6c142a157d7c684a8dbe1a",
                "md5": "c5afefc526c19a1ee0bdc649d4ef0b7e",
                "sha256": "5f514763e9567d01de36338efa1b390824b83d5d87810dc5d04231ca1ef8061c"
            },
            "downloads": -1,
            "filename": "parseid-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5afefc526c19a1ee0bdc649d4ef0b7e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8505,
            "upload_time": "2024-05-01T15:06:27",
            "upload_time_iso_8601": "2024-05-01T15:06:27.787541Z",
            "url": "https://files.pythonhosted.org/packages/34/83/afd20396c8533dea61108ef1b57e6723a2d0fd6c142a157d7c684a8dbe1a/parseid-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46d7213d03af70e826d71c2e97973fd76add43efad9e9eb5efce940f6a67119d",
                "md5": "20d9857d76f13f36f03fecda5c8fd461",
                "sha256": "e26142713d1017c776e6cd19dfec9c5755d2309d828adc2d9b6a6c9b89596e2b"
            },
            "downloads": -1,
            "filename": "parseid-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "20d9857d76f13f36f03fecda5c8fd461",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8985,
            "upload_time": "2024-05-01T15:06:28",
            "upload_time_iso_8601": "2024-05-01T15:06:28.965420Z",
            "url": "https://files.pythonhosted.org/packages/46/d7/213d03af70e826d71c2e97973fd76add43efad9e9eb5efce940f6a67119d/parseid-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-01 15:06:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tiezhengyuan",
    "github_project": "parse_identifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "parseid"
}

Tiezheng Yuan