# sampleids
Uniform sample ID parser
Available on PyPI at https://pypi.org/project/sampleids/
Available on Github at https://github.com/tmcqueen-materials/sampleids
## Quick Start
1. Install: `pip3 install sampleids`
2. In code, do:
```
from sampleids import parse as sid_parse, CONFIDENCE as sid_CONFIDENCE
res = sid_parse("AAA_BBB_YYYYMMDD_C_III_S_(QQQQQQQQQQ)-EE", ["AAA",...], ["BBB",...], ["III",...])
print(res) # will print the tuple SampleID(lab_id='AAA', tool_id='BBB', date='20241001', sample_id='C', provenance_id=['III'], split_id='S', parents=[SampleID(lab_id='', tool_id='', date='', sample_id='', provenance_id=[], split_id='', parents=[], extra='', raw='QQQQQQQQQQ', confidence=<CONFIDENCE.NONE: 0>, why='P_PARENT1_PI_V1_6L01_PI_V1_NOPARSE')], extra='EE', raw='AAA_BBB_20241001_C_III_S_(QQQQQQQQQQ)-EE', confidence=<CONFIDENCE.HIGH: 3>, why='P_PARENT1_PI_V1_6L01')
# You can check if confidence is greater than a minimum value, e.g.:
if res.confidence > sid_CONFIDENCE.LOW:
print("Confidence is not low!")
# The "why" string gives a log of the code paths taken by the parser.
# If you find a case that fails to parse, and you think it should,
# or a case it parses incorrectly, be sure to include the why string!
print(res.why) # prints 'P_PARENT1_PI_V1_6L01'
```
## Specification
This module parses sample identifiers following the schema described at https://occamy.chemistry.jhu.edu/references/samples/index.php . It is a lenient parser, to account for variations observed in the real world, e.g. swapping of month and date, or swapping of identifier fragments.
## Version Compatibility
sampleids is compatible with all versions of Python 3.4+.
Raw data
{
"_id": null,
"home_page": "https://github.com/tmcqueen-materials/sampleids",
"name": "sampleids",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.4",
"maintainer_email": null,
"keywords": null,
"author": "Tyrel M. McQueen",
"author_email": "tmcqueen-pypi@demoivre.com",
"download_url": "https://files.pythonhosted.org/packages/2e/4b/3210778780f39f45b5aeaff2e57d6cfc2ab134315b8f6eab7805ccb5ec7a/sampleids-0.0.2.tar.gz",
"platform": null,
"description": "# sampleids\nUniform sample ID parser\n\nAvailable on PyPI at https://pypi.org/project/sampleids/\nAvailable on Github at https://github.com/tmcqueen-materials/sampleids\n\n## Quick Start\n\n1. Install: `pip3 install sampleids`\n2. In code, do:\n```\nfrom sampleids import parse as sid_parse, CONFIDENCE as sid_CONFIDENCE\n\nres = sid_parse(\"AAA_BBB_YYYYMMDD_C_III_S_(QQQQQQQQQQ)-EE\", [\"AAA\",...], [\"BBB\",...], [\"III\",...])\nprint(res) # will print the tuple SampleID(lab_id='AAA', tool_id='BBB', date='20241001', sample_id='C', provenance_id=['III'], split_id='S', parents=[SampleID(lab_id='', tool_id='', date='', sample_id='', provenance_id=[], split_id='', parents=[], extra='', raw='QQQQQQQQQQ', confidence=<CONFIDENCE.NONE: 0>, why='P_PARENT1_PI_V1_6L01_PI_V1_NOPARSE')], extra='EE', raw='AAA_BBB_20241001_C_III_S_(QQQQQQQQQQ)-EE', confidence=<CONFIDENCE.HIGH: 3>, why='P_PARENT1_PI_V1_6L01')\n\n# You can check if confidence is greater than a minimum value, e.g.:\nif res.confidence > sid_CONFIDENCE.LOW:\n print(\"Confidence is not low!\")\n\n# The \"why\" string gives a log of the code paths taken by the parser.\n# If you find a case that fails to parse, and you think it should,\n# or a case it parses incorrectly, be sure to include the why string!\nprint(res.why) # prints 'P_PARENT1_PI_V1_6L01'\n```\n\n## Specification\n\nThis module parses sample identifiers following the schema described at https://occamy.chemistry.jhu.edu/references/samples/index.php . It is a lenient parser, to account for variations observed in the real world, e.g. swapping of month and date, or swapping of identifier fragments.\n\n## Version Compatibility\nsampleids is compatible with all versions of Python 3.4+.\n\n\n\n",
"bugtrack_url": null,
"license": "GNU GPLv2",
"summary": "Uniform Sample Identifiers Parser",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/tmcqueen-materials/sampleids"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "13f6bc8e1a0437a72ea1f5629705ec8b15944b1f7db66d2573065db82329b2c4",
"md5": "e080ddadba0dd48d031be61f0c9fd134",
"sha256": "9a7a74a038c37e9cfe92f264b24d95290eb8b3de555197d292df3cb2dd35fdb6"
},
"downloads": -1,
"filename": "sampleids-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e080ddadba0dd48d031be61f0c9fd134",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.4",
"size": 15181,
"upload_time": "2024-10-08T18:25:02",
"upload_time_iso_8601": "2024-10-08T18:25:02.074230Z",
"url": "https://files.pythonhosted.org/packages/13/f6/bc8e1a0437a72ea1f5629705ec8b15944b1f7db66d2573065db82329b2c4/sampleids-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2e4b3210778780f39f45b5aeaff2e57d6cfc2ab134315b8f6eab7805ccb5ec7a",
"md5": "b4c41ecbb2c04df786cbf1eff27c1ad8",
"sha256": "28872b39326d3534ce4fac4514f47da6970d2142b5923ffb641653b492d02388"
},
"downloads": -1,
"filename": "sampleids-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "b4c41ecbb2c04df786cbf1eff27c1ad8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4",
"size": 13809,
"upload_time": "2024-10-08T18:25:03",
"upload_time_iso_8601": "2024-10-08T18:25:03.670240Z",
"url": "https://files.pythonhosted.org/packages/2e/4b/3210778780f39f45b5aeaff2e57d6cfc2ab134315b8f6eab7805ccb5ec7a/sampleids-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-08 18:25:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tmcqueen-materials",
"github_project": "sampleids",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sampleids"
}