# cazy-parser
_A way to extract specific information from the Carbohydrate-Active enZYmes._
[![Downloads](https://pepy.tech/badge/cazy-parser)](https://pepy.tech/project/cazy-parser)
[![status](http://joss.theoj.org/papers/f709afe5d720fc6eee82fca277942a46/status.svg)](http://joss.theoj.org/papers/f709afe5d720fc6eee82fca277942a46)
[![unittests](https://github.com/rvhonorato/cazy-parser/actions/workflows/unittests.yml/badge.svg?branch=main)](https://github.com/rvhonorato/cazy-parser/actions/workflows/unittests.yml)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/33f087332ec24da689268a13d2f4ca23)](https://www.codacy.com/gh/rvhonorato/cazy-parser/dashboard?utm_source=github.com&utm_medium=referral&utm_content=rvhonorato/cazy-parser&utm_campaign=Badge_Grade)
[![Codacy Badge](https://app.codacy.com/project/badge/Coverage/33f087332ec24da689268a13d2f4ca23)](https://www.codacy.com/gh/rvhonorato/cazy-parser/dashboard?utm_source=github.com&utm_medium=referral&utm_content=rvhonorato/cazy-parser&utm_campaign=Badge_Coverage)
Make sure to visit and cite the CAZy website!
- <http://www.cazy.org/>
- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM,
Henrissat B (2014) The Carbohydrate-active enzymes database
(CAZy) in 2013. **Nucleic Acids Res** 42:D490–D495. [PMID: [24270786](http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&cmd=search&term=24270786)].
License: [GNU GPLv3](https://www.gnu.org/licenses/gpl-3.0.html)
RV Honorato. CAZy-parser a way to extract information from
the Carbohydrate-Active enZYmes Database.
The Journal of Open Source Software\_, 1(8), dec 2016.
[10.21105/joss.00053](https://github.com/openjournals/joss-papers/blob/master/joss.00053/10.21105.joss.00053.pdf)
## Introduction
_cazy-parser_ is a tool that extract information from
[CAZy](http://www.cazy.org/) in a more usable and readable format.
Firstly, a script reads the HTML structure and creates a mirror of the
database as a tab delimited file. Secondly, information is extracted from
the database according to user inputted parameters and presented to the user
as a set of accession codes.
## Install / Upgrade
```text
pip install --upgrade cazy-parser
```
## Usage (internet connection required)
```text
cazy-parser -h
usage: cazy-parser [-h] [-f FAMILY] [-s SUBFAMILY] [-c CHARACTERIZED] [-v] {GH,GT,PL,CA,AA}
positional arguments:
{GH,GT,PL,CA,AA}
optional arguments:
-h, --help show this help message and exit
-f FAMILY, --family FAMILY
-s SUBFAMILY, --subfamily SUBFAMILY
-c CHARACTERIZED, --characterized CHARACTERIZED
-v, --version show version
```
### Example
Extract all fasta sequences from family 43 of Glycoside Hydrolase subfamily 1
```text
$ cazy-parser GH -f 43 -s 1
[2022-05-26 16:39:21,511 91 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 92 INFO]
[2022-05-26 16:39:21,511 93 INFO] ┌─┐┌─┐┌─┐┬ ┬ ┌─┐┌─┐┬─┐┌─┐┌─┐┬─┐
[2022-05-26 16:39:21,511 94 INFO] │ ├─┤┌─┘└┬┘───├─┘├─┤├┬┘└─┐├┤ ├┬┘
[2022-05-26 16:39:21,511 95 INFO] └─┘┴ ┴└─┘ ┴ ┴ ┴ ┴┴└─└─┘└─┘┴└─ v2.0.1
[2022-05-26 16:39:21,511 96 INFO]
[2022-05-26 16:39:21,511 97 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 183 INFO] Fetching links for Glycoside-Hydrolases, url: http://www.cazy.org/Glycoside-Hydrolases.html
[2022-05-26 16:39:22,454 189 INFO] Only using links of family 43 subfamily 1
[2022-05-26 16:39:23,029 26 INFO] Dowloading 1415 fasta sequences...
[2022-05-26 16:40:32,187 51 INFO] Dumping fasta sequences to file GH43_1_26052022.fasta
```
This will generate the following file `GH43_1_DDMMYYY.fasta`
containing the fasta sequences.
## To-do and how to contribute
Please refer to [CONTRIBUTING](CONTRIBUTING.md) 🤓
Raw data
{
"_id": null,
"home_page": "",
"name": "cazy-parser",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "cazy,database,datamining",
"author": "Rodrigo V. Honorato",
"author_email": "rvhonorato@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/66/ca/7c4a75991dcc268b7be0256d05e9a7ca43137b8b0195907e6faf0446c3c5/cazy_parser-2.0.3.tar.gz",
"platform": null,
"description": "# cazy-parser\n\n_A way to extract specific information from the Carbohydrate-Active enZYmes._\n\n[![Downloads](https://pepy.tech/badge/cazy-parser)](https://pepy.tech/project/cazy-parser)\n[![status](http://joss.theoj.org/papers/f709afe5d720fc6eee82fca277942a46/status.svg)](http://joss.theoj.org/papers/f709afe5d720fc6eee82fca277942a46)\n[![unittests](https://github.com/rvhonorato/cazy-parser/actions/workflows/unittests.yml/badge.svg?branch=main)](https://github.com/rvhonorato/cazy-parser/actions/workflows/unittests.yml)\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/33f087332ec24da689268a13d2f4ca23)](https://www.codacy.com/gh/rvhonorato/cazy-parser/dashboard?utm_source=github.com&utm_medium=referral&utm_content=rvhonorato/cazy-parser&utm_campaign=Badge_Grade)\n[![Codacy Badge](https://app.codacy.com/project/badge/Coverage/33f087332ec24da689268a13d2f4ca23)](https://www.codacy.com/gh/rvhonorato/cazy-parser/dashboard?utm_source=github.com&utm_medium=referral&utm_content=rvhonorato/cazy-parser&utm_campaign=Badge_Coverage)\n\nMake sure to visit and cite the CAZy website!\n\n- <http://www.cazy.org/>\n- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM,\n Henrissat B (2014) The Carbohydrate-active enzymes database\n (CAZy) in 2013. **Nucleic Acids Res** 42:D490\u2013D495. [PMID: [24270786](http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&cmd=search&term=24270786)].\n\nLicense: [GNU GPLv3](https://www.gnu.org/licenses/gpl-3.0.html)\n\nRV Honorato. CAZy-parser a way to extract information from\nthe Carbohydrate-Active enZYmes Database.\nThe Journal of Open Source Software\\_, 1(8), dec 2016.\n[10.21105/joss.00053](https://github.com/openjournals/joss-papers/blob/master/joss.00053/10.21105.joss.00053.pdf)\n\n## Introduction\n\n_cazy-parser_ is a tool that extract information from\n[CAZy](http://www.cazy.org/) in a more usable and readable format.\nFirstly, a script reads the HTML structure and creates a mirror of the\ndatabase as a tab delimited file. Secondly, information is extracted from\nthe database according to user inputted parameters and presented to the user\nas a set of accession codes.\n\n## Install / Upgrade\n\n```text\npip install --upgrade cazy-parser\n```\n\n## Usage (internet connection required)\n\n```text\ncazy-parser -h\nusage: cazy-parser [-h] [-f FAMILY] [-s SUBFAMILY] [-c CHARACTERIZED] [-v] {GH,GT,PL,CA,AA}\n\npositional arguments:\n {GH,GT,PL,CA,AA}\n\noptional arguments:\n -h, --help show this help message and exit\n -f FAMILY, --family FAMILY\n -s SUBFAMILY, --subfamily SUBFAMILY\n -c CHARACTERIZED, --characterized CHARACTERIZED\n -v, --version show version\n```\n\n### Example\n\nExtract all fasta sequences from family 43 of Glycoside Hydrolase subfamily 1\n\n```text\n$ cazy-parser GH -f 43 -s 1\n [2022-05-26 16:39:21,511 91 INFO] ------------------------------------------\n [2022-05-26 16:39:21,511 92 INFO]\n [2022-05-26 16:39:21,511 93 INFO] \u250c\u2500\u2510\u250c\u2500\u2510\u250c\u2500\u2510\u252c \u252c \u250c\u2500\u2510\u250c\u2500\u2510\u252c\u2500\u2510\u250c\u2500\u2510\u250c\u2500\u2510\u252c\u2500\u2510\n [2022-05-26 16:39:21,511 94 INFO] \u2502 \u251c\u2500\u2524\u250c\u2500\u2518\u2514\u252c\u2518\u2500\u2500\u2500\u251c\u2500\u2518\u251c\u2500\u2524\u251c\u252c\u2518\u2514\u2500\u2510\u251c\u2524 \u251c\u252c\u2518\n [2022-05-26 16:39:21,511 95 INFO] \u2514\u2500\u2518\u2534 \u2534\u2514\u2500\u2518 \u2534 \u2534 \u2534 \u2534\u2534\u2514\u2500\u2514\u2500\u2518\u2514\u2500\u2518\u2534\u2514\u2500 v2.0.1\n [2022-05-26 16:39:21,511 96 INFO]\n [2022-05-26 16:39:21,511 97 INFO] ------------------------------------------\n [2022-05-26 16:39:21,511 183 INFO] Fetching links for Glycoside-Hydrolases, url: http://www.cazy.org/Glycoside-Hydrolases.html\n [2022-05-26 16:39:22,454 189 INFO] Only using links of family 43 subfamily 1\n [2022-05-26 16:39:23,029 26 INFO] Dowloading 1415 fasta sequences...\n [2022-05-26 16:40:32,187 51 INFO] Dumping fasta sequences to file GH43_1_26052022.fasta\n```\n\nThis will generate the following file `GH43_1_DDMMYYY.fasta`\ncontaining the fasta sequences.\n\n## To-do and how to contribute\n\nPlease refer to [CONTRIBUTING](CONTRIBUTING.md) \ud83e\udd13\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "A way to extract specific information from CAZy",
"version": "2.0.3",
"project_urls": null,
"split_keywords": [
"cazy",
"database",
"datamining"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d21de3d8748d82c4f995b1599d5a574b169ea7b174c1c2a382bc194f4628db06",
"md5": "4956403eb79d333861e1a25663787204",
"sha256": "beff5ec5845e2f1dc45d43b584a003920a68f3cb1c880bd74fd576edb177b9fa"
},
"downloads": -1,
"filename": "cazy_parser-2.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4956403eb79d333861e1a25663787204",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 21389,
"upload_time": "2023-10-12T10:13:19",
"upload_time_iso_8601": "2023-10-12T10:13:19.582232Z",
"url": "https://files.pythonhosted.org/packages/d2/1d/e3d8748d82c4f995b1599d5a574b169ea7b174c1c2a382bc194f4628db06/cazy_parser-2.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "66ca7c4a75991dcc268b7be0256d05e9a7ca43137b8b0195907e6faf0446c3c5",
"md5": "c391b89f9918c12afde6a9c9ec5fc4ac",
"sha256": "f74fb33a9106a3d402870a3ca757d1cbf94e0ee8b6321695a97b8e7a28f632a9"
},
"downloads": -1,
"filename": "cazy_parser-2.0.3.tar.gz",
"has_sig": false,
"md5_digest": "c391b89f9918c12afde6a9c9ec5fc4ac",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 20785,
"upload_time": "2023-10-12T10:13:21",
"upload_time_iso_8601": "2023-10-12T10:13:21.187849Z",
"url": "https://files.pythonhosted.org/packages/66/ca/7c4a75991dcc268b7be0256d05e9a7ca43137b8b0195907e6faf0446c3c5/cazy_parser-2.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-12 10:13:21",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "cazy-parser"
}