# Utilsovs - 0.9
Utils derived from the [O-GlcNAc Database](https://www.oglcnac.mcw.edu/) code source.
Please report any bugs or incompatibilities.
If you use *utilsovs* in your academic work, please cite:
Malard F, Wulff-Fuentes E, Berendt R, Didier G and Olivier-Van Stichelen S. **Automatization and self-maintenance of the O-GlcNAcome catalogue:
A Smart Scientific Database**. *Database*, Volume 2021, (2021).
## Install
```python
pip3 install utilsovs-pkg
```
Test install with ```pytest``` from the package root directory.
## Content
The package utilsovs contains:
- API wrappers - Proteins from UniProtKB ID ([UniProtKB](https://www.uniprot.org/), [GlyGen](https://www.glygen.org/), [The *O*-GlcNAc Database](https://www.oglcnac.mcw.edu/))
- API wrappers - Literature from PMID ([MedLine/PubMed](https://pubmed.ncbi.nlm.nih.gov/), [Semantic Scholar](https://www.semanticscholar.org/), [ProteomeXchange](http://www.proteomexchange.org/))
- Protein digestion tool: full and partial digestion and MW calculation (monoisotopic, average mass)
- Calculation of log2(odds) from alignment file and generation of sequence logo
- Match residuePosition on sequence fetched from UniProtKB to validate datasets
- Convert PDF to Text using wrappers and repair/clean
- Miscellaneous functions
### API wrappers - Proteins from UniProtKB ID
```python
from utilsovs import *
# Fetch UniProtKB Proteins REST API (@data.url)
data = fetch_one_UniProtKB('P08047',filepath='out.json',pprint=False)
# Fetch The O-GlcNAc Database Proteins REST API (@data.url)
data = fetch_one_oglcnacDB('P08047',filepath='out.json',pprint=False)
# Fetch RESTful Glygen webservice-based APIs (@data.url)
data = fetch_one_GlyGen('P08047',filepath='out.json',pprint=False)
# data is an class instance. To print the data of interest:
print (data.data)
```
### API wrappers - Literature from PubMed IDentifier (PMID)
```python
from utilsovs import *
# Fetch MedLine/PubMed API using Entrez.efetch (@data.url)
data = fetch_one_PubMed('33479245',db="pubmed",filepath='out.json',pprint=False)
# Fetch Semantic Scholar API (@data.url)
data = fetch_one_SemanticScholar('33479245',filepath='out.json',pprint=False)
# Fetch proteomeXchange using GET search request (@data.url)
data = fetch_one_proteomeXchange('29351928',filepath='out.json',pprint=False)
# data is an class instance. To print the data of interest:
print (data.data)
```
### Compute - Digest protein, match residuePosition on sequence or calculate log2(odds) from alignment file and draw consensus sequence logo
```python
from utilsovs import *
# Full digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
data = compute_one_fullDigest('P13693','Trypsin',filepath='out.json')
# Partial digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
# All possible combinations of adjacent fragments are generated
data = compute_one_partialDigest('P13693','Trypsin',filepath='out.json')
# Match residuePosition with UniProtKB ID protein sequence
data = compute_match_aaSeq('P13693','D6',filepath='out.json')
# Compute log2odds from alignment file - Input for draw_one_seqLogo()
data = compute_aln_log2odds('align.aln',organism='HUMAN',filepath='out.json')
# Draw sequence logo from compute_aln_log2odds output file
# See https://logomaker.readthedocs.io/en/latest/implementation.html
# Edit logomaker config in src/ultilsovs_draw.py
draw_one_seqLogo('compute_aln_log2odds.json',filepath='out.png',showplot=False,center_values=False)
# data is an class instance. To print the data of interest:
print (data.data)
```
### Text Processing
```python
from utilsovs import *
# PDF to Text conversion using GNU pdftotext (Linux/Mac) or Tika (Windows) and text repair + cleaning.
data = pdf_one_pdf2text('test.pdf',filepath='out.dat',clean=True)
# data is an class instance. To print the data of interest:
print (data.data)
```
### Miscellaneous standalone functions
Functions below return Python objects or variables.
```python
from utilsovs import *
# Show list of proteases for digest utils
show_proteases()
# Return protein sequence from UniProtKB ID
get_one_sequence('P13693',filepath='out.dat')
# Compute MW of a peptide and return [string,mw_monoisotopic,mw_average]
compute_one_MW('EWENMR',filepath='out.json')
#Compute amino-acids frequency table for a given organism from uniprot_sprot.fasta.gz
get_one_freqAAdict(organism='HUMAN',filepath='out.json')
#Clear all data in utilsovs cache
clearCache()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/synthaze/utilsovs",
"name": "utilsovs-pkg",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "Florian Malard, PhD",
"author_email": "florian.malard@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d2/50/98bb8cd1789e55238dc186e68d42799eb5c7612046b679af679b8a8ed54a/utilsovs-pkg-0.9.5.tar.gz",
"platform": "",
"description": "# Utilsovs - 0.9\n\nUtils derived from the [O-GlcNAc Database](https://www.oglcnac.mcw.edu/) code source.\n\nPlease report any bugs or incompatibilities.\n\nIf you use *utilsovs* in your academic work, please cite:\n\nMalard F, Wulff-Fuentes E, Berendt R, Didier G and Olivier-Van Stichelen S. **Automatization and self-maintenance of the O-GlcNAcome catalogue:\nA Smart Scientific Database**. *Database*, Volume 2021, (2021).\n\n## Install\n\n```python\npip3 install utilsovs-pkg\n```\n\nTest install with ```pytest``` from the package root directory.\n\n## Content\n\nThe package utilsovs contains:\n\n- API wrappers - Proteins from UniProtKB ID ([UniProtKB](https://www.uniprot.org/), [GlyGen](https://www.glygen.org/), [The *O*-GlcNAc Database](https://www.oglcnac.mcw.edu/))\n- API wrappers - Literature from PMID ([MedLine/PubMed](https://pubmed.ncbi.nlm.nih.gov/), [Semantic Scholar](https://www.semanticscholar.org/), [ProteomeXchange](http://www.proteomexchange.org/))\n- Protein digestion tool: full and partial digestion and MW calculation (monoisotopic, average mass)\n- Calculation of log2(odds) from alignment file and generation of sequence logo\n- Match residuePosition on sequence fetched from UniProtKB to validate datasets\n- Convert PDF to Text using wrappers and repair/clean\n- Miscellaneous functions\n\n### API wrappers - Proteins from UniProtKB ID\n\n```python\nfrom utilsovs import *\n\n# Fetch UniProtKB Proteins REST API (@data.url)\ndata = fetch_one_UniProtKB('P08047',filepath='out.json',pprint=False)\n\n# Fetch The O-GlcNAc Database Proteins REST API (@data.url)\ndata = fetch_one_oglcnacDB('P08047',filepath='out.json',pprint=False)\n\n# Fetch RESTful Glygen webservice-based APIs (@data.url)\ndata = fetch_one_GlyGen('P08047',filepath='out.json',pprint=False)\n\n# data is an class instance. To print the data of interest:\nprint (data.data)\n\n```\n\n### API wrappers - Literature from PubMed IDentifier (PMID)\n\n```python\nfrom utilsovs import *\n\n# Fetch MedLine/PubMed API using Entrez.efetch (@data.url)\ndata = fetch_one_PubMed('33479245',db=\"pubmed\",filepath='out.json',pprint=False)\n\n# Fetch Semantic Scholar API (@data.url)\ndata = fetch_one_SemanticScholar('33479245',filepath='out.json',pprint=False)\n\n# Fetch proteomeXchange using GET search request (@data.url)\ndata = fetch_one_proteomeXchange('29351928',filepath='out.json',pprint=False)\n\n# data is an class instance. To print the data of interest:\nprint (data.data)\n\n```\n\n### Compute - Digest protein, match residuePosition on sequence or calculate log2(odds) from alignment file and draw consensus sequence logo\n\n```python\nfrom utilsovs import *\n\n# Full digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]\ndata = compute_one_fullDigest('P13693','Trypsin',filepath='out.json')\n\n# Partial digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]\n# All possible combinations of adjacent fragments are generated\ndata = compute_one_partialDigest('P13693','Trypsin',filepath='out.json')\n\n# Match residuePosition with UniProtKB ID protein sequence\ndata = compute_match_aaSeq('P13693','D6',filepath='out.json')\n\n# Compute log2odds from alignment file - Input for draw_one_seqLogo()\ndata = compute_aln_log2odds('align.aln',organism='HUMAN',filepath='out.json')\n\n# Draw sequence logo from compute_aln_log2odds output file\n# See https://logomaker.readthedocs.io/en/latest/implementation.html\n# Edit logomaker config in src/ultilsovs_draw.py\ndraw_one_seqLogo('compute_aln_log2odds.json',filepath='out.png',showplot=False,center_values=False)\n\n# data is an class instance. To print the data of interest:\nprint (data.data)\n\n```\n\n### Text Processing\n\n```python\nfrom utilsovs import *\n\n# PDF to Text conversion using GNU pdftotext (Linux/Mac) or Tika (Windows) and text repair + cleaning.\ndata = pdf_one_pdf2text('test.pdf',filepath='out.dat',clean=True)\n\n# data is an class instance. To print the data of interest:\nprint (data.data)\n\n```\n\n### Miscellaneous standalone functions\n\nFunctions below return Python objects or variables.\n\n```python\nfrom utilsovs import *\n\n# Show list of proteases for digest utils\nshow_proteases()\n\n# Return protein sequence from UniProtKB ID\nget_one_sequence('P13693',filepath='out.dat')\n\n# Compute MW of a peptide and return [string,mw_monoisotopic,mw_average]\ncompute_one_MW('EWENMR',filepath='out.json')\n\n#Compute amino-acids frequency table for a given organism from uniprot_sprot.fasta.gz\nget_one_freqAAdict(organism='HUMAN',filepath='out.json')\n\n#Clear all data in utilsovs cache\nclearCache()\n\n\n```\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Utils derived from the O-GlcNAc Database source code",
"version": "0.9.5",
"project_urls": {
"Bug Tracker": "https://github.com/synthaze/utilsovs/issues",
"Homepage": "https://github.com/synthaze/utilsovs"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bfa92dc69ac55770d232cb9e60cf9ae4f6381c10d6ba87f04dd734d9193d9a10",
"md5": "2fd808316cc7db5509ff48e177bf1e67",
"sha256": "dc0531c216283ec60616c77e4804b7039d94a045d8bf55e9edaba4a6c8936a44"
},
"downloads": -1,
"filename": "utilsovs_pkg-0.9.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2fd808316cc7db5509ff48e177bf1e67",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 25837,
"upload_time": "2022-02-02T14:53:52",
"upload_time_iso_8601": "2022-02-02T14:53:52.563653Z",
"url": "https://files.pythonhosted.org/packages/bf/a9/2dc69ac55770d232cb9e60cf9ae4f6381c10d6ba87f04dd734d9193d9a10/utilsovs_pkg-0.9.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d25098bb8cd1789e55238dc186e68d42799eb5c7612046b679af679b8a8ed54a",
"md5": "98c397b7c5384edc1dbb3901d52a0bca",
"sha256": "dfa36a7a90495eaf1d4eb07c40743b161d28b59f9ecb691129b7ef8a5fbe4128"
},
"downloads": -1,
"filename": "utilsovs-pkg-0.9.5.tar.gz",
"has_sig": false,
"md5_digest": "98c397b7c5384edc1dbb3901d52a0bca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 23999,
"upload_time": "2022-02-02T14:53:53",
"upload_time_iso_8601": "2022-02-02T14:53:53.921293Z",
"url": "https://files.pythonhosted.org/packages/d2/50/98bb8cd1789e55238dc186e68d42799eb5c7612046b679af679b8a8ed54a/utilsovs-pkg-0.9.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-02-02 14:53:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "synthaze",
"github_project": "utilsovs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "utilsovs-pkg"
}