| Name | wbtools JSON |
| Version |
3.0.11
JSON |
| download |
| home_page | https://github.com/WormBase/wbtools |
| Summary | Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions |
| upload_time | 2024-08-27 20:12:43 |
| maintainer | None |
| docs_url | None |
| author | Valerio Arnaboldi |
| requires_python | >=3.6 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# WBtools
> Interface to WormBase curation database and Text Mining functions
Access WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase
database. The package also exposes text mining functions on papers' fulltext.
## Installation
```pip install wbtools```
## Usage example
### Get sentences from a WormBase paper
```python
from wbtools.literature.corpus import CorpusManager
paper_id = "00050564"
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
paper_ids=[paper_id], file_server_host="file_server_base_url", file_server_user="username",
file_server_passwd="password")
sentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)
```
### Get the latest papers (up to 50) added to WormBase or modified in the last 30 days
```python
from wbtools.literature.corpus import CorpusManager
import datetime
one_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime("%M/%D/%Y")
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
from_date=one_month_ago, max_num_papers=50,
file_server_host="file_server_base_url", file_server_user="username",
file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
```
### Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)
```python
from wbtools.literature.corpus import CorpusManager
import datetime
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'],
exclude_temp_pdf=True, file_server_host="file_server_base_url",
file_server_user="username", file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/WormBase/wbtools",
"name": "wbtools",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Valerio Arnaboldi",
"author_email": "valearna@caltech.edu",
"download_url": "https://files.pythonhosted.org/packages/04/b9/a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99/wbtools-3.0.11.tar.gz",
"platform": null,
"description": "# WBtools\n> Interface to WormBase curation database and Text Mining functions\n\nAccess WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase \ndatabase. The package also exposes text mining functions on papers' fulltext.\n\n## Installation\n\n```pip install wbtools```\n\n## Usage example\n\n### Get sentences from a WormBase paper\n\n```python\nfrom wbtools.literature.corpus import CorpusManager\n\npaper_id = \"00050564\"\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n paper_ids=[paper_id], file_server_host=\"file_server_base_url\", file_server_user=\"username\", \n file_server_passwd=\"password\")\nsentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)\n```\n\n### Get the latest papers (up to 50) added to WormBase or modified in the last 30 days \n\n```python\nfrom wbtools.literature.corpus import CorpusManager\nimport datetime\n\none_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime(\"%M/%D/%Y\")\n\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n from_date=one_month_ago, max_num_papers=50, \n file_server_host=\"file_server_base_url\", file_server_user=\"username\", \n file_server_passwd=\"password\")\npaper_ids = [paper.paper_id for paper in cm.get_all_papers()]\n```\n\n### Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)\n\n```python\nfrom wbtools.literature.corpus import CorpusManager\nimport datetime\n\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'], \n exclude_temp_pdf=True, file_server_host=\"file_server_base_url\", \n file_server_user=\"username\", file_server_passwd=\"password\")\npaper_ids = [paper.paper_id for paper in cm.get_all_papers()]\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions",
"version": "3.0.11",
"project_urls": {
"Homepage": "https://github.com/WormBase/wbtools"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d9e2c90f2c5ed788311b91d5134565221d752fccf6ce08024940833eb7cf073d",
"md5": "143158435c71b52b80b0c93a7ba4f1b9",
"sha256": "5f066e79dbeaeab651fe6dd6433de438d2d06c49343da2693611daa51de87758"
},
"downloads": -1,
"filename": "wbtools-3.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "143158435c71b52b80b0c93a7ba4f1b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 55563,
"upload_time": "2024-08-27T20:12:41",
"upload_time_iso_8601": "2024-08-27T20:12:41.385479Z",
"url": "https://files.pythonhosted.org/packages/d9/e2/c90f2c5ed788311b91d5134565221d752fccf6ce08024940833eb7cf073d/wbtools-3.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "04b9a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99",
"md5": "8d3cd27179eb1f7c32e9df085d2cb151",
"sha256": "4f09a2c4d0000e5bf63819f42beb271520d529fa462a0e687a7d5b1bf7fbd280"
},
"downloads": -1,
"filename": "wbtools-3.0.11.tar.gz",
"has_sig": false,
"md5_digest": "8d3cd27179eb1f7c32e9df085d2cb151",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 41276,
"upload_time": "2024-08-27T20:12:43",
"upload_time_iso_8601": "2024-08-27T20:12:43.166674Z",
"url": "https://files.pythonhosted.org/packages/04/b9/a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99/wbtools-3.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-27 20:12:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "WormBase",
"github_project": "wbtools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "wbtools"
}