wbtools


Namewbtools JSON
Version 3.0.11 PyPI version JSON
download
home_pagehttps://github.com/WormBase/wbtools
SummaryInterface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions
upload_time2024-08-27 20:12:43
maintainerNone
docs_urlNone
authorValerio Arnaboldi
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # WBtools
> Interface to WormBase curation database and Text Mining functions

Access WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase 
database. The package also exposes text mining functions on papers' fulltext.

## Installation

```pip install wbtools```

## Usage example

### Get sentences from a WormBase paper

```python
from wbtools.literature.corpus import CorpusManager

paper_id = "00050564"
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         paper_ids=[paper_id], file_server_host="file_server_base_url", file_server_user="username", 
                         file_server_passwd="password")
sentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)
```

### Get the latest papers (up to 50) added to WormBase or modified in the last 30 days 

```python
from wbtools.literature.corpus import CorpusManager
import datetime

one_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime("%M/%D/%Y")

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         from_date=one_month_ago, max_num_papers=50, 
                         file_server_host="file_server_base_url", file_server_user="username", 
                         file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
```

### Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)

```python
from wbtools.literature.corpus import CorpusManager
import datetime

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'], 
                         exclude_temp_pdf=True, file_server_host="file_server_base_url", 
                         file_server_user="username", file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/WormBase/wbtools",
    "name": "wbtools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Valerio Arnaboldi",
    "author_email": "valearna@caltech.edu",
    "download_url": "https://files.pythonhosted.org/packages/04/b9/a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99/wbtools-3.0.11.tar.gz",
    "platform": null,
    "description": "# WBtools\n> Interface to WormBase curation database and Text Mining functions\n\nAccess WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase \ndatabase. The package also exposes text mining functions on papers' fulltext.\n\n## Installation\n\n```pip install wbtools```\n\n## Usage example\n\n### Get sentences from a WormBase paper\n\n```python\nfrom wbtools.literature.corpus import CorpusManager\n\npaper_id = \"00050564\"\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n                         paper_ids=[paper_id], file_server_host=\"file_server_base_url\", file_server_user=\"username\", \n                         file_server_passwd=\"password\")\nsentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)\n```\n\n### Get the latest papers (up to 50) added to WormBase or modified in the last 30 days \n\n```python\nfrom wbtools.literature.corpus import CorpusManager\nimport datetime\n\none_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime(\"%M/%D/%Y\")\n\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n                         from_date=one_month_ago, max_num_papers=50, \n                         file_server_host=\"file_server_base_url\", file_server_user=\"username\", \n                         file_server_passwd=\"password\")\npaper_ids = [paper.paper_id for paper in cm.get_all_papers()]\n```\n\n### Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)\n\n```python\nfrom wbtools.literature.corpus import CorpusManager\nimport datetime\n\ncm = CorpusManager()\ncm.load_from_wb_database(db_name=\"wb_dbname\", db_user=\"wb_dbuser\", db_password=\"wb_dbpasswd\", db_host=\"wb_dbhost\",\n                         max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'], \n                         exclude_temp_pdf=True, file_server_host=\"file_server_base_url\", \n                         file_server_user=\"username\", file_server_passwd=\"password\")\npaper_ids = [paper.paper_id for paper in cm.get_all_papers()]\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions",
    "version": "3.0.11",
    "project_urls": {
        "Homepage": "https://github.com/WormBase/wbtools"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9e2c90f2c5ed788311b91d5134565221d752fccf6ce08024940833eb7cf073d",
                "md5": "143158435c71b52b80b0c93a7ba4f1b9",
                "sha256": "5f066e79dbeaeab651fe6dd6433de438d2d06c49343da2693611daa51de87758"
            },
            "downloads": -1,
            "filename": "wbtools-3.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "143158435c71b52b80b0c93a7ba4f1b9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 55563,
            "upload_time": "2024-08-27T20:12:41",
            "upload_time_iso_8601": "2024-08-27T20:12:41.385479Z",
            "url": "https://files.pythonhosted.org/packages/d9/e2/c90f2c5ed788311b91d5134565221d752fccf6ce08024940833eb7cf073d/wbtools-3.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04b9a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99",
                "md5": "8d3cd27179eb1f7c32e9df085d2cb151",
                "sha256": "4f09a2c4d0000e5bf63819f42beb271520d529fa462a0e687a7d5b1bf7fbd280"
            },
            "downloads": -1,
            "filename": "wbtools-3.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "8d3cd27179eb1f7c32e9df085d2cb151",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 41276,
            "upload_time": "2024-08-27T20:12:43",
            "upload_time_iso_8601": "2024-08-27T20:12:43.166674Z",
            "url": "https://files.pythonhosted.org/packages/04/b9/a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99/wbtools-3.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-27 20:12:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "WormBase",
    "github_project": "wbtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "wbtools"
}
        
Elapsed time: 0.29844s