# bibcodex
Library to access, analyze, and display bibliographic information.
[![PyPI version](https://badge.fury.io/py/bibcodex.svg)](https://badge.fury.io/py/bibcodex)
## Installation
pip install bibcodex
## Examples
Import the `pandas` and `bibcodex` together and load a dataframe:
```python
import bibcodex
import pandas as pd
# You should always cast your search variables (pmid, doi) to str.
df = pd.read_csv("data/sample_data.csv", dtype={'pmid':str})
```
Valid download methods are: `icite`, `doi2pmid`, `semanticScholar`, or `pubmed`:
```python
# Set the index to search query
df = df.set_index("doi")
# Download the information, and combine it with the original dataframe:
info = df.bibcodex.download('semanticScholar')
print(df.combine_first(info[["title"]]))
"""
doi title
10.1001/jama.2017.18444 Progressive Massive Fibrosis in Coal Miners Fr...
10.1001/jama.2018.0126 Birth Defects Potentially Related to Zika Viru...
10.1001/jama.2018.0708 Association Between Estimated Cumulative Vacci...
10.1001/jama.2018.10488 Electronic Cigarette Sales in the United State...
"""
```
All search queries are cached locally in `./cache`. To clear the cache use:
```python
df.codex.clear()
```
| API | Returned fields |
| ------------- | ------------- |
| [`pubmed`](https://www.ncbi.nlm.nih.gov/home/develop/api/) | title, issue, pages, abstract, journal, authors, pubdate, mesh_terms, publication_types, chemical_list, keywords, doi, references, delete, languages, vernacular_title, affiliations, pmc, other_id, medline_ta, nlm_unique_id, issn_linking, country |
| [`semanticScholar`](https://www.semanticscholar.org/product/api#Fetch-Paper) | abstract, arxivId, authors, citationVelocity, citations, corpusId, fieldsOfStudy, influentialCitationCount, isOpenAccess, isPublisherLicensed, is_open_access, is_publisher_licensed, numCitedBy, numCiting, paperId, references, s2FieldsOfStudy, title, topics, url, venue, year |
| [`icite`](https://icite.od.nih.gov/api) | year, title, authors, journal, is_research_article, relative_citation_ratio, nih_percentile, human, animal, molecular_cellular, apt, is_clinical, citation_count, citations_per_year, expected_citations_per_year, field_citation_rate, provisional, x_coord, y_coord, cited_by_clin, cited_by, references, doi |
| [`doi2pmid`](https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0) | live, status, errmsg, pmcid, pmid, versions |
## Roadmap
- [x] API access: Pubmed (Parsed MEDLINE data)
- [x] API access: Semantic Scholar (PMID)
- [x] API access: iCite
- [x] API access: Semantic Scholar (DOI)
- [x] API access: DOI to PMID NLM www.ncbi.nlm.nih.gov/pmc/tools/idconv/
- [ ] API access: Pubmed (XML)
- [ ] API access: arXiv
- [ ] API access: CoLIL
- [x] API access, validation of input
- [x] API access, multi item requests
- [x] API access, chunking
- [ ] API access, include status_code in download results
- [ ] API access, better error handling
- [x] API caching, clearing
- [x] Codex, validate PMID
- [x] Codex, validate DOI
- [x] Codex, build dataframe from items
- [x] Testing harness
- [ ] Full testing coverage
- [x] Code linting
- [x] pypi library
- [x] README with examples
- [ ] Status bar for long downloads
- [ ] Embedding functions (SPECTER)
- [ ] Clustering
- [ ] Visualization (streamlit)
## Development
Built with ❤ ️by [@metasemantic](https://twitter.com/metasemantic). Package is linted by [black](https://github.com/psf/black) and conforms to standards by [flake8](https://github.com/PyCQA/flake8). Pull requests accepted, but please provide tests with full coverage for new code.
Raw data
{
"_id": null,
"home_page": "https://github.com/thoppe/bibcodex",
"name": "bibcodex",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "bibliographic,publications,pubmed,NLP",
"author": "Travis Hoppe",
"author_email": "travis.hoppe+{package_name}@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/22/bd/23e4ad603c5adbda3050c10dd0735db0d0fdb1b9f6f0fdfb054e8a99416c/bibcodex-1.1.7.tar.gz",
"platform": null,
"description": "# bibcodex\nLibrary to access, analyze, and display bibliographic information.\n\n[![PyPI version](https://badge.fury.io/py/bibcodex.svg)](https://badge.fury.io/py/bibcodex)\n\n## Installation\n\n pip install bibcodex\n\n## Examples\n\nImport the `pandas` and `bibcodex` together and load a dataframe:\n```python\nimport bibcodex\nimport pandas as pd\n\n# You should always cast your search variables (pmid, doi) to str.\ndf = pd.read_csv(\"data/sample_data.csv\", dtype={'pmid':str})\n```\n\nValid download methods are: `icite`, `doi2pmid`, `semanticScholar`, or `pubmed`:\n\n```python\n# Set the index to search query\ndf = df.set_index(\"doi\")\n\n# Download the information, and combine it with the original dataframe:\ninfo = df.bibcodex.download('semanticScholar')\nprint(df.combine_first(info[[\"title\"]]))\n\n\"\"\"\ndoi title \n10.1001/jama.2017.18444 Progressive Massive Fibrosis in Coal Miners Fr...\n10.1001/jama.2018.0126 Birth Defects Potentially Related to Zika Viru...\n10.1001/jama.2018.0708 Association Between Estimated Cumulative Vacci...\n10.1001/jama.2018.10488 Electronic Cigarette Sales in the United State...\n\"\"\"\n```\n\nAll search queries are cached locally in `./cache`. To clear the cache use:\n\n```python\ndf.codex.clear()\n```\n\n\n| API | Returned fields |\n| ------------- | ------------- |\n| [`pubmed`](https://www.ncbi.nlm.nih.gov/home/develop/api/) | title, issue, pages, abstract, journal, authors, pubdate, mesh_terms, publication_types, chemical_list, keywords, doi, references, delete, languages, vernacular_title, affiliations, pmc, other_id, medline_ta, nlm_unique_id, issn_linking, country |\n| [`semanticScholar`](https://www.semanticscholar.org/product/api#Fetch-Paper) | abstract, arxivId, authors, citationVelocity, citations, corpusId, fieldsOfStudy, influentialCitationCount, isOpenAccess, isPublisherLicensed, is_open_access, is_publisher_licensed, numCitedBy, numCiting, paperId, references, s2FieldsOfStudy, title, topics, url, venue, year |\n| [`icite`](https://icite.od.nih.gov/api) | year, title, authors, journal, is_research_article, relative_citation_ratio, nih_percentile, human, animal, molecular_cellular, apt, is_clinical, citation_count, citations_per_year, expected_citations_per_year, field_citation_rate, provisional, x_coord, y_coord, cited_by_clin, cited_by, references, doi |\n| [`doi2pmid`](https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0) | live, status, errmsg, pmcid, pmid, versions |\n\n\n\n\n## Roadmap\n\n- [x] API access: Pubmed (Parsed MEDLINE data)\n- [x] API access: Semantic Scholar (PMID)\n- [x] API access: iCite\n- [x] API access: Semantic Scholar (DOI)\n- [x] API access: DOI to PMID NLM www.ncbi.nlm.nih.gov/pmc/tools/idconv/\n- [ ] API access: Pubmed (XML)\n- [ ] API access: arXiv\n- [ ] API access: CoLIL\n- [x] API access, validation of input\n- [x] API access, multi item requests\n- [x] API access, chunking\n- [ ] API access, include status_code in download results \n- [ ] API access, better error handling\n- [x] API caching, clearing\n- [x] Codex, validate PMID\n- [x] Codex, validate DOI\n- [x] Codex, build dataframe from items\n- [x] Testing harness\n- [ ] Full testing coverage\n- [x] Code linting\n- [x] pypi library\n- [x] README with examples\n- [ ] Status bar for long downloads\n- [ ] Embedding functions (SPECTER)\n- [ ] Clustering\n- [ ] Visualization (streamlit)\n\n\n## Development\n\nBuilt with \u2764 \ufe0fby [@metasemantic](https://twitter.com/metasemantic). Package is linted by [black](https://github.com/psf/black) and conforms to standards by [flake8](https://github.com/PyCQA/flake8). Pull requests accepted, but please provide tests with full coverage for new code.\n\n\n\n",
"bugtrack_url": null,
"license": "CC-SA",
"summary": "Access, analyze, and display bibliographic information",
"version": "1.1.7",
"project_urls": {
"Homepage": "https://github.com/thoppe/bibcodex"
},
"split_keywords": [
"bibliographic",
"publications",
"pubmed",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "22bd23e4ad603c5adbda3050c10dd0735db0d0fdb1b9f6f0fdfb054e8a99416c",
"md5": "d7788f4fead59375bec0d607fa64269a",
"sha256": "e95bc28a5c203b8ea5125918554500c42872a7d5a0dcfaf50ed5f937aa482ec7"
},
"downloads": -1,
"filename": "bibcodex-1.1.7.tar.gz",
"has_sig": false,
"md5_digest": "d7788f4fead59375bec0d607fa64269a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29618,
"upload_time": "2023-10-11T22:52:39",
"upload_time_iso_8601": "2023-10-11T22:52:39.190510Z",
"url": "https://files.pythonhosted.org/packages/22/bd/23e4ad603c5adbda3050c10dd0735db0d0fdb1b9f6f0fdfb054e8a99416c/bibcodex-1.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-11 22:52:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thoppe",
"github_project": "bibcodex",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "bibcodex"
}