# unpywall - Interfacing the Unpaywall API with Python
[![Build Status](https://circleci.com/gh/unpywall/unpywall.svg?style=shield)](https://app.circleci.com/pipelines/github/unpywall/unpywall)
[![codecov.io](https://codecov.io/gh/unpywall/unpywall/branch/master/graph/badge.svg)](https://codecov.io/gh/unpywall/unpywall?branch=master)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/unpywall)](https://pypi.org/project/unpywall/)
[![License](https://img.shields.io/github/license/unpywall/unpywall)](https://github.com/unpywall/unpywall/blob/master/LICENSE.txt)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4085414.svg)](https://doi.org/10.5281/zenodo.4085414)
[![PyPI - Version](https://img.shields.io/pypi/v/unpywall)](https://pypi.org/project/unpywall/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unpywall)](https://pypi.org/project/unpywall/)
[![Documentation Status](https://readthedocs.org/projects/unpywall/badge/?version=latest)](https://unpywall.readthedocs.io/en/latest/?badge=latest)
## Introduction
unpywall is a Python client that utilizes the [Unpaywall REST API](https://unpaywall.org/products/api) for scholarly analysis with [pandas](https://pandas.pydata.org/). This package is influenced by [roadoi](https://github.com/ropensci/roadoi), a R client that interacts with the Unpaywall API.
You can find more about the Unpaywall service here: https://unpaywall.org/.
The documentation about the Unpaywall REST API is located here: https://unpaywall.org/products/api.
## Install
Install from [pypi](https://pypi.org/project/unpywall/) using pip:
```bash
pip install unpywall
```
## Use
### Authentication
An authentification is required to use the Unpaywall Service. For that, unpywall offers two options for authorizing the client. You can either import `UnpywallCredentials` which generates an environment variable or you can set the environment variable by yourself. Both methods require an email.
```python
from unpywall.utils import UnpywallCredentials
UnpywallCredentials('nick.haupka@gmail.com')
```
Notice that the environment variable for authentication needs to be called `UNPAYWALL_EMAIL`.
```bash
export UNPAYWALL_EMAIL=nick.haupka@gmail.com
```
### Query Unpaywall by DOI
If you want to search articles by a given DOI use the method `doi`. The result is a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).
```python
from unpywall import Unpywall
Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'])
# data_standard ... best_oa_location.version
#0 2 ... publishedVersion
#1 2 ... publishedVersion
#[2 rows x 32 columns]
```
You can track the progress of your API call by setting the parameter `progress` to True. This is especially useful for estimating the time required.
```python
Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],
progress=True)
#|========================= | 50%
```
This method also allows two options for catching errors (`raise` and `ignore`)
```python
Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],
errors='ignore')
```
### Query Unpaywall by text search
If you want to search articles by a given term use the method `query`. The result is a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
```python
Unpywall.query(query='sea lion',
is_oa=True)
# data_standard ... first_oa_location.version
#0 2 ... publishedVersion
#1 2 ... publishedVersion
#2 2 ... publishedVersion
```
### Conveniently obtain full text
If you are using Unpaywall to obtain full-text copies of papers for literature mining, you may benefit from the following functions:
You can use the `download_pdf_handle` method to return a PDF handle for the given DOI.
```python
Unpywall.download_pdf_handle(doi='10.1038/nature12373')
#<http.client.HTTPResponse object at 0x7fd08ef677c0>
```
To return an URL to a PDF for the given DOI, use `get_pdf_link`.
```python
Unpywall.get_pdf_link(doi='10.1038/nature12373')
#'https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf'
```
To return an URL to the best available OA copy, regardless of the format, use `get_doc_link`.
```python
Unpywall.get_doc_link(doi='10.1016/j.envint.2020.105730')
#'https://doi.org/10.1016/j.envint.2020.105730'
```
To return a list of all URLS to OA copies, use `get_all_links`.
```python
Unpywall.get_all_links(doi='10.1038/nature12373')
#['https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf']
```
You can also directly access all data provided by unpaywall in json format using `get_json`.
```python
Unpywall.get_json(doi='10.1038/nature12373')
#{'best_oa_location': {'endpoint_id': '8c9d8ba370a84253deb', 'evidence': 'oa repository (via OAI-PMH doi match)', 'host_type': ...
```
## Command-Line-Interface
unpywall comes with a command-line-interface that can be used to quickly look up a PDF or to download free full-text articles to your device.
### Obtain a PDF URL
Retrieve the URL of a PDF for a given DOI with the following command.
```bash
unpywall link 10.1038/nature12373
```
### View a PDF
If you want to view a PDF in your Browser or on your system use `view`.
```bash
unpywall view 10.1038/nature12373 -m browser
```
### PDF Download
Use `download` if you want to store a PDF on your machine.
```bash
unpywall download 10.1038/nature12373 -f article.pdf -p ./documents
```
### Help
You can always use `help` to open a description for the provided functions.
```bash
unpywall -h
```
## Documentation
Full documentation is available at https://unpywall.readthedocs.io/.
## Develop
To install unpywall, along with dev tools, run:
```bash
pip install -e '.[dev]'
```
Raw data
{
"_id": null,
"home_page": "https://github.com/unpywall/unpywall",
"name": "unpywall",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Unpaywall,Open Access,full text",
"author": "Nick Haupka, bganglia",
"author_email": "nick.haupka@gmail.com, bganglia892@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1b/d4/c7734a4b188db5eba57c50e283ebbc05673ff0ab85b6d9485356f18643de/unpywall-0.2.3.tar.gz",
"platform": null,
"description": "# unpywall - Interfacing the Unpaywall API with Python\n\n[![Build Status](https://circleci.com/gh/unpywall/unpywall.svg?style=shield)](https://app.circleci.com/pipelines/github/unpywall/unpywall)\n[![codecov.io](https://codecov.io/gh/unpywall/unpywall/branch/master/graph/badge.svg)](https://codecov.io/gh/unpywall/unpywall?branch=master)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/unpywall)](https://pypi.org/project/unpywall/)\n[![License](https://img.shields.io/github/license/unpywall/unpywall)](https://github.com/unpywall/unpywall/blob/master/LICENSE.txt)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4085414.svg)](https://doi.org/10.5281/zenodo.4085414)\n[![PyPI - Version](https://img.shields.io/pypi/v/unpywall)](https://pypi.org/project/unpywall/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unpywall)](https://pypi.org/project/unpywall/)\n[![Documentation Status](https://readthedocs.org/projects/unpywall/badge/?version=latest)](https://unpywall.readthedocs.io/en/latest/?badge=latest)\n\n## Introduction\n\nunpywall is a Python client that utilizes the [Unpaywall REST API](https://unpaywall.org/products/api) for scholarly analysis with [pandas](https://pandas.pydata.org/). This package is influenced by [roadoi](https://github.com/ropensci/roadoi), a R client that interacts with the Unpaywall API.\n\nYou can find more about the Unpaywall service here: https://unpaywall.org/.\n\nThe documentation about the Unpaywall REST API is located here: https://unpaywall.org/products/api.\n\n\n## Install\n\nInstall from [pypi](https://pypi.org/project/unpywall/) using pip:\n```bash\npip install unpywall\n```\n\n## Use\n\n### Authentication\n\nAn authentification is required to use the Unpaywall Service. For that, unpywall offers two options for authorizing the client. You can either import `UnpywallCredentials` which generates an environment variable or you can set the environment variable by yourself. Both methods require an email.\n\n```python\nfrom unpywall.utils import UnpywallCredentials\n\nUnpywallCredentials('nick.haupka@gmail.com')\n```\n\nNotice that the environment variable for authentication needs to be called `UNPAYWALL_EMAIL`.\n\n```bash\nexport UNPAYWALL_EMAIL=nick.haupka@gmail.com\n```\n\n### Query Unpaywall by DOI\n\nIf you want to search articles by a given DOI use the method `doi`. The result is a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n\n```python\nfrom unpywall import Unpywall\n\nUnpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'])\n\n# data_standard ... best_oa_location.version\n#0 2 ... publishedVersion\n#1 2 ... publishedVersion\n\n#[2 rows x 32 columns]\n```\n\nYou can track the progress of your API call by setting the parameter `progress` to True. This is especially useful for estimating the time required.\n\n```python\nUnpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],\n progress=True)\n\n#|========================= | 50%\n```\n\nThis method also allows two options for catching errors (`raise` and `ignore`)\n\n```python\nUnpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],\n errors='ignore')\n```\n\n### Query Unpaywall by text search\n\nIf you want to search articles by a given term use the method `query`. The result is a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)\n\n```python\nUnpywall.query(query='sea lion',\n is_oa=True)\n# data_standard ... first_oa_location.version\n#0 2 ... publishedVersion\n#1 2 ... publishedVersion\n#2 2 ... publishedVersion\n```\n\n### Conveniently obtain full text\n\nIf you are using Unpaywall to obtain full-text copies of papers for literature mining, you may benefit from the following functions:\n\nYou can use the `download_pdf_handle` method to return a PDF handle for the given DOI.\n\n```python\nUnpywall.download_pdf_handle(doi='10.1038/nature12373')\n\n#<http.client.HTTPResponse object at 0x7fd08ef677c0>\n```\n\nTo return an URL to a PDF for the given DOI, use `get_pdf_link`.\n\n```python\nUnpywall.get_pdf_link(doi='10.1038/nature12373')\n\n#'https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf'\n```\n\nTo return an URL to the best available OA copy, regardless of the format, use `get_doc_link`.\n\n```python\nUnpywall.get_doc_link(doi='10.1016/j.envint.2020.105730')\n\n#'https://doi.org/10.1016/j.envint.2020.105730'\n```\nTo return a list of all URLS to OA copies, use `get_all_links`.\n\n```python\nUnpywall.get_all_links(doi='10.1038/nature12373')\n\n#['https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf']\n```\n\nYou can also directly access all data provided by unpaywall in json format using `get_json`.\n\n```python\nUnpywall.get_json(doi='10.1038/nature12373')\n\n#{'best_oa_location': {'endpoint_id': '8c9d8ba370a84253deb', 'evidence': 'oa repository (via OAI-PMH doi match)', 'host_type': ...\n```\n\n## Command-Line-Interface\n\nunpywall comes with a command-line-interface that can be used to quickly look up a PDF or to download free full-text articles to your device.\n\n### Obtain a PDF URL\n\nRetrieve the URL of a PDF for a given DOI with the following command.\n\n```bash\nunpywall link 10.1038/nature12373\n```\n\n### View a PDF\n\nIf you want to view a PDF in your Browser or on your system use `view`.\n\n```bash\nunpywall view 10.1038/nature12373 -m browser\n```\n\n### PDF Download\n\nUse `download` if you want to store a PDF on your machine.\n\n```bash\nunpywall download 10.1038/nature12373 -f article.pdf -p ./documents\n```\n\n### Help\n\nYou can always use `help` to open a description for the provided functions.\n\n```bash\nunpywall -h\n```\n\n## Documentation\n\nFull documentation is available at https://unpywall.readthedocs.io/.\n\n## Develop\n\nTo install unpywall, along with dev tools, run:\n\n```bash\npip install -e '.[dev]'\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Interfacing the Unpaywall Database with Python",
"version": "0.2.3",
"project_urls": {
"Documentation": "https://unpywall.readthedocs.io/en/latest/",
"Homepage": "https://github.com/unpywall/unpywall",
"Source": "https://github.com/unpywall/unpywall",
"Tracker": "https://github.com/unpywall/unpywall/issues"
},
"split_keywords": [
"unpaywall",
"open access",
"full text"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1bd4c7734a4b188db5eba57c50e283ebbc05673ff0ab85b6d9485356f18643de",
"md5": "7f4b5463743d24b147e11518ccc08842",
"sha256": "4b1977d4e90ae5638a851bf1a8c1c04aa092ad3f8c0137aae0d0c07038f86e68"
},
"downloads": -1,
"filename": "unpywall-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "7f4b5463743d24b147e11518ccc08842",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15417,
"upload_time": "2024-02-19T10:21:11",
"upload_time_iso_8601": "2024-02-19T10:21:11.677277Z",
"url": "https://files.pythonhosted.org/packages/1b/d4/c7734a4b188db5eba57c50e283ebbc05673ff0ab85b6d9485356f18643de/unpywall-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-19 10:21:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "unpywall",
"github_project": "unpywall",
"travis_ci": false,
"coveralls": true,
"github_actions": false,
"circle": true,
"requirements": [],
"lcname": "unpywall"
}