arxiv


Namearxiv JSON
Version 2.1.3 PyPI version JSON
download
home_pagehttps://github.com/lukasschwab/arxiv.py
SummaryPython wrapper for the arXiv API: https://arxiv.org/help/api/
upload_time2024-06-25 02:56:20
maintainerNone
docs_urlNone
authorLukas Schwab
requires_python>=3.7
licenseMIT
keywords arxiv api wrapper academic journals papers
VCS
bugtrack_url
requirements feedparser requests pytest ruff pdoc pip-audit
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # arxiv.py
[![PyPI](https://img.shields.io/pypi/v/arxiv)](https://pypi.org/project/arxiv/) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arxiv) [![GitHub Workflow Status (branch)](https://img.shields.io/github/actions/workflow/status/lukasschwab/arxiv.py/python-package.yml?branch=master)](https://github.com/lukasschwab/arxiv.py/actions?query=branch%3Amaster) [![Full package documentation](https://img.shields.io/badge/docs-hosted-brightgreen)](https://lukasschwab.me/arxiv.py/index.html)

Python wrapper for [the arXiv API](https://arxiv.org/help/api/index).

[arXiv](https://arxiv.org/) is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

## Usage

### Installation

```bash
$ pip install arxiv
```

In your Python script, include the line

```python
import arxiv
```

### Examples

#### Fetching results

```python
import arxiv

# Construct the default API client.
client = arxiv.Client()

# Search for the 10 most recent articles matching the keyword "quantum."
search = arxiv.Search(
  query = "quantum",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)

# `results` is a generator; you can iterate over its elements one by one...
for r in client.results(search):
  print(r.title)
# ...or exhaust it into a list. Careful: this is slow for large results sets.
all_results = list(results)
print([r.title for r in all_results])

# For advanced query syntax documentation, see the arXiv API User Manual:
# https://arxiv.org/help/api/user-manual#query_details
search = arxiv.Search(query = "au:del_maestro AND ti:checkerboard")
first_result = next(client.results(search))
print(first_result)

# Search for the paper with ID "1605.08386v1"
search_by_id = arxiv.Search(id_list=["1605.08386v1"])
# Reuse client to fetch the paper, then print its title.
first_result = next(client.results(search))
print(first_result.title)
```

#### Downloading papers

To download a PDF of the paper with ID "1605.08386v1," run a `Search` and then use `Result.download_pdf()`:

```python
import arxiv

paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
# Download the PDF to the PWD with a default filename.
paper.download_pdf()
# Download the PDF to the PWD with a custom filename.
paper.download_pdf(filename="downloaded-paper.pdf")
# Download the PDF to a specified directory with a custom filename.
paper.download_pdf(dirpath="./mydir", filename="downloaded-paper.pdf")
```

The same interface is available for downloading .tar.gz files of the paper source:

```python
import arxiv

paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
# Download the archive to the PWD with a default filename.
paper.download_source()
# Download the archive to the PWD with a custom filename.
paper.download_source(filename="downloaded-paper.tar.gz")
# Download the archive to a specified directory with a custom filename.
paper.download_source(dirpath="./mydir", filename="downloaded-paper.tar.gz")
```

#### Fetching results with a custom client

```python
import arxiv

big_slow_client = arxiv.Client(
  page_size = 1000,
  delay_seconds = 10.0,
  num_retries = 5
)

# Prints 1000 titles before needing to make another request.
for result in big_slow_client.results(arxiv.Search(query="quantum")):
  print(result.title)
```

#### Logging

To inspect this package's network behavior and API logic, configure a `DEBUG`-level logger.

```pycon
>>> import logging, arxiv
>>> logging.basicConfig(level=logging.DEBUG)
>>> client = arxiv.Client()
>>> paper = next(client.results(arxiv.Search(id_list=["1605.08386v1"])))
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979
```

## Types 

### Client

A `Client` specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.

Clients configurations specify pagination and retry logic. *Reusing* a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.

### Search

A `Search` specifies a search of arXiv's database. Use `Client.results` to get a generator yielding `Result`s.

### Result

The `Result` objects yielded by `Client.results` include metadata about each paper and helper methods for downloading their content.

The meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).

`Result` also exposes helper methods for downloading papers: `Result.download_pdf` and `Result.download_source`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lukasschwab/arxiv.py",
    "name": "arxiv",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "arxiv api wrapper academic journals papers",
    "author": "Lukas Schwab",
    "author_email": "lukas.schwab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fe/59/fe41f54bdfed776c2e9bcd6289e4c71349eb938241d89b4c97d0f33e8013/arxiv-2.1.3.tar.gz",
    "platform": null,
    "description": "# arxiv.py\n[![PyPI](https://img.shields.io/pypi/v/arxiv)](https://pypi.org/project/arxiv/) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arxiv) [![GitHub Workflow Status (branch)](https://img.shields.io/github/actions/workflow/status/lukasschwab/arxiv.py/python-package.yml?branch=master)](https://github.com/lukasschwab/arxiv.py/actions?query=branch%3Amaster) [![Full package documentation](https://img.shields.io/badge/docs-hosted-brightgreen)](https://lukasschwab.me/arxiv.py/index.html)\n\nPython wrapper for [the arXiv API](https://arxiv.org/help/api/index).\n\n[arXiv](https://arxiv.org/) is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.\n\n## Usage\n\n### Installation\n\n```bash\n$ pip install arxiv\n```\n\nIn your Python script, include the line\n\n```python\nimport arxiv\n```\n\n### Examples\n\n#### Fetching results\n\n```python\nimport arxiv\n\n# Construct the default API client.\nclient = arxiv.Client()\n\n# Search for the 10 most recent articles matching the keyword \"quantum.\"\nsearch = arxiv.Search(\n  query = \"quantum\",\n  max_results = 10,\n  sort_by = arxiv.SortCriterion.SubmittedDate\n)\n\nresults = client.results(search)\n\n# `results` is a generator; you can iterate over its elements one by one...\nfor r in client.results(search):\n  print(r.title)\n# ...or exhaust it into a list. Careful: this is slow for large results sets.\nall_results = list(results)\nprint([r.title for r in all_results])\n\n# For advanced query syntax documentation, see the arXiv API User Manual:\n# https://arxiv.org/help/api/user-manual#query_details\nsearch = arxiv.Search(query = \"au:del_maestro AND ti:checkerboard\")\nfirst_result = next(client.results(search))\nprint(first_result)\n\n# Search for the paper with ID \"1605.08386v1\"\nsearch_by_id = arxiv.Search(id_list=[\"1605.08386v1\"])\n# Reuse client to fetch the paper, then print its title.\nfirst_result = next(client.results(search))\nprint(first_result.title)\n```\n\n#### Downloading papers\n\nTo download a PDF of the paper with ID \"1605.08386v1,\" run a `Search` and then use `Result.download_pdf()`:\n\n```python\nimport arxiv\n\npaper = next(arxiv.Client().results(arxiv.Search(id_list=[\"1605.08386v1\"])))\n# Download the PDF to the PWD with a default filename.\npaper.download_pdf()\n# Download the PDF to the PWD with a custom filename.\npaper.download_pdf(filename=\"downloaded-paper.pdf\")\n# Download the PDF to a specified directory with a custom filename.\npaper.download_pdf(dirpath=\"./mydir\", filename=\"downloaded-paper.pdf\")\n```\n\nThe same interface is available for downloading .tar.gz files of the paper source:\n\n```python\nimport arxiv\n\npaper = next(arxiv.Client().results(arxiv.Search(id_list=[\"1605.08386v1\"])))\n# Download the archive to the PWD with a default filename.\npaper.download_source()\n# Download the archive to the PWD with a custom filename.\npaper.download_source(filename=\"downloaded-paper.tar.gz\")\n# Download the archive to a specified directory with a custom filename.\npaper.download_source(dirpath=\"./mydir\", filename=\"downloaded-paper.tar.gz\")\n```\n\n#### Fetching results with a custom client\n\n```python\nimport arxiv\n\nbig_slow_client = arxiv.Client(\n  page_size = 1000,\n  delay_seconds = 10.0,\n  num_retries = 5\n)\n\n# Prints 1000 titles before needing to make another request.\nfor result in big_slow_client.results(arxiv.Search(query=\"quantum\")):\n  print(result.title)\n```\n\n#### Logging\n\nTo inspect this package's network behavior and API logic, configure a `DEBUG`-level logger.\n\n```pycon\n>>> import logging, arxiv\n>>> logging.basicConfig(level=logging.DEBUG)\n>>> client = arxiv.Client()\n>>> paper = next(client.results(arxiv.Search(id_list=[\"1605.08386v1\"])))\nINFO:arxiv.arxiv:Requesting 100 results at offset 0\nINFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100\nDEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443\nDEBUG:urllib3.connectionpool:https://export.arxiv.org:443 \"GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1\" 200 979\n```\n\n## Types \n\n### Client\n\nA `Client` specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.\n\nClients configurations specify pagination and retry logic. *Reusing* a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.\n\n### Search\n\nA `Search` specifies a search of arXiv's database. Use `Client.results` to get a generator yielding `Result`s.\n\n### Result\n\nThe `Result` objects yielded by `Client.results` include metadata about each paper and helper methods for downloading their content.\n\nThe meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).\n\n`Result` also exposes helper methods for downloading papers: `Result.download_pdf` and `Result.download_source`.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python wrapper for the arXiv API: https://arxiv.org/help/api/",
    "version": "2.1.3",
    "project_urls": {
        "Homepage": "https://github.com/lukasschwab/arxiv.py"
    },
    "split_keywords": [
        "arxiv",
        "api",
        "wrapper",
        "academic",
        "journals",
        "papers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b77b7bf42178d227b26d3daf94cdd22a72a4ed5bf235548c4f5aea49c51c6458",
                "md5": "ad6b0b74665574a80f0cb515c5d6b63a",
                "sha256": "6f43673ab770a9e848d7d4fc1894824df55edeac3c3572ea280c9ba2e3c0f39f"
            },
            "downloads": -1,
            "filename": "arxiv-2.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad6b0b74665574a80f0cb515c5d6b63a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11478,
            "upload_time": "2024-06-25T02:56:17",
            "upload_time_iso_8601": "2024-06-25T02:56:17.032076Z",
            "url": "https://files.pythonhosted.org/packages/b7/7b/7bf42178d227b26d3daf94cdd22a72a4ed5bf235548c4f5aea49c51c6458/arxiv-2.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fe59fe41f54bdfed776c2e9bcd6289e4c71349eb938241d89b4c97d0f33e8013",
                "md5": "23fa881227f7768da899ad62be834d1d",
                "sha256": "32365221994d2cf05657c1fadf63a26efc8ccdec18590281ee03515bfef8bc4e"
            },
            "downloads": -1,
            "filename": "arxiv-2.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "23fa881227f7768da899ad62be834d1d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 16747,
            "upload_time": "2024-06-25T02:56:20",
            "upload_time_iso_8601": "2024-06-25T02:56:20.062841Z",
            "url": "https://files.pythonhosted.org/packages/fe/59/fe41f54bdfed776c2e9bcd6289e4c71349eb938241d89b4c97d0f33e8013/arxiv-2.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-25 02:56:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lukasschwab",
    "github_project": "arxiv.py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "feedparser",
            "specs": [
                [
                    "~=",
                    "6.0.10"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "~=",
                    "2.32.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "6.2.2"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    ">=",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "pdoc",
            "specs": [
                [
                    "==",
                    "13.1.0"
                ]
            ]
        },
        {
            "name": "pip-audit",
            "specs": [
                [
                    ">=",
                    "1.1.2"
                ]
            ]
        }
    ],
    "lcname": "arxiv"
}
        
Elapsed time: 0.37938s