# paginate-json
[![PyPI](https://img.shields.io/pypi/v/paginate-json.svg)](https://pypi.python.org/pypi/paginate-json)
[![Changelog](https://img.shields.io/github/v/release/simonw/paginate-json?include_prereleases&label=changelog)](https://github.com/simonw/paginate-json/releases)
[![Tests](https://github.com/simonw/paginate-json/workflows/Test/badge.svg)](https://github.com/simonw/paginate-json/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/paginate-json/blob/main/LICENSE)
CLI tool for retrieving JSON from paginated APIs.
This tool works against APIs that use the HTTP Link header for pagination. The GitHub API is [one example of this](https://developer.github.com/v3/guides/traversing-with-pagination/).
Recipes using this tool:
- [Combined release notes from GitHub with jq and paginate-json](https://til.simonwillison.net/jq/combined-github-release-notes)
- [Export a Mastodon timeline to SQLite](https://til.simonwillison.net/mastodon/export-timeline-to-sqlite)
## Installation
```bash
pip install paginate-json
```
Or use [pipx](https://pypa.github.io/pipx/):
```bash
pipx install paginate-json
```
## Usage
Run this tool against a URL that returns a JSON list of items and uses the `link:` HTTP header to indicate the URL of the next page of results.
It will output a single JSON list containing all of the records, across multiple pages.
```bash
paginate-json \
https://api.github.com/users/simonw/events
```
You can use the `--header` option to send additional request headers. For example, if you have a GitHub OAuth token you can pass it like this:
```bash
paginate-json \
https://api.github.com/users/simonw/events \
--header Authorization "bearer e94d9e404d86..."
```
Some APIs may return a root level object where the items you wish to gather are stored in a key, like this example from the [Datasette JSON API](https://docs.datasette.io/en/latest/json_api.html):
```json
{
"ok": true,
"rows": [
{
"id": 1,
"name": "San Francisco"
},
{
"id": 2,
"name": "Los Angeles"
},
{
"id": 3,
"name": "Detroit"
},
{
"id": 4,
"name": "Memnonia"
}
]
}
```
In this case, use `--key rows` to specify which key to extract the items from:
```bash
paginate-json \
https://latest.datasette.io/fixtures/facet_cities.json \
--key rows
```
The output JSON will be streamed as a pretty-printed JSON array by default.
To switch to newline-delimited JSON, with a separate object on each line, add `--nl`:
```bash
paginate-json \
https://latest.datasette.io/fixtures/facet_cities.json \
--key rows \
--nl
```
The output from that command looks like this:
```
{"id": 1, "name": "San Francisco"}
{"id": 2, "name": "Los Angeles"}
{"id": 3, "name": "Detroit"}
{"id": 4, "name": "Memnonia"}
```
## Using this with sqlite-utils
This tool works well in conjunction with [sqlite-utils](https://github.com/simonw/sqlite-utils). For example, here's how to load all of the GitHub issues for a project into a local SQLite database.
```bash
paginate-json \
"https://api.github.com/repos/simonw/datasette/issues?state=all&filter=all" \
--nl | \
sqlite-utils upsert /tmp/issues.db issues - --nl --pk=id
```
You can then use [other features of sqlite-utils](https://sqlite-utils.readthedocs.io/en/latest/cli.html) to enhance the resulting database. For example, to enable full-text search on the issue title and body columns:
```bash
sqlite-utils enable-fts /tmp/issues.db issues title body
```
## Using jq to transform each page
If you install the optional [jq](https://pypi.org/project/jq/) or [pyjq](https://pypi.org/project/pyjq/) dependency you can also pass `--jq PROGRAM` to transform the results of each page using a [jq program](https://stedolan.github.io/jq/). The `jq` option you supply should transform each page of fetched results into an array of objects.
For example, to extract the `id` and `title` from each issue:
```bash
paginate-json \
"https://api.github.com/repos/simonw/datasette/issues" \
--nl \
--jq 'map({id, title})'
```
## paginate-json --help
<!-- [[[cog
import cog
from paginate_json import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.cli, ["--help"])
help = result.output.replace("Usage: cli", "Usage: paginate-json")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: paginate-json [OPTIONS] URL
Fetch paginated JSON from a URL
Example usage:
paginate-json https://api.github.com/repos/simonw/datasette/issues
Options:
--version Show the version and exit.
--nl Output newline-delimited JSON
--key TEXT Top-level key to extract from each page
--jq TEXT jq transformation to run on each page
--accept TEXT Accept header to send
--sleep INTEGER Seconds to delay between requests
--silent Don't show progress on stderr - default
-v, --verbose Show progress on stderr
--show-headers Dump response headers out to stderr
--ignore-http-errors Keep going on non-200 HTTP status codes
--header <TEXT TEXT>... Send custom request headers
--help Show this message and exit.
```
<!-- [[[end]]] -->
Raw data
{
"_id": null,
"home_page": "https://github.com/simonw/paginate-json",
"name": "paginate-json",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Simon Willison",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/03/af/c888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18/paginate-json-1.0.tar.gz",
"platform": null,
"description": "# paginate-json\n\n[![PyPI](https://img.shields.io/pypi/v/paginate-json.svg)](https://pypi.python.org/pypi/paginate-json)\n[![Changelog](https://img.shields.io/github/v/release/simonw/paginate-json?include_prereleases&label=changelog)](https://github.com/simonw/paginate-json/releases)\n[![Tests](https://github.com/simonw/paginate-json/workflows/Test/badge.svg)](https://github.com/simonw/paginate-json/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/paginate-json/blob/main/LICENSE)\n\nCLI tool for retrieving JSON from paginated APIs.\n\nThis tool works against APIs that use the HTTP Link header for pagination. The GitHub API is [one example of this](https://developer.github.com/v3/guides/traversing-with-pagination/).\n\nRecipes using this tool:\n\n- [Combined release notes from GitHub with jq and paginate-json](https://til.simonwillison.net/jq/combined-github-release-notes)\n- [Export a Mastodon timeline to SQLite](https://til.simonwillison.net/mastodon/export-timeline-to-sqlite)\n\n## Installation\n\n```bash\npip install paginate-json\n```\nOr use [pipx](https://pypa.github.io/pipx/):\n```bash\npipx install paginate-json\n```\n\n## Usage\n\nRun this tool against a URL that returns a JSON list of items and uses the `link:` HTTP header to indicate the URL of the next page of results.\n\nIt will output a single JSON list containing all of the records, across multiple pages.\n```bash\npaginate-json \\\n https://api.github.com/users/simonw/events\n```\nYou can use the `--header` option to send additional request headers. For example, if you have a GitHub OAuth token you can pass it like this:\n```bash\npaginate-json \\\n https://api.github.com/users/simonw/events \\\n --header Authorization \"bearer e94d9e404d86...\"\n```\nSome APIs may return a root level object where the items you wish to gather are stored in a key, like this example from the [Datasette JSON API](https://docs.datasette.io/en/latest/json_api.html):\n```json\n{\n \"ok\": true,\n \"rows\": [\n {\n \"id\": 1,\n \"name\": \"San Francisco\"\n },\n {\n \"id\": 2,\n \"name\": \"Los Angeles\"\n },\n {\n \"id\": 3,\n \"name\": \"Detroit\"\n },\n {\n \"id\": 4,\n \"name\": \"Memnonia\"\n }\n ]\n}\n```\nIn this case, use `--key rows` to specify which key to extract the items from:\n```bash\npaginate-json \\\n https://latest.datasette.io/fixtures/facet_cities.json \\\n --key rows\n```\nThe output JSON will be streamed as a pretty-printed JSON array by default.\n\nTo switch to newline-delimited JSON, with a separate object on each line, add `--nl`:\n```bash\npaginate-json \\\n https://latest.datasette.io/fixtures/facet_cities.json \\\n --key rows \\\n --nl\n```\nThe output from that command looks like this:\n```\n{\"id\": 1, \"name\": \"San Francisco\"}\n{\"id\": 2, \"name\": \"Los Angeles\"}\n{\"id\": 3, \"name\": \"Detroit\"}\n{\"id\": 4, \"name\": \"Memnonia\"}\n```\n\n\n\n## Using this with sqlite-utils\n\nThis tool works well in conjunction with [sqlite-utils](https://github.com/simonw/sqlite-utils). For example, here's how to load all of the GitHub issues for a project into a local SQLite database.\n```bash\npaginate-json \\\n \"https://api.github.com/repos/simonw/datasette/issues?state=all&filter=all\" \\\n --nl | \\\n sqlite-utils upsert /tmp/issues.db issues - --nl --pk=id\n```\nYou can then use [other features of sqlite-utils](https://sqlite-utils.readthedocs.io/en/latest/cli.html) to enhance the resulting database. For example, to enable full-text search on the issue title and body columns:\n```bash\nsqlite-utils enable-fts /tmp/issues.db issues title body\n```\n## Using jq to transform each page\n\nIf you install the optional [jq](https://pypi.org/project/jq/) or [pyjq](https://pypi.org/project/pyjq/) dependency you can also pass `--jq PROGRAM` to transform the results of each page using a [jq program](https://stedolan.github.io/jq/). The `jq` option you supply should transform each page of fetched results into an array of objects.\n\nFor example, to extract the `id` and `title` from each issue:\n```bash\npaginate-json \\\n \"https://api.github.com/repos/simonw/datasette/issues\" \\\n --nl \\\n --jq 'map({id, title})'\n```\n\n## paginate-json --help\n\n<!-- [[[cog\nimport cog\nfrom paginate_json import cli\nfrom click.testing import CliRunner\nrunner = CliRunner()\nresult = runner.invoke(cli.cli, [\"--help\"])\nhelp = result.output.replace(\"Usage: cli\", \"Usage: paginate-json\")\ncog.out(\n \"```\\n{}\\n```\".format(help)\n)\n]]] -->\n```\nUsage: paginate-json [OPTIONS] URL\n\n Fetch paginated JSON from a URL\n\n Example usage:\n\n paginate-json https://api.github.com/repos/simonw/datasette/issues\n\nOptions:\n --version Show the version and exit.\n --nl Output newline-delimited JSON\n --key TEXT Top-level key to extract from each page\n --jq TEXT jq transformation to run on each page\n --accept TEXT Accept header to send\n --sleep INTEGER Seconds to delay between requests\n --silent Don't show progress on stderr - default\n -v, --verbose Show progress on stderr\n --show-headers Dump response headers out to stderr\n --ignore-http-errors Keep going on non-200 HTTP status codes\n --header <TEXT TEXT>... Send custom request headers\n --help Show this message and exit.\n\n```\n<!-- [[[end]]] -->\n\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "CLI tool for fetching paginated JSON from a URL",
"version": "1.0",
"project_urls": {
"CI": "https://github.com/simonw/paginate-json/actions",
"Changelog": "https://github.com/simonw/paginate-json/releases",
"Homepage": "https://github.com/simonw/paginate-json",
"Issues": "https://github.com/simonw/paginate-json/issues",
"Source code": "https://github.com/simonw/paginate-json"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e623497a4d248f2409ce32c623f828da0638746453ab5044dc8bbc1bc1a27d29",
"md5": "c776529075d2e473ad99bffed2560048",
"sha256": "7ebcc109bf56865d24fe4fc3b015551466abe4396aa9713faddc92b7290bacbc"
},
"downloads": -1,
"filename": "paginate_json-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c776529075d2e473ad99bffed2560048",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9845,
"upload_time": "2023-08-30T02:55:19",
"upload_time_iso_8601": "2023-08-30T02:55:19.752343Z",
"url": "https://files.pythonhosted.org/packages/e6/23/497a4d248f2409ce32c623f828da0638746453ab5044dc8bbc1bc1a27d29/paginate_json-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "03afc888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18",
"md5": "7925e492324ac547ce032a8766e27621",
"sha256": "689d3599b38a325c6f4ae76c773b8c79a6f5f324305ca6a88ed1bf72cebc04f3"
},
"downloads": -1,
"filename": "paginate-json-1.0.tar.gz",
"has_sig": false,
"md5_digest": "7925e492324ac547ce032a8766e27621",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9936,
"upload_time": "2023-08-30T02:55:21",
"upload_time_iso_8601": "2023-08-30T02:55:21.419656Z",
"url": "https://files.pythonhosted.org/packages/03/af/c888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18/paginate-json-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-30 02:55:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "paginate-json",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "paginate-json"
}