paginate-json

Name	paginate-json JSON
Version	1.0 JSON
	download
home_page	https://github.com/simonw/paginate-json
Summary	CLI tool for fetching paginated JSON from a URL
upload_time	2023-08-30 02:55:21
maintainer
docs_url	None
author	Simon Willison
requires_python
license	Apache License, Version 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # paginate-json

[![PyPI](https://img.shields.io/pypi/v/paginate-json.svg)](https://pypi.python.org/pypi/paginate-json)
[![Changelog](https://img.shields.io/github/v/release/simonw/paginate-json?include_prereleases&label=changelog)](https://github.com/simonw/paginate-json/releases)
[![Tests](https://github.com/simonw/paginate-json/workflows/Test/badge.svg)](https://github.com/simonw/paginate-json/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/paginate-json/blob/main/LICENSE)

CLI tool for retrieving JSON from paginated APIs.

This tool works against APIs that use the HTTP Link header for pagination. The GitHub API is [one example of this](https://developer.github.com/v3/guides/traversing-with-pagination/).

Recipes using this tool:

- [Combined release notes from GitHub with jq and paginate-json](https://til.simonwillison.net/jq/combined-github-release-notes)
- [Export a Mastodon timeline to SQLite](https://til.simonwillison.net/mastodon/export-timeline-to-sqlite)

## Installation

```bash
pip install paginate-json
```
Or use [pipx](https://pypa.github.io/pipx/):
```bash
pipx install paginate-json
```

## Usage

Run this tool against a URL that returns a JSON list of items and uses the `link:` HTTP header to indicate the URL of the next page of results.

It will output a single JSON list containing all of the records, across multiple pages.
```bash
paginate-json \
  https://api.github.com/users/simonw/events
```
You can use the `--header` option to send additional request headers. For example, if you have a GitHub OAuth token you can pass it like this:
```bash
paginate-json \
  https://api.github.com/users/simonw/events \
  --header Authorization "bearer e94d9e404d86..."
```
Some APIs may return a root level object where the items you wish to gather are stored in a key, like this example from the [Datasette JSON API](https://docs.datasette.io/en/latest/json_api.html):
```json
{
  "ok": true,
  "rows": [
    {
      "id": 1,
      "name": "San Francisco"
    },
    {
      "id": 2,
      "name": "Los Angeles"
    },
    {
      "id": 3,
      "name": "Detroit"
    },
    {
      "id": 4,
      "name": "Memnonia"
    }
  ]
}
```
In this case, use `--key rows` to specify which key to extract the items from:
```bash
paginate-json \
  https://latest.datasette.io/fixtures/facet_cities.json \
  --key rows
```
The output JSON will be streamed as a pretty-printed JSON array by default.

To switch to newline-delimited JSON, with a separate object on each line, add `--nl`:
```bash
paginate-json \
  https://latest.datasette.io/fixtures/facet_cities.json \
  --key rows \
  --nl
```
The output from that command looks like this:
```
{"id": 1, "name": "San Francisco"}
{"id": 2, "name": "Los Angeles"}
{"id": 3, "name": "Detroit"}
{"id": 4, "name": "Memnonia"}
```



## Using this with sqlite-utils

This tool works well in conjunction with [sqlite-utils](https://github.com/simonw/sqlite-utils). For example, here's how to load all of the GitHub issues for a project into a local SQLite database.
```bash
paginate-json \
  "https://api.github.com/repos/simonw/datasette/issues?state=all&filter=all" \
  --nl | \
  sqlite-utils upsert /tmp/issues.db issues - --nl --pk=id
```
You can then use [other features of sqlite-utils](https://sqlite-utils.readthedocs.io/en/latest/cli.html) to enhance the resulting database. For example, to enable full-text search on the issue title and body columns:
```bash
sqlite-utils enable-fts /tmp/issues.db issues title body
```
## Using jq to transform each page

If you install the optional [jq](https://pypi.org/project/jq/) or [pyjq](https://pypi.org/project/pyjq/) dependency you can also pass `--jq PROGRAM` to transform the results of each page using a [jq program](https://stedolan.github.io/jq/). The `jq` option you supply should transform each page of fetched results into an array of objects.

For example, to extract the `id` and `title` from each issue:
```bash
paginate-json \
  "https://api.github.com/repos/simonw/datasette/issues" \
  --nl \
  --jq 'map({id, title})'
```

## paginate-json --help

<!-- [[[cog
import cog
from paginate_json import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.cli, ["--help"])
help = result.output.replace("Usage: cli", "Usage: paginate-json")
cog.out(
    "```\n{}\n```".format(help)
)
]]] -->
```
Usage: paginate-json [OPTIONS] URL

  Fetch paginated JSON from a URL

  Example usage:

      paginate-json https://api.github.com/repos/simonw/datasette/issues

Options:
  --version                Show the version and exit.
  --nl                     Output newline-delimited JSON
  --key TEXT               Top-level key to extract from each page
  --jq TEXT                jq transformation to run on each page
  --accept TEXT            Accept header to send
  --sleep INTEGER          Seconds to delay between requests
  --silent                 Don't show progress on stderr - default
  -v, --verbose            Show progress on stderr
  --show-headers           Dump response headers out to stderr
  --ignore-http-errors     Keep going on non-200 HTTP status codes
  --header <TEXT TEXT>...  Send custom request headers
  --help                   Show this message and exit.

```
<!-- [[[end]]] -->

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/simonw/paginate-json",
    "name": "paginate-json",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/03/af/c888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18/paginate-json-1.0.tar.gz",
    "platform": null,
    "description": "# paginate-json\n\n[![PyPI](https://img.shields.io/pypi/v/paginate-json.svg)](https://pypi.python.org/pypi/paginate-json)\n[![Changelog](https://img.shields.io/github/v/release/simonw/paginate-json?include_prereleases&label=changelog)](https://github.com/simonw/paginate-json/releases)\n[![Tests](https://github.com/simonw/paginate-json/workflows/Test/badge.svg)](https://github.com/simonw/paginate-json/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/paginate-json/blob/main/LICENSE)\n\nCLI tool for retrieving JSON from paginated APIs.\n\nThis tool works against APIs that use the HTTP Link header for pagination. The GitHub API is [one example of this](https://developer.github.com/v3/guides/traversing-with-pagination/).\n\nRecipes using this tool:\n\n- [Combined release notes from GitHub with jq and paginate-json](https://til.simonwillison.net/jq/combined-github-release-notes)\n- [Export a Mastodon timeline to SQLite](https://til.simonwillison.net/mastodon/export-timeline-to-sqlite)\n\n## Installation\n\n```bash\npip install paginate-json\n```\nOr use [pipx](https://pypa.github.io/pipx/):\n```bash\npipx install paginate-json\n```\n\n## Usage\n\nRun this tool against a URL that returns a JSON list of items and uses the `link:` HTTP header to indicate the URL of the next page of results.\n\nIt will output a single JSON list containing all of the records, across multiple pages.\n```bash\npaginate-json \\\n  https://api.github.com/users/simonw/events\n```\nYou can use the `--header` option to send additional request headers. For example, if you have a GitHub OAuth token you can pass it like this:\n```bash\npaginate-json \\\n  https://api.github.com/users/simonw/events \\\n  --header Authorization \"bearer e94d9e404d86...\"\n```\nSome APIs may return a root level object where the items you wish to gather are stored in a key, like this example from the [Datasette JSON API](https://docs.datasette.io/en/latest/json_api.html):\n```json\n{\n  \"ok\": true,\n  \"rows\": [\n    {\n      \"id\": 1,\n      \"name\": \"San Francisco\"\n    },\n    {\n      \"id\": 2,\n      \"name\": \"Los Angeles\"\n    },\n    {\n      \"id\": 3,\n      \"name\": \"Detroit\"\n    },\n    {\n      \"id\": 4,\n      \"name\": \"Memnonia\"\n    }\n  ]\n}\n```\nIn this case, use `--key rows` to specify which key to extract the items from:\n```bash\npaginate-json \\\n  https://latest.datasette.io/fixtures/facet_cities.json \\\n  --key rows\n```\nThe output JSON will be streamed as a pretty-printed JSON array by default.\n\nTo switch to newline-delimited JSON, with a separate object on each line, add `--nl`:\n```bash\npaginate-json \\\n  https://latest.datasette.io/fixtures/facet_cities.json \\\n  --key rows \\\n  --nl\n```\nThe output from that command looks like this:\n```\n{\"id\": 1, \"name\": \"San Francisco\"}\n{\"id\": 2, \"name\": \"Los Angeles\"}\n{\"id\": 3, \"name\": \"Detroit\"}\n{\"id\": 4, \"name\": \"Memnonia\"}\n```\n\n\n\n## Using this with sqlite-utils\n\nThis tool works well in conjunction with [sqlite-utils](https://github.com/simonw/sqlite-utils). For example, here's how to load all of the GitHub issues for a project into a local SQLite database.\n```bash\npaginate-json \\\n  \"https://api.github.com/repos/simonw/datasette/issues?state=all&filter=all\" \\\n  --nl | \\\n  sqlite-utils upsert /tmp/issues.db issues - --nl --pk=id\n```\nYou can then use [other features of sqlite-utils](https://sqlite-utils.readthedocs.io/en/latest/cli.html) to enhance the resulting database. For example, to enable full-text search on the issue title and body columns:\n```bash\nsqlite-utils enable-fts /tmp/issues.db issues title body\n```\n## Using jq to transform each page\n\nIf you install the optional [jq](https://pypi.org/project/jq/) or [pyjq](https://pypi.org/project/pyjq/) dependency you can also pass `--jq PROGRAM` to transform the results of each page using a [jq program](https://stedolan.github.io/jq/). The `jq` option you supply should transform each page of fetched results into an array of objects.\n\nFor example, to extract the `id` and `title` from each issue:\n```bash\npaginate-json \\\n  \"https://api.github.com/repos/simonw/datasette/issues\" \\\n  --nl \\\n  --jq 'map({id, title})'\n```\n\n## paginate-json --help\n\n<!-- [[[cog\nimport cog\nfrom paginate_json import cli\nfrom click.testing import CliRunner\nrunner = CliRunner()\nresult = runner.invoke(cli.cli, [\"--help\"])\nhelp = result.output.replace(\"Usage: cli\", \"Usage: paginate-json\")\ncog.out(\n    \"```\\n{}\\n```\".format(help)\n)\n]]] -->\n```\nUsage: paginate-json [OPTIONS] URL\n\n  Fetch paginated JSON from a URL\n\n  Example usage:\n\n      paginate-json https://api.github.com/repos/simonw/datasette/issues\n\nOptions:\n  --version                Show the version and exit.\n  --nl                     Output newline-delimited JSON\n  --key TEXT               Top-level key to extract from each page\n  --jq TEXT                jq transformation to run on each page\n  --accept TEXT            Accept header to send\n  --sleep INTEGER          Seconds to delay between requests\n  --silent                 Don't show progress on stderr - default\n  -v, --verbose            Show progress on stderr\n  --show-headers           Dump response headers out to stderr\n  --ignore-http-errors     Keep going on non-200 HTTP status codes\n  --header <TEXT TEXT>...  Send custom request headers\n  --help                   Show this message and exit.\n\n```\n<!-- [[[end]]] -->\n\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "CLI tool for fetching paginated JSON from a URL",
    "version": "1.0",
    "project_urls": {
        "CI": "https://github.com/simonw/paginate-json/actions",
        "Changelog": "https://github.com/simonw/paginate-json/releases",
        "Homepage": "https://github.com/simonw/paginate-json",
        "Issues": "https://github.com/simonw/paginate-json/issues",
        "Source code": "https://github.com/simonw/paginate-json"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e623497a4d248f2409ce32c623f828da0638746453ab5044dc8bbc1bc1a27d29",
                "md5": "c776529075d2e473ad99bffed2560048",
                "sha256": "7ebcc109bf56865d24fe4fc3b015551466abe4396aa9713faddc92b7290bacbc"
            },
            "downloads": -1,
            "filename": "paginate_json-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c776529075d2e473ad99bffed2560048",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9845,
            "upload_time": "2023-08-30T02:55:19",
            "upload_time_iso_8601": "2023-08-30T02:55:19.752343Z",
            "url": "https://files.pythonhosted.org/packages/e6/23/497a4d248f2409ce32c623f828da0638746453ab5044dc8bbc1bc1a27d29/paginate_json-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03afc888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18",
                "md5": "7925e492324ac547ce032a8766e27621",
                "sha256": "689d3599b38a325c6f4ae76c773b8c79a6f5f324305ca6a88ed1bf72cebc04f3"
            },
            "downloads": -1,
            "filename": "paginate-json-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7925e492324ac547ce032a8766e27621",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9936,
            "upload_time": "2023-08-30T02:55:21",
            "upload_time_iso_8601": "2023-08-30T02:55:21.419656Z",
            "url": "https://files.pythonhosted.org/packages/03/af/c888fe62794fd285b1ce3d29131d14358dc8da568afdfd8a09bd4c976b18/paginate-json-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-30 02:55:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "paginate-json",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "paginate-json"
}

Simon Willison