---
title: apibackuper \-- a command-line tool to archive/backup API calls
---
apibackuper is a command line tool to archive/backup API calls. It\'s
goal to download all data behind REST API and to archive it to local
storage. This tool designed to backup API data, so simple as possible.
::: contents
:::
::: section-numbering
:::
# History
This tool was developed optimize backup/archival procedures for Russian
government information from E-Budget portal budget.gov.ru and some other
government IT systems too. Examples of tool usage could be found in
\"examples\" directory
# Main features
- Any GET/POST iterative API supported
- Allows to estimate time required to backup API
- Stores data inside ZIP container
- Supports export of backup data as JSON lines file
- Documentation
- Test coverage
# Installation
## Linux
Most Linux distributions provide a package that can be installed using
the system package manager, for example:
``` bash
# Debian, Ubuntu, etc.
$ apt install apibackuper
```
``` bash
# Fedora
$ dnf install apibackuper
```
``` bash
# CentOS, RHEL, ...
$ yum install apibackuper
```
``` bash
# Arch Linux
$ pacman -S apibackuper
```
## Windows, etc.
A universal installation method (that works on Windows, Mac OS X, Linux,
…, and always provides the latest version) is to use pip:
``` bash
# Make sure we have an up-to-date version of pip and setuptools:
$ pip install --upgrade pip setuptools
$ pip install --upgrade apibackuper
```
(If `pip` installation fails for some reason, you can try
`easy_install apibackuper` as a fallback.)
## Python version
Python version 3.6 or greater is required.
# Quickstart
This example is about backup of Russian certificate authorities. List of
them published at e-trust.gosuslugi.ru and available via undocumented
API.
``` bash
$ apibackuper create etrust
$ cd etrust
```
Edit apibackuper.cfg as:
``` bash
[settings]
initialized = True
name = etrust
[project]
description = E-Trust UC list
url = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list
http_mode = POST
work_modes = full,incremental,update
iterate_by = page
[params]
page_size_param = recordsOnPage
page_size_limit = 100
page_number_param = page
[data]
total_number_key = total
data_key = data
item_key = РеестровыйНомер
change_key = СтатусАккредитации.ДействуетС
[storage]
storage_type = zip
```
Add file params.json with parameters used with POST requests
``` json
{"page":1,"orderBy":"id","ascending":false,"recordsOnPage":100,"searchString":null,"cities":null,"software":null,"cryptToolClasses":null,"statuses":null}
```
Execute command \"estimate\" to see how long data will be collected and
how much space needed
``` bash
$ apibackuper estimate full
```
Output:
``` bash
Total records: 502
Records per request: 100
Total requests: 6
Average record size 32277.96 bytes
Estimated size (json lines) 16.20 MB
Avg request time, seconds 66.9260
Estimated all requests time, seconds 402.8947
```
Execute command \"run\" to collect the data. Result stored in
\"storage.zip\"
``` bash
$ apibackuper run full
```
Exports data from storage and saves as jsonl file called
\"etrust.jsonl\"
``` bash
$ apibackuper export jsonl etrust.jsonl
```
# Config options
Example config file
``` bash
[settings]
initialized = True
name = <name>
splitter = .
[project]
description = <description>
url = <url>
http_mode = <GET or POST>
work_modes = <combination of full,incremental,update>
iterate_by = <page or skip>
[params]
page_size_param = <page size param>
page_size_limit = <page size limit>
page_number_param = <page number>
count_skip_param = <key to iterate in skip mode>
[data]
total_number_key = <total number key>
data_key = <data key>
item_key = <item key>
change_key = <change key>
[follow]
follow_mode = <type of follow mode>
follow_pattern = <url prefix to follow links>
follow_data_key = <follow data item key>
follow_param = <follow param>
follow_item_key = <follow item key>
[files]
fetch_mode = <file fetch mode>
root_url = <file root url>
keys = <keys with file data>
storage_mode = <file storage mode>
[storage]
storage_type = zip
compression = True
```
## settings
- name - short name of the project
- splitter - value of field splitter. Needed for rare cases when \'.\'
is part of field name. For example for OData requests and
\'@odata.count\' field
## project
- description - text that explains what for is this project
- url - API endpoint url
- http_mode - one of HTTP modes: GET or POST
- work_modes - type of operations: full - archive everything,
incremental - add new records only, update - collect changed data
only
- iterate_by - type of iteration of records. By \'page\' - default,
page by page or by \'skip\' if skip value provided
## params
- page_size_param - parameter with page size
- page_size_limit - limit of records provided by API
- page_number_param = parameter with page number
- count_skip_param - parameter for \'skip\' type of iteration
## data
- total_number_key - key in data with total number of records
- data_key - key in data with list of records
- item_key - key in data with unique identifier of the record. Could
be group of keys separated with comma
- change_key - key in data that indicates that record changed. Could
be group of keys separated with comma
## follow
- follow_mode - mode to follow objects. Could be \'url\' or \'item\'.
If mode is \'url\' than follow_pattern not used
- follow_pattern - url pattern / url prefix for followed objects. Only
for mode \'item\'\'
- follow_data_key - if object/objects are inside array, key of this
array
- follow_param - parameter used in \'item\' mode
- follow_item_key - item key
## files
- fetch_mode - file fetch mode. Could be \'prefix\' or \'id\'. Prefix
- root_url - root url / prefix for files
- keys - list of keys with urls/file id\'s to search for files to save
- storage_mode - a way how files stored in storage/files.zip. By
default \'filepath\' and files storaged same way as they presented
in url
## storage
- storage_type - type of local storage. \'zip\' is local zip file is
default one
- compression - if True than compressed ZIP file used, less space
used, more CPU time processing data
# Usage
Synopsis:
``` bash
$ apibackuper [flags] [command] inputfile
```
See also `apibackuper --help`.
## Examples
Create project \"budgettofk\":
``` bash
$ apibackuper create budgettofk
```
Estimate execution time for \'budgettofk\' project. Should be called in
project dir or project dir provided via -p parameter:
``` bash
$ apibackuper estimate full -p budgettofk
```
Output
``` bash
Total records: 12282
Records per request: 500
Total requests: 25
Average record size 1293.60 bytes
Estimated size (json lines) 15.89 MB
Avg request time, seconds 1.8015
Estimated all requests time, seconds 46.0536
```
Run project. Should be called in project dir or project dir provided via
-p parameter
``` bash
$ apibackuper run full
```
Export data from project. Should be called in project dir or project dir
provided via -p parameter
``` bash
$ apibackuper export jsonl hhemployers.jsonl -p hhemployers
```
Follows each object of downloaded data and does requests for each
objects .. code-block:: bash
> \$ apibackuper follow continue
Downloads all files associated with API objects .. code-block:: bash
> \$ apibackuper getfiles
# Advanced
TBD
Raw data
{
"_id": null,
"home_page": "https://github.com/datacoon/apibackuper/",
"name": "apibackuper",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "api json jsonl csv bson cli dataset",
"author": "Ivan Begtin",
"author_email": "ivan@begtin.tech",
"download_url": "https://files.pythonhosted.org/packages/eb/0a/896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec/apibackuper-1.0.11.tar.gz",
"platform": null,
"description": "---\r\r\ntitle: apibackuper \\-- a command-line tool to archive/backup API calls\r\r\n---\r\r\n\r\r\napibackuper is a command line tool to archive/backup API calls. It\\'s\r\r\ngoal to download all data behind REST API and to archive it to local\r\r\nstorage. This tool designed to backup API data, so simple as possible.\r\r\n\r\r\n::: contents\r\r\n:::\r\r\n\r\r\n::: section-numbering\r\r\n:::\r\r\n\r\r\n# History\r\r\n\r\r\nThis tool was developed optimize backup/archival procedures for Russian\r\r\ngovernment information from E-Budget portal budget.gov.ru and some other\r\r\ngovernment IT systems too. Examples of tool usage could be found in\r\r\n\\\"examples\\\" directory\r\r\n\r\r\n# Main features\r\r\n\r\r\n- Any GET/POST iterative API supported\r\r\n- Allows to estimate time required to backup API\r\r\n- Stores data inside ZIP container\r\r\n- Supports export of backup data as JSON lines file\r\r\n- Documentation\r\r\n- Test coverage\r\r\n\r\r\n# Installation\r\r\n\r\r\n## Linux\r\r\n\r\r\nMost Linux distributions provide a package that can be installed using\r\r\nthe system package manager, for example:\r\r\n\r\r\n``` bash\r\r\n# Debian, Ubuntu, etc.\r\r\n$ apt install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# Fedora\r\r\n$ dnf install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# CentOS, RHEL, ...\r\r\n$ yum install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# Arch Linux\r\r\n$ pacman -S apibackuper\r\r\n```\r\r\n\r\r\n## Windows, etc.\r\r\n\r\r\nA universal installation method (that works on Windows, Mac OS X, Linux,\r\r\n\u0432\u0402\u00a6, and always provides the latest version) is to use pip:\r\r\n\r\r\n``` bash\r\r\n# Make sure we have an up-to-date version of pip and setuptools:\r\r\n$ pip install --upgrade pip setuptools\r\r\n\r\r\n$ pip install --upgrade apibackuper\r\r\n```\r\r\n\r\r\n(If `pip` installation fails for some reason, you can try\r\r\n`easy_install apibackuper` as a fallback.)\r\r\n\r\r\n## Python version\r\r\n\r\r\nPython version 3.6 or greater is required.\r\r\n\r\r\n# Quickstart\r\r\n\r\r\nThis example is about backup of Russian certificate authorities. List of\r\r\nthem published at e-trust.gosuslugi.ru and available via undocumented\r\r\nAPI.\r\r\n\r\r\n``` bash\r\r\n$ apibackuper create etrust\r\r\n$ cd etrust\r\r\n```\r\r\n\r\r\nEdit apibackuper.cfg as:\r\r\n\r\r\n``` bash\r\r\n[settings]\r\r\ninitialized = True\r\r\nname = etrust\r\r\n\r\r\n[project]\r\r\ndescription = E-Trust UC list\r\r\nurl = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list\r\r\nhttp_mode = POST\r\r\nwork_modes = full,incremental,update\r\r\niterate_by = page\r\r\n\r\r\n[params]\r\r\npage_size_param = recordsOnPage\r\r\npage_size_limit = 100\r\r\npage_number_param = page\r\r\n\r\r\n[data]\r\r\ntotal_number_key = total\r\r\ndata_key = data\r\r\nitem_key = \u0420\u0435\u0435\u0441\u0442\u0440\u043e\u0432\u044b\u0439\u041d\u043e\u043c\u0435\u0440\r\r\nchange_key = \u0421\u0442\u0430\u0442\u0443\u0441\u0410\u043a\u043a\u0440\u0435\u0434\u0438\u0442\u0430\u0446\u0438\u0438.\u0414\u0435\u0439\u0441\u0442\u0432\u0443\u0435\u0442\u0421\r\r\n\r\r\n[storage]\r\r\nstorage_type = zip\r\r\n```\r\r\n\r\r\nAdd file params.json with parameters used with POST requests\r\r\n\r\r\n``` json\r\r\n{\"page\":1,\"orderBy\":\"id\",\"ascending\":false,\"recordsOnPage\":100,\"searchString\":null,\"cities\":null,\"software\":null,\"cryptToolClasses\":null,\"statuses\":null}\r\r\n```\r\r\n\r\r\nExecute command \\\"estimate\\\" to see how long data will be collected and\r\r\nhow much space needed\r\r\n\r\r\n``` bash\r\r\n$ apibackuper estimate full\r\r\n```\r\r\n\r\r\nOutput:\r\r\n\r\r\n``` bash\r\r\nTotal records: 502\r\r\nRecords per request: 100\r\r\nTotal requests: 6\r\r\nAverage record size 32277.96 bytes\r\r\nEstimated size (json lines) 16.20 MB\r\r\nAvg request time, seconds 66.9260\r\r\nEstimated all requests time, seconds 402.8947\r\r\n```\r\r\n\r\r\nExecute command \\\"run\\\" to collect the data. Result stored in\r\r\n\\\"storage.zip\\\"\r\r\n\r\r\n``` bash\r\r\n$ apibackuper run full\r\r\n```\r\r\n\r\r\nExports data from storage and saves as jsonl file called\r\r\n\\\"etrust.jsonl\\\"\r\r\n\r\r\n``` bash\r\r\n$ apibackuper export jsonl etrust.jsonl\r\r\n```\r\r\n\r\r\n# Config options\r\r\n\r\r\nExample config file\r\r\n\r\r\n``` bash\r\r\n[settings]\r\r\ninitialized = True\r\r\nname = <name>\r\r\nsplitter = .\r\r\n\r\r\n[project]\r\r\ndescription = <description>\r\r\nurl = <url>\r\r\nhttp_mode = <GET or POST>\r\r\nwork_modes = <combination of full,incremental,update>\r\r\niterate_by = <page or skip>\r\r\n\r\r\n[params]\r\r\npage_size_param = <page size param>\r\r\npage_size_limit = <page size limit>\r\r\npage_number_param = <page number>\r\r\ncount_skip_param = <key to iterate in skip mode>\r\r\n\r\r\n\r\r\n[data]\r\r\ntotal_number_key = <total number key>\r\r\ndata_key = <data key>\r\r\nitem_key = <item key>\r\r\nchange_key = <change key>\r\r\n\r\r\n[follow]\r\r\nfollow_mode = <type of follow mode>\r\r\nfollow_pattern = <url prefix to follow links>\r\r\nfollow_data_key = <follow data item key>\r\r\nfollow_param = <follow param>\r\r\nfollow_item_key = <follow item key>\r\r\n\r\r\n[files]\r\r\nfetch_mode = <file fetch mode>\r\r\nroot_url = <file root url>\r\r\nkeys = <keys with file data>\r\r\nstorage_mode = <file storage mode>\r\r\n\r\r\n\r\r\n[storage]\r\r\nstorage_type = zip\r\r\ncompression = True\r\r\n```\r\r\n\r\r\n## settings\r\r\n\r\r\n- name - short name of the project\r\r\n- splitter - value of field splitter. Needed for rare cases when \\'.\\'\r\r\n is part of field name. For example for OData requests and\r\r\n \\'@odata.count\\' field\r\r\n\r\r\n## project\r\r\n\r\r\n- description - text that explains what for is this project\r\r\n- url - API endpoint url\r\r\n- http_mode - one of HTTP modes: GET or POST\r\r\n- work_modes - type of operations: full - archive everything,\r\r\n incremental - add new records only, update - collect changed data\r\r\n only\r\r\n- iterate_by - type of iteration of records. By \\'page\\' - default,\r\r\n page by page or by \\'skip\\' if skip value provided\r\r\n\r\r\n## params\r\r\n\r\r\n- page_size_param - parameter with page size\r\r\n- page_size_limit - limit of records provided by API\r\r\n- page_number_param = parameter with page number\r\r\n- count_skip_param - parameter for \\'skip\\' type of iteration\r\r\n\r\r\n## data\r\r\n\r\r\n- total_number_key - key in data with total number of records\r\r\n- data_key - key in data with list of records\r\r\n- item_key - key in data with unique identifier of the record. Could\r\r\n be group of keys separated with comma\r\r\n- change_key - key in data that indicates that record changed. Could\r\r\n be group of keys separated with comma\r\r\n\r\r\n## follow\r\r\n\r\r\n- follow_mode - mode to follow objects. Could be \\'url\\' or \\'item\\'.\r\r\n If mode is \\'url\\' than follow_pattern not used\r\r\n- follow_pattern - url pattern / url prefix for followed objects. Only\r\r\n for mode \\'item\\'\\'\r\r\n- follow_data_key - if object/objects are inside array, key of this\r\r\n array\r\r\n- follow_param - parameter used in \\'item\\' mode\r\r\n- follow_item_key - item key\r\r\n\r\r\n## files\r\r\n\r\r\n- fetch_mode - file fetch mode. Could be \\'prefix\\' or \\'id\\'. Prefix\r\r\n- root_url - root url / prefix for files\r\r\n- keys - list of keys with urls/file id\\'s to search for files to save\r\r\n- storage_mode - a way how files stored in storage/files.zip. By\r\r\n default \\'filepath\\' and files storaged same way as they presented\r\r\n in url\r\r\n\r\r\n## storage\r\r\n\r\r\n- storage_type - type of local storage. \\'zip\\' is local zip file is\r\r\n default one\r\r\n- compression - if True than compressed ZIP file used, less space\r\r\n used, more CPU time processing data\r\r\n\r\r\n# Usage\r\r\n\r\r\nSynopsis:\r\r\n\r\r\n``` bash\r\r\n$ apibackuper [flags] [command] inputfile\r\r\n```\r\r\n\r\r\nSee also `apibackuper --help`.\r\r\n\r\r\n## Examples\r\r\n\r\r\nCreate project \\\"budgettofk\\\":\r\r\n\r\r\n``` bash\r\r\n$ apibackuper create budgettofk\r\r\n```\r\r\n\r\r\nEstimate execution time for \\'budgettofk\\' project. Should be called in\r\r\nproject dir or project dir provided via -p parameter:\r\r\n\r\r\n``` bash\r\r\n$ apibackuper estimate full -p budgettofk\r\r\n```\r\r\n\r\r\nOutput\r\r\n\r\r\n``` bash\r\r\nTotal records: 12282\r\r\nRecords per request: 500\r\r\nTotal requests: 25\r\r\nAverage record size 1293.60 bytes\r\r\nEstimated size (json lines) 15.89 MB\r\r\nAvg request time, seconds 1.8015\r\r\nEstimated all requests time, seconds 46.0536\r\r\n```\r\r\n\r\r\nRun project. Should be called in project dir or project dir provided via\r\r\n-p parameter\r\r\n\r\r\n``` bash\r\r\n$ apibackuper run full\r\r\n```\r\r\n\r\r\nExport data from project. Should be called in project dir or project dir\r\r\nprovided via -p parameter\r\r\n\r\r\n``` bash\r\r\n$ apibackuper export jsonl hhemployers.jsonl -p hhemployers\r\r\n```\r\r\n\r\r\nFollows each object of downloaded data and does requests for each\r\r\nobjects .. code-block:: bash\r\r\n\r\r\n> \\$ apibackuper follow continue\r\r\n\r\r\nDownloads all files associated with API objects .. code-block:: bash\r\r\n\r\r\n> \\$ apibackuper getfiles\r\r\n\r\r\n# Advanced\r\r\n\r\r\nTBD\r\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "apibackuper: a command-line tool and python library for API backuping",
"version": "1.0.11",
"project_urls": {
"Download": "https://github.com/datacoon/apibackuper/",
"Homepage": "https://github.com/datacoon/apibackuper/"
},
"split_keywords": [
"api",
"json",
"jsonl",
"csv",
"bson",
"cli",
"dataset"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "eb0a896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec",
"md5": "564d768acfd65a8f3f1d5f86150224e6",
"sha256": "e31215e4e52e2c1828bcbf0ba916ffde8dc77c5b4a719ae69f7eb0cf79495d58"
},
"downloads": -1,
"filename": "apibackuper-1.0.11.tar.gz",
"has_sig": false,
"md5_digest": "564d768acfd65a8f3f1d5f86150224e6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21870,
"upload_time": "2024-07-08T09:31:49",
"upload_time_iso_8601": "2024-07-08T09:31:49.662826Z",
"url": "https://files.pythonhosted.org/packages/eb/0a/896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec/apibackuper-1.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-08 09:31:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datacoon",
"github_project": "apibackuper",
"travis_ci": false,
"coveralls": true,
"github_actions": false,
"requirements": [
{
"name": "aria2p",
"specs": [
[
">=",
"0.11.3"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.12.2"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.1.6"
]
]
},
{
"name": "lxml",
"specs": [
[
">=",
"4.9.3"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.2"
]
]
},
{
"name": "Requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
">=",
"65.5.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
">=",
"2.0.6"
]
]
},
{
"name": "xmltodict",
"specs": [
[
">=",
"0.13.0"
]
]
}
],
"tox": true,
"lcname": "apibackuper"
}