apibackuper


Nameapibackuper JSON
Version 1.0.11 PyPI version JSON
download
home_pagehttps://github.com/datacoon/apibackuper/
Summaryapibackuper: a command-line tool and python library for API backuping
upload_time2024-07-08 09:31:49
maintainerNone
docs_urlNone
authorIvan Begtin
requires_pythonNone
licenseMIT
keywords api json jsonl csv bson cli dataset
VCS
bugtrack_url
requirements aria2p beautifulsoup4 click lxml pytest Requests setuptools urllib3 xmltodict
Travis-CI No Travis.
coveralls test coverage
            ---

title: apibackuper \-- a command-line tool to archive/backup API calls

---



apibackuper is a command line tool to archive/backup API calls. It\'s

goal to download all data behind REST API and to archive it to local

storage. This tool designed to backup API data, so simple as possible.



::: contents

:::



::: section-numbering

:::



# History



This tool was developed optimize backup/archival procedures for Russian

government information from E-Budget portal budget.gov.ru and some other

government IT systems too. Examples of tool usage could be found in

\"examples\" directory



# Main features



-   Any GET/POST iterative API supported

-   Allows to estimate time required to backup API

-   Stores data inside ZIP container

-   Supports export of backup data as JSON lines file

-   Documentation

-   Test coverage



# Installation



## Linux



Most Linux distributions provide a package that can be installed using

the system package manager, for example:



``` bash

# Debian, Ubuntu, etc.

$ apt install apibackuper

```



``` bash

# Fedora

$ dnf install apibackuper

```



``` bash

# CentOS, RHEL, ...

$ yum install apibackuper

```



``` bash

# Arch Linux

$ pacman -S apibackuper

```



## Windows, etc.



A universal installation method (that works on Windows, Mac OS X, Linux,

…, and always provides the latest version) is to use pip:



``` bash

# Make sure we have an up-to-date version of pip and setuptools:

$ pip install --upgrade pip setuptools



$ pip install --upgrade apibackuper

```



(If `pip` installation fails for some reason, you can try

`easy_install apibackuper` as a fallback.)



## Python version



Python version 3.6 or greater is required.



# Quickstart



This example is about backup of Russian certificate authorities. List of

them published at e-trust.gosuslugi.ru and available via undocumented

API.



``` bash

$ apibackuper create etrust

$ cd etrust

```



Edit apibackuper.cfg as:



``` bash

[settings]

initialized = True

name = etrust



[project]

description = E-Trust UC list

url = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list

http_mode = POST

work_modes = full,incremental,update

iterate_by = page



[params]

page_size_param = recordsOnPage

page_size_limit = 100

page_number_param = page



[data]

total_number_key = total

data_key = data

item_key = РеестровыйНомер

change_key = СтатусАккредитации.ДействуетС



[storage]

storage_type = zip

```



Add file params.json with parameters used with POST requests



``` json

{"page":1,"orderBy":"id","ascending":false,"recordsOnPage":100,"searchString":null,"cities":null,"software":null,"cryptToolClasses":null,"statuses":null}

```



Execute command \"estimate\" to see how long data will be collected and

how much space needed



``` bash

$ apibackuper estimate full

```



Output:



``` bash

Total records: 502

Records per request: 100

Total requests: 6

Average record size 32277.96 bytes

Estimated size (json lines) 16.20 MB

Avg request time, seconds 66.9260

Estimated all requests time, seconds 402.8947

```



Execute command \"run\" to collect the data. Result stored in

\"storage.zip\"



``` bash

$ apibackuper run full

```



Exports data from storage and saves as jsonl file called

\"etrust.jsonl\"



``` bash

$ apibackuper export jsonl etrust.jsonl

```



# Config options



Example config file



``` bash

[settings]

initialized = True

name = <name>

splitter = .



[project]

description = <description>

url = <url>

http_mode = <GET or POST>

work_modes = <combination of full,incremental,update>

iterate_by = <page or skip>



[params]

page_size_param = <page size param>

page_size_limit = <page size limit>

page_number_param = <page number>

count_skip_param = <key to iterate in skip mode>





[data]

total_number_key = <total number key>

data_key = <data key>

item_key = <item key>

change_key = <change key>



[follow]

follow_mode = <type of follow mode>

follow_pattern = <url prefix to follow links>

follow_data_key = <follow data item key>

follow_param = <follow param>

follow_item_key = <follow item key>



[files]

fetch_mode = <file fetch mode>

root_url = <file root url>

keys = <keys with file data>

storage_mode = <file storage mode>





[storage]

storage_type = zip

compression = True

```



## settings



-   name - short name of the project

-   splitter - value of field splitter. Needed for rare cases when \'.\'

    is part of field name. For example for OData requests and

    \'@odata.count\' field



## project



-   description - text that explains what for is this project

-   url - API endpoint url

-   http_mode - one of HTTP modes: GET or POST

-   work_modes - type of operations: full - archive everything,

    incremental - add new records only, update - collect changed data

    only

-   iterate_by - type of iteration of records. By \'page\' - default,

    page by page or by \'skip\' if skip value provided



## params



-   page_size_param - parameter with page size

-   page_size_limit - limit of records provided by API

-   page_number_param = parameter with page number

-   count_skip_param - parameter for \'skip\' type of iteration



## data



-   total_number_key - key in data with total number of records

-   data_key - key in data with list of records

-   item_key - key in data with unique identifier of the record. Could

    be group of keys separated with comma

-   change_key - key in data that indicates that record changed. Could

    be group of keys separated with comma



## follow



-   follow_mode - mode to follow objects. Could be \'url\' or \'item\'.

    If mode is \'url\' than follow_pattern not used

-   follow_pattern - url pattern / url prefix for followed objects. Only

    for mode \'item\'\'

-   follow_data_key - if object/objects are inside array, key of this

    array

-   follow_param - parameter used in \'item\' mode

-   follow_item_key - item key



## files



-   fetch_mode - file fetch mode. Could be \'prefix\' or \'id\'. Prefix

-   root_url - root url / prefix for files

-   keys - list of keys with urls/file id\'s to search for files to save

-   storage_mode - a way how files stored in storage/files.zip. By

    default \'filepath\' and files storaged same way as they presented

    in url



## storage



-   storage_type - type of local storage. \'zip\' is local zip file is

    default one

-   compression - if True than compressed ZIP file used, less space

    used, more CPU time processing data



# Usage



Synopsis:



``` bash

$ apibackuper [flags] [command] inputfile

```



See also `apibackuper --help`.



## Examples



Create project \"budgettofk\":



``` bash

$ apibackuper create budgettofk

```



Estimate execution time for \'budgettofk\' project. Should be called in

project dir or project dir provided via -p parameter:



``` bash

$ apibackuper estimate full -p budgettofk

```



Output



``` bash

Total records: 12282

Records per request: 500

Total requests: 25

Average record size 1293.60 bytes

Estimated size (json lines) 15.89 MB

Avg request time, seconds 1.8015

Estimated all requests time, seconds 46.0536

```



Run project. Should be called in project dir or project dir provided via

-p parameter



``` bash

$ apibackuper run full

```



Export data from project. Should be called in project dir or project dir

provided via -p parameter



``` bash

$ apibackuper export jsonl hhemployers.jsonl -p hhemployers

```



Follows each object of downloaded data and does requests for each

objects .. code-block:: bash



> \$ apibackuper follow continue



Downloads all files associated with API objects .. code-block:: bash



> \$ apibackuper getfiles



# Advanced



TBD


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/datacoon/apibackuper/",
    "name": "apibackuper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "api json jsonl csv bson cli dataset",
    "author": "Ivan Begtin",
    "author_email": "ivan@begtin.tech",
    "download_url": "https://files.pythonhosted.org/packages/eb/0a/896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec/apibackuper-1.0.11.tar.gz",
    "platform": null,
    "description": "---\r\r\ntitle: apibackuper \\-- a command-line tool to archive/backup API calls\r\r\n---\r\r\n\r\r\napibackuper is a command line tool to archive/backup API calls. It\\'s\r\r\ngoal to download all data behind REST API and to archive it to local\r\r\nstorage. This tool designed to backup API data, so simple as possible.\r\r\n\r\r\n::: contents\r\r\n:::\r\r\n\r\r\n::: section-numbering\r\r\n:::\r\r\n\r\r\n# History\r\r\n\r\r\nThis tool was developed optimize backup/archival procedures for Russian\r\r\ngovernment information from E-Budget portal budget.gov.ru and some other\r\r\ngovernment IT systems too. Examples of tool usage could be found in\r\r\n\\\"examples\\\" directory\r\r\n\r\r\n# Main features\r\r\n\r\r\n-   Any GET/POST iterative API supported\r\r\n-   Allows to estimate time required to backup API\r\r\n-   Stores data inside ZIP container\r\r\n-   Supports export of backup data as JSON lines file\r\r\n-   Documentation\r\r\n-   Test coverage\r\r\n\r\r\n# Installation\r\r\n\r\r\n## Linux\r\r\n\r\r\nMost Linux distributions provide a package that can be installed using\r\r\nthe system package manager, for example:\r\r\n\r\r\n``` bash\r\r\n# Debian, Ubuntu, etc.\r\r\n$ apt install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# Fedora\r\r\n$ dnf install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# CentOS, RHEL, ...\r\r\n$ yum install apibackuper\r\r\n```\r\r\n\r\r\n``` bash\r\r\n# Arch Linux\r\r\n$ pacman -S apibackuper\r\r\n```\r\r\n\r\r\n## Windows, etc.\r\r\n\r\r\nA universal installation method (that works on Windows, Mac OS X, Linux,\r\r\n\u0432\u0402\u00a6, and always provides the latest version) is to use pip:\r\r\n\r\r\n``` bash\r\r\n# Make sure we have an up-to-date version of pip and setuptools:\r\r\n$ pip install --upgrade pip setuptools\r\r\n\r\r\n$ pip install --upgrade apibackuper\r\r\n```\r\r\n\r\r\n(If `pip` installation fails for some reason, you can try\r\r\n`easy_install apibackuper` as a fallback.)\r\r\n\r\r\n## Python version\r\r\n\r\r\nPython version 3.6 or greater is required.\r\r\n\r\r\n# Quickstart\r\r\n\r\r\nThis example is about backup of Russian certificate authorities. List of\r\r\nthem published at e-trust.gosuslugi.ru and available via undocumented\r\r\nAPI.\r\r\n\r\r\n``` bash\r\r\n$ apibackuper create etrust\r\r\n$ cd etrust\r\r\n```\r\r\n\r\r\nEdit apibackuper.cfg as:\r\r\n\r\r\n``` bash\r\r\n[settings]\r\r\ninitialized = True\r\r\nname = etrust\r\r\n\r\r\n[project]\r\r\ndescription = E-Trust UC list\r\r\nurl = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list\r\r\nhttp_mode = POST\r\r\nwork_modes = full,incremental,update\r\r\niterate_by = page\r\r\n\r\r\n[params]\r\r\npage_size_param = recordsOnPage\r\r\npage_size_limit = 100\r\r\npage_number_param = page\r\r\n\r\r\n[data]\r\r\ntotal_number_key = total\r\r\ndata_key = data\r\r\nitem_key = \u0420\u0435\u0435\u0441\u0442\u0440\u043e\u0432\u044b\u0439\u041d\u043e\u043c\u0435\u0440\r\r\nchange_key = \u0421\u0442\u0430\u0442\u0443\u0441\u0410\u043a\u043a\u0440\u0435\u0434\u0438\u0442\u0430\u0446\u0438\u0438.\u0414\u0435\u0439\u0441\u0442\u0432\u0443\u0435\u0442\u0421\r\r\n\r\r\n[storage]\r\r\nstorage_type = zip\r\r\n```\r\r\n\r\r\nAdd file params.json with parameters used with POST requests\r\r\n\r\r\n``` json\r\r\n{\"page\":1,\"orderBy\":\"id\",\"ascending\":false,\"recordsOnPage\":100,\"searchString\":null,\"cities\":null,\"software\":null,\"cryptToolClasses\":null,\"statuses\":null}\r\r\n```\r\r\n\r\r\nExecute command \\\"estimate\\\" to see how long data will be collected and\r\r\nhow much space needed\r\r\n\r\r\n``` bash\r\r\n$ apibackuper estimate full\r\r\n```\r\r\n\r\r\nOutput:\r\r\n\r\r\n``` bash\r\r\nTotal records: 502\r\r\nRecords per request: 100\r\r\nTotal requests: 6\r\r\nAverage record size 32277.96 bytes\r\r\nEstimated size (json lines) 16.20 MB\r\r\nAvg request time, seconds 66.9260\r\r\nEstimated all requests time, seconds 402.8947\r\r\n```\r\r\n\r\r\nExecute command \\\"run\\\" to collect the data. Result stored in\r\r\n\\\"storage.zip\\\"\r\r\n\r\r\n``` bash\r\r\n$ apibackuper run full\r\r\n```\r\r\n\r\r\nExports data from storage and saves as jsonl file called\r\r\n\\\"etrust.jsonl\\\"\r\r\n\r\r\n``` bash\r\r\n$ apibackuper export jsonl etrust.jsonl\r\r\n```\r\r\n\r\r\n# Config options\r\r\n\r\r\nExample config file\r\r\n\r\r\n``` bash\r\r\n[settings]\r\r\ninitialized = True\r\r\nname = <name>\r\r\nsplitter = .\r\r\n\r\r\n[project]\r\r\ndescription = <description>\r\r\nurl = <url>\r\r\nhttp_mode = <GET or POST>\r\r\nwork_modes = <combination of full,incremental,update>\r\r\niterate_by = <page or skip>\r\r\n\r\r\n[params]\r\r\npage_size_param = <page size param>\r\r\npage_size_limit = <page size limit>\r\r\npage_number_param = <page number>\r\r\ncount_skip_param = <key to iterate in skip mode>\r\r\n\r\r\n\r\r\n[data]\r\r\ntotal_number_key = <total number key>\r\r\ndata_key = <data key>\r\r\nitem_key = <item key>\r\r\nchange_key = <change key>\r\r\n\r\r\n[follow]\r\r\nfollow_mode = <type of follow mode>\r\r\nfollow_pattern = <url prefix to follow links>\r\r\nfollow_data_key = <follow data item key>\r\r\nfollow_param = <follow param>\r\r\nfollow_item_key = <follow item key>\r\r\n\r\r\n[files]\r\r\nfetch_mode = <file fetch mode>\r\r\nroot_url = <file root url>\r\r\nkeys = <keys with file data>\r\r\nstorage_mode = <file storage mode>\r\r\n\r\r\n\r\r\n[storage]\r\r\nstorage_type = zip\r\r\ncompression = True\r\r\n```\r\r\n\r\r\n## settings\r\r\n\r\r\n-   name - short name of the project\r\r\n-   splitter - value of field splitter. Needed for rare cases when \\'.\\'\r\r\n    is part of field name. For example for OData requests and\r\r\n    \\'@odata.count\\' field\r\r\n\r\r\n## project\r\r\n\r\r\n-   description - text that explains what for is this project\r\r\n-   url - API endpoint url\r\r\n-   http_mode - one of HTTP modes: GET or POST\r\r\n-   work_modes - type of operations: full - archive everything,\r\r\n    incremental - add new records only, update - collect changed data\r\r\n    only\r\r\n-   iterate_by - type of iteration of records. By \\'page\\' - default,\r\r\n    page by page or by \\'skip\\' if skip value provided\r\r\n\r\r\n## params\r\r\n\r\r\n-   page_size_param - parameter with page size\r\r\n-   page_size_limit - limit of records provided by API\r\r\n-   page_number_param = parameter with page number\r\r\n-   count_skip_param - parameter for \\'skip\\' type of iteration\r\r\n\r\r\n## data\r\r\n\r\r\n-   total_number_key - key in data with total number of records\r\r\n-   data_key - key in data with list of records\r\r\n-   item_key - key in data with unique identifier of the record. Could\r\r\n    be group of keys separated with comma\r\r\n-   change_key - key in data that indicates that record changed. Could\r\r\n    be group of keys separated with comma\r\r\n\r\r\n## follow\r\r\n\r\r\n-   follow_mode - mode to follow objects. Could be \\'url\\' or \\'item\\'.\r\r\n    If mode is \\'url\\' than follow_pattern not used\r\r\n-   follow_pattern - url pattern / url prefix for followed objects. Only\r\r\n    for mode \\'item\\'\\'\r\r\n-   follow_data_key - if object/objects are inside array, key of this\r\r\n    array\r\r\n-   follow_param - parameter used in \\'item\\' mode\r\r\n-   follow_item_key - item key\r\r\n\r\r\n## files\r\r\n\r\r\n-   fetch_mode - file fetch mode. Could be \\'prefix\\' or \\'id\\'. Prefix\r\r\n-   root_url - root url / prefix for files\r\r\n-   keys - list of keys with urls/file id\\'s to search for files to save\r\r\n-   storage_mode - a way how files stored in storage/files.zip. By\r\r\n    default \\'filepath\\' and files storaged same way as they presented\r\r\n    in url\r\r\n\r\r\n## storage\r\r\n\r\r\n-   storage_type - type of local storage. \\'zip\\' is local zip file is\r\r\n    default one\r\r\n-   compression - if True than compressed ZIP file used, less space\r\r\n    used, more CPU time processing data\r\r\n\r\r\n# Usage\r\r\n\r\r\nSynopsis:\r\r\n\r\r\n``` bash\r\r\n$ apibackuper [flags] [command] inputfile\r\r\n```\r\r\n\r\r\nSee also `apibackuper --help`.\r\r\n\r\r\n## Examples\r\r\n\r\r\nCreate project \\\"budgettofk\\\":\r\r\n\r\r\n``` bash\r\r\n$ apibackuper create budgettofk\r\r\n```\r\r\n\r\r\nEstimate execution time for \\'budgettofk\\' project. Should be called in\r\r\nproject dir or project dir provided via -p parameter:\r\r\n\r\r\n``` bash\r\r\n$ apibackuper estimate full -p budgettofk\r\r\n```\r\r\n\r\r\nOutput\r\r\n\r\r\n``` bash\r\r\nTotal records: 12282\r\r\nRecords per request: 500\r\r\nTotal requests: 25\r\r\nAverage record size 1293.60 bytes\r\r\nEstimated size (json lines) 15.89 MB\r\r\nAvg request time, seconds 1.8015\r\r\nEstimated all requests time, seconds 46.0536\r\r\n```\r\r\n\r\r\nRun project. Should be called in project dir or project dir provided via\r\r\n-p parameter\r\r\n\r\r\n``` bash\r\r\n$ apibackuper run full\r\r\n```\r\r\n\r\r\nExport data from project. Should be called in project dir or project dir\r\r\nprovided via -p parameter\r\r\n\r\r\n``` bash\r\r\n$ apibackuper export jsonl hhemployers.jsonl -p hhemployers\r\r\n```\r\r\n\r\r\nFollows each object of downloaded data and does requests for each\r\r\nobjects .. code-block:: bash\r\r\n\r\r\n> \\$ apibackuper follow continue\r\r\n\r\r\nDownloads all files associated with API objects .. code-block:: bash\r\r\n\r\r\n> \\$ apibackuper getfiles\r\r\n\r\r\n# Advanced\r\r\n\r\r\nTBD\r\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "apibackuper: a command-line tool and python library for API backuping",
    "version": "1.0.11",
    "project_urls": {
        "Download": "https://github.com/datacoon/apibackuper/",
        "Homepage": "https://github.com/datacoon/apibackuper/"
    },
    "split_keywords": [
        "api",
        "json",
        "jsonl",
        "csv",
        "bson",
        "cli",
        "dataset"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eb0a896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec",
                "md5": "564d768acfd65a8f3f1d5f86150224e6",
                "sha256": "e31215e4e52e2c1828bcbf0ba916ffde8dc77c5b4a719ae69f7eb0cf79495d58"
            },
            "downloads": -1,
            "filename": "apibackuper-1.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "564d768acfd65a8f3f1d5f86150224e6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21870,
            "upload_time": "2024-07-08T09:31:49",
            "upload_time_iso_8601": "2024-07-08T09:31:49.662826Z",
            "url": "https://files.pythonhosted.org/packages/eb/0a/896d20dee8719b93a11d0212a2ae6180fb078cbdb9186ae7e980a9247dec/apibackuper-1.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-08 09:31:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datacoon",
    "github_project": "apibackuper",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": false,
    "requirements": [
        {
            "name": "aria2p",
            "specs": [
                [
                    ">=",
                    "0.11.3"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.1.6"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    ">=",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.4.2"
                ]
            ]
        },
        {
            "name": "Requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    ">=",
                    "65.5.0"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    ">=",
                    "2.0.6"
                ]
            ]
        },
        {
            "name": "xmltodict",
            "specs": [
                [
                    ">=",
                    "0.13.0"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "apibackuper"
}
        
Elapsed time: 0.28125s