pg-seldump


Namepg-seldump JSON
Version 0.3 PyPI version JSON
download
home_pagehttps://github.com/dvarrazzo/pg_seldump
SummaryPostgreSQL selective data dump
upload_time2023-02-06 13:33:19
maintainer
docs_urlNone
authorDaniele Varrazzo
requires_python>= 3.7
licenseBSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            .. image:: https://travis-ci.org/dvarrazzo/pg_seldump.svg?branch=master
    :target: https://travis-ci.org/dvarrazzo/pg_seldump
    :alt: Build status

This tool allows to extract data from a PostgreSQL_ database with greater
flexibility that tools like pg_dump_ allow.

``pg_seldump`` reads one of more dump definitions from YAML files and selects
what tables or other database objects to save. It is possible to extract only
certain columns of the tables, only certain records, or to replace certain
values with a different expression, for instance to anonymize data.

The output of the program is a text file which can be used by psql_ to
restore data into a database with a complete schema but with no data (or at
least no conflicting data), e.g. using::

    $ pg_seldump --dsn="dbname=sourcedb" datadump.yaml > dump.sql
    ...
    $ psql -1X --set ON_ERROR_STOP=1 -f dump.sql "dbname=targetdb"

.. _PostgreSQL: https://www.postgresql.org/
.. _pg_dump: https://www.postgresql.org/docs/current/app-pgdump.html
.. _psql: https://www.postgresql.org/docs/current/app-psql.html


Program usage
-------------

Usage::

    pg_seldump [-h] [--version] [--dsn DSN] [--outfile OUTFILE] [--test]
               [-q | -v]
               config [config ...]

    Create a selective dump of a PostgreSQL database.

    positional arguments:
      config                yaml file describing the data to dump

    optional arguments:
      -h, --help            show this help message and exit
      --version             show program's version number and exit
      --dsn DSN             database connection string [default: '']
      --outfile OUTFILE, -o OUTFILE
                            the file where to save the dump [default: stdout]
      --test                test the configuration to verify it works as expected
      -q, --quiet           talk less
      -v, --verbose         talk more

The ``config`` files must be YAML_ files containing a ``db_objects`` list of
entries. Each entry may have:

.. _YAML: https://yaml.org/

Selectors (all the specified ones must match):

- ``name``: name of the db object to dump
- ``names``: list of names or regex of db objects to dump
- ``schema``: schema name of the db object to dump
- ``schemas``: list of schema names or regexp to match schema names of the
  db object to dump
- ``kind``: kind of object to match. Can be:

  - ``table``
  - ``sequence``
  - ``paritioned table``
  - ``materialized view``

- ``kinds``: list of kind of objects to match (like for ``kind``)
- ``adjust_score``: adjustment for the match score to break rules ties

.. note::
    Sequences are selected automatically if they are used in default values by
    dumped tables.

Data modifiers:

- ``action``: what to do with the matched object:

  - ``dump``: dump the object in the output (default)
  - ``skip``: don't dump the object
  - ``error``: raise an error in case of match (useful to create strict
    descriptions where all the db objects must be mentioned explicitly)

- ``no_columns``: list of columns names to omit
- ``filter``: WHERE condition to include only a subset of the records in the dump
- ``replace``: mapping from column names to SQL expressions to replace values
  into the dump with somethings else

The objects in the database are matched to the rules in the config files.
Every match will have a score according to how specific was the selector
matched the object.

- ``name`` or ``names`` list: 1000
- ``names`` regexp: 500
- ``schema`` or ``schemas`` list: 100
- ``schemas`` regexp: 50
- ``kind`` or ``kinds``: 10

The rule with the highest score will apply. If two rules have exactly the same
score the program will report an error: you can use ``adjust_score`` to break
the tie.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dvarrazzo/pg_seldump",
    "name": "pg-seldump",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">= 3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Daniele Varrazzo",
    "author_email": "daniele.varrazzo@gmail.com",
    "download_url": "",
    "platform": null,
    "description": ".. image:: https://travis-ci.org/dvarrazzo/pg_seldump.svg?branch=master\n    :target: https://travis-ci.org/dvarrazzo/pg_seldump\n    :alt: Build status\n\nThis tool allows to extract data from a PostgreSQL_ database with greater\nflexibility that tools like pg_dump_ allow.\n\n``pg_seldump`` reads one of more dump definitions from YAML files and selects\nwhat tables or other database objects to save. It is possible to extract only\ncertain columns of the tables, only certain records, or to replace certain\nvalues with a different expression, for instance to anonymize data.\n\nThe output of the program is a text file which can be used by psql_ to\nrestore data into a database with a complete schema but with no data (or at\nleast no conflicting data), e.g. using::\n\n    $ pg_seldump --dsn=\"dbname=sourcedb\" datadump.yaml > dump.sql\n    ...\n    $ psql -1X --set ON_ERROR_STOP=1 -f dump.sql \"dbname=targetdb\"\n\n.. _PostgreSQL: https://www.postgresql.org/\n.. _pg_dump: https://www.postgresql.org/docs/current/app-pgdump.html\n.. _psql: https://www.postgresql.org/docs/current/app-psql.html\n\n\nProgram usage\n-------------\n\nUsage::\n\n    pg_seldump [-h] [--version] [--dsn DSN] [--outfile OUTFILE] [--test]\n               [-q | -v]\n               config [config ...]\n\n    Create a selective dump of a PostgreSQL database.\n\n    positional arguments:\n      config                yaml file describing the data to dump\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      --version             show program's version number and exit\n      --dsn DSN             database connection string [default: '']\n      --outfile OUTFILE, -o OUTFILE\n                            the file where to save the dump [default: stdout]\n      --test                test the configuration to verify it works as expected\n      -q, --quiet           talk less\n      -v, --verbose         talk more\n\nThe ``config`` files must be YAML_ files containing a ``db_objects`` list of\nentries. Each entry may have:\n\n.. _YAML: https://yaml.org/\n\nSelectors (all the specified ones must match):\n\n- ``name``: name of the db object to dump\n- ``names``: list of names or regex of db objects to dump\n- ``schema``: schema name of the db object to dump\n- ``schemas``: list of schema names or regexp to match schema names of the\n  db object to dump\n- ``kind``: kind of object to match. Can be:\n\n  - ``table``\n  - ``sequence``\n  - ``paritioned table``\n  - ``materialized view``\n\n- ``kinds``: list of kind of objects to match (like for ``kind``)\n- ``adjust_score``: adjustment for the match score to break rules ties\n\n.. note::\n    Sequences are selected automatically if they are used in default values by\n    dumped tables.\n\nData modifiers:\n\n- ``action``: what to do with the matched object:\n\n  - ``dump``: dump the object in the output (default)\n  - ``skip``: don't dump the object\n  - ``error``: raise an error in case of match (useful to create strict\n    descriptions where all the db objects must be mentioned explicitly)\n\n- ``no_columns``: list of columns names to omit\n- ``filter``: WHERE condition to include only a subset of the records in the dump\n- ``replace``: mapping from column names to SQL expressions to replace values\n  into the dump with somethings else\n\nThe objects in the database are matched to the rules in the config files.\nEvery match will have a score according to how specific was the selector\nmatched the object.\n\n- ``name`` or ``names`` list: 1000\n- ``names`` regexp: 500\n- ``schema`` or ``schemas`` list: 100\n- ``schemas`` regexp: 50\n- ``kind`` or ``kinds``: 10\n\nThe rule with the highest score will apply. If two rules have exactly the same\nscore the program will report an error: you can use ``adjust_score`` to break\nthe tie.\n\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "PostgreSQL selective data dump",
    "version": "0.3",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c9d8df4ee7b1852b94dd8df88298f5081c4c67c2f56db9268f8bc171f94db33",
                "md5": "39b87382a54dacb7ca4821cde36bddda",
                "sha256": "ef6520f2646c0d9e64585ad76307fe3a2c648263fc19402e19cac5d61952b24b"
            },
            "downloads": -1,
            "filename": "pg_seldump-0.3-py3-none-any.whl",
            "has_sig": true,
            "md5_digest": "39b87382a54dacb7ca4821cde36bddda",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">= 3.7",
            "size": 28057,
            "upload_time": "2023-02-06T13:33:19",
            "upload_time_iso_8601": "2023-02-06T13:33:19.063221Z",
            "url": "https://files.pythonhosted.org/packages/5c/9d/8df4ee7b1852b94dd8df88298f5081c4c67c2f56db9268f8bc171f94db33/pg_seldump-0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-06 13:33:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "dvarrazzo",
    "github_project": "pg_seldump",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "pg-seldump"
}
        
Elapsed time: 0.03821s