python-postgres-cdc


Namepython-postgres-cdc JSON
Version 0.0.0rc2 PyPI version JSON
download
home_page
SummaryChange Data Capture (CDC) library for Postgres
upload_time2023-09-24 22:22:34
maintainer
docs_urlNone
author
requires_python>=3.7
licenseMIT License Copyright (c) [2020] [Daniel Geals] Copyright (c) [2023] [Roman Kutlak] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords postgres cdc change data capture logical replication outbox
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pypgcdc

Change Data Capture (CDC) tool for Postgres

This project is a Python implementation of a Postgres CDC client.
It is intended to be used as a library for building CDC applications.
It is not intended to be used as a standalone application but there is a running example useful as a starting point.

The main problem with CDC and Postgres in practice is that Postgres runs with a hot stand-by used for failover
in case of a primary node failure. The change is usually behind a DNS record so clients just re-connect and run fine.
The problem is that the replication slot is not copied over to the new primary node and the CDC client will either
fail to start (no slot) or create a slot but potentially miss some data (slot created after some data was written).

You can, of course, create a new slot, do the initial sync where you copy over all the available data, 
and then start replicating as usual. The problem is that the initial sync can take a long time if you have big tables.

This library doesn't really solve the problem, but it provides a way to react to the failover event. The idea is
that you add triggers to the tables you want to replicate (_published tables_) and store the inserts/updates/deletes
in separate tables (_log tables_).
The initial sync when you start your CDC app can just `select *` from the published tables and then tail the changes
on these tables. You will need some persistence to store the commits you have already processed so when a failover
event occurs, you can use the last processed commit to select any changes from the log tables and then carry on
tailing the published tables. This way you can avoid the initial sync and catch up with any pending changes faster.


## Env Vars

* PYPGCDC_DSN, default "postgres://postgres:postgrespw@localhost:5432/test" -- Postgres connection string
* PYPGCDC_SLOT, default "test_slot" -- Postgres replication slot name
* PYPGCDC_PUBLICATION, default "test_publication" -- Postgres publication name
* PYPGCDC_LSN, default 0 -- Postgres LSN to start from
* PYPGCDC_VERBOSE, default "False" -- A flag used to control print output of the example datastore. 
  Use one of ("1", "true", "yes") to enable more verbose output.


## Example

The library comes with an example which can be used to see how it works. The example requires a running 
Postgres database with some tables and an existing publication. The example will create a replication slot
if it doesn't exist and start tailing the changes. The example will print the changes to stdout.

Once you finish with the example, remember to drop the replication slot. Leaving an unused replication slot
is dangerous as the WAL files used for replication might not be removed, and you can run out of disk space
(not an issue on your local computer but quite a problem on your production servers...).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "python-postgres-cdc",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "postgres,CDC,change,data,capture,logical,replication,outbox",
    "author": "",
    "author_email": "Roman Kutlak <roman@kutlak.net>",
    "download_url": "https://files.pythonhosted.org/packages/ac/90/0c79e44c7a2707964ea74e81330d471238844728628b018534bf882fed42/python-postgres-cdc-0.0.0rc2.tar.gz",
    "platform": null,
    "description": "# pypgcdc\n\nChange Data Capture (CDC) tool for Postgres\n\nThis project is a Python implementation of a Postgres CDC client.\nIt is intended to be used as a library for building CDC applications.\nIt is not intended to be used as a standalone application but there is a running example useful as a starting point.\n\nThe main problem with CDC and Postgres in practice is that Postgres runs with a hot stand-by used for failover\nin case of a primary node failure. The change is usually behind a DNS record so clients just re-connect and run fine.\nThe problem is that the replication slot is not copied over to the new primary node and the CDC client will either\nfail to start (no slot) or create a slot but potentially miss some data (slot created after some data was written).\n\nYou can, of course, create a new slot, do the initial sync where you copy over all the available data, \nand then start replicating as usual. The problem is that the initial sync can take a long time if you have big tables.\n\nThis library doesn't really solve the problem, but it provides a way to react to the failover event. The idea is\nthat you add triggers to the tables you want to replicate (_published tables_) and store the inserts/updates/deletes\nin separate tables (_log tables_).\nThe initial sync when you start your CDC app can just `select *` from the published tables and then tail the changes\non these tables. You will need some persistence to store the commits you have already processed so when a failover\nevent occurs, you can use the last processed commit to select any changes from the log tables and then carry on\ntailing the published tables. This way you can avoid the initial sync and catch up with any pending changes faster.\n\n\n## Env Vars\n\n* PYPGCDC_DSN, default \"postgres://postgres:postgrespw@localhost:5432/test\" -- Postgres connection string\n* PYPGCDC_SLOT, default \"test_slot\" -- Postgres replication slot name\n* PYPGCDC_PUBLICATION, default \"test_publication\" -- Postgres publication name\n* PYPGCDC_LSN, default 0 -- Postgres LSN to start from\n* PYPGCDC_VERBOSE, default \"False\" -- A flag used to control print output of the example datastore. \n  Use one of (\"1\", \"true\", \"yes\") to enable more verbose output.\n\n\n## Example\n\nThe library comes with an example which can be used to see how it works. The example requires a running \nPostgres database with some tables and an existing publication. The example will create a replication slot\nif it doesn't exist and start tailing the changes. The example will print the changes to stdout.\n\nOnce you finish with the example, remember to drop the replication slot. Leaving an unused replication slot\nis dangerous as the WAL files used for replication might not be removed, and you can run out of disk space\n(not an issue on your local computer but quite a problem on your production servers...).\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) [2020] [Daniel Geals] Copyright (c) [2023] [Roman Kutlak]  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Change Data Capture (CDC) library for Postgres",
    "version": "0.0.0rc2",
    "project_urls": {
        "Homepage": "https://github.com/roman-kutlak/pypgcdc"
    },
    "split_keywords": [
        "postgres",
        "cdc",
        "change",
        "data",
        "capture",
        "logical",
        "replication",
        "outbox"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "54e5d688d5da2f80d35939fe8ba251d7d217c37669cca9b0e5f69d7a25cfc10d",
                "md5": "9f70a831513c6a832183b4547aa39b2a",
                "sha256": "6bba92112459e7f53abc2ee303ac7c6d6cdb3bf88bf1d4d23d8c177425764be0"
            },
            "downloads": -1,
            "filename": "python_postgres_cdc-0.0.0rc2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9f70a831513c6a832183b4547aa39b2a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20587,
            "upload_time": "2023-09-24T22:22:32",
            "upload_time_iso_8601": "2023-09-24T22:22:32.233003Z",
            "url": "https://files.pythonhosted.org/packages/54/e5/d688d5da2f80d35939fe8ba251d7d217c37669cca9b0e5f69d7a25cfc10d/python_postgres_cdc-0.0.0rc2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ac900c79e44c7a2707964ea74e81330d471238844728628b018534bf882fed42",
                "md5": "48f9c3b62784f70799306c0fcf95c155",
                "sha256": "64de0862c4a253c6acf5fecfb4089aecc833c5399ce71cf5ac3d4c582482d129"
            },
            "downloads": -1,
            "filename": "python-postgres-cdc-0.0.0rc2.tar.gz",
            "has_sig": false,
            "md5_digest": "48f9c3b62784f70799306c0fcf95c155",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 18886,
            "upload_time": "2023-09-24T22:22:34",
            "upload_time_iso_8601": "2023-09-24T22:22:34.175292Z",
            "url": "https://files.pythonhosted.org/packages/ac/90/0c79e44c7a2707964ea74e81330d471238844728628b018534bf882fed42/python-postgres-cdc-0.0.0rc2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-24 22:22:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "roman-kutlak",
    "github_project": "pypgcdc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "python-postgres-cdc"
}
        
Elapsed time: 0.12579s