sqlite-chronicle


Namesqlite-chronicle JSON
Version 0.2.1 PyPI version JSON
download
home_page
SummaryUse triggers to maintain a chronicle table of updated/deleted timestamps in SQLite
upload_time2023-12-06 05:21:20
maintainer
docs_urlNone
authorSimon Willison
requires_python
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # sqlite-chronicle

[![PyPI](https://img.shields.io/pypi/v/sqlite-chronicle.svg)](https://pypi.org/project/sqlite-chronicle/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-chronicle?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-chronicle/releases)
[![Tests](https://github.com/simonw/sqlite-chronicle/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-chronicle/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-chronicle/blob/main/LICENSE)

Use triggers to track when rows in a SQLite table were updated or deleted

## Installation

```bash
pip install sqlite-chronicle
```

## enable_chronicle(conn, table_name)

This module provides a function: `sqlite_chronicle.enable_chronicle(conn, table_name)`, which does the following:

1. Checks if a `_chronicle_{table_name}` table exists already. If so, it does nothing. Otherwise...
2. Creates that table, with the same primary key columns as the original table plus integer columns `added_ms`, `updated_ms`, `version` and `deleted`
3. Creates a new row in the chronicle table corresponding to every row in the original table, setting `added_ms` and `updated_ms` to the current timestamp in milliseconds, and `version` column that starts at 1 and increments for each subsequent row
4. Sets up three triggers on the table:
  - An after insert trigger, which creates a new row in the chronicle table, sets `added_ms` and `updated_ms` to the current time and increments the `version`
  - An after update trigger, which updates the `updated_ms` timestamp and also updates any primary keys if they have changed (likely extremely rare) plus increments the `version`
  - An after delete trigger, which updates the `updated_ms`, increments the `version` and places a `1` in the `deleted` column

The function will raise a `sqlite_chronicle.ChronicleError` exception if the table does not have a single or compound primary key.

Note that the `version` for a table is a globally incrementing number, so every time it is set it will be set to the current `max(version)` + 1 for that entire table.

The end result is a chronicle table that looks something like this:

|  id |    added_ms  | updated_ms | version | deleted |
|-----|---------------|---------|--------|---------|
|  47 | 1694408890954 | 1694408890954 | 2 |      0 |
|  48 | 1694408874863 | 1694408874863 | 3 |      1 |
|   1 | 1694408825192 | 1694408825192 | 4 |      0 |
|   2 | 1694408825192 | 1694408825192 | 5 |      0 |
|   3 | 1694408825192 | 1694408825192 | 6 |      0 |

## updates_since(conn, table_name, since=None, batch_size=1000)

The `sqlite_chronicle.updates_since()` function returns a generator over a list of `Change` objects.

These objects represent changes that have occurred to rows in the table since the `since` version number, or since the beginning of time if `since` is not provided.

- `conn` is a SQLite connection object
- `table_name` is a string containing the name of the table to get changes for
- `since` is an optional integer version number - if not provided, all changes will be returned
- `batch_size` is an internal detail, controlling the number of rows that are returned from the database at a time. You should not need to change this as the function implements its own internal pagination.

Each `Change` returned from the generator looks something like this:

```python
Change(
    pks=(5,),
    added_ms=1701836971223,
    updated_ms=1701836971223,
    version=5,
    row={'id': 5, 'name': 'Simon'},
    deleted=0
)
```
A `Change` is a dataclass with the following properties:

- `pks` is a tuple of the primary key values for the row - this will be a tuple with a single item for normal primary keys, or multiple items for compound primary keys
- `added_ms` is the timestamp in milliseconds when the row was added
- `updated_ms` is the timestamp in milliseconds when the row was last updated
- `version` is the version number for the row - you can use this as a `since` value to get changes since that point
- `row` is a dictionary containing the current values for the row - these will be `None` if the row has been deleted (except for the primary keys)
- `deleted` is `0` if the row has not been deleted, or `1` if it has been deleted

Any time you call this you should track the last `version` number that you see, so you can pass it as the `since` value in future calls to get changes that occurred since that point.

Note that if a row had multiple updates in between calls to this function you will still only see one `Change` object for that row - the `updated_ms` and `version` will reflect the most recent update.

## Potential applications

Chronicle tables can be used to efficiently answer the question "what rows have been inserted, updated or deleted since I last checked" - by looking at the `version` column which has an index to make it fast to answer that question.

This has numerous potential applications, including:

- Synchronization and replication: other databases can "subscribe" to tables, keeping track of when they last refreshed their copy and requesting just rows that changed since the last time - and deleting rows that have been marked as deleted.
- Indexing: if you need to update an Elasticsearch index or a vector database embeddings index or similar you can run against just the records that changed since your last run - see also [The denormalized query engine design pattern](https://2017.djangocon.us/talks/the-denormalized-query-engine-design-pattern/)
- Enrichments: [datasette-enrichments](https://github.com/datasette/datasette-enrichments) needs to to persist something that says "every address column should be geocoded" - then have an enrichment that runs every X seconds and looks for newly inserted or updated rows and enriches just those.
- Showing people what has changed since their last visit - "52 rows have been updated and 16 deleted since yesterday" kind of thing.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sqlite-chronicle",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/22/d4/488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47/sqlite-chronicle-0.2.1.tar.gz",
    "platform": null,
    "description": "# sqlite-chronicle\n\n[![PyPI](https://img.shields.io/pypi/v/sqlite-chronicle.svg)](https://pypi.org/project/sqlite-chronicle/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-chronicle?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-chronicle/releases)\n[![Tests](https://github.com/simonw/sqlite-chronicle/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-chronicle/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-chronicle/blob/main/LICENSE)\n\nUse triggers to track when rows in a SQLite table were updated or deleted\n\n## Installation\n\n```bash\npip install sqlite-chronicle\n```\n\n## enable_chronicle(conn, table_name)\n\nThis module provides a function: `sqlite_chronicle.enable_chronicle(conn, table_name)`, which does the following:\n\n1. Checks if a `_chronicle_{table_name}` table exists already. If so, it does nothing. Otherwise...\n2. Creates that table, with the same primary key columns as the original table plus integer columns `added_ms`, `updated_ms`, `version` and `deleted`\n3. Creates a new row in the chronicle table corresponding to every row in the original table, setting `added_ms` and `updated_ms` to the current timestamp in milliseconds, and `version` column that starts at 1 and increments for each subsequent row\n4. Sets up three triggers on the table:\n  - An after insert trigger, which creates a new row in the chronicle table, sets `added_ms` and `updated_ms` to the current time and increments the `version`\n  - An after update trigger, which updates the `updated_ms` timestamp and also updates any primary keys if they have changed (likely extremely rare) plus increments the `version`\n  - An after delete trigger, which updates the `updated_ms`, increments the `version` and places a `1` in the `deleted` column\n\nThe function will raise a `sqlite_chronicle.ChronicleError` exception if the table does not have a single or compound primary key.\n\nNote that the `version` for a table is a globally incrementing number, so every time it is set it will be set to the current `max(version)` + 1 for that entire table.\n\nThe end result is a chronicle table that looks something like this:\n\n|  id |    added_ms  | updated_ms | version | deleted |\n|-----|---------------|---------|--------|---------|\n|  47 | 1694408890954 | 1694408890954 | 2 |      0 |\n|  48 | 1694408874863 | 1694408874863 | 3 |      1 |\n|   1 | 1694408825192 | 1694408825192 | 4 |      0 |\n|   2 | 1694408825192 | 1694408825192 | 5 |      0 |\n|   3 | 1694408825192 | 1694408825192 | 6 |      0 |\n\n## updates_since(conn, table_name, since=None, batch_size=1000)\n\nThe `sqlite_chronicle.updates_since()` function returns a generator over a list of `Change` objects.\n\nThese objects represent changes that have occurred to rows in the table since the `since` version number, or since the beginning of time if `since` is not provided.\n\n- `conn` is a SQLite connection object\n- `table_name` is a string containing the name of the table to get changes for\n- `since` is an optional integer version number - if not provided, all changes will be returned\n- `batch_size` is an internal detail, controlling the number of rows that are returned from the database at a time. You should not need to change this as the function implements its own internal pagination.\n\nEach `Change` returned from the generator looks something like this:\n\n```python\nChange(\n    pks=(5,),\n    added_ms=1701836971223,\n    updated_ms=1701836971223,\n    version=5,\n    row={'id': 5, 'name': 'Simon'},\n    deleted=0\n)\n```\nA `Change` is a dataclass with the following properties:\n\n- `pks` is a tuple of the primary key values for the row - this will be a tuple with a single item for normal primary keys, or multiple items for compound primary keys\n- `added_ms` is the timestamp in milliseconds when the row was added\n- `updated_ms` is the timestamp in milliseconds when the row was last updated\n- `version` is the version number for the row - you can use this as a `since` value to get changes since that point\n- `row` is a dictionary containing the current values for the row - these will be `None` if the row has been deleted (except for the primary keys)\n- `deleted` is `0` if the row has not been deleted, or `1` if it has been deleted\n\nAny time you call this you should track the last `version` number that you see, so you can pass it as the `since` value in future calls to get changes that occurred since that point.\n\nNote that if a row had multiple updates in between calls to this function you will still only see one `Change` object for that row - the `updated_ms` and `version` will reflect the most recent update.\n\n## Potential applications\n\nChronicle tables can be used to efficiently answer the question \"what rows have been inserted, updated or deleted since I last checked\" - by looking at the `version` column which has an index to make it fast to answer that question.\n\nThis has numerous potential applications, including:\n\n- Synchronization and replication: other databases can \"subscribe\" to tables, keeping track of when they last refreshed their copy and requesting just rows that changed since the last time - and deleting rows that have been marked as deleted.\n- Indexing: if you need to update an Elasticsearch index or a vector database embeddings index or similar you can run against just the records that changed since your last run - see also [The denormalized query engine design pattern](https://2017.djangocon.us/talks/the-denormalized-query-engine-design-pattern/)\n- Enrichments: [datasette-enrichments](https://github.com/datasette/datasette-enrichments) needs to to persist something that says \"every address column should be geocoded\" - then have an enrichment that runs every X seconds and looks for newly inserted or updated rows and enriches just those.\n- Showing people what has changed since their last visit - \"52 rows have been updated and 16 deleted since yesterday\" kind of thing.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Use triggers to maintain a chronicle table of updated/deleted timestamps in SQLite",
    "version": "0.2.1",
    "project_urls": {
        "CI": "https://github.com/simonw/sqlite-chronicle/actions",
        "Changelog": "https://github.com/simonw/sqlite-chronicle/releases",
        "Homepage": "https://github.com/simonw/sqlite-chronicle",
        "Issues": "https://github.com/simonw/sqlite-chronicle/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b90e58d5ea301c22cec50193444e2b35dfc9b930311a7dcbcaf55885b8af764",
                "md5": "e32f6f65368c5425defd43555b2ec846",
                "sha256": "94253e1c92ac2b9c73fdd6ff0afd3bf7295c90318dd64fa256695db1f348bd62"
            },
            "downloads": -1,
            "filename": "sqlite_chronicle-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e32f6f65368c5425defd43555b2ec846",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5516,
            "upload_time": "2023-12-06T05:21:18",
            "upload_time_iso_8601": "2023-12-06T05:21:18.001618Z",
            "url": "https://files.pythonhosted.org/packages/0b/90/e58d5ea301c22cec50193444e2b35dfc9b930311a7dcbcaf55885b8af764/sqlite_chronicle-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22d4488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47",
                "md5": "87ace732e205ce6eeba5ba5c388edbee",
                "sha256": "2661610ea9c7541932af4039a9b65fa100ea864ce701e8748073afac3afda8a0"
            },
            "downloads": -1,
            "filename": "sqlite-chronicle-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "87ace732e205ce6eeba5ba5c388edbee",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7156,
            "upload_time": "2023-12-06T05:21:20",
            "upload_time_iso_8601": "2023-12-06T05:21:20.266357Z",
            "url": "https://files.pythonhosted.org/packages/22/d4/488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47/sqlite-chronicle-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-06 05:21:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "sqlite-chronicle",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sqlite-chronicle"
}
        
Elapsed time: 0.58961s