Name | sqlite-chronicle JSON |
Version |
0.2.1
JSON |
| download |
home_page | |
Summary | Use triggers to maintain a chronicle table of updated/deleted timestamps in SQLite |
upload_time | 2023-12-06 05:21:20 |
maintainer | |
docs_url | None |
author | Simon Willison |
requires_python | |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# sqlite-chronicle
[![PyPI](https://img.shields.io/pypi/v/sqlite-chronicle.svg)](https://pypi.org/project/sqlite-chronicle/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-chronicle?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-chronicle/releases)
[![Tests](https://github.com/simonw/sqlite-chronicle/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-chronicle/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-chronicle/blob/main/LICENSE)
Use triggers to track when rows in a SQLite table were updated or deleted
## Installation
```bash
pip install sqlite-chronicle
```
## enable_chronicle(conn, table_name)
This module provides a function: `sqlite_chronicle.enable_chronicle(conn, table_name)`, which does the following:
1. Checks if a `_chronicle_{table_name}` table exists already. If so, it does nothing. Otherwise...
2. Creates that table, with the same primary key columns as the original table plus integer columns `added_ms`, `updated_ms`, `version` and `deleted`
3. Creates a new row in the chronicle table corresponding to every row in the original table, setting `added_ms` and `updated_ms` to the current timestamp in milliseconds, and `version` column that starts at 1 and increments for each subsequent row
4. Sets up three triggers on the table:
- An after insert trigger, which creates a new row in the chronicle table, sets `added_ms` and `updated_ms` to the current time and increments the `version`
- An after update trigger, which updates the `updated_ms` timestamp and also updates any primary keys if they have changed (likely extremely rare) plus increments the `version`
- An after delete trigger, which updates the `updated_ms`, increments the `version` and places a `1` in the `deleted` column
The function will raise a `sqlite_chronicle.ChronicleError` exception if the table does not have a single or compound primary key.
Note that the `version` for a table is a globally incrementing number, so every time it is set it will be set to the current `max(version)` + 1 for that entire table.
The end result is a chronicle table that looks something like this:
| id | added_ms | updated_ms | version | deleted |
|-----|---------------|---------|--------|---------|
| 47 | 1694408890954 | 1694408890954 | 2 | 0 |
| 48 | 1694408874863 | 1694408874863 | 3 | 1 |
| 1 | 1694408825192 | 1694408825192 | 4 | 0 |
| 2 | 1694408825192 | 1694408825192 | 5 | 0 |
| 3 | 1694408825192 | 1694408825192 | 6 | 0 |
## updates_since(conn, table_name, since=None, batch_size=1000)
The `sqlite_chronicle.updates_since()` function returns a generator over a list of `Change` objects.
These objects represent changes that have occurred to rows in the table since the `since` version number, or since the beginning of time if `since` is not provided.
- `conn` is a SQLite connection object
- `table_name` is a string containing the name of the table to get changes for
- `since` is an optional integer version number - if not provided, all changes will be returned
- `batch_size` is an internal detail, controlling the number of rows that are returned from the database at a time. You should not need to change this as the function implements its own internal pagination.
Each `Change` returned from the generator looks something like this:
```python
Change(
pks=(5,),
added_ms=1701836971223,
updated_ms=1701836971223,
version=5,
row={'id': 5, 'name': 'Simon'},
deleted=0
)
```
A `Change` is a dataclass with the following properties:
- `pks` is a tuple of the primary key values for the row - this will be a tuple with a single item for normal primary keys, or multiple items for compound primary keys
- `added_ms` is the timestamp in milliseconds when the row was added
- `updated_ms` is the timestamp in milliseconds when the row was last updated
- `version` is the version number for the row - you can use this as a `since` value to get changes since that point
- `row` is a dictionary containing the current values for the row - these will be `None` if the row has been deleted (except for the primary keys)
- `deleted` is `0` if the row has not been deleted, or `1` if it has been deleted
Any time you call this you should track the last `version` number that you see, so you can pass it as the `since` value in future calls to get changes that occurred since that point.
Note that if a row had multiple updates in between calls to this function you will still only see one `Change` object for that row - the `updated_ms` and `version` will reflect the most recent update.
## Potential applications
Chronicle tables can be used to efficiently answer the question "what rows have been inserted, updated or deleted since I last checked" - by looking at the `version` column which has an index to make it fast to answer that question.
This has numerous potential applications, including:
- Synchronization and replication: other databases can "subscribe" to tables, keeping track of when they last refreshed their copy and requesting just rows that changed since the last time - and deleting rows that have been marked as deleted.
- Indexing: if you need to update an Elasticsearch index or a vector database embeddings index or similar you can run against just the records that changed since your last run - see also [The denormalized query engine design pattern](https://2017.djangocon.us/talks/the-denormalized-query-engine-design-pattern/)
- Enrichments: [datasette-enrichments](https://github.com/datasette/datasette-enrichments) needs to to persist something that says "every address column should be geocoded" - then have an enrichment that runs every X seconds and looks for newly inserted or updated rows and enriches just those.
- Showing people what has changed since their last visit - "52 rows have been updated and 16 deleted since yesterday" kind of thing.
Raw data
{
"_id": null,
"home_page": "",
"name": "sqlite-chronicle",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Simon Willison",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/22/d4/488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47/sqlite-chronicle-0.2.1.tar.gz",
"platform": null,
"description": "# sqlite-chronicle\n\n[![PyPI](https://img.shields.io/pypi/v/sqlite-chronicle.svg)](https://pypi.org/project/sqlite-chronicle/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-chronicle?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-chronicle/releases)\n[![Tests](https://github.com/simonw/sqlite-chronicle/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-chronicle/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-chronicle/blob/main/LICENSE)\n\nUse triggers to track when rows in a SQLite table were updated or deleted\n\n## Installation\n\n```bash\npip install sqlite-chronicle\n```\n\n## enable_chronicle(conn, table_name)\n\nThis module provides a function: `sqlite_chronicle.enable_chronicle(conn, table_name)`, which does the following:\n\n1. Checks if a `_chronicle_{table_name}` table exists already. If so, it does nothing. Otherwise...\n2. Creates that table, with the same primary key columns as the original table plus integer columns `added_ms`, `updated_ms`, `version` and `deleted`\n3. Creates a new row in the chronicle table corresponding to every row in the original table, setting `added_ms` and `updated_ms` to the current timestamp in milliseconds, and `version` column that starts at 1 and increments for each subsequent row\n4. Sets up three triggers on the table:\n - An after insert trigger, which creates a new row in the chronicle table, sets `added_ms` and `updated_ms` to the current time and increments the `version`\n - An after update trigger, which updates the `updated_ms` timestamp and also updates any primary keys if they have changed (likely extremely rare) plus increments the `version`\n - An after delete trigger, which updates the `updated_ms`, increments the `version` and places a `1` in the `deleted` column\n\nThe function will raise a `sqlite_chronicle.ChronicleError` exception if the table does not have a single or compound primary key.\n\nNote that the `version` for a table is a globally incrementing number, so every time it is set it will be set to the current `max(version)` + 1 for that entire table.\n\nThe end result is a chronicle table that looks something like this:\n\n| id | added_ms | updated_ms | version | deleted |\n|-----|---------------|---------|--------|---------|\n| 47 | 1694408890954 | 1694408890954 | 2 | 0 |\n| 48 | 1694408874863 | 1694408874863 | 3 | 1 |\n| 1 | 1694408825192 | 1694408825192 | 4 | 0 |\n| 2 | 1694408825192 | 1694408825192 | 5 | 0 |\n| 3 | 1694408825192 | 1694408825192 | 6 | 0 |\n\n## updates_since(conn, table_name, since=None, batch_size=1000)\n\nThe `sqlite_chronicle.updates_since()` function returns a generator over a list of `Change` objects.\n\nThese objects represent changes that have occurred to rows in the table since the `since` version number, or since the beginning of time if `since` is not provided.\n\n- `conn` is a SQLite connection object\n- `table_name` is a string containing the name of the table to get changes for\n- `since` is an optional integer version number - if not provided, all changes will be returned\n- `batch_size` is an internal detail, controlling the number of rows that are returned from the database at a time. You should not need to change this as the function implements its own internal pagination.\n\nEach `Change` returned from the generator looks something like this:\n\n```python\nChange(\n pks=(5,),\n added_ms=1701836971223,\n updated_ms=1701836971223,\n version=5,\n row={'id': 5, 'name': 'Simon'},\n deleted=0\n)\n```\nA `Change` is a dataclass with the following properties:\n\n- `pks` is a tuple of the primary key values for the row - this will be a tuple with a single item for normal primary keys, or multiple items for compound primary keys\n- `added_ms` is the timestamp in milliseconds when the row was added\n- `updated_ms` is the timestamp in milliseconds when the row was last updated\n- `version` is the version number for the row - you can use this as a `since` value to get changes since that point\n- `row` is a dictionary containing the current values for the row - these will be `None` if the row has been deleted (except for the primary keys)\n- `deleted` is `0` if the row has not been deleted, or `1` if it has been deleted\n\nAny time you call this you should track the last `version` number that you see, so you can pass it as the `since` value in future calls to get changes that occurred since that point.\n\nNote that if a row had multiple updates in between calls to this function you will still only see one `Change` object for that row - the `updated_ms` and `version` will reflect the most recent update.\n\n## Potential applications\n\nChronicle tables can be used to efficiently answer the question \"what rows have been inserted, updated or deleted since I last checked\" - by looking at the `version` column which has an index to make it fast to answer that question.\n\nThis has numerous potential applications, including:\n\n- Synchronization and replication: other databases can \"subscribe\" to tables, keeping track of when they last refreshed their copy and requesting just rows that changed since the last time - and deleting rows that have been marked as deleted.\n- Indexing: if you need to update an Elasticsearch index or a vector database embeddings index or similar you can run against just the records that changed since your last run - see also [The denormalized query engine design pattern](https://2017.djangocon.us/talks/the-denormalized-query-engine-design-pattern/)\n- Enrichments: [datasette-enrichments](https://github.com/datasette/datasette-enrichments) needs to to persist something that says \"every address column should be geocoded\" - then have an enrichment that runs every X seconds and looks for newly inserted or updated rows and enriches just those.\n- Showing people what has changed since their last visit - \"52 rows have been updated and 16 deleted since yesterday\" kind of thing.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Use triggers to maintain a chronicle table of updated/deleted timestamps in SQLite",
"version": "0.2.1",
"project_urls": {
"CI": "https://github.com/simonw/sqlite-chronicle/actions",
"Changelog": "https://github.com/simonw/sqlite-chronicle/releases",
"Homepage": "https://github.com/simonw/sqlite-chronicle",
"Issues": "https://github.com/simonw/sqlite-chronicle/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b90e58d5ea301c22cec50193444e2b35dfc9b930311a7dcbcaf55885b8af764",
"md5": "e32f6f65368c5425defd43555b2ec846",
"sha256": "94253e1c92ac2b9c73fdd6ff0afd3bf7295c90318dd64fa256695db1f348bd62"
},
"downloads": -1,
"filename": "sqlite_chronicle-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e32f6f65368c5425defd43555b2ec846",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5516,
"upload_time": "2023-12-06T05:21:18",
"upload_time_iso_8601": "2023-12-06T05:21:18.001618Z",
"url": "https://files.pythonhosted.org/packages/0b/90/e58d5ea301c22cec50193444e2b35dfc9b930311a7dcbcaf55885b8af764/sqlite_chronicle-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "22d4488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47",
"md5": "87ace732e205ce6eeba5ba5c388edbee",
"sha256": "2661610ea9c7541932af4039a9b65fa100ea864ce701e8748073afac3afda8a0"
},
"downloads": -1,
"filename": "sqlite-chronicle-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "87ace732e205ce6eeba5ba5c388edbee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7156,
"upload_time": "2023-12-06T05:21:20",
"upload_time_iso_8601": "2023-12-06T05:21:20.266357Z",
"url": "https://files.pythonhosted.org/packages/22/d4/488f162900b8e74cb35233de13f09ea721e648fb730084b4f014426c7c47/sqlite-chronicle-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-06 05:21:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "sqlite-chronicle",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sqlite-chronicle"
}