# pg-database-utils
A suite of utilities for PostgreSQL database queries and operations built on sqlalchemy.
This library includes support for:
1. `TSVECTOR`, `JSON` and `JSONB` indexes (for PostgreSQL versions 9.5+)
2. Generated columns (for PostgreSQL versions 12+)
3. Optional Django database configuration for Django projects
It also includes:
1. Helpers to make most common DDL queries more readable
2. Performant functions for querying JSON and TSVECTOR columns
3. Support for `SELECT INTO` queries from existing tables and/or `VALUES` clauses
4. Support for `UPDATE` queries that require application logic
## Installation
Install with:
```bash
pip install pg-database-utils
```
## Configuration
This project is designed to make configuration easy.
If you already have database connections defined in Django, then you can reuse them;
otherwise, you can configure your own without having Django as a dependency.
### To configure with Django
If you want to use the "default" database, **no configuration is required**.
If you want to specify a particular Django database to read settings from:
1. Create a JSON configuration file with the database name:
```python
{
"django-db-key": "other", # To override DATABASES["default"]
"connect-args": {"sslmode": "require"} # To override DATABASES["<db_name>"]["OPTIONS"]
}
```
2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file
**Note**: "django-db-key" takes precedence over all other database connection settings in the JSON file.
If you specify a Django database, those database connection settings will be used.
### To configure without Django
1. Create a JSON configuration file with at least the required settings (i.e. `database-name`):
```python
{
"database-name": "required", # Name of the database to query
"database-engine": "optional", # Defaults to postgres
"database-host": "optional", # Defaults to 127.0.0.1
"database-port": "optional", # Defaults to 5432
"database-user": "optional", # Defaults to postgres
"database-password": "optional" # For trusted users like postgres
}
```
2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file
### Additional configuration with or without Django
Additional configuration options include:
```python
{
"connect-args": {"sslmode": "require"}, # Defaults to postgres settings, "prefer" by default
"date-format": "optional", # Defaults to "%Y-%m-%d"
"timestamp-format": "optional", # Defaults to "%Y-%m-%d %H:%M:%S"
"pooling-args": { # To override sqlalchemy pooling config
"max_overflow": 0, # Defaults to 10 connections beyond pool size
"pool_recycle": 60, # Defaults to no timeout (-1) in seconds
"pool_size": 20, # Defaults to 5 connections
"pool_timeout": 30 # Defaults to 30 seconds
}
}
```
**Note**: "date-format" and "timestamp-format" must be compatible with the formatting configured in PostgreSQL.
## Usage
This library is designed to make common database operations easy and readable,
so most of the utility functions are designed to work with either strings or `sqlalchemy` objects as parameters.
### Schema utilities
* Creating and relating tables
```python
from pg_database import schema
my_table = schema.create_table(
"my_table",
dropfirst=True,
index_cols={"id": "unique"},
id="int", name="int", addr="text", geom="bytea", deleted="bool"
)
schema.create_index(my_table, "name", index_op="unique")
schema.create_table("other_table", id="int", my_table_id="int", val="text")
schema.create_foreign_key("other_table", "my_table_id", "my_table.id")
```
* Altering tables
```python
from pg_database import schema
schema.alter_column_type("my_table", "name", "text")
schema.create_index("my_table", "name", index_op="to_tsvector")
schema.create_column("my_table", "json_col", "jsonb", checkfirst=True)
schema.create_index("my_table", "json_col", index_op="json_full")
# These steps require the postgis extension
schema.alter_column_type("my_table", "geom", "geometry", using="geom::geometry(Polygon,4326)")
schema.create_index("my_table", "geom", index_op="spatial")
```
* Dropping database objects
```python
from pg_database import schema
all_tables = schema.get_metadata().tables
other_table = all_tables["other_table"]
schema.drop_foreign_key(other_table, "other_table_my_table_id_fkey")
schema.drop_index("my_table", index_name="my_table_json_col_json_full_idx")
schema.drop_table("my_table")
schema.drop_table(other_table)
```
### SQL utilities
* Inserting rows
```python
import json
from datetime import datetime, timedelta
from pg_database import sql
create_date = datetime.now()
sql.select_into(
"new_table",
[
(1, "one", {}, create_date),
(2, "two", {}, create_date),
(3, "three", {}, create_date)
],
"id,val,json,created",
"int,text,jsonb,date"
)
```
* Updating rows
```python
from pg_database import sql
def update_row(row):
row = list(row)
pk, val, created, jval = row[0], row[1], row[2], row[3]
row[1] = f"{pk} {val} first batch"
row[2] = created + timedelta(days=1)
row[3] = {"id": pk, "val": val, "batch": "first"}
return row
sql.update_rows("new_table", "id", "val,created,json", update_row, batch_size=3)
```
* Querying rows
```python
from pg_database import sql, schema
# Reduce database queries by sending a sqlalchemy table
all_tables = schema.get_metadata().tables
new_table = all_tables["new_table"]
schema.create_index(new_table, "json", index_op="json_path")
schema.create_index(new_table, "val", index_op="to_tsvector")
sql.query_json_keys(new_table, "json", {"batch": "first"})
sql.query_tsvector_columns("new_table", "val", "batch first")
```
* Values clause for `INSERT`s or `SELECT INTO`, with custom connection arguments at execution time
```python
from datetime import datetime
from sqlalchemy import column
from sqlalchemy.sql import Insert, Select
from pg_database import sql, schema
# Prepare data, column names, column types and table name
create_date = datetime.now()
values_data = [
(1, "one", {}, True, create_date),
(2, "two", {}, False, create_date),
(3, "three", {}, 0, create_date)
]
values_names = ["id", "val", "json", "boolean", "created"]
values_types = ["int", "text", "jsonb", "bool", "date"]
values_table = "values_table"
# SELECT INTO to create a new table from raw values using sslmode==require
select_vals = sql.Values(values_names, values_types, *values_data)
select_into = sql.SelectInto([column(c) for c in values_names], values_table)
with schema.get_engine(connect_args={"sslmode": "require"}).connect() as conn:
conn.execute(select_into.select_from(select_vals).execution_options(autocommit=True))
# INSERT INTO to add new records from raw values using custom pooling args
existing_table = schema.get_metadata().tables[values_table]
insert_vals = sql.Values(values_names, values_types, *values_data)
insert_from = Select([column(c) for c in values_names]).select_from(insert_vals)
insert_into = Insert(existing_table).from_select(names=values_names, select=insert_from)
with schema.get_engine(pooling_args={"pool_size": 20, "max_overflow": 0}).connect() as conn:
conn.execute(insert_into.execution_options(autocommit=True))
```
Raw data
{
"_id": null,
"home_page": "https://github.com/consbio/pg-database-utils/",
"name": "pg-database-utils",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "postgres, postgresql, utils, utilities, pg_database, pg_database_utils, sqlalchemy",
"author": "dharvey-consbio",
"author_email": "dani.harvey@consbio.org",
"download_url": "https://files.pythonhosted.org/packages/b0/36/c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae/pg_database_utils-1.0.0.tar.gz",
"platform": null,
"description": "# pg-database-utils\n\nA suite of utilities for PostgreSQL database queries and operations built on sqlalchemy.\n\nThis library includes support for:\n1. `TSVECTOR`, `JSON` and `JSONB` indexes (for PostgreSQL versions 9.5+)\n2. Generated columns (for PostgreSQL versions 12+)\n3. Optional Django database configuration for Django projects\n\nIt also includes:\n1. Helpers to make most common DDL queries more readable\n2. Performant functions for querying JSON and TSVECTOR columns\n3. Support for `SELECT INTO` queries from existing tables and/or `VALUES` clauses\n4. Support for `UPDATE` queries that require application logic\n\n\n## Installation\nInstall with:\n```bash\npip install pg-database-utils\n```\n\n## Configuration\n\nThis project is designed to make configuration easy.\nIf you already have database connections defined in Django, then you can reuse them;\notherwise, you can configure your own without having Django as a dependency.\n\n### To configure with Django\n\nIf you want to use the \"default\" database, **no configuration is required**.\n\nIf you want to specify a particular Django database to read settings from:\n1. Create a JSON configuration file with the database name:\n```python\n{\n \"django-db-key\": \"other\", # To override DATABASES[\"default\"]\n \"connect-args\": {\"sslmode\": \"require\"} # To override DATABASES[\"<db_name>\"][\"OPTIONS\"]\n}\n```\n2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file\n\n**Note**: \"django-db-key\" takes precedence over all other database connection settings in the JSON file.\nIf you specify a Django database, those database connection settings will be used.\n\n### To configure without Django\n\n1. Create a JSON configuration file with at least the required settings (i.e. `database-name`):\n```python\n{\n \"database-name\": \"required\", # Name of the database to query\n \"database-engine\": \"optional\", # Defaults to postgres\n \"database-host\": \"optional\", # Defaults to 127.0.0.1\n \"database-port\": \"optional\", # Defaults to 5432\n \"database-user\": \"optional\", # Defaults to postgres\n \"database-password\": \"optional\" # For trusted users like postgres\n}\n```\n2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file\n\n### Additional configuration with or without Django\n\nAdditional configuration options include:\n```python\n{\n \"connect-args\": {\"sslmode\": \"require\"}, # Defaults to postgres settings, \"prefer\" by default\n \"date-format\": \"optional\", # Defaults to \"%Y-%m-%d\"\n \"timestamp-format\": \"optional\", # Defaults to \"%Y-%m-%d %H:%M:%S\"\n \"pooling-args\": { # To override sqlalchemy pooling config\n \"max_overflow\": 0, # Defaults to 10 connections beyond pool size\n \"pool_recycle\": 60, # Defaults to no timeout (-1) in seconds\n \"pool_size\": 20, # Defaults to 5 connections\n \"pool_timeout\": 30 # Defaults to 30 seconds\n }\n}\n```\n\n**Note**: \"date-format\" and \"timestamp-format\" must be compatible with the formatting configured in PostgreSQL.\n\n\n## Usage\n\nThis library is designed to make common database operations easy and readable,\nso most of the utility functions are designed to work with either strings or `sqlalchemy` objects as parameters.\n\n### Schema utilities\n\n* Creating and relating tables\n```python\nfrom pg_database import schema\n\nmy_table = schema.create_table(\n \"my_table\",\n dropfirst=True,\n index_cols={\"id\": \"unique\"},\n id=\"int\", name=\"int\", addr=\"text\", geom=\"bytea\", deleted=\"bool\"\n)\nschema.create_index(my_table, \"name\", index_op=\"unique\")\n\nschema.create_table(\"other_table\", id=\"int\", my_table_id=\"int\", val=\"text\")\nschema.create_foreign_key(\"other_table\", \"my_table_id\", \"my_table.id\")\n```\n* Altering tables\n```python\nfrom pg_database import schema\n\nschema.alter_column_type(\"my_table\", \"name\", \"text\")\nschema.create_index(\"my_table\", \"name\", index_op=\"to_tsvector\")\n\nschema.create_column(\"my_table\", \"json_col\", \"jsonb\", checkfirst=True)\nschema.create_index(\"my_table\", \"json_col\", index_op=\"json_full\")\n\n# These steps require the postgis extension\nschema.alter_column_type(\"my_table\", \"geom\", \"geometry\", using=\"geom::geometry(Polygon,4326)\")\nschema.create_index(\"my_table\", \"geom\", index_op=\"spatial\")\n```\n* Dropping database objects\n```python\nfrom pg_database import schema\n\nall_tables = schema.get_metadata().tables\nother_table = all_tables[\"other_table\"]\n\nschema.drop_foreign_key(other_table, \"other_table_my_table_id_fkey\")\nschema.drop_index(\"my_table\", index_name=\"my_table_json_col_json_full_idx\")\nschema.drop_table(\"my_table\")\nschema.drop_table(other_table)\n```\n\n### SQL utilities\n\n* Inserting rows\n```python\nimport json\nfrom datetime import datetime, timedelta\nfrom pg_database import sql\n\ncreate_date = datetime.now()\n\nsql.select_into(\n \"new_table\",\n [\n (1, \"one\", {}, create_date),\n (2, \"two\", {}, create_date),\n (3, \"three\", {}, create_date)\n ],\n \"id,val,json,created\",\n \"int,text,jsonb,date\"\n)\n```\n* Updating rows\n```python\nfrom pg_database import sql\n\ndef update_row(row):\n row = list(row)\n pk, val, created, jval = row[0], row[1], row[2], row[3]\n row[1] = f\"{pk} {val} first batch\"\n row[2] = created + timedelta(days=1)\n row[3] = {\"id\": pk, \"val\": val, \"batch\": \"first\"}\n return row\n\nsql.update_rows(\"new_table\", \"id\", \"val,created,json\", update_row, batch_size=3)\n```\n* Querying rows\n```python\nfrom pg_database import sql, schema\n\n# Reduce database queries by sending a sqlalchemy table\nall_tables = schema.get_metadata().tables\nnew_table = all_tables[\"new_table\"]\n\nschema.create_index(new_table, \"json\", index_op=\"json_path\")\nschema.create_index(new_table, \"val\", index_op=\"to_tsvector\")\n\nsql.query_json_keys(new_table, \"json\", {\"batch\": \"first\"})\nsql.query_tsvector_columns(\"new_table\", \"val\", \"batch first\")\n```\n* Values clause for `INSERT`s or `SELECT INTO`, with custom connection arguments at execution time\n```python\nfrom datetime import datetime\nfrom sqlalchemy import column\nfrom sqlalchemy.sql import Insert, Select\nfrom pg_database import sql, schema\n\n# Prepare data, column names, column types and table name\n\ncreate_date = datetime.now()\n\nvalues_data = [\n (1, \"one\", {}, True, create_date),\n (2, \"two\", {}, False, create_date),\n (3, \"three\", {}, 0, create_date)\n]\nvalues_names = [\"id\", \"val\", \"json\", \"boolean\", \"created\"]\nvalues_types = [\"int\", \"text\", \"jsonb\", \"bool\", \"date\"]\nvalues_table = \"values_table\"\n\n# SELECT INTO to create a new table from raw values using sslmode==require\n\nselect_vals = sql.Values(values_names, values_types, *values_data)\nselect_into = sql.SelectInto([column(c) for c in values_names], values_table)\nwith schema.get_engine(connect_args={\"sslmode\": \"require\"}).connect() as conn:\n conn.execute(select_into.select_from(select_vals).execution_options(autocommit=True))\n\n# INSERT INTO to add new records from raw values using custom pooling args\n\nexisting_table = schema.get_metadata().tables[values_table]\n\ninsert_vals = sql.Values(values_names, values_types, *values_data)\ninsert_from = Select([column(c) for c in values_names]).select_from(insert_vals)\ninsert_into = Insert(existing_table).from_select(names=values_names, select=insert_from)\nwith schema.get_engine(pooling_args={\"pool_size\": 20, \"max_overflow\": 0}).connect() as conn:\n conn.execute(insert_into.execution_options(autocommit=True))\n```\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "A suite of utilities for PostgreSQL database queries and operations built on sqlalchemy",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/consbio/pg-database-utils/",
"Repository": "https://github.com/consbio/pg-database-utils/"
},
"split_keywords": [
"postgres",
" postgresql",
" utils",
" utilities",
" pg_database",
" pg_database_utils",
" sqlalchemy"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b552792cfa2628e6cbb424a5d66bc251d8619629ee5da8da0b7e6b856e6ed45d",
"md5": "2074aa667c2da31387b4d21b30726ecb",
"sha256": "3dac7073c0ebbeae6844bba242b43944c6062e84430b8f9d75dedfe458bd8132"
},
"downloads": -1,
"filename": "pg_database_utils-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2074aa667c2da31387b4d21b30726ecb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 39758,
"upload_time": "2024-04-03T18:33:05",
"upload_time_iso_8601": "2024-04-03T18:33:05.235592Z",
"url": "https://files.pythonhosted.org/packages/b5/52/792cfa2628e6cbb424a5d66bc251d8619629ee5da8da0b7e6b856e6ed45d/pg_database_utils-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b036c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae",
"md5": "be8008caed81cbe239103f2914993784",
"sha256": "a0bdc2e341aa16e6410b49a39985b96df59edef97be135994464a3a72f0b0264"
},
"downloads": -1,
"filename": "pg_database_utils-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "be8008caed81cbe239103f2914993784",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 36648,
"upload_time": "2024-04-03T18:33:07",
"upload_time_iso_8601": "2024-04-03T18:33:07.232464Z",
"url": "https://files.pythonhosted.org/packages/b0/36/c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae/pg_database_utils-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-03 18:33:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "consbio",
"github_project": "pg-database-utils",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "pg-database-utils"
}