pg-database-utils


Namepg-database-utils JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/consbio/pg-database-utils/
SummaryA suite of utilities for PostgreSQL database queries and operations built on sqlalchemy
upload_time2024-04-03 18:33:07
maintainerNone
docs_urlNone
authordharvey-consbio
requires_python<4.0,>=3.9
licenseBSD
keywords postgres postgresql utils utilities pg_database pg_database_utils sqlalchemy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # pg-database-utils

A suite of utilities for PostgreSQL database queries and operations built on sqlalchemy.

This library includes support for:
1. `TSVECTOR`, `JSON` and `JSONB` indexes (for PostgreSQL versions 9.5+)
2. Generated columns (for PostgreSQL versions 12+)
3. Optional Django database configuration for Django projects

It also includes:
1. Helpers to make most common DDL queries more readable
2. Performant functions for querying JSON and TSVECTOR columns
3. Support for `SELECT INTO` queries from existing tables and/or `VALUES` clauses
4. Support for `UPDATE` queries that require application logic


## Installation
Install with:
```bash
pip install pg-database-utils
```

## Configuration

This project is designed to make configuration easy.
If you already have database connections defined in Django, then you can reuse them;
otherwise, you can configure your own without having Django as a dependency.

### To configure with Django

If you want to use the "default" database, **no configuration is required**.

If you want to specify a particular Django database to read settings from:
1. Create a JSON configuration file with the database name:
```python
{
    "django-db-key": "other",               # To override DATABASES["default"]
    "connect-args": {"sslmode": "require"}  # To override DATABASES["<db_name>"]["OPTIONS"]
}
```
2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file

**Note**: "django-db-key" takes precedence over all other database connection settings in the JSON file.
If you specify a Django database, those database connection settings will be used.

### To configure without Django

1. Create a JSON configuration file with at least the required settings (i.e. `database-name`):
```python
{
    "database-name": "required",     # Name of the database to query
    "database-engine": "optional",   # Defaults to postgres
    "database-host": "optional",     # Defaults to 127.0.0.1
    "database-port": "optional",     # Defaults to 5432
    "database-user": "optional",     # Defaults to postgres
    "database-password": "optional"  # For trusted users like postgres
}
```
2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file

### Additional configuration with or without Django

Additional configuration options include:
```python
{
    "connect-args": {"sslmode": "require"},  # Defaults to postgres settings, "prefer" by default
    "date-format": "optional",               # Defaults to "%Y-%m-%d"
    "timestamp-format": "optional",          # Defaults to "%Y-%m-%d %H:%M:%S"
    "pooling-args": {                        # To override sqlalchemy pooling config
        "max_overflow": 0,                   # Defaults to 10 connections beyond pool size
        "pool_recycle": 60,                  # Defaults to no timeout (-1) in seconds
        "pool_size": 20,                     # Defaults to 5 connections
        "pool_timeout": 30                   # Defaults to 30 seconds
     }
}
```

**Note**: "date-format" and "timestamp-format" must be compatible with the formatting configured in PostgreSQL.


## Usage

This library is designed to make common database operations easy and readable,
so most of the utility functions are designed to work with either strings or `sqlalchemy` objects as parameters.

### Schema utilities

* Creating and relating tables
```python
from pg_database import schema

my_table = schema.create_table(
    "my_table",
    dropfirst=True,
    index_cols={"id": "unique"},
    id="int", name="int", addr="text", geom="bytea", deleted="bool"
)
schema.create_index(my_table, "name", index_op="unique")

schema.create_table("other_table", id="int", my_table_id="int", val="text")
schema.create_foreign_key("other_table", "my_table_id", "my_table.id")
```
* Altering tables
```python
from pg_database import schema

schema.alter_column_type("my_table", "name", "text")
schema.create_index("my_table", "name", index_op="to_tsvector")

schema.create_column("my_table", "json_col", "jsonb", checkfirst=True)
schema.create_index("my_table", "json_col", index_op="json_full")

# These steps require the postgis extension
schema.alter_column_type("my_table", "geom", "geometry", using="geom::geometry(Polygon,4326)")
schema.create_index("my_table", "geom", index_op="spatial")
```
* Dropping database objects
```python
from pg_database import schema

all_tables = schema.get_metadata().tables
other_table = all_tables["other_table"]

schema.drop_foreign_key(other_table, "other_table_my_table_id_fkey")
schema.drop_index("my_table", index_name="my_table_json_col_json_full_idx")
schema.drop_table("my_table")
schema.drop_table(other_table)
```

### SQL utilities

* Inserting rows
```python
import json
from datetime import datetime, timedelta
from pg_database import sql

create_date = datetime.now()

sql.select_into(
    "new_table",
    [
        (1, "one", {}, create_date),
        (2, "two", {}, create_date),
        (3, "three", {}, create_date)
    ],
    "id,val,json,created",
    "int,text,jsonb,date"
)
```
* Updating rows
```python
from pg_database import sql

def update_row(row):
    row = list(row)
    pk, val, created, jval = row[0], row[1], row[2], row[3]
    row[1] = f"{pk} {val} first batch"
    row[2] = created + timedelta(days=1)
    row[3] = {"id": pk, "val": val, "batch": "first"}
    return row

sql.update_rows("new_table", "id", "val,created,json", update_row, batch_size=3)
```
* Querying rows
```python
from pg_database import sql, schema

# Reduce database queries by sending a sqlalchemy table
all_tables = schema.get_metadata().tables
new_table = all_tables["new_table"]

schema.create_index(new_table, "json", index_op="json_path")
schema.create_index(new_table, "val", index_op="to_tsvector")

sql.query_json_keys(new_table, "json", {"batch": "first"})
sql.query_tsvector_columns("new_table", "val", "batch first")
```
* Values clause for `INSERT`s or `SELECT INTO`, with custom connection arguments at execution time
```python
from datetime import datetime
from sqlalchemy import column
from sqlalchemy.sql import Insert, Select
from pg_database import sql, schema

# Prepare data, column names, column types and table name

create_date = datetime.now()

values_data = [
    (1, "one", {}, True, create_date),
    (2, "two", {}, False, create_date),
    (3, "three", {}, 0, create_date)
]
values_names = ["id", "val", "json", "boolean", "created"]
values_types = ["int", "text", "jsonb", "bool", "date"]
values_table = "values_table"

# SELECT INTO to create a new table from raw values using sslmode==require

select_vals = sql.Values(values_names, values_types, *values_data)
select_into = sql.SelectInto([column(c) for c in values_names], values_table)
with schema.get_engine(connect_args={"sslmode": "require"}).connect() as conn:
    conn.execute(select_into.select_from(select_vals).execution_options(autocommit=True))

# INSERT INTO to add new records from raw values using custom pooling args

existing_table = schema.get_metadata().tables[values_table]

insert_vals = sql.Values(values_names, values_types, *values_data)
insert_from = Select([column(c) for c in values_names]).select_from(insert_vals)
insert_into = Insert(existing_table).from_select(names=values_names, select=insert_from)
with schema.get_engine(pooling_args={"pool_size": 20, "max_overflow": 0}).connect() as conn:
    conn.execute(insert_into.execution_options(autocommit=True))
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/consbio/pg-database-utils/",
    "name": "pg-database-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "postgres, postgresql, utils, utilities, pg_database, pg_database_utils, sqlalchemy",
    "author": "dharvey-consbio",
    "author_email": "dani.harvey@consbio.org",
    "download_url": "https://files.pythonhosted.org/packages/b0/36/c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae/pg_database_utils-1.0.0.tar.gz",
    "platform": null,
    "description": "# pg-database-utils\n\nA suite of utilities for PostgreSQL database queries and operations built on sqlalchemy.\n\nThis library includes support for:\n1. `TSVECTOR`, `JSON` and `JSONB` indexes (for PostgreSQL versions 9.5+)\n2. Generated columns (for PostgreSQL versions 12+)\n3. Optional Django database configuration for Django projects\n\nIt also includes:\n1. Helpers to make most common DDL queries more readable\n2. Performant functions for querying JSON and TSVECTOR columns\n3. Support for `SELECT INTO` queries from existing tables and/or `VALUES` clauses\n4. Support for `UPDATE` queries that require application logic\n\n\n## Installation\nInstall with:\n```bash\npip install pg-database-utils\n```\n\n## Configuration\n\nThis project is designed to make configuration easy.\nIf you already have database connections defined in Django, then you can reuse them;\notherwise, you can configure your own without having Django as a dependency.\n\n### To configure with Django\n\nIf you want to use the \"default\" database, **no configuration is required**.\n\nIf you want to specify a particular Django database to read settings from:\n1. Create a JSON configuration file with the database name:\n```python\n{\n    \"django-db-key\": \"other\",               # To override DATABASES[\"default\"]\n    \"connect-args\": {\"sslmode\": \"require\"}  # To override DATABASES[\"<db_name>\"][\"OPTIONS\"]\n}\n```\n2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file\n\n**Note**: \"django-db-key\" takes precedence over all other database connection settings in the JSON file.\nIf you specify a Django database, those database connection settings will be used.\n\n### To configure without Django\n\n1. Create a JSON configuration file with at least the required settings (i.e. `database-name`):\n```python\n{\n    \"database-name\": \"required\",     # Name of the database to query\n    \"database-engine\": \"optional\",   # Defaults to postgres\n    \"database-host\": \"optional\",     # Defaults to 127.0.0.1\n    \"database-port\": \"optional\",     # Defaults to 5432\n    \"database-user\": \"optional\",     # Defaults to postgres\n    \"database-password\": \"optional\"  # For trusted users like postgres\n}\n```\n2. Set the `DATABASE_CONFIG_JSON` environment variable to point to the location of the file\n\n### Additional configuration with or without Django\n\nAdditional configuration options include:\n```python\n{\n    \"connect-args\": {\"sslmode\": \"require\"},  # Defaults to postgres settings, \"prefer\" by default\n    \"date-format\": \"optional\",               # Defaults to \"%Y-%m-%d\"\n    \"timestamp-format\": \"optional\",          # Defaults to \"%Y-%m-%d %H:%M:%S\"\n    \"pooling-args\": {                        # To override sqlalchemy pooling config\n        \"max_overflow\": 0,                   # Defaults to 10 connections beyond pool size\n        \"pool_recycle\": 60,                  # Defaults to no timeout (-1) in seconds\n        \"pool_size\": 20,                     # Defaults to 5 connections\n        \"pool_timeout\": 30                   # Defaults to 30 seconds\n     }\n}\n```\n\n**Note**: \"date-format\" and \"timestamp-format\" must be compatible with the formatting configured in PostgreSQL.\n\n\n## Usage\n\nThis library is designed to make common database operations easy and readable,\nso most of the utility functions are designed to work with either strings or `sqlalchemy` objects as parameters.\n\n### Schema utilities\n\n* Creating and relating tables\n```python\nfrom pg_database import schema\n\nmy_table = schema.create_table(\n    \"my_table\",\n    dropfirst=True,\n    index_cols={\"id\": \"unique\"},\n    id=\"int\", name=\"int\", addr=\"text\", geom=\"bytea\", deleted=\"bool\"\n)\nschema.create_index(my_table, \"name\", index_op=\"unique\")\n\nschema.create_table(\"other_table\", id=\"int\", my_table_id=\"int\", val=\"text\")\nschema.create_foreign_key(\"other_table\", \"my_table_id\", \"my_table.id\")\n```\n* Altering tables\n```python\nfrom pg_database import schema\n\nschema.alter_column_type(\"my_table\", \"name\", \"text\")\nschema.create_index(\"my_table\", \"name\", index_op=\"to_tsvector\")\n\nschema.create_column(\"my_table\", \"json_col\", \"jsonb\", checkfirst=True)\nschema.create_index(\"my_table\", \"json_col\", index_op=\"json_full\")\n\n# These steps require the postgis extension\nschema.alter_column_type(\"my_table\", \"geom\", \"geometry\", using=\"geom::geometry(Polygon,4326)\")\nschema.create_index(\"my_table\", \"geom\", index_op=\"spatial\")\n```\n* Dropping database objects\n```python\nfrom pg_database import schema\n\nall_tables = schema.get_metadata().tables\nother_table = all_tables[\"other_table\"]\n\nschema.drop_foreign_key(other_table, \"other_table_my_table_id_fkey\")\nschema.drop_index(\"my_table\", index_name=\"my_table_json_col_json_full_idx\")\nschema.drop_table(\"my_table\")\nschema.drop_table(other_table)\n```\n\n### SQL utilities\n\n* Inserting rows\n```python\nimport json\nfrom datetime import datetime, timedelta\nfrom pg_database import sql\n\ncreate_date = datetime.now()\n\nsql.select_into(\n    \"new_table\",\n    [\n        (1, \"one\", {}, create_date),\n        (2, \"two\", {}, create_date),\n        (3, \"three\", {}, create_date)\n    ],\n    \"id,val,json,created\",\n    \"int,text,jsonb,date\"\n)\n```\n* Updating rows\n```python\nfrom pg_database import sql\n\ndef update_row(row):\n    row = list(row)\n    pk, val, created, jval = row[0], row[1], row[2], row[3]\n    row[1] = f\"{pk} {val} first batch\"\n    row[2] = created + timedelta(days=1)\n    row[3] = {\"id\": pk, \"val\": val, \"batch\": \"first\"}\n    return row\n\nsql.update_rows(\"new_table\", \"id\", \"val,created,json\", update_row, batch_size=3)\n```\n* Querying rows\n```python\nfrom pg_database import sql, schema\n\n# Reduce database queries by sending a sqlalchemy table\nall_tables = schema.get_metadata().tables\nnew_table = all_tables[\"new_table\"]\n\nschema.create_index(new_table, \"json\", index_op=\"json_path\")\nschema.create_index(new_table, \"val\", index_op=\"to_tsvector\")\n\nsql.query_json_keys(new_table, \"json\", {\"batch\": \"first\"})\nsql.query_tsvector_columns(\"new_table\", \"val\", \"batch first\")\n```\n* Values clause for `INSERT`s or `SELECT INTO`, with custom connection arguments at execution time\n```python\nfrom datetime import datetime\nfrom sqlalchemy import column\nfrom sqlalchemy.sql import Insert, Select\nfrom pg_database import sql, schema\n\n# Prepare data, column names, column types and table name\n\ncreate_date = datetime.now()\n\nvalues_data = [\n    (1, \"one\", {}, True, create_date),\n    (2, \"two\", {}, False, create_date),\n    (3, \"three\", {}, 0, create_date)\n]\nvalues_names = [\"id\", \"val\", \"json\", \"boolean\", \"created\"]\nvalues_types = [\"int\", \"text\", \"jsonb\", \"bool\", \"date\"]\nvalues_table = \"values_table\"\n\n# SELECT INTO to create a new table from raw values using sslmode==require\n\nselect_vals = sql.Values(values_names, values_types, *values_data)\nselect_into = sql.SelectInto([column(c) for c in values_names], values_table)\nwith schema.get_engine(connect_args={\"sslmode\": \"require\"}).connect() as conn:\n    conn.execute(select_into.select_from(select_vals).execution_options(autocommit=True))\n\n# INSERT INTO to add new records from raw values using custom pooling args\n\nexisting_table = schema.get_metadata().tables[values_table]\n\ninsert_vals = sql.Values(values_names, values_types, *values_data)\ninsert_from = Select([column(c) for c in values_names]).select_from(insert_vals)\ninsert_into = Insert(existing_table).from_select(names=values_names, select=insert_from)\nwith schema.get_engine(pooling_args={\"pool_size\": 20, \"max_overflow\": 0}).connect() as conn:\n    conn.execute(insert_into.execution_options(autocommit=True))\n```\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A suite of utilities for PostgreSQL database queries and operations built on sqlalchemy",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/consbio/pg-database-utils/",
        "Repository": "https://github.com/consbio/pg-database-utils/"
    },
    "split_keywords": [
        "postgres",
        " postgresql",
        " utils",
        " utilities",
        " pg_database",
        " pg_database_utils",
        " sqlalchemy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b552792cfa2628e6cbb424a5d66bc251d8619629ee5da8da0b7e6b856e6ed45d",
                "md5": "2074aa667c2da31387b4d21b30726ecb",
                "sha256": "3dac7073c0ebbeae6844bba242b43944c6062e84430b8f9d75dedfe458bd8132"
            },
            "downloads": -1,
            "filename": "pg_database_utils-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2074aa667c2da31387b4d21b30726ecb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 39758,
            "upload_time": "2024-04-03T18:33:05",
            "upload_time_iso_8601": "2024-04-03T18:33:05.235592Z",
            "url": "https://files.pythonhosted.org/packages/b5/52/792cfa2628e6cbb424a5d66bc251d8619629ee5da8da0b7e6b856e6ed45d/pg_database_utils-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b036c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae",
                "md5": "be8008caed81cbe239103f2914993784",
                "sha256": "a0bdc2e341aa16e6410b49a39985b96df59edef97be135994464a3a72f0b0264"
            },
            "downloads": -1,
            "filename": "pg_database_utils-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "be8008caed81cbe239103f2914993784",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 36648,
            "upload_time": "2024-04-03T18:33:07",
            "upload_time_iso_8601": "2024-04-03T18:33:07.232464Z",
            "url": "https://files.pythonhosted.org/packages/b0/36/c8eb7b94a26876a55a69390a9693c63712549902ffc7d3e8a62bb90122ae/pg_database_utils-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-03 18:33:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "consbio",
    "github_project": "pg-database-utils",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "pg-database-utils"
}
        
Elapsed time: 0.24266s