# dbcooper-py
[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)
The dbcooper package turns a database connection into a collection of functions,
handling logic for keeping track of connections and letting you take advantage of
autocompletion when exploring a database.
It's especially helpful to use when authoring database-specific Python packages,
for instance in an internal company package or one wrapping a public data source.
For the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).
## Installation
```
pip install dbcooper
```
## Example
### Initializing the functions
The dbcooper package asks you to create the connection first.
As an example, we'll use the Lahman baseball database package (`lahman`).
```python
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
# connect to sqlite
engine = create_engine("sqlite://")
# load the lahman data into the "lahman" schema
lahman_sqlite(engine)
```
Next we'll set up dbcooper
```python
from dbcooper import DbCooper
dbc = DbCooper(engine)
```
The `DbCooper` object contains two important things:
* Accessors to fetch specific tables.
* Functions for interacting with the underlying database.
### Using table accessors
In the example below, we'll use the `"Lahman"."Salaries"` table as an example.
By default, dbcooper makes this accessible as `.lahman_salaries`.
**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.
```python
# show table and column descriptions
dbc.lahman_salaries
```
<h3> salaries </h3>
<p> (No table description.) </p>
<table>
<thead>
<tr><th>name </th><th>type </th><th>description </th></tr>
</thead>
<tbody>
<tr><td>index </td><td>BIGINT</td><td> </td></tr>
<tr><td>yearID </td><td>BIGINT</td><td> </td></tr>
<tr><td>teamID </td><td>TEXT </td><td> </td></tr>
<tr><td>lgID </td><td>TEXT </td><td> </td></tr>
<tr><td>playerID</td><td>TEXT </td><td> </td></tr>
<tr><td>salary </td><td>BIGINT</td><td> </td></tr>
</tbody>
</table>
Note that sqlite doesn't support table and columnn descriptions, so these sections
are empty.
**Calling** `.lahman_salaries()` fetches a lazy version of the data.
```python
dbc.lahman_salaries()
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
index yearID teamID lgID playerID salary
0 0 1985 ATL NL barkele01 870000
1 1 1985 ATL NL bedrost01 550000
2 2 1985 ATL NL benedbr01 545000
3 3 1985 ATL NL campri01 633333
4 4 1985 ATL NL ceronri01 625000
# .. may have more rows
Note that this data is a siuba `LazyTbl` object, which you can use to analyze the data.
```python
from siuba import _, count
dbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
over_100k n
0 True 25374
1 False 1054
# .. may have more rows
### Using database functions
* `.list()`: Get a list of tables
* `.tbl()`: Access a table that can be worked with using `siuba`.
* `.query()`: Perform a SQL query and work with the result.
* `._engine`: Get the underlying sqlalchemy engine.
For instance, we could start by finding the names of the tables in the Lahman database.
```python
dbc.list()
```
['lahman.allstar_full',
'lahman.appearances',
'lahman.awards_managers',
'lahman.awards_players',
'lahman.awards_share_managers',
'lahman.awards_share_players',
'lahman.batting',
'lahman.batting_post',
'lahman.college_playing',
'lahman.fielding',
'lahman.fielding_of',
'lahman.fielding_ofsplit',
'lahman.fielding_post',
'lahman.hall_of_fame',
'lahman.home_games',
'lahman.managers',
'lahman.managers_half',
'lahman.parks',
'lahman.people',
'lahman.pitching',
'lahman.pitching_post',
'lahman.salaries',
'lahman.schools',
'lahman.series_post',
'lahman.teams',
'lahman.teams_franchises',
'lahman.teams_half']
We can access one of these tables with `dbc.tbl()`, then put it through any kind
of siuba operation.
```python
dbc.tbl("Salaries")
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
index yearID teamID lgID playerID salary
0 0 1985 ATL NL barkele01 870000
1 1 1985 ATL NL bedrost01 550000
2 2 1985 ATL NL benedbr01 545000
3 3 1985 ATL NL campri01 633333
4 4 1985 ATL NL ceronri01 625000
# .. may have more rows
```python
from siuba import _, count
dbc.tbl("Salaries") >> count(_.yearID, sort=True)
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
yearID n
0 1999 1006
1 1998 998
2 1995 986
3 1996 931
4 1997 925
# .. may have more rows
If you'd rather start from a SQL query, use the `.query()` method.
```python
dbc.query("""
SELECT
playerID,
sum(AB) as AB
FROM Batting
GROUP BY playerID
""")
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
playerID AB
0 aardsda01 4
1 aaronha01 12364
2 aaronto01 944
3 aasedo01 5
4 abadan01 21
# .. may have more rows
For anything else you might want to do, the sqlalchemy Engine object is available.
For example, the code below shows how you can set its `.echo` attribute, which
tells sqlalchemy to provide useful logs.
```python
dbc._engine.echo = True
table_names = dbc.list()
```
2022-03-20 22:49:37,553 INFO sqlalchemy.engine.Engine PRAGMA database_list
2022-03-20 22:49:37,554 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine SELECT name FROM "main".sqlite_master WHERE type='table' ORDER BY name
2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-20 22:49:37,556 INFO sqlalchemy.engine.Engine SELECT name FROM "lahman".sqlite_master WHERE type='table' ORDER BY name
2022-03-20 22:49:37,557 INFO sqlalchemy.engine.Engine [raw sql] ()
Note that the log messages above show that the `.list()` method executed two queries:
One to list tables in the "main" schema (which is empty), and one to list tables
in the "lahman" schema.
## Advanced Configuration
> ⚠️: These behaviors are well tested, but dbcooper's internals and API may change.
dbcooper can be configured in three ways, each corresponding to a class interface:
* **TableFinder**: Which tables will be used by `dbcooper`.
* **AccessorBuilder**: How table names are turned into accessors.
* **DbcDocumentedTable**: The class that defines what an accessor will return.
```python
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
from dbcooper import DbCooper, AccessorBuilder
engine = create_engine("sqlite://")
lahman_sqlite(engine)
```
### Excluding a schema
```python
from dbcooper import TableFinder
finder = TableFinder(exclude_schemas=["lahman"])
dbc_no_lahman = DbCooper(engine, table_finder=finder)
dbc_no_lahman.list()
```
[]
### Formatting table names
```python
from dbcooper import AccessorBuilder
# omits schema, and keeps only table name
# e.g. `salaries`, rather than `lahman_salaries`
builder = AccessorBuilder(format_from_part="table")
tbl_flat = DbCooper(engine, accessor_builder=builder)
tbl_flat.salaries()
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
index yearID teamID lgID playerID salary
0 0 1985 ATL NL barkele01 870000
1 1 1985 ATL NL bedrost01 550000
2 2 1985 ATL NL benedbr01 545000
3 3 1985 ATL NL campri01 633333
4 4 1985 ATL NL ceronri01 625000
# .. may have more rows
### Grouping tables by schema
```python
from dbcooper import AccessorHierarchyBuilder
tbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())
# note the form: <schema>.<table>
tbl_nested.lahman.salaries()
```
# Source: lazy query
# DB Conn: Engine(sqlite://)
# Preview:
index yearID teamID lgID playerID salary
0 0 1985 ATL NL barkele01 870000
1 1 1985 ATL NL bedrost01 550000
2 2 1985 ATL NL benedbr01 545000
3 3 1985 ATL NL campri01 633333
4 4 1985 ATL NL ceronri01 625000
# .. may have more rows
### Don't show table documentation
```python
from dbcooper import DbcSimpleTable
dbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)
dbc_no_doc.lahman_salaries
```
DbcSimpleTable(..., 'salaries', 'lahman')
Note that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things
like table and column descriptions as well as other dialects, so `DbcSimpleTable`
may be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).
## Developing
```shell
# install with development dependencies
pip install -e .[dev]
# or install from requirements file
pip install -r requirements/dev.txt
```
### Test
```shell
# run all tests, see pytest section of pyproject.toml
pytest
# run specific backends
pytest -m 'not snowflake and not bigquery'
# stop on first failure, drop into debugger
pytest -x --pdb
```
### Release
```shell
# set version number
git tag v0.0.1
# (optional) push to github
git push origin --tags
# check version
python -m setuptools_scm
```
Raw data
{
"_id": null,
"home_page": "https://github.com/machow/template-python-pkg",
"name": "dbcooper",
"maintainer": "",
"docs_url": null,
"requires_python": ">3.8",
"maintainer_email": "",
"keywords": "template,packaging",
"author": "Michael Chow",
"author_email": "mc_al_github@fastmail.com",
"download_url": "https://files.pythonhosted.org/packages/dd/30/a2020c9858e622012859d8c8a49f36fce922fe58b8b4e6f940fbe3d9874e/dbcooper-0.0.5.tar.gz",
"platform": null,
"description": "# dbcooper-py\n\n[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)\n\nThe dbcooper package turns a database connection into a collection of functions,\nhandling logic for keeping track of connections and letting you take advantage of\nautocompletion when exploring a database.\n\nIt's especially helpful to use when authoring database-specific Python packages,\nfor instance in an internal company package or one wrapping a public data source.\n\nFor the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).\n\n## Installation\n\n```\npip install dbcooper\n```\n\n## Example\n\n### Initializing the functions\n\nThe dbcooper package asks you to create the connection first.\nAs an example, we'll use the Lahman baseball database package (`lahman`).\n\n\n```python\nfrom sqlalchemy import create_engine\nfrom dbcooper.data import lahman_sqlite\n\n# connect to sqlite\nengine = create_engine(\"sqlite://\")\n\n# load the lahman data into the \"lahman\" schema\nlahman_sqlite(engine)\n```\n\nNext we'll set up dbcooper\n\n\n```python\nfrom dbcooper import DbCooper\n\ndbc = DbCooper(engine)\n```\n\nThe `DbCooper` object contains two important things:\n\n* Accessors to fetch specific tables.\n* Functions for interacting with the underlying database.\n\n### Using table accessors\n\nIn the example below, we'll use the `\"Lahman\".\"Salaries\"` table as an example.\nBy default, dbcooper makes this accessible as `.lahman_salaries`.\n\n**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.\n\n\n```python\n# show table and column descriptions\ndbc.lahman_salaries\n```\n\n\n\n\n<h3> salaries </h3>\n<p> (No table description.) </p>\n<table>\n<thead>\n<tr><th>name </th><th>type </th><th>description </th></tr>\n</thead>\n<tbody>\n<tr><td>index </td><td>BIGINT</td><td> </td></tr>\n<tr><td>yearID </td><td>BIGINT</td><td> </td></tr>\n<tr><td>teamID </td><td>TEXT </td><td> </td></tr>\n<tr><td>lgID </td><td>TEXT </td><td> </td></tr>\n<tr><td>playerID</td><td>TEXT </td><td> </td></tr>\n<tr><td>salary </td><td>BIGINT</td><td> </td></tr>\n</tbody>\n</table>\n\n\n\nNote that sqlite doesn't support table and columnn descriptions, so these sections\nare empty.\n\n**Calling** `.lahman_salaries()` fetches a lazy version of the data.\n\n\n\n```python\ndbc.lahman_salaries()\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n index yearID teamID lgID playerID salary\n 0 0 1985 ATL NL barkele01 870000\n 1 1 1985 ATL NL bedrost01 550000\n 2 2 1985 ATL NL benedbr01 545000\n 3 3 1985 ATL NL campri01 633333\n 4 4 1985 ATL NL ceronri01 625000\n # .. may have more rows\n\n\n\nNote that this data is a siuba `LazyTbl` object, which you can use to analyze the data.\n\n\n```python\nfrom siuba import _, count\n\ndbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n over_100k n\n 0 True 25374\n 1 False 1054\n # .. may have more rows\n\n\n\n### Using database functions\n\n* `.list()`: Get a list of tables\n* `.tbl()`: Access a table that can be worked with using `siuba`.\n* `.query()`: Perform a SQL query and work with the result.\n* `._engine`: Get the underlying sqlalchemy engine.\n\nFor instance, we could start by finding the names of the tables in the Lahman database.\n\n\n```python\ndbc.list()\n```\n\n\n\n\n ['lahman.allstar_full',\n 'lahman.appearances',\n 'lahman.awards_managers',\n 'lahman.awards_players',\n 'lahman.awards_share_managers',\n 'lahman.awards_share_players',\n 'lahman.batting',\n 'lahman.batting_post',\n 'lahman.college_playing',\n 'lahman.fielding',\n 'lahman.fielding_of',\n 'lahman.fielding_ofsplit',\n 'lahman.fielding_post',\n 'lahman.hall_of_fame',\n 'lahman.home_games',\n 'lahman.managers',\n 'lahman.managers_half',\n 'lahman.parks',\n 'lahman.people',\n 'lahman.pitching',\n 'lahman.pitching_post',\n 'lahman.salaries',\n 'lahman.schools',\n 'lahman.series_post',\n 'lahman.teams',\n 'lahman.teams_franchises',\n 'lahman.teams_half']\n\n\n\nWe can access one of these tables with `dbc.tbl()`, then put it through any kind\nof siuba operation.\n\n\n```python\ndbc.tbl(\"Salaries\")\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n index yearID teamID lgID playerID salary\n 0 0 1985 ATL NL barkele01 870000\n 1 1 1985 ATL NL bedrost01 550000\n 2 2 1985 ATL NL benedbr01 545000\n 3 3 1985 ATL NL campri01 633333\n 4 4 1985 ATL NL ceronri01 625000\n # .. may have more rows\n\n\n\n\n```python\nfrom siuba import _, count\ndbc.tbl(\"Salaries\") >> count(_.yearID, sort=True)\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n yearID n\n 0 1999 1006\n 1 1998 998\n 2 1995 986\n 3 1996 931\n 4 1997 925\n # .. may have more rows\n\n\n\nIf you'd rather start from a SQL query, use the `.query()` method.\n\n\n```python\ndbc.query(\"\"\"\n SELECT\n playerID,\n sum(AB) as AB\n FROM Batting\n GROUP BY playerID\n\"\"\")\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n playerID AB\n 0 aardsda01 4\n 1 aaronha01 12364\n 2 aaronto01 944\n 3 aasedo01 5\n 4 abadan01 21\n # .. may have more rows\n\n\n\nFor anything else you might want to do, the sqlalchemy Engine object is available.\nFor example, the code below shows how you can set its `.echo` attribute, which\ntells sqlalchemy to provide useful logs.\n\n\n```python\ndbc._engine.echo = True\ntable_names = dbc.list()\n```\n\n 2022-03-20 22:49:37,553 INFO sqlalchemy.engine.Engine PRAGMA database_list\n 2022-03-20 22:49:37,554 INFO sqlalchemy.engine.Engine [raw sql] ()\n 2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine SELECT name FROM \"main\".sqlite_master WHERE type='table' ORDER BY name\n 2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine [raw sql] ()\n 2022-03-20 22:49:37,556 INFO sqlalchemy.engine.Engine SELECT name FROM \"lahman\".sqlite_master WHERE type='table' ORDER BY name\n 2022-03-20 22:49:37,557 INFO sqlalchemy.engine.Engine [raw sql] ()\n\n\nNote that the log messages above show that the `.list()` method executed two queries:\nOne to list tables in the \"main\" schema (which is empty), and one to list tables\nin the \"lahman\" schema.\n\n## Advanced Configuration\n\n> \u26a0\ufe0f: These behaviors are well tested, but dbcooper's internals and API may change.\n\ndbcooper can be configured in three ways, each corresponding to a class interface:\n\n* **TableFinder**: Which tables will be used by `dbcooper`.\n* **AccessorBuilder**: How table names are turned into accessors.\n* **DbcDocumentedTable**: The class that defines what an accessor will return.\n\n\n```python\nfrom sqlalchemy import create_engine\nfrom dbcooper.data import lahman_sqlite\nfrom dbcooper import DbCooper, AccessorBuilder\n\nengine = create_engine(\"sqlite://\")\nlahman_sqlite(engine)\n```\n\n### Excluding a schema\n\n\n```python\nfrom dbcooper import TableFinder\n\nfinder = TableFinder(exclude_schemas=[\"lahman\"])\ndbc_no_lahman = DbCooper(engine, table_finder=finder)\ndbc_no_lahman.list()\n```\n\n\n\n\n []\n\n\n\n### Formatting table names\n\n\n```python\nfrom dbcooper import AccessorBuilder\n\n# omits schema, and keeps only table name\n# e.g. `salaries`, rather than `lahman_salaries`\nbuilder = AccessorBuilder(format_from_part=\"table\")\n\ntbl_flat = DbCooper(engine, accessor_builder=builder)\ntbl_flat.salaries()\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n index yearID teamID lgID playerID salary\n 0 0 1985 ATL NL barkele01 870000\n 1 1 1985 ATL NL bedrost01 550000\n 2 2 1985 ATL NL benedbr01 545000\n 3 3 1985 ATL NL campri01 633333\n 4 4 1985 ATL NL ceronri01 625000\n # .. may have more rows\n\n\n\n### Grouping tables by schema\n\n\n```python\nfrom dbcooper import AccessorHierarchyBuilder\n\ntbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())\n\n# note the form: <schema>.<table>\ntbl_nested.lahman.salaries()\n```\n\n\n\n\n # Source: lazy query\n # DB Conn: Engine(sqlite://)\n # Preview:\n index yearID teamID lgID playerID salary\n 0 0 1985 ATL NL barkele01 870000\n 1 1 1985 ATL NL bedrost01 550000\n 2 2 1985 ATL NL benedbr01 545000\n 3 3 1985 ATL NL campri01 633333\n 4 4 1985 ATL NL ceronri01 625000\n # .. may have more rows\n\n\n\n### Don't show table documentation\n\n\n```python\nfrom dbcooper import DbcSimpleTable\n\ndbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)\ndbc_no_doc.lahman_salaries\n```\n\n\n\n\n DbcSimpleTable(..., 'salaries', 'lahman')\n\n\n\nNote that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things\nlike table and column descriptions as well as other dialects, so `DbcSimpleTable`\nmay be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).\n\n## Developing\n\n```shell\n# install with development dependencies\npip install -e .[dev]\n\n# or install from requirements file\npip install -r requirements/dev.txt\n```\n\n### Test\n\n```shell\n# run all tests, see pytest section of pyproject.toml\npytest\n\n# run specific backends\npytest -m 'not snowflake and not bigquery'\n\n# stop on first failure, drop into debugger\npytest -x --pdb\n```\n\n### Release\n\n```shell\n# set version number\ngit tag v0.0.1\n\n# (optional) push to github\ngit push origin --tags\n\n# check version\npython -m setuptools_scm\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Short description of the package.",
"version": "0.0.5",
"split_keywords": [
"template",
"packaging"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bbb30472f06cd2009cfe15dca94c1f02d199464ef61a696f37a5edc16c98141e",
"md5": "9ba453ea52107d95953fe31dac3e0bda",
"sha256": "93aa999f5003b16e8929f1170ecbf90701602aeaf082df69713b131f2e4e845d"
},
"downloads": -1,
"filename": "dbcooper-0.0.5-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "9ba453ea52107d95953fe31dac3e0bda",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">3.8",
"size": 16615,
"upload_time": "2023-03-23T13:39:15",
"upload_time_iso_8601": "2023-03-23T13:39:15.791611Z",
"url": "https://files.pythonhosted.org/packages/bb/b3/0472f06cd2009cfe15dca94c1f02d199464ef61a696f37a5edc16c98141e/dbcooper-0.0.5-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dd30a2020c9858e622012859d8c8a49f36fce922fe58b8b4e6f940fbe3d9874e",
"md5": "678c641ceca66c3681f017bf046d9a61",
"sha256": "4c5684716dc3955a43f562738861a82d391554c1e3e41bf07769455973dc036c"
},
"downloads": -1,
"filename": "dbcooper-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "678c641ceca66c3681f017bf046d9a61",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.8",
"size": 18880,
"upload_time": "2023-03-23T13:39:17",
"upload_time_iso_8601": "2023-03-23T13:39:17.684792Z",
"url": "https://files.pythonhosted.org/packages/dd/30/a2020c9858e622012859d8c8a49f36fce922fe58b8b4e6f940fbe3d9874e/dbcooper-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-23 13:39:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "machow",
"github_project": "template-python-pkg",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "dbcooper"
}