# duckdb_engine
[](https://pypi.org/project/duckdb-engine/) [](https://badge.fury.io/py/duckdb-engine) [](https://pypi.org/project/duckdb-engine/) [](https://codecov.io/gh/Mause/duckdb_engine)
Basic SQLAlchemy driver for [DuckDB](https://duckdb.org/)
<!--ts-->
* [duckdb_engine](#duckdb_engine)
* [Installation](#installation)
* [Usage](#usage)
* [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)
* [Configuration](#configuration)
* [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)
* [Things to keep in mind](#things-to-keep-in-mind)
* [Auto-incrementing ID columns](#auto-incrementing-id-columns)
* [Pandas read_sql() chunksize](#pandas-read_sql-chunksize)
* [Unsigned integer support](#unsigned-integer-support)
* [Alembic Integration](#alembic-integration)
* [Preloading extensions (experimental)](#preloading-extensions-experimental)
* [The name](#the-name)
<!-- Created by https://github.com/ekalinin/github-markdown-toc -->
<!-- Added by: me, at: Wed 20 Sep 2023 12:44:27 AWST -->
<!--te-->
## Installation
```sh
$ pip install duckdb-engine
```
DuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).
## Usage
Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search
```python
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
class FakeModel(Base): # type: ignore
__tablename__ = "fake"
id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
name = Column(String)
eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)
session.add(FakeModel(name="Frank"))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == "Frank"
```
## Usage in IPython/Jupyter
With IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or
Alex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).
## Configuration
You can configure DuckDB by passing `connect_args` to the create_engine function
```python
create_engine(
'duckdb:///:memory:',
connect_args={
'read_only': False,
'config': {
'memory_limit': '500mb'
}
}
)
```
The supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)
## How to register a pandas DataFrame
```python
conn = create_engine("duckdb:///:memory:").connect()
# with SQLAlchemy 1.3
conn.execute("register", ("dataframe_name", pd.DataFrame(...)))
# with SQLAlchemy 1.4+
conn.execute(text("register(:name, :df)"), {"name": "test_df", "df": df})
conn.execute("select * from dataframe_name")
```
## Things to keep in mind
Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.
### Auto-incrementing ID columns
When defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).
The following example demonstrates how to create an auto-incrementing ID column for a simple table:
```python
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
... 'users',
... metadata,
... sqlalchemy.Column(
... 'id',
... sqlalchemy.Integer,
... user_id_seq,
... server_default=user_id_seq.next_value(),
... primary_key=True,
... ),
... )
>>> metadata.create_all(bind=engine)
```
### Pandas `read_sql()` chunksize
**NOTE**: this is no longer an issue in versions `>=0.5.0` of `duckdb`
The `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:
```python
>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine) ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25) ### Throws an exception
```
### Unsigned integer support
Unsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).
## Alembic Integration
SQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.
This support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.
```python
from alembic.ddl.impl import DefaultImpl
class AlembicDuckDBImpl(DefaultImpl):
"""Alembic implementation for DuckDB."""
__dialect__ = "duckdb"
```
After loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.
## Preloading extensions (experimental)
> DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.
Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter
```python
from sqlalchemy import create_engine
create_engine(
'duckdb:///:memory:',
connect_args={
'preload_extensions': ['https'],
'config': {
's3_region': 'ap-southeast-1'
}
}
)
```
## The name
Yes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now
Raw data
{
"_id": null,
"home_page": "https://github.com/Mause/duckdb_engine",
"name": "duckdb_engine",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Elliana",
"author_email": "me@mause.me",
"download_url": "https://files.pythonhosted.org/packages/d1/ad/919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331/duckdb_engine-0.13.6.tar.gz",
"platform": null,
"description": "# duckdb_engine\n\n[](https://pypi.org/project/duckdb-engine/) [](https://badge.fury.io/py/duckdb-engine) [](https://pypi.org/project/duckdb-engine/) [](https://codecov.io/gh/Mause/duckdb_engine)\n\nBasic SQLAlchemy driver for [DuckDB](https://duckdb.org/)\n\n<!--ts-->\n* [duckdb_engine](#duckdb_engine)\n * [Installation](#installation)\n * [Usage](#usage)\n * [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)\n * [Configuration](#configuration)\n * [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)\n * [Things to keep in mind](#things-to-keep-in-mind)\n * [Auto-incrementing ID columns](#auto-incrementing-id-columns)\n * [Pandas read_sql() chunksize](#pandas-read_sql-chunksize)\n * [Unsigned integer support](#unsigned-integer-support)\n * [Alembic Integration](#alembic-integration)\n * [Preloading extensions (experimental)](#preloading-extensions-experimental)\n * [The name](#the-name)\n\n<!-- Created by https://github.com/ekalinin/github-markdown-toc -->\n<!-- Added by: me, at: Wed 20 Sep 2023 12:44:27 AWST -->\n\n<!--te-->\n\n## Installation\n```sh\n$ pip install duckdb-engine\n```\n\nDuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).\n\n## Usage\n\nOnce you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search\n\n```python\nfrom sqlalchemy import Column, Integer, Sequence, String, create_engine\nfrom sqlalchemy.ext.declarative import declarative_base\nfrom sqlalchemy.orm.session import Session\n\nBase = declarative_base()\n\n\nclass FakeModel(Base): # type: ignore\n __tablename__ = \"fake\"\n\n id = Column(Integer, Sequence(\"fakemodel_id_sequence\"), primary_key=True)\n name = Column(String)\n\n\neng = create_engine(\"duckdb:///:memory:\")\nBase.metadata.create_all(eng)\nsession = Session(bind=eng)\n\nsession.add(FakeModel(name=\"Frank\"))\nsession.commit()\n\nfrank = session.query(FakeModel).one()\n\nassert frank.name == \"Frank\"\n```\n\n## Usage in IPython/Jupyter\n\nWith IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or\nAlex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).\n\n## Configuration\n\nYou can configure DuckDB by passing `connect_args` to the create_engine function\n```python\ncreate_engine(\n 'duckdb:///:memory:',\n connect_args={\n 'read_only': False,\n 'config': {\n 'memory_limit': '500mb'\n }\n }\n)\n```\n\nThe supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)\n\n## How to register a pandas DataFrame\n\n```python\nconn = create_engine(\"duckdb:///:memory:\").connect()\n\n# with SQLAlchemy 1.3\nconn.execute(\"register\", (\"dataframe_name\", pd.DataFrame(...)))\n\n# with SQLAlchemy 1.4+\nconn.execute(text(\"register(:name, :df)\"), {\"name\": \"test_df\", \"df\": df})\n\nconn.execute(\"select * from dataframe_name\")\n```\n\n## Things to keep in mind\nDuckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.\n\n### Auto-incrementing ID columns\nWhen defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).\n\nThe following example demonstrates how to create an auto-incrementing ID column for a simple table:\n\n```python\n>>> import sqlalchemy\n>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n>>> metadata = sqlalchemy.MetaData(engine)\n>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')\n>>> users_table = sqlalchemy.Table(\n... 'users',\n... metadata,\n... sqlalchemy.Column(\n... 'id',\n... sqlalchemy.Integer,\n... user_id_seq,\n... server_default=user_id_seq.next_value(),\n... primary_key=True,\n... ),\n... )\n>>> metadata.create_all(bind=engine)\n```\n\n### Pandas `read_sql()` chunksize\n\n**NOTE**: this is no longer an issue in versions `>=0.5.0` of `duckdb`\n\nThe `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:\n\n```python\n>>> import pandas as pd\n>>> import sqlalchemy\n>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n>>> df = pd.read_sql('users', engine) ### Works as expected\n>>> df = pd.read_sql('users', engine, chunksize=25) ### Throws an exception\n```\n\n### Unsigned integer support\n\nUnsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).\n\n## Alembic Integration\n\nSQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.\n\nThis support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.\n\n```python\nfrom alembic.ddl.impl import DefaultImpl\n\nclass AlembicDuckDBImpl(DefaultImpl):\n \"\"\"Alembic implementation for DuckDB.\"\"\"\n\n __dialect__ = \"duckdb\"\n```\n\nAfter loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.\n\n## Preloading extensions (experimental)\n\n> DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.\n\nUntil the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter\n\n```python\nfrom sqlalchemy import create_engine\n\ncreate_engine(\n 'duckdb:///:memory:',\n connect_args={\n 'preload_extensions': ['https'],\n 'config': {\n 's3_region': 'ap-southeast-1'\n }\n }\n)\n```\n\n## The name\n\nYes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SQLAlchemy driver for duckdb",
"version": "0.13.6",
"project_urls": {
"Bug Tracker": "https://github.com/Mause/duckdb_engine/issues",
"Changelog": "https://github.com/Mause/duckdb_engine/releases",
"Homepage": "https://github.com/Mause/duckdb_engine",
"Repository": "https://github.com/Mause/duckdb_engine"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ef5d81a0d67483d0767e4fbf7444b079b3f21574a184b0888782ced1c2172777",
"md5": "8aea36b6df311363c912a850c8a7e259",
"sha256": "cedd44252cce5f42de88752026925154a566c407987116a242d250642904ba84"
},
"downloads": -1,
"filename": "duckdb_engine-0.13.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8aea36b6df311363c912a850c8a7e259",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 48961,
"upload_time": "2024-11-24T00:34:30",
"upload_time_iso_8601": "2024-11-24T00:34:30.258287Z",
"url": "https://files.pythonhosted.org/packages/ef/5d/81a0d67483d0767e4fbf7444b079b3f21574a184b0888782ced1c2172777/duckdb_engine-0.13.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d1ad919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331",
"md5": "e386f419acffc0450577dcd5b3299bf0",
"sha256": "221ec7759e157fd8d4fcb0bd64f603c5a4b1889186f30d805a91b10a73f8c59a"
},
"downloads": -1,
"filename": "duckdb_engine-0.13.6.tar.gz",
"has_sig": false,
"md5_digest": "e386f419acffc0450577dcd5b3299bf0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 47745,
"upload_time": "2024-11-24T00:34:32",
"upload_time_iso_8601": "2024-11-24T00:34:32.563189Z",
"url": "https://files.pythonhosted.org/packages/d1/ad/919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331/duckdb_engine-0.13.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-24 00:34:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Mause",
"github_project": "duckdb_engine",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "duckdb_engine"
}