duckdb_engine

Name	duckdb_engine JSON
Version	0.13.6 JSON
	download
home_page	https://github.com/Mause/duckdb_engine
Summary	SQLAlchemy driver for duckdb
upload_time	2024-11-24 00:34:32
maintainer	None
docs_url	None
author	Elliana
requires_python	<4,>=3.8
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # duckdb_engine

[![Supported Python Versions](https://img.shields.io/pypi/pyversions/duckdb-engine)](https://pypi.org/project/duckdb-engine/) [![PyPI version](https://badge.fury.io/py/duckdb-engine.svg)](https://badge.fury.io/py/duckdb-engine) [![PyPI Downloads](https://img.shields.io/pypi/dm/duckdb-engine.svg)](https://pypi.org/project/duckdb-engine/) [![codecov](https://codecov.io/gh/Mause/duckdb_engine/graph/badge.svg)](https://codecov.io/gh/Mause/duckdb_engine)

Basic SQLAlchemy driver for [DuckDB](https://duckdb.org/)

<!--ts-->
* [duckdb_engine](#duckdb_engine)
   * [Installation](#installation)
   * [Usage](#usage)
   * [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)
   * [Configuration](#configuration)
   * [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)
   * [Things to keep in mind](#things-to-keep-in-mind)
      * [Auto-incrementing ID columns](#auto-incrementing-id-columns)
      * [Pandas read_sql() chunksize](#pandas-read_sql-chunksize)
      * [Unsigned integer support](#unsigned-integer-support)
   * [Alembic Integration](#alembic-integration)
   * [Preloading extensions (experimental)](#preloading-extensions-experimental)
   * [The name](#the-name)

<!-- Created by https://github.com/ekalinin/github-markdown-toc -->
<!-- Added by: me, at: Wed 20 Sep 2023 12:44:27 AWST -->

<!--te-->

## Installation
```sh
$ pip install duckdb-engine
```

DuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).

## Usage

Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search

```python
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session

Base = declarative_base()


class FakeModel(Base):  # type: ignore
    __tablename__ = "fake"

    id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
    name = Column(String)


eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

session.add(FakeModel(name="Frank"))
session.commit()

frank = session.query(FakeModel).one()

assert frank.name == "Frank"
```

## Usage in IPython/Jupyter

With IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or
Alex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).

## Configuration

You can configure DuckDB by passing `connect_args` to the create_engine function
```python
create_engine(
    'duckdb:///:memory:',
    connect_args={
        'read_only': False,
        'config': {
            'memory_limit': '500mb'
        }
    }
)
```

The supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)

## How to register a pandas DataFrame

```python
conn = create_engine("duckdb:///:memory:").connect()

# with SQLAlchemy 1.3
conn.execute("register", ("dataframe_name", pd.DataFrame(...)))

# with SQLAlchemy 1.4+
conn.execute(text("register(:name, :df)"), {"name": "test_df", "df": df})

conn.execute("select * from dataframe_name")
```

## Things to keep in mind
Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.

### Auto-incrementing ID columns
When defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).

The following example demonstrates how to create an auto-incrementing ID column for a simple table:

```python
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
...     'users',
...     metadata,
...     sqlalchemy.Column(
...         'id',
...         sqlalchemy.Integer,
...         user_id_seq,
...         server_default=user_id_seq.next_value(),
...         primary_key=True,
...     ),
... )
>>> metadata.create_all(bind=engine)
```

### Pandas `read_sql()` chunksize

**NOTE**: this is no longer an issue in versions `>=0.5.0` of `duckdb`

The `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:

```python
>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine)                ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception
```

### Unsigned integer support

Unsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).

## Alembic Integration

SQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.

This support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.

```python
from alembic.ddl.impl import DefaultImpl

class AlembicDuckDBImpl(DefaultImpl):
    """Alembic implementation for DuckDB."""

    __dialect__ = "duckdb"
```

After loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.

## Preloading extensions (experimental)

> DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.

Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter

```python
from sqlalchemy import create_engine

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'preload_extensions': ['https'],
        'config': {
            's3_region': 'ap-southeast-1'
        }
    }
)
```

## The name

Yes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Mause/duckdb_engine",
    "name": "duckdb_engine",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Elliana",
    "author_email": "me@mause.me",
    "download_url": "https://files.pythonhosted.org/packages/d1/ad/919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331/duckdb_engine-0.13.6.tar.gz",
    "platform": null,
    "description": "# duckdb_engine\n\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/duckdb-engine)](https://pypi.org/project/duckdb-engine/) [![PyPI version](https://badge.fury.io/py/duckdb-engine.svg)](https://badge.fury.io/py/duckdb-engine) [![PyPI Downloads](https://img.shields.io/pypi/dm/duckdb-engine.svg)](https://pypi.org/project/duckdb-engine/) [![codecov](https://codecov.io/gh/Mause/duckdb_engine/graph/badge.svg)](https://codecov.io/gh/Mause/duckdb_engine)\n\nBasic SQLAlchemy driver for [DuckDB](https://duckdb.org/)\n\n<!--ts-->\n* [duckdb_engine](#duckdb_engine)\n   * [Installation](#installation)\n   * [Usage](#usage)\n   * [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)\n   * [Configuration](#configuration)\n   * [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)\n   * [Things to keep in mind](#things-to-keep-in-mind)\n      * [Auto-incrementing ID columns](#auto-incrementing-id-columns)\n      * [Pandas read_sql() chunksize](#pandas-read_sql-chunksize)\n      * [Unsigned integer support](#unsigned-integer-support)\n   * [Alembic Integration](#alembic-integration)\n   * [Preloading extensions (experimental)](#preloading-extensions-experimental)\n   * [The name](#the-name)\n\n<!-- Created by https://github.com/ekalinin/github-markdown-toc -->\n<!-- Added by: me, at: Wed 20 Sep 2023 12:44:27 AWST -->\n\n<!--te-->\n\n## Installation\n```sh\n$ pip install duckdb-engine\n```\n\nDuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).\n\n## Usage\n\nOnce you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search\n\n```python\nfrom sqlalchemy import Column, Integer, Sequence, String, create_engine\nfrom sqlalchemy.ext.declarative import declarative_base\nfrom sqlalchemy.orm.session import Session\n\nBase = declarative_base()\n\n\nclass FakeModel(Base):  # type: ignore\n    __tablename__ = \"fake\"\n\n    id = Column(Integer, Sequence(\"fakemodel_id_sequence\"), primary_key=True)\n    name = Column(String)\n\n\neng = create_engine(\"duckdb:///:memory:\")\nBase.metadata.create_all(eng)\nsession = Session(bind=eng)\n\nsession.add(FakeModel(name=\"Frank\"))\nsession.commit()\n\nfrank = session.query(FakeModel).one()\n\nassert frank.name == \"Frank\"\n```\n\n## Usage in IPython/Jupyter\n\nWith IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or\nAlex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).\n\n## Configuration\n\nYou can configure DuckDB by passing `connect_args` to the create_engine function\n```python\ncreate_engine(\n    'duckdb:///:memory:',\n    connect_args={\n        'read_only': False,\n        'config': {\n            'memory_limit': '500mb'\n        }\n    }\n)\n```\n\nThe supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)\n\n## How to register a pandas DataFrame\n\n```python\nconn = create_engine(\"duckdb:///:memory:\").connect()\n\n# with SQLAlchemy 1.3\nconn.execute(\"register\", (\"dataframe_name\", pd.DataFrame(...)))\n\n# with SQLAlchemy 1.4+\nconn.execute(text(\"register(:name, :df)\"), {\"name\": \"test_df\", \"df\": df})\n\nconn.execute(\"select * from dataframe_name\")\n```\n\n## Things to keep in mind\nDuckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.\n\n### Auto-incrementing ID columns\nWhen defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).\n\nThe following example demonstrates how to create an auto-incrementing ID column for a simple table:\n\n```python\n>>> import sqlalchemy\n>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n>>> metadata = sqlalchemy.MetaData(engine)\n>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')\n>>> users_table = sqlalchemy.Table(\n...     'users',\n...     metadata,\n...     sqlalchemy.Column(\n...         'id',\n...         sqlalchemy.Integer,\n...         user_id_seq,\n...         server_default=user_id_seq.next_value(),\n...         primary_key=True,\n...     ),\n... )\n>>> metadata.create_all(bind=engine)\n```\n\n### Pandas `read_sql()` chunksize\n\n**NOTE**: this is no longer an issue in versions `>=0.5.0` of `duckdb`\n\nThe `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:\n\n```python\n>>> import pandas as pd\n>>> import sqlalchemy\n>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n>>> df = pd.read_sql('users', engine)                ### Works as expected\n>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception\n```\n\n### Unsigned integer support\n\nUnsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).\n\n## Alembic Integration\n\nSQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.\n\nThis support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.\n\n```python\nfrom alembic.ddl.impl import DefaultImpl\n\nclass AlembicDuckDBImpl(DefaultImpl):\n    \"\"\"Alembic implementation for DuckDB.\"\"\"\n\n    __dialect__ = \"duckdb\"\n```\n\nAfter loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.\n\n## Preloading extensions (experimental)\n\n> DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.\n\nUntil the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter\n\n```python\nfrom sqlalchemy import create_engine\n\ncreate_engine(\n    'duckdb:///:memory:',\n    connect_args={\n        'preload_extensions': ['https'],\n        'config': {\n            's3_region': 'ap-southeast-1'\n        }\n    }\n)\n```\n\n## The name\n\nYes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SQLAlchemy driver for duckdb",
    "version": "0.13.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/Mause/duckdb_engine/issues",
        "Changelog": "https://github.com/Mause/duckdb_engine/releases",
        "Homepage": "https://github.com/Mause/duckdb_engine",
        "Repository": "https://github.com/Mause/duckdb_engine"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef5d81a0d67483d0767e4fbf7444b079b3f21574a184b0888782ced1c2172777",
                "md5": "8aea36b6df311363c912a850c8a7e259",
                "sha256": "cedd44252cce5f42de88752026925154a566c407987116a242d250642904ba84"
            },
            "downloads": -1,
            "filename": "duckdb_engine-0.13.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8aea36b6df311363c912a850c8a7e259",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 48961,
            "upload_time": "2024-11-24T00:34:30",
            "upload_time_iso_8601": "2024-11-24T00:34:30.258287Z",
            "url": "https://files.pythonhosted.org/packages/ef/5d/81a0d67483d0767e4fbf7444b079b3f21574a184b0888782ced1c2172777/duckdb_engine-0.13.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1ad919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331",
                "md5": "e386f419acffc0450577dcd5b3299bf0",
                "sha256": "221ec7759e157fd8d4fcb0bd64f603c5a4b1889186f30d805a91b10a73f8c59a"
            },
            "downloads": -1,
            "filename": "duckdb_engine-0.13.6.tar.gz",
            "has_sig": false,
            "md5_digest": "e386f419acffc0450577dcd5b3299bf0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 47745,
            "upload_time": "2024-11-24T00:34:32",
            "upload_time_iso_8601": "2024-11-24T00:34:32.563189Z",
            "url": "https://files.pythonhosted.org/packages/d1/ad/919358e78030059f8ddab03499eb0949145295000f61333642d50daaa331/duckdb_engine-0.13.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-24 00:34:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Mause",
    "github_project": "duckdb_engine",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "duckdb_engine"
}

Elliana