# magic_duckdb
DuckDB Cell (%%dql) and Line (%dql) Magics for Jupyter and VSCode
## Motivation
magic_duckdb was created to:
- Provide simple cell/line magics with minimal code and zero dependencies
- Match performance of DuckDB python API
- Be a simple starting point to add other useful features with minimal complexity
- Bundle useful features, like using OpenAI to improve SQL, sql formatting (beautifying) and explain analyze graphics
### Why not the %sql magics (jupysql or ipython-sql)?
The goal of this project is to expose the native features of duckdb, with minimal dependencies, such as exporting to arrow tables or using DuckDB relation objects.
## Simplicity
The goal of this project was to provide minimal line & cell magics for DuckDB in Jupyter notebooks with minimal dependencies and as simply as possible.
The core code is concentrated in two places:
- magic.py: Barebones cell and line magic that parses arguments, and executes statements
- duckdb_mode.py: execute() calls the appropriate method based on the user selected output type.
Features that require additional dependencies, such as Jinja2 for the --jinja2 feature, are imported dynamically.
## Quick Start
```
%pip install magic_duckdb --upgrade --quiet
%load_ext magic_duckdb
%dql select * from range(100)
```
## Usage
```
Connection:
-cn <connection_string>: Create a new connection to a DuckDB. If passed without a query, this changes the global connection, otherwise used just for the current query.
%dql -cn myfile.db
-co <connection_object>: Use an existing DuckDB Connection. If passed without a query, this changes the global connection, otherwise used just for the current query.
con = duckdb.connect("somefile.db")
%dql -co con
-d: Use the duckdb.default_connection. If passed without a query, this changes the global connection, otherwise used just for the current query.
-g | --getcon: Get the current connection
con = %dql --getcon
--close: Close current connection
Modes:
-t <type> [default: df]: Selects the type for this request. If passed without a query, this changes the global default, otherwise used just for the current query.
-e <explain_mode>: Display the explain plan, or explain analyze, or AST
Options:
-l | --listtypes: Returns a list of available output types, for -t
-j | --jinja2: Process the SQL as a jinja template, using the user_ns environment.
`%dql -j select * from {{var1}}`
-p | --params: Pass the specified parameter(s) as a SQL parameters
`%dql -p obj1 select
-o <var>: Stores the resulting output in a variable named <var>
-tp <name> <value>: Pass a kwarg to the type function. Intended to be used to pass parameters to .show(). Must be passed on each call, not saved.
`%dql -t show -tp max_rows 10 <query>`
Extras:
--tables: Returns tables used by the query
-f: Format the string using npx sql-formatter
Other:
-ai / -aichat: Route request to OpenAI
%dql -ai Fix selct star frm mytable
```
See [notebooks](https://github.com/iqmo-org/magic_duckdb/tree/main/notebooks) for usage examples.
## Usage Details
- `%dql -t [df | arrow | pl | relation | show | df_markdown] <query>`: Equivalent to - `connection.sql(query).<type>()`
- `%dql -e [explain | explain_analyze_tree | explain_analyze_json | explain_analyze_draw | analyze | ast_json | ast_tree | ast_draw] <query>`:
- `explain` is equivalent to `connection.sql(query).explain()`
- `explain_analyze_*` options enable profiling (`PRAGMA enable_profiling`), and use `connection.sql(query).explain(type='analyze')`
- `explain_analyze_draw` requires [graphviz](https://graphviz.org/) to be installed on your system, and "dot" or "dot.exe" must be in your PATH or added via `magic.explain_analyze_graphviz.dot_path = 'path_to_dot'`. The graphviz python module must also be installed: `pip install graphviz`
- The AST options use [json_serialize_sql](https://github.com/duckdb/duckdb/discussions/6922) to describe the SQL. `ast_json` displays the raw json result, `ast_tree` displays an indented tree, and `ast_draw` uses graphviz to draw a graphical version of the tree
- `%dql --format <query>` uses [sql-formatter](https://github.com/sql-formatter-org/sql-formatter). This is a javascript library, so it needs to be installed separately, although it's executed via npx so should be fine as long as you have npx / node in your path.
- `%dql --tables <query>` returns the list of tables used by the query, equivalent to: `duckdb.get_table_names("SELECT * FROM xyz")`
- `%dql [-ai | -aichat] fix <query>` passes the current schema to OpenAI and askes OpenAI to "fix" the query. An OpenAI developer key is required.
```
# to set openai key
from magic_duckdb.extras import sql_ai
sql_ai.OPENAI_KEY = openai_key
```
## Autocompletion: Work in Progress
There are two different Autocompletion implementations, one for MatcherAPIv2 and the other (pre-ipython 8.6.0) for MatcherAPIv1. The MatcherAPIv2 version is tried first, and if it fails, the MatcherAPIv1 version is loaded. MatcherAPIv1 will not match the entire results of a cell: it's limited to a line by line match.
- Phrase completion: `%dql create <tab>` will show common phrases, such as `CREATE OR REPLACE`
- Pragma completion: `%dql PRAGMA <tab>` will list available pragma's.
- Table completion: `%dql select * from <tab>` will list available tables and Pandas DataFrames. This is triggered by a list of keywords (ie: from) that are expected to be followed. See magic_duckdb.autocomplete.common for the keywords.
- Column completion: `%dql select tablename.<tab> from tablename` will list available tables and Pandas DataFrames for tablename.
Notes:
- VScode will autocomplete differently than Jupyter: For instance, VScode autocompletes on spaces, Jupyter on tabs.
- Column completion only matches table or dataframe names. It doesn't look up aliases or analyze subqueries.
- This is meant as a proof of concept. The next step is to leverage duckdb's existing work: [4921](https://github.com/duckdb/duckdb/pull/4921) and test cases here: [shell-test.py](https://github.com/Mytherin/duckdb/blob/5f75cb90b478434f0a1811af0695ea3a186a67a8/tools/shell/shell-test.py)
- https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/hinterland/README.html: Enable code autocompletion menu for every keypress in a code cell, instead of only calling it with tab.
Autocompletion can be disabled with:
```
from magic_duckdb import magic
magic.ENABLE_AUTOCOMPLETE=False
%load_ext magic_duckdb
```
# Capturing output
Line Magics can be captured with assignment or -o:
```
# These are equivalent:
%dql -o varname <query>
varname = %dql <query>
```
Cell magics can only be captured with -o (var = %%dql doesn't work)
```
%%dql -o varname
<query>
```
To silence a cell, you can stack %%capture:
```
%%capture
%%iql -o bqldf
<query>
```
# Performance Comparison
The jupysql/sql-alchemy/duckdb-engine %sql magic was surprisingly slow when compared to magic_duckdb or duckdb. I didn't spend a lot of time evaluating this, so please do your own evaluation: my priority was keeping magic_duckdb simple by using duckdb directly.
### Versions
> Python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)]
> DuckDB: library_version source_id
> 0 0.8.1-dev51 e84cc1acb8
> Pandas : 2.0.1
> jupysql 0.7.5
### Test Setup
See [benchmarking.ipynb](https://github.com/iqmo-org/magic_duckdb/blob/main/notebooks/benchmarking.ipynb) for the test code.
### Results
| | name | 1 | 1000 | 1000000 | description |
| --- | ------------------------- | -----: | -----: | ------: | --------------------------------------------------- |
| 0 | test_magicddb_pandas | 3.47 | 4.11 | 291.0 | %dql -t df query |
| 1 | test_duckdb_execute_df | 3.80 | 3.39 | 281.0 | con.execute(query).df() |
| 2 | test_duckdb_execute_arrow | 3.91 | 4.04 | 128.0 | con.execute(query).arrow() |
| 3 | test_magicddb_arrow | 4.10 | 5.38 | 127.0 | %dql -t arrow query |
| 4 | test_duckdb_sql_df | 6.93 | 7.94 | 318.0 | con.sql(query).df() |
| 5 | test_jupysql | 321.00 | 256.00 | 547.0 | %config SqlMagic.autopandas = True <br/> %sql query |
Copyright © 2023 Iqmo Corporation
Raw data
{
"_id": null,
"home_page": "https://github.com/iqmo-org/magic_duckdb",
"name": "magic-duckdb",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Jupyter, Cell Magic, DuckDB, Line Magic, Magic",
"author": "iqmo",
"author_email": "info@iqmo.com",
"download_url": null,
"platform": null,
"description": "# magic_duckdb\n\nDuckDB Cell (%%dql) and Line (%dql) Magics for Jupyter and VSCode\n\n## Motivation\n\nmagic_duckdb was created to:\n\n- Provide simple cell/line magics with minimal code and zero dependencies\n- Match performance of DuckDB python API\n- Be a simple starting point to add other useful features with minimal complexity\n- Bundle useful features, like using OpenAI to improve SQL, sql formatting (beautifying) and explain analyze graphics\n\n### Why not the %sql magics (jupysql or ipython-sql)?\n\nThe goal of this project is to expose the native features of duckdb, with minimal dependencies, such as exporting to arrow tables or using DuckDB relation objects. \n\n## Simplicity\n\nThe goal of this project was to provide minimal line & cell magics for DuckDB in Jupyter notebooks with minimal dependencies and as simply as possible.\n\nThe core code is concentrated in two places:\n\n- magic.py: Barebones cell and line magic that parses arguments, and executes statements\n- duckdb_mode.py: execute() calls the appropriate method based on the user selected output type.\n\nFeatures that require additional dependencies, such as Jinja2 for the --jinja2 feature, are imported dynamically.\n\n## Quick Start\n\n```\n%pip install magic_duckdb --upgrade --quiet\n%load_ext magic_duckdb\n%dql select * from range(100)\n```\n\n## Usage\n\n```\nConnection:\n-cn <connection_string>: Create a new connection to a DuckDB. If passed without a query, this changes the global connection, otherwise used just for the current query.\n %dql -cn myfile.db\n-co <connection_object>: Use an existing DuckDB Connection. If passed without a query, this changes the global connection, otherwise used just for the current query.\n con = duckdb.connect(\"somefile.db\")\n %dql -co con\n-d: Use the duckdb.default_connection. If passed without a query, this changes the global connection, otherwise used just for the current query.\n-g | --getcon: Get the current connection\n con = %dql --getcon\n--close: Close current connection\n\nModes:\n-t <type> [default: df]: Selects the type for this request. If passed without a query, this changes the global default, otherwise used just for the current query.\n-e <explain_mode>: Display the explain plan, or explain analyze, or AST\n\nOptions:\n-l | --listtypes: Returns a list of available output types, for -t\n\n-j | --jinja2: Process the SQL as a jinja template, using the user_ns environment.\n `%dql -j select * from {{var1}}`\n-p | --params: Pass the specified parameter(s) as a SQL parameters\n `%dql -p obj1 select\n-o <var>: Stores the resulting output in a variable named <var>\n-tp <name> <value>: Pass a kwarg to the type function. Intended to be used to pass parameters to .show(). Must be passed on each call, not saved. \n `%dql -t show -tp max_rows 10 <query>`\n\nExtras:\n--tables: Returns tables used by the query\n-f: Format the string using npx sql-formatter\nOther:\n-ai / -aichat: Route request to OpenAI\n %dql -ai Fix selct star frm mytable\n\n```\n\nSee [notebooks](https://github.com/iqmo-org/magic_duckdb/tree/main/notebooks) for usage examples.\n\n## Usage Details\n\n- `%dql -t [df | arrow | pl | relation | show | df_markdown] <query>`: Equivalent to - `connection.sql(query).<type>()`\n- `%dql -e [explain | explain_analyze_tree | explain_analyze_json | explain_analyze_draw | analyze | ast_json | ast_tree | ast_draw] <query>`:\n - `explain` is equivalent to `connection.sql(query).explain()`\n - `explain_analyze_*` options enable profiling (`PRAGMA enable_profiling`), and use `connection.sql(query).explain(type='analyze')`\n - `explain_analyze_draw` requires [graphviz](https://graphviz.org/) to be installed on your system, and \"dot\" or \"dot.exe\" must be in your PATH or added via `magic.explain_analyze_graphviz.dot_path = 'path_to_dot'`. The graphviz python module must also be installed: `pip install graphviz`\n - The AST options use [json_serialize_sql](https://github.com/duckdb/duckdb/discussions/6922) to describe the SQL. `ast_json` displays the raw json result, `ast_tree` displays an indented tree, and `ast_draw` uses graphviz to draw a graphical version of the tree\n- `%dql --format <query>` uses [sql-formatter](https://github.com/sql-formatter-org/sql-formatter). This is a javascript library, so it needs to be installed separately, although it's executed via npx so should be fine as long as you have npx / node in your path.\n- `%dql --tables <query>` returns the list of tables used by the query, equivalent to: `duckdb.get_table_names(\"SELECT * FROM xyz\")`\n- `%dql [-ai | -aichat] fix <query>` passes the current schema to OpenAI and askes OpenAI to \"fix\" the query. An OpenAI developer key is required.\n\n```\n # to set openai key\n from magic_duckdb.extras import sql_ai\n sql_ai.OPENAI_KEY = openai_key\n```\n\n## Autocompletion: Work in Progress\n\nThere are two different Autocompletion implementations, one for MatcherAPIv2 and the other (pre-ipython 8.6.0) for MatcherAPIv1. The MatcherAPIv2 version is tried first, and if it fails, the MatcherAPIv1 version is loaded. MatcherAPIv1 will not match the entire results of a cell: it's limited to a line by line match.\n\n- Phrase completion: `%dql create <tab>` will show common phrases, such as `CREATE OR REPLACE`\n- Pragma completion: `%dql PRAGMA <tab>` will list available pragma's.\n- Table completion: `%dql select * from <tab>` will list available tables and Pandas DataFrames. This is triggered by a list of keywords (ie: from) that are expected to be followed. See magic_duckdb.autocomplete.common for the keywords.\n- Column completion: `%dql select tablename.<tab> from tablename` will list available tables and Pandas DataFrames for tablename.\n\nNotes:\n\n- VScode will autocomplete differently than Jupyter: For instance, VScode autocompletes on spaces, Jupyter on tabs.\n- Column completion only matches table or dataframe names. It doesn't look up aliases or analyze subqueries.\n- This is meant as a proof of concept. The next step is to leverage duckdb's existing work: [4921](https://github.com/duckdb/duckdb/pull/4921) and test cases here: [shell-test.py](https://github.com/Mytherin/duckdb/blob/5f75cb90b478434f0a1811af0695ea3a186a67a8/tools/shell/shell-test.py)\n- https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/hinterland/README.html: Enable code autocompletion menu for every keypress in a code cell, instead of only calling it with tab.\n\nAutocompletion can be disabled with:\n\n```\nfrom magic_duckdb import magic\nmagic.ENABLE_AUTOCOMPLETE=False\n%load_ext magic_duckdb\n```\n\n# Capturing output\n\nLine Magics can be captured with assignment or -o:\n\n```\n# These are equivalent:\n%dql -o varname <query>\nvarname = %dql <query>\n```\n\nCell magics can only be captured with -o (var = %%dql doesn't work)\n\n```\n%%dql -o varname\n<query>\n```\n\nTo silence a cell, you can stack %%capture:\n\n```\n%%capture\n%%iql -o bqldf\n<query>\n```\n\n# Performance Comparison\n\nThe jupysql/sql-alchemy/duckdb-engine %sql magic was surprisingly slow when compared to magic_duckdb or duckdb. I didn't spend a lot of time evaluating this, so please do your own evaluation: my priority was keeping magic_duckdb simple by using duckdb directly.\n\n### Versions\n\n> Python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)]\n> DuckDB: library_version source_id\n> 0 0.8.1-dev51 e84cc1acb8\n> Pandas : 2.0.1\n> jupysql 0.7.5\n\n### Test Setup\n\nSee [benchmarking.ipynb](https://github.com/iqmo-org/magic_duckdb/blob/main/notebooks/benchmarking.ipynb) for the test code.\n\n### Results\n\n| | name | 1 | 1000 | 1000000 | description |\n| --- | ------------------------- | -----: | -----: | ------: | --------------------------------------------------- |\n| 0 | test_magicddb_pandas | 3.47 | 4.11 | 291.0 | %dql -t df query |\n| 1 | test_duckdb_execute_df | 3.80 | 3.39 | 281.0 | con.execute(query).df() |\n| 2 | test_duckdb_execute_arrow | 3.91 | 4.04 | 128.0 | con.execute(query).arrow() |\n| 3 | test_magicddb_arrow | 4.10 | 5.38 | 127.0 | %dql -t arrow query |\n| 4 | test_duckdb_sql_df | 6.93 | 7.94 | 318.0 | con.sql(query).df() |\n| 5 | test_jupysql | 321.00 | 256.00 | 547.0 | %config SqlMagic.autopandas = True <br/> %sql query |\n\nCopyright © 2023 Iqmo Corporation\n",
"bugtrack_url": null,
"license": null,
"summary": "Jupyter Cell and Line Magics for DuckDB",
"version": "0.1.27",
"project_urls": {
"Homepage": "https://github.com/iqmo-org/magic_duckdb"
},
"split_keywords": [
"jupyter",
" cell magic",
" duckdb",
" line magic",
" magic"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "936280d7f6d235b8aece35f9ceea0133340b5844984e8849a8224dc022bcdecb",
"md5": "2b3fcd27d5c31506abf7ffc30ea8f822",
"sha256": "a9f74bb068051862d0f4524679786cf30932b306150e1f3819152448b02f3f97"
},
"downloads": -1,
"filename": "magic_duckdb-0.1.27-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2b3fcd27d5c31506abf7ffc30ea8f822",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 21990,
"upload_time": "2024-03-30T15:46:18",
"upload_time_iso_8601": "2024-03-30T15:46:18.754222Z",
"url": "https://files.pythonhosted.org/packages/93/62/80d7f6d235b8aece35f9ceea0133340b5844984e8849a8224dc022bcdecb/magic_duckdb-0.1.27-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-30 15:46:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "iqmo-org",
"github_project": "magic_duckdb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "magic-duckdb"
}