Name | databricks-sqlalchemy JSON |
Version |
2.0.3
JSON |
| download |
home_page | None |
Summary | Databricks SQLAlchemy plugin for Python |
upload_time | 2024-11-14 09:03:30 |
maintainer | None |
docs_url | None |
author | Databricks |
requires_python | <4.0.0,>=3.8.0 |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## Databricks dialect for SQLALchemy 2.0
The Databricks dialect for SQLAlchemy serves as bridge between [SQLAlchemy](https://www.sqlalchemy.org/) and the Databricks SQL Python driver. A working example demonstrating usage can be found in `sqlalchemy_example.py`.
## Installation
To install the dialect and its dependencies:
```shell
pip install databricks-sqlalchemy
```
If you also plan to use `alembic` you can alternatively run:
```shell
pip install alembic
```
## Connection String
Every SQLAlchemy application that connects to a database needs to use an [Engine](https://docs.sqlalchemy.org/en/20/tutorial/engine.html#tutorial-engine), which you can create by passing a connection string to `create_engine`. The connection string must include these components:
1. Host
2. HTTP Path for a compute resource
3. API access token
4. Initial catalog for the connection
5. Initial schema for the connection
**Note: Our dialect is built and tested on workspaces with Unity Catalog enabled. Support for the `hive_metastore` catalog is untested.**
For example:
```python
import os
from sqlalchemy import create_engine
host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
catalog = os.getenv("DATABRICKS_CATALOG")
schema = os.getenv("DATABRICKS_SCHEMA")
engine = create_engine(
f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}"
)
```
## Types
The [SQLAlchemy type hierarchy](https://docs.sqlalchemy.org/en/20/core/type_basics.html) contains backend-agnostic type implementations (represented in CamelCase) and backend-specific types (represented in UPPERCASE). The majority of SQLAlchemy's [CamelCase](https://docs.sqlalchemy.org/en/20/core/type_basics.html#the-camelcase-datatypes) types are supported. This means that a SQLAlchemy application using these types should "just work" with Databricks.
|SQLAlchemy Type|Databricks SQL Type|
|-|-|
[`BigInteger`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.BigInteger)| [`BIGINT`](https://docs.databricks.com/en/sql/language-manual/data-types/bigint-type.html)
[`LargeBinary`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.LargeBinary)| (not supported)|
[`Boolean`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Boolean)| [`BOOLEAN`](https://docs.databricks.com/en/sql/language-manual/data-types/boolean-type.html)
[`Date`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Date)| [`DATE`](https://docs.databricks.com/en/sql/language-manual/data-types/date-type.html)
[`DateTime`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.DateTime)| [`TIMESTAMP_NTZ`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-ntz-type.html)|
[`Double`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Double)| [`DOUBLE`](https://docs.databricks.com/en/sql/language-manual/data-types/double-type.html)
[`Enum`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Enum)| (not supported)|
[`Float`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Float)| [`FLOAT`](https://docs.databricks.com/en/sql/language-manual/data-types/float-type.html)
[`Integer`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Integer)| [`INT`](https://docs.databricks.com/en/sql/language-manual/data-types/int-type.html)
[`Numeric`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Numeric)| [`DECIMAL`](https://docs.databricks.com/en/sql/language-manual/data-types/decimal-type.html)|
[`PickleType`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.PickleType)| (not supported)|
[`SmallInteger`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.SmallInteger)| [`SMALLINT`](https://docs.databricks.com/en/sql/language-manual/data-types/smallint-type.html)
[`String`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.String)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|
[`Text`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Text)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|
[`Time`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Time)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|
[`Unicode`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Unicode)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|
[`UnicodeText`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.UnicodeText)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|
[`Uuid`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)
In addition, the dialect exposes three UPPERCASE SQLAlchemy types which are specific to Databricks:
- [`databricks.sqlalchemy.TINYINT`](https://docs.databricks.com/en/sql/language-manual/data-types/tinyint-type.html)
- [`databricks.sqlalchemy.TIMESTAMP`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-type.html)
- [`databricks.sqlalchemy.TIMESTAMP_NTZ`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-ntz-type.html)
### `LargeBinary()` and `PickleType()`
Databricks Runtime doesn't currently support binding of binary values in SQL queries, which is a pre-requisite for this functionality in SQLAlchemy.
## `Enum()` and `CHECK` constraints
Support for `CHECK` constraints is not implemented in this dialect. Support is planned for a future release.
SQLAlchemy's `Enum()` type depends on `CHECK` constraints and is therefore not yet supported.
### `DateTime()`, `TIMESTAMP_NTZ()`, and `TIMESTAMP()`
Databricks Runtime provides two datetime-like types: `TIMESTAMP` which is always timezone-aware and `TIMESTAMP_NTZ` which is timezone agnostic. Both types can be imported from `databricks.sqlalchemy` and used in your models.
The SQLAlchemy documentation indicates that `DateTime()` is not timezone-aware by default. So our dialect maps this type to `TIMESTAMP_NTZ()`. In practice, you should never need to use `TIMESTAMP_NTZ()` directly. Just use `DateTime()`.
If you need your field to be timezone-aware, you can import `TIMESTAMP()` and use it instead.
_Note that SQLAlchemy documentation suggests that you can declare a `DateTime()` with `timezone=True` on supported backends. However, if you do this with the Databricks dialect, the `timezone` argument will be ignored._
```python
from sqlalchemy import DateTime
from databricks.sqlalchemy import TIMESTAMP
class SomeModel(Base):
some_date_without_timezone = DateTime()
some_date_with_timezone = TIMESTAMP()
```
### `String()`, `Text()`, `Unicode()`, and `UnicodeText()`
Databricks Runtime doesn't support length limitations for `STRING` fields. Therefore `String()` or `String(1)` or `String(255)` will all produce identical DDL. Since `Text()`, `Unicode()`, `UnicodeText()` all use the same underlying type in Databricks SQL, they will generate equivalent DDL.
### `Time()`
Databricks Runtime doesn't have a native time-like data type. To implement this type in SQLAlchemy, our dialect stores SQLAlchemy `Time()` values in a `STRING` field. Unlike `DateTime` above, this type can optionally support timezone awareness (since the dialect is in complete control of the strings that we write to the Delta table).
```python
from sqlalchemy import Time
class SomeModel(Base):
time_tz = Time(timezone=True)
time_ntz = Time()
```
# Usage Notes
## `Identity()` and `autoincrement`
Identity and generated value support is currently limited in this dialect.
When defining models, SQLAlchemy types can accept an [`autoincrement`](https://docs.sqlalchemy.org/en/20/core/metadata.html#sqlalchemy.schema.Column.params.autoincrement) argument. In our dialect, this argument is currently ignored. To create an auto-incrementing field in your model you can pass in an explicit [`Identity()`](https://docs.sqlalchemy.org/en/20/core/defaults.html#identity-ddl) instead.
Furthermore, in Databricks Runtime, only `BIGINT` fields can be configured to auto-increment. So in SQLAlchemy, you must use the `BigInteger()` type.
```python
from sqlalchemy import Identity, String
class SomeModel(Base):
id = BigInteger(Identity())
value = String()
```
When calling `Base.metadata.create_all()`, the executed DDL will include `GENERATED ALWAYS AS IDENTITY` for the `id` column. This is useful when using SQLAlchemy to generate tables. However, as of this writing, `Identity()` constructs are not captured when SQLAlchemy reflects a table's metadata (support for this is planned).
## Parameters
`databricks-sql-connector` supports two approaches to parameterizing SQL queries: native and inline. Our SQLAlchemy 2.0 dialect always uses the native approach and is therefore limited to DBR 14.2 and above. If you are writing parameterized queries to be executed by SQLAlchemy, you must use the "named" paramstyle (`:param`). Read more about parameterization in `docs/parameters.md`.
## Usage with pandas
Use [`pandas.DataFrame.to_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) and [`pandas.read_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql) to write and read from Databricks SQL. These methods both accept a SQLAlchemy connection to interact with Databricks.
### Read from Databricks SQL into pandas
```python
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test")
with engine.connect() as conn:
# This will read the contents of `main.test.some_table`
df = pd.read_sql("some_table", conn)
```
### Write to Databricks SQL from pandas
```python
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test")
squares = [(i, i * i) for i in range(100)]
df = pd.DataFrame(data=squares,columns=['x','x_squared'])
with engine.connect() as conn:
# This will write the contents of `df` to `main.test.squares`
df.to_sql('squares',conn)
```
## [`PrimaryKey()`](https://docs.sqlalchemy.org/en/20/core/constraints.html#sqlalchemy.schema.PrimaryKeyConstraint) and [`ForeignKey()`](https://docs.sqlalchemy.org/en/20/core/constraints.html#defining-foreign-keys)
Unity Catalog workspaces in Databricks support PRIMARY KEY and FOREIGN KEY constraints. _Note that Databricks Runtime does not enforce the integrity of FOREIGN KEY constraints_. You can establish a primary key by setting `primary_key=True` when defining a column.
When building `ForeignKey` or `ForeignKeyConstraint` objects, you must specify a `name` for the constraint.
If your model definition requires a self-referential FOREIGN KEY constraint, you must include `use_alter=True` when defining the relationship.
```python
from sqlalchemy import Table, Column, ForeignKey, BigInteger, String
users = Table(
"users",
metadata_obj,
Column("id", BigInteger, primary_key=True),
Column("name", String(), nullable=False),
Column("email", String()),
Column("manager_id", ForeignKey("users.id", name="fk_users_manager_id_x_users_id", use_alter=True))
)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "databricks-sqlalchemy",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0.0,>=3.8.0",
"maintainer_email": null,
"keywords": null,
"author": "Databricks",
"author_email": "databricks-sql-connector-maintainers@databricks.com",
"download_url": "https://files.pythonhosted.org/packages/de/c3/3b24ba0fc54198c500f166210ef5cf7fc2b1175c0978fa46be1f82771175/databricks_sqlalchemy-2.0.3.tar.gz",
"platform": null,
"description": "## Databricks dialect for SQLALchemy 2.0\n\nThe Databricks dialect for SQLAlchemy serves as bridge between [SQLAlchemy](https://www.sqlalchemy.org/) and the Databricks SQL Python driver. A working example demonstrating usage can be found in `sqlalchemy_example.py`.\n\n\n## Installation\n\nTo install the dialect and its dependencies:\n\n```shell\npip install databricks-sqlalchemy\n```\n\nIf you also plan to use `alembic` you can alternatively run:\n\n```shell\npip install alembic\n```\n\n## Connection String\n\nEvery SQLAlchemy application that connects to a database needs to use an [Engine](https://docs.sqlalchemy.org/en/20/tutorial/engine.html#tutorial-engine), which you can create by passing a connection string to `create_engine`. The connection string must include these components:\n\n1. Host\n2. HTTP Path for a compute resource\n3. API access token\n4. Initial catalog for the connection\n5. Initial schema for the connection\n\n**Note: Our dialect is built and tested on workspaces with Unity Catalog enabled. Support for the `hive_metastore` catalog is untested.**\n\nFor example:\n\n```python\nimport os\nfrom sqlalchemy import create_engine\n\nhost = os.getenv(\"DATABRICKS_SERVER_HOSTNAME\")\nhttp_path = os.getenv(\"DATABRICKS_HTTP_PATH\")\naccess_token = os.getenv(\"DATABRICKS_TOKEN\")\ncatalog = os.getenv(\"DATABRICKS_CATALOG\")\nschema = os.getenv(\"DATABRICKS_SCHEMA\")\n\nengine = create_engine(\n f\"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}\"\n )\n```\n\n## Types\n\nThe [SQLAlchemy type hierarchy](https://docs.sqlalchemy.org/en/20/core/type_basics.html) contains backend-agnostic type implementations (represented in CamelCase) and backend-specific types (represented in UPPERCASE). The majority of SQLAlchemy's [CamelCase](https://docs.sqlalchemy.org/en/20/core/type_basics.html#the-camelcase-datatypes) types are supported. This means that a SQLAlchemy application using these types should \"just work\" with Databricks.\n\n|SQLAlchemy Type|Databricks SQL Type|\n|-|-|\n[`BigInteger`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.BigInteger)| [`BIGINT`](https://docs.databricks.com/en/sql/language-manual/data-types/bigint-type.html)\n[`LargeBinary`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.LargeBinary)| (not supported)|\n[`Boolean`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Boolean)| [`BOOLEAN`](https://docs.databricks.com/en/sql/language-manual/data-types/boolean-type.html)\n[`Date`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Date)| [`DATE`](https://docs.databricks.com/en/sql/language-manual/data-types/date-type.html)\n[`DateTime`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.DateTime)| [`TIMESTAMP_NTZ`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-ntz-type.html)|\n[`Double`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Double)| [`DOUBLE`](https://docs.databricks.com/en/sql/language-manual/data-types/double-type.html)\n[`Enum`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Enum)| (not supported)|\n[`Float`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Float)| [`FLOAT`](https://docs.databricks.com/en/sql/language-manual/data-types/float-type.html)\n[`Integer`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Integer)| [`INT`](https://docs.databricks.com/en/sql/language-manual/data-types/int-type.html)\n[`Numeric`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Numeric)| [`DECIMAL`](https://docs.databricks.com/en/sql/language-manual/data-types/decimal-type.html)|\n[`PickleType`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.PickleType)| (not supported)|\n[`SmallInteger`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.SmallInteger)| [`SMALLINT`](https://docs.databricks.com/en/sql/language-manual/data-types/smallint-type.html)\n[`String`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.String)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|\n[`Text`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Text)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|\n[`Time`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Time)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|\n[`Unicode`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Unicode)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|\n[`UnicodeText`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.UnicodeText)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)|\n[`Uuid`](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid)| [`STRING`](https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html)\n\nIn addition, the dialect exposes three UPPERCASE SQLAlchemy types which are specific to Databricks:\n\n- [`databricks.sqlalchemy.TINYINT`](https://docs.databricks.com/en/sql/language-manual/data-types/tinyint-type.html)\n- [`databricks.sqlalchemy.TIMESTAMP`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-type.html)\n- [`databricks.sqlalchemy.TIMESTAMP_NTZ`](https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-ntz-type.html)\n\n\n### `LargeBinary()` and `PickleType()`\n\nDatabricks Runtime doesn't currently support binding of binary values in SQL queries, which is a pre-requisite for this functionality in SQLAlchemy.\n\n## `Enum()` and `CHECK` constraints\n\nSupport for `CHECK` constraints is not implemented in this dialect. Support is planned for a future release.\n\nSQLAlchemy's `Enum()` type depends on `CHECK` constraints and is therefore not yet supported.\n\n### `DateTime()`, `TIMESTAMP_NTZ()`, and `TIMESTAMP()`\n\nDatabricks Runtime provides two datetime-like types: `TIMESTAMP` which is always timezone-aware and `TIMESTAMP_NTZ` which is timezone agnostic. Both types can be imported from `databricks.sqlalchemy` and used in your models.\n\nThe SQLAlchemy documentation indicates that `DateTime()` is not timezone-aware by default. So our dialect maps this type to `TIMESTAMP_NTZ()`. In practice, you should never need to use `TIMESTAMP_NTZ()` directly. Just use `DateTime()`.\n\nIf you need your field to be timezone-aware, you can import `TIMESTAMP()` and use it instead.\n\n_Note that SQLAlchemy documentation suggests that you can declare a `DateTime()` with `timezone=True` on supported backends. However, if you do this with the Databricks dialect, the `timezone` argument will be ignored._\n\n```python\nfrom sqlalchemy import DateTime\nfrom databricks.sqlalchemy import TIMESTAMP\n\nclass SomeModel(Base):\n some_date_without_timezone = DateTime()\n some_date_with_timezone = TIMESTAMP()\n```\n\n### `String()`, `Text()`, `Unicode()`, and `UnicodeText()`\n\nDatabricks Runtime doesn't support length limitations for `STRING` fields. Therefore `String()` or `String(1)` or `String(255)` will all produce identical DDL. Since `Text()`, `Unicode()`, `UnicodeText()` all use the same underlying type in Databricks SQL, they will generate equivalent DDL.\n\n### `Time()`\n\nDatabricks Runtime doesn't have a native time-like data type. To implement this type in SQLAlchemy, our dialect stores SQLAlchemy `Time()` values in a `STRING` field. Unlike `DateTime` above, this type can optionally support timezone awareness (since the dialect is in complete control of the strings that we write to the Delta table).\n\n```python\nfrom sqlalchemy import Time\n\nclass SomeModel(Base):\n time_tz = Time(timezone=True)\n time_ntz = Time()\n```\n\n\n# Usage Notes\n\n## `Identity()` and `autoincrement`\n\nIdentity and generated value support is currently limited in this dialect.\n\nWhen defining models, SQLAlchemy types can accept an [`autoincrement`](https://docs.sqlalchemy.org/en/20/core/metadata.html#sqlalchemy.schema.Column.params.autoincrement) argument. In our dialect, this argument is currently ignored. To create an auto-incrementing field in your model you can pass in an explicit [`Identity()`](https://docs.sqlalchemy.org/en/20/core/defaults.html#identity-ddl) instead.\n\nFurthermore, in Databricks Runtime, only `BIGINT` fields can be configured to auto-increment. So in SQLAlchemy, you must use the `BigInteger()` type.\n\n```python\nfrom sqlalchemy import Identity, String\n\nclass SomeModel(Base):\n id = BigInteger(Identity())\n value = String()\n```\n\nWhen calling `Base.metadata.create_all()`, the executed DDL will include `GENERATED ALWAYS AS IDENTITY` for the `id` column. This is useful when using SQLAlchemy to generate tables. However, as of this writing, `Identity()` constructs are not captured when SQLAlchemy reflects a table's metadata (support for this is planned).\n\n## Parameters\n\n`databricks-sql-connector` supports two approaches to parameterizing SQL queries: native and inline. Our SQLAlchemy 2.0 dialect always uses the native approach and is therefore limited to DBR 14.2 and above. If you are writing parameterized queries to be executed by SQLAlchemy, you must use the \"named\" paramstyle (`:param`). Read more about parameterization in `docs/parameters.md`.\n\n## Usage with pandas\n\nUse [`pandas.DataFrame.to_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) and [`pandas.read_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql) to write and read from Databricks SQL. These methods both accept a SQLAlchemy connection to interact with Databricks.\n\n### Read from Databricks SQL into pandas\n```python\nfrom sqlalchemy import create_engine\nimport pandas as pd\n\nengine = create_engine(\"databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test\")\nwith engine.connect() as conn:\n # This will read the contents of `main.test.some_table`\n df = pd.read_sql(\"some_table\", conn)\n```\n\n### Write to Databricks SQL from pandas\n\n```python\nfrom sqlalchemy import create_engine\nimport pandas as pd\n\nengine = create_engine(\"databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test\")\nsquares = [(i, i * i) for i in range(100)]\ndf = pd.DataFrame(data=squares,columns=['x','x_squared'])\n\nwith engine.connect() as conn:\n # This will write the contents of `df` to `main.test.squares`\n df.to_sql('squares',conn)\n```\n\n## [`PrimaryKey()`](https://docs.sqlalchemy.org/en/20/core/constraints.html#sqlalchemy.schema.PrimaryKeyConstraint) and [`ForeignKey()`](https://docs.sqlalchemy.org/en/20/core/constraints.html#defining-foreign-keys)\n\nUnity Catalog workspaces in Databricks support PRIMARY KEY and FOREIGN KEY constraints. _Note that Databricks Runtime does not enforce the integrity of FOREIGN KEY constraints_. You can establish a primary key by setting `primary_key=True` when defining a column.\n\nWhen building `ForeignKey` or `ForeignKeyConstraint` objects, you must specify a `name` for the constraint.\n\nIf your model definition requires a self-referential FOREIGN KEY constraint, you must include `use_alter=True` when defining the relationship.\n\n```python\nfrom sqlalchemy import Table, Column, ForeignKey, BigInteger, String\n\nusers = Table(\n \"users\",\n metadata_obj,\n Column(\"id\", BigInteger, primary_key=True),\n Column(\"name\", String(), nullable=False),\n Column(\"email\", String()),\n Column(\"manager_id\", ForeignKey(\"users.id\", name=\"fk_users_manager_id_x_users_id\", use_alter=True))\n)\n```\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Databricks SQLAlchemy plugin for Python",
"version": "2.0.3",
"project_urls": {
"Bug Tracker": "https://github.com/databricks/databricks-sqlalchemy/issues",
"Homepage": "https://github.com/databricks/databricks-sqlalchemy"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b4891d1ba651946e78933faec7bb8ecf127ed1f819fe2c69b1598b7e8456797e",
"md5": "20081c75be8dfe9f8258b18b0a543ed6",
"sha256": "a5a8544dc185afa150822dada3fa581e7e6286eac5aa873071d5c1eadb3193dd"
},
"downloads": -1,
"filename": "databricks_sqlalchemy-2.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "20081c75be8dfe9f8258b18b0a543ed6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0.0,>=3.8.0",
"size": 110509,
"upload_time": "2024-11-14T09:03:27",
"upload_time_iso_8601": "2024-11-14T09:03:27.958861Z",
"url": "https://files.pythonhosted.org/packages/b4/89/1d1ba651946e78933faec7bb8ecf127ed1f819fe2c69b1598b7e8456797e/databricks_sqlalchemy-2.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dec33b24ba0fc54198c500f166210ef5cf7fc2b1175c0978fa46be1f82771175",
"md5": "f7bae6866752839d8bafbd177a022bc1",
"sha256": "02d1e43147480ad779cf0b8b472355639112627ec7447b24cad20101e9faffc5"
},
"downloads": -1,
"filename": "databricks_sqlalchemy-2.0.3.tar.gz",
"has_sig": false,
"md5_digest": "f7bae6866752839d8bafbd177a022bc1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0.0,>=3.8.0",
"size": 105805,
"upload_time": "2024-11-14T09:03:30",
"upload_time_iso_8601": "2024-11-14T09:03:30.275632Z",
"url": "https://files.pythonhosted.org/packages/de/c3/3b24ba0fc54198c500f166210ef5cf7fc2b1175c0978fa46be1f82771175/databricks_sqlalchemy-2.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-14 09:03:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "databricks",
"github_project": "databricks-sqlalchemy",
"github_not_found": true,
"lcname": "databricks-sqlalchemy"
}