[![PyPI - Version](https://img.shields.io/pypi/v/dbxio)](https://pypi.org/project/dbxio/)
[![GitHub Build](https://github.com/Toloka/dbxio/workflows/Tests/badge.svg)](https://github.com/Toloka/dbxio/actions)
[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbxio.svg)](https://pypi.org/project/dbxio/)
[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbxio)](https://pypi.org/project/dbxio/)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
# dbxio: High-level Databricks client
## Overview
**_dbxio_** is a high-level client for Databricks that simplifies working with tables and volumes.
It provides a simple interface for reading and writing data, creating and deleting objects, and running SQL queries and
fetching results.
## Why _dbxio_?
1. **_dbxio_** connects the power of Databricks SQL and Python for local data manipulation.
2. **_dbxio_** provides a simple and intuitive interface for working with Databricks Tables and Volumes.
Now it's possible to read/write data with just a few lines of code.
3. For large amounts of data, **_dbxio_** uses intermediate object storage of your choice to perform bulk upload later
(see [COPY INTO](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) for more details).
So, you can upload any amount of data, and _dbxio_ will take care of synchronizing the data with the table in
Databricks.
### Alternatives
Currently, we are not aware of any alternatives that offer the same functionality as **_dbxio_**.
If you come across any, we would be interested to learn about them.
Please let us know by opening an issue in our GitHub repository.
---
## Installation
**_dbxio_** requires Python 3.9 or later. You can install **_dbxio_** using pip:
```bash
pip install dbxio
```
## _dbxio_ by Example
```python
import dbxio
client = dbxio.DbxIOClient.from_cluster_settings(
cluster_type=dbxio.ClusterType.SQL_WAREHOUSE,
http_path='<YOUR_HTTP_PATH>',
server_hostname='<YOUR_SERVER_HOSTNAME>',
settings=dbxio.Settings(cloud_provider=dbxio.CloudProvider.AZURE),
)
# read table
table = list(dbxio.read_table('catalog.schema.table', client=client))
# write table
data = [
{'col1': 1, 'col2': 'a', 'col3': [1, 2, 3]},
{'col1': 2, 'col2': 'b', 'col3': [4, 5, 6]},
]
schema = dbxio.TableSchema.from_obj(
{
'col1': dbxio.types.IntType(),
'col2': dbxio.types.StringType(),
'col3': dbxio.types.ArrayType(dbxio.types.IntType()),
}
)
dbxio.bulk_write_table(
dbxio.Table('domain.schema.table', schema=schema),
data,
client=client,
abs_name='blob_storage_name',
abs_container_name='container_name',
append=True,
)
```
---
## Cloud Support
**_dbxio_** supports the following cloud providers:
- [x] Azure
- [ ] AWS (in plans)
- [ ] GCP (in plans)
## Project Information
- [Docs](docs/README.md)
- [PyPI](https://pypi.org/project/dbxio/)
- [Contributing](CONTRIBUTING.md)
Raw data
{
"_id": null,
"home_page": "https://github.com/Toloka/dbxio",
"name": "dbxio",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "python, databricks, dbx",
"author": "Nikita Yurasov",
"author_email": "nikitayurasov@toloka.ai",
"download_url": "https://files.pythonhosted.org/packages/81/9b/9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c/dbxio-0.5.2.tar.gz",
"platform": null,
"description": "[![PyPI - Version](https://img.shields.io/pypi/v/dbxio)](https://pypi.org/project/dbxio/)\n[![GitHub Build](https://github.com/Toloka/dbxio/workflows/Tests/badge.svg)](https://github.com/Toloka/dbxio/actions)\n\n[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbxio.svg)](https://pypi.org/project/dbxio/)\n[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbxio)](https://pypi.org/project/dbxio/)\n\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n\n# dbxio: High-level Databricks client\n\n## Overview\n\n**_dbxio_** is a high-level client for Databricks that simplifies working with tables and volumes.\nIt provides a simple interface for reading and writing data, creating and deleting objects, and running SQL queries and\nfetching results.\n\n## Why _dbxio_?\n\n1. **_dbxio_** connects the power of Databricks SQL and Python for local data manipulation.\n2. **_dbxio_** provides a simple and intuitive interface for working with Databricks Tables and Volumes.\n Now it's possible to read/write data with just a few lines of code.\n3. For large amounts of data, **_dbxio_** uses intermediate object storage of your choice to perform bulk upload later\n (see [COPY INTO](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) for more details).\n So, you can upload any amount of data, and _dbxio_ will take care of synchronizing the data with the table in\n Databricks.\n\n### Alternatives\n\nCurrently, we are not aware of any alternatives that offer the same functionality as **_dbxio_**.\nIf you come across any, we would be interested to learn about them.\nPlease let us know by opening an issue in our GitHub repository.\n\n---\n\n## Installation\n\n**_dbxio_** requires Python 3.9 or later. You can install **_dbxio_** using pip:\n\n```bash\npip install dbxio\n```\n\n## _dbxio_ by Example\n\n```python\nimport dbxio\n\nclient = dbxio.DbxIOClient.from_cluster_settings(\n cluster_type=dbxio.ClusterType.SQL_WAREHOUSE,\n http_path='<YOUR_HTTP_PATH>',\n server_hostname='<YOUR_SERVER_HOSTNAME>',\n settings=dbxio.Settings(cloud_provider=dbxio.CloudProvider.AZURE),\n)\n\n# read table\ntable = list(dbxio.read_table('catalog.schema.table', client=client))\n\n# write table\ndata = [\n {'col1': 1, 'col2': 'a', 'col3': [1, 2, 3]},\n {'col1': 2, 'col2': 'b', 'col3': [4, 5, 6]},\n]\nschema = dbxio.TableSchema.from_obj(\n {\n 'col1': dbxio.types.IntType(),\n 'col2': dbxio.types.StringType(),\n 'col3': dbxio.types.ArrayType(dbxio.types.IntType()),\n }\n)\ndbxio.bulk_write_table(\n dbxio.Table('domain.schema.table', schema=schema),\n data,\n client=client,\n abs_name='blob_storage_name',\n abs_container_name='container_name',\n append=True,\n)\n```\n\n---\n\n## Cloud Support\n\n**_dbxio_** supports the following cloud providers:\n\n- [x] Azure\n- [ ] AWS (in plans)\n- [ ] GCP (in plans)\n\n## Project Information\n\n- [Docs](docs/README.md)\n- [PyPI](https://pypi.org/project/dbxio/)\n- [Contributing](CONTRIBUTING.md)\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "High-level Databricks client",
"version": "0.5.2",
"project_urls": {
"Documentation": "https://github.com/Toloka/dbxio/blob/main/docs/README.md",
"Homepage": "https://github.com/Toloka/dbxio",
"Repository": "https://github.com/Toloka/dbxio"
},
"split_keywords": [
"python",
" databricks",
" dbx"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "efdddeac49c5e9ffc7ae9d56fb8b45e71c66cf7b6cdf4a96655999290bdd4c83",
"md5": "293eedb3ad056f66f5989edc27f26eba",
"sha256": "290578cf14980bbac66ba2aaf9ef69c6d8e703edabfd64cf13f6ec392306b7c0"
},
"downloads": -1,
"filename": "dbxio-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "293eedb3ad056f66f5989edc27f26eba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 46864,
"upload_time": "2024-11-12T14:39:56",
"upload_time_iso_8601": "2024-11-12T14:39:56.266796Z",
"url": "https://files.pythonhosted.org/packages/ef/dd/deac49c5e9ffc7ae9d56fb8b45e71c66cf7b6cdf4a96655999290bdd4c83/dbxio-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "819b9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c",
"md5": "ba9aa1546b88fe16efc0d342e919be76",
"sha256": "cfe2aa0a513ed74244d6758e2d2fbeaecf26ade989bede7453a3797f2a3c469d"
},
"downloads": -1,
"filename": "dbxio-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "ba9aa1546b88fe16efc0d342e919be76",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 36207,
"upload_time": "2024-11-12T14:39:57",
"upload_time_iso_8601": "2024-11-12T14:39:57.869402Z",
"url": "https://files.pythonhosted.org/packages/81/9b/9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c/dbxio-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-12 14:39:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Toloka",
"github_project": "dbxio",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "dbxio"
}