dbxio


Namedbxio JSON
Version 0.5.2 PyPI version JSON
download
home_pagehttps://github.com/Toloka/dbxio
SummaryHigh-level Databricks client
upload_time2024-11-12 14:39:57
maintainerNone
docs_urlNone
authorNikita Yurasov
requires_python<4.0,>=3.9
licenseApache-2.0
keywords python databricks dbx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI - Version](https://img.shields.io/pypi/v/dbxio)](https://pypi.org/project/dbxio/)
[![GitHub Build](https://github.com/Toloka/dbxio/workflows/Tests/badge.svg)](https://github.com/Toloka/dbxio/actions)

[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbxio.svg)](https://pypi.org/project/dbxio/)
[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbxio)](https://pypi.org/project/dbxio/)

[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)

# dbxio: High-level Databricks client

## Overview

**_dbxio_** is a high-level client for Databricks that simplifies working with tables and volumes.
It provides a simple interface for reading and writing data, creating and deleting objects, and running SQL queries and
fetching results.

## Why _dbxio_?

1. **_dbxio_** connects the power of Databricks SQL and Python for local data manipulation.
2. **_dbxio_** provides a simple and intuitive interface for working with Databricks Tables and Volumes.
   Now it's possible to read/write data with just a few lines of code.
3. For large amounts of data, **_dbxio_** uses intermediate object storage of your choice to perform bulk upload later
   (see [COPY INTO](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) for more details).
   So, you can upload any amount of data, and _dbxio_ will take care of synchronizing the data with the table in
   Databricks.

### Alternatives

Currently, we are not aware of any alternatives that offer the same functionality as **_dbxio_**.
If you come across any, we would be interested to learn about them.
Please let us know by opening an issue in our GitHub repository.

---

## Installation

**_dbxio_** requires Python 3.9 or later. You can install **_dbxio_** using pip:

```bash
pip install dbxio
```

## _dbxio_ by Example

```python
import dbxio

client = dbxio.DbxIOClient.from_cluster_settings(
    cluster_type=dbxio.ClusterType.SQL_WAREHOUSE,
    http_path='<YOUR_HTTP_PATH>',
    server_hostname='<YOUR_SERVER_HOSTNAME>',
    settings=dbxio.Settings(cloud_provider=dbxio.CloudProvider.AZURE),
)

# read table
table = list(dbxio.read_table('catalog.schema.table', client=client))

# write table
data = [
    {'col1': 1, 'col2': 'a', 'col3': [1, 2, 3]},
    {'col1': 2, 'col2': 'b', 'col3': [4, 5, 6]},
]
schema = dbxio.TableSchema.from_obj(
    {
        'col1': dbxio.types.IntType(),
        'col2': dbxio.types.StringType(),
        'col3': dbxio.types.ArrayType(dbxio.types.IntType()),
    }
)
dbxio.bulk_write_table(
    dbxio.Table('domain.schema.table', schema=schema),
    data,
    client=client,
    abs_name='blob_storage_name',
    abs_container_name='container_name',
    append=True,
)
```

---

## Cloud Support

**_dbxio_** supports the following cloud providers:

- [x] Azure
- [ ] AWS (in plans)
- [ ] GCP (in plans)

## Project Information

- [Docs](docs/README.md)
- [PyPI](https://pypi.org/project/dbxio/)
- [Contributing](CONTRIBUTING.md)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Toloka/dbxio",
    "name": "dbxio",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "python, databricks, dbx",
    "author": "Nikita Yurasov",
    "author_email": "nikitayurasov@toloka.ai",
    "download_url": "https://files.pythonhosted.org/packages/81/9b/9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c/dbxio-0.5.2.tar.gz",
    "platform": null,
    "description": "[![PyPI - Version](https://img.shields.io/pypi/v/dbxio)](https://pypi.org/project/dbxio/)\n[![GitHub Build](https://github.com/Toloka/dbxio/workflows/Tests/badge.svg)](https://github.com/Toloka/dbxio/actions)\n\n[![License](https://img.shields.io/:license-Apache%202-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0.txt)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbxio.svg)](https://pypi.org/project/dbxio/)\n[![PyPI - Downloads](https://img.shields.io/pepy/dt/dbxio)](https://pypi.org/project/dbxio/)\n\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n\n# dbxio: High-level Databricks client\n\n## Overview\n\n**_dbxio_** is a high-level client for Databricks that simplifies working with tables and volumes.\nIt provides a simple interface for reading and writing data, creating and deleting objects, and running SQL queries and\nfetching results.\n\n## Why _dbxio_?\n\n1. **_dbxio_** connects the power of Databricks SQL and Python for local data manipulation.\n2. **_dbxio_** provides a simple and intuitive interface for working with Databricks Tables and Volumes.\n   Now it's possible to read/write data with just a few lines of code.\n3. For large amounts of data, **_dbxio_** uses intermediate object storage of your choice to perform bulk upload later\n   (see [COPY INTO](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) for more details).\n   So, you can upload any amount of data, and _dbxio_ will take care of synchronizing the data with the table in\n   Databricks.\n\n### Alternatives\n\nCurrently, we are not aware of any alternatives that offer the same functionality as **_dbxio_**.\nIf you come across any, we would be interested to learn about them.\nPlease let us know by opening an issue in our GitHub repository.\n\n---\n\n## Installation\n\n**_dbxio_** requires Python 3.9 or later. You can install **_dbxio_** using pip:\n\n```bash\npip install dbxio\n```\n\n## _dbxio_ by Example\n\n```python\nimport dbxio\n\nclient = dbxio.DbxIOClient.from_cluster_settings(\n    cluster_type=dbxio.ClusterType.SQL_WAREHOUSE,\n    http_path='<YOUR_HTTP_PATH>',\n    server_hostname='<YOUR_SERVER_HOSTNAME>',\n    settings=dbxio.Settings(cloud_provider=dbxio.CloudProvider.AZURE),\n)\n\n# read table\ntable = list(dbxio.read_table('catalog.schema.table', client=client))\n\n# write table\ndata = [\n    {'col1': 1, 'col2': 'a', 'col3': [1, 2, 3]},\n    {'col1': 2, 'col2': 'b', 'col3': [4, 5, 6]},\n]\nschema = dbxio.TableSchema.from_obj(\n    {\n        'col1': dbxio.types.IntType(),\n        'col2': dbxio.types.StringType(),\n        'col3': dbxio.types.ArrayType(dbxio.types.IntType()),\n    }\n)\ndbxio.bulk_write_table(\n    dbxio.Table('domain.schema.table', schema=schema),\n    data,\n    client=client,\n    abs_name='blob_storage_name',\n    abs_container_name='container_name',\n    append=True,\n)\n```\n\n---\n\n## Cloud Support\n\n**_dbxio_** supports the following cloud providers:\n\n- [x] Azure\n- [ ] AWS (in plans)\n- [ ] GCP (in plans)\n\n## Project Information\n\n- [Docs](docs/README.md)\n- [PyPI](https://pypi.org/project/dbxio/)\n- [Contributing](CONTRIBUTING.md)\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "High-level Databricks client",
    "version": "0.5.2",
    "project_urls": {
        "Documentation": "https://github.com/Toloka/dbxio/blob/main/docs/README.md",
        "Homepage": "https://github.com/Toloka/dbxio",
        "Repository": "https://github.com/Toloka/dbxio"
    },
    "split_keywords": [
        "python",
        " databricks",
        " dbx"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "efdddeac49c5e9ffc7ae9d56fb8b45e71c66cf7b6cdf4a96655999290bdd4c83",
                "md5": "293eedb3ad056f66f5989edc27f26eba",
                "sha256": "290578cf14980bbac66ba2aaf9ef69c6d8e703edabfd64cf13f6ec392306b7c0"
            },
            "downloads": -1,
            "filename": "dbxio-0.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "293eedb3ad056f66f5989edc27f26eba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 46864,
            "upload_time": "2024-11-12T14:39:56",
            "upload_time_iso_8601": "2024-11-12T14:39:56.266796Z",
            "url": "https://files.pythonhosted.org/packages/ef/dd/deac49c5e9ffc7ae9d56fb8b45e71c66cf7b6cdf4a96655999290bdd4c83/dbxio-0.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "819b9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c",
                "md5": "ba9aa1546b88fe16efc0d342e919be76",
                "sha256": "cfe2aa0a513ed74244d6758e2d2fbeaecf26ade989bede7453a3797f2a3c469d"
            },
            "downloads": -1,
            "filename": "dbxio-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ba9aa1546b88fe16efc0d342e919be76",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 36207,
            "upload_time": "2024-11-12T14:39:57",
            "upload_time_iso_8601": "2024-11-12T14:39:57.869402Z",
            "url": "https://files.pythonhosted.org/packages/81/9b/9e99d2cd8fb715c61047c761e4538746384e6fabc9281035b202e1b1da5c/dbxio-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-12 14:39:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Toloka",
    "github_project": "dbxio",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dbxio"
}
        
Elapsed time: 0.39763s