# PyPI Project 'databricks-mosaic-gdal' Overview
> Current version is 3.4.3 (to match GDAL).
This is a filetree (vs apt based) drop-in packaging of GDAL with Java Bindings for Ubuntu 20.04 (Focal Fossa) which is used by [Databricks Runtime](https://docs.databricks.com/release-notes/runtime/releases.html) (DBR) 11.3+.
1. `gdal-3.4.3-filetree.tar.xz` is ~50MB - it is extracted with `tar -xf gdal-3.4.3-filetree.tar.xz -C /`
2. `gdal-3.4.3.-symlinks.tar.xz` is ~19MB - it is extracted with `tar -xhf gdal-3.4.3-symlinks.tar.xz -C /`
An [init script](https://docs.databricks.com/clusters/init-scripts.html) based approach is provided at [gdal-3.4.3-filetree-init.sh](https://github.com/databrickslabs/mosaic/blob/main/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh) for managing install of the tarballs across a DBR cluster with install from PyPI.
Starting in version `3.4.3.post5`, handling tarball unpacking without requiring an init script through [cluster library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library) vs a notebook scoped is being tested. __Use the init script if you run into issues.__
__Requirements:__
* This will not install the GDAL tarballs if not on Databricks Runtime >= 11.3
* Added benefit of not accidentally installing on your local machine which would be bad since it unpacks to root (`/`)
* This is deployed to pypi without a wheel (WHL) to avoid uneccessary duplication of the resource files and to trigger `setup.py` building on install
__Notes:__
* This is a very specific packaging for GDAL + dependencies which removes any libraries that are already provided by DBR, so it __will not be not useful outside Databricks.__
* It additionally includes GDAL shared objects (`.so`) for Java Bindings, GDAL 3.4.3 Python bindings, and tweak for OSGEO as currently supplied by [UbuntuGIS PPA](https://launchpad.net/~ubuntugis/+archive/ubuntu/ubuntugis-unstable) based init script [install-gdal-databricks.sh](https://github.com/databrickslabs/mosaic/blob/main/src/main/resources/scripts/install-gdal-databricks.sh) provided by Mosaic. __This install replaces the existing way on Mosaic, so choose one or the other.__
* The GDAL JAR for 3.4 is not included but is provided by Mosaic itself and added to your Databricks cluster as part of the [enable_gdal](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#enable-gdal-for-a-notebook) called when configuring Mosaic for GDAL. Separately, the JAR could be added as a [cluster-installed library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library), e.g. through Maven coordinates `org.gdal:gdal:3.4.0` from [mvnrepository](https://mvnrepository.com/artifact/org.gdal/gdal/3.4.0).
* Don't use versions below `3.4.3.post5` - __This is still being developed / refined.__
__Install:__
_Install 'databricks-mosaic-gdal' from PyPI as a [cluster library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library) vs a notebook scoped._
_Install 'databricks-mosaic' from PyPI in a notebook or as a cluster library (handles JARs as well):_
```
# install in notebook e.g.
%pip install databricks-mosaic
```
_Then you can initialize:_
```
import mosaic as mos
mos.enable_mosaic(spark, dbutils)
mos.enable_gdal(spark)
```
_Check Mosaic [GDAL Installation Guide](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#) for updated instructions on/around APR 2023._
Raw data
{
"_id": null,
"home_page": "https://github.com/databrickslabs/mosaic/tree/main/modules/python/gdal_package",
"name": "databricks-mosaic-gdal",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": "",
"keywords": "",
"author": "Databricks",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/7c/c4/80b1a791050d4a9254ad471f829e332599f6d80335bfa2aa09605b164463/databricks-mosaic-gdal-3.4.3.post5.tar.gz",
"platform": null,
"description": "# PyPI Project 'databricks-mosaic-gdal' Overview\n\n> Current version is 3.4.3 (to match GDAL).\n\nThis is a filetree (vs apt based) drop-in packaging of GDAL with Java Bindings for Ubuntu 20.04 (Focal Fossa) which is used by [Databricks Runtime](https://docs.databricks.com/release-notes/runtime/releases.html) (DBR) 11.3+. \n\n1. `gdal-3.4.3-filetree.tar.xz` is ~50MB - it is extracted with `tar -xf gdal-3.4.3-filetree.tar.xz -C /`\n2. `gdal-3.4.3.-symlinks.tar.xz` is ~19MB - it is extracted with `tar -xhf gdal-3.4.3-symlinks.tar.xz -C /`\n\nAn [init script](https://docs.databricks.com/clusters/init-scripts.html) based approach is provided at [gdal-3.4.3-filetree-init.sh](https://github.com/databrickslabs/mosaic/blob/main/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh) for managing install of the tarballs across a DBR cluster with install from PyPI.\n\nStarting in version `3.4.3.post5`, handling tarball unpacking without requiring an init script through [cluster library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library) vs a notebook scoped is being tested. __Use the init script if you run into issues.__\n\n __Requirements:__\n\n* This will not install the GDAL tarballs if not on Databricks Runtime >= 11.3\n* Added benefit of not accidentally installing on your local machine which would be bad since it unpacks to root (`/`)\n* This is deployed to pypi without a wheel (WHL) to avoid uneccessary duplication of the resource files and to trigger `setup.py` building on install\n\n __Notes:__\n\n* This is a very specific packaging for GDAL + dependencies which removes any libraries that are already provided by DBR, so it __will not be not useful outside Databricks.__\n* It additionally includes GDAL shared objects (`.so`) for Java Bindings, GDAL 3.4.3 Python bindings, and tweak for OSGEO as currently supplied by [UbuntuGIS PPA](https://launchpad.net/~ubuntugis/+archive/ubuntu/ubuntugis-unstable) based init script [install-gdal-databricks.sh](https://github.com/databrickslabs/mosaic/blob/main/src/main/resources/scripts/install-gdal-databricks.sh) provided by Mosaic. __This install replaces the existing way on Mosaic, so choose one or the other.__\n* The GDAL JAR for 3.4 is not included but is provided by Mosaic itself and added to your Databricks cluster as part of the [enable_gdal](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#enable-gdal-for-a-notebook) called when configuring Mosaic for GDAL. Separately, the JAR could be added as a [cluster-installed library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library), e.g. through Maven coordinates `org.gdal:gdal:3.4.0` from [mvnrepository](https://mvnrepository.com/artifact/org.gdal/gdal/3.4.0).\n* Don't use versions below `3.4.3.post5` - __This is still being developed / refined.__\n\n__Install:__\n\n_Install 'databricks-mosaic-gdal' from PyPI as a [cluster library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library) vs a notebook scoped._\n\n_Install 'databricks-mosaic' from PyPI in a notebook or as a cluster library (handles JARs as well):_\n\n```\n# install in notebook e.g.\n%pip install databricks-mosaic\n```\n\n_Then you can initialize:_\n\n```\nimport mosaic as mos\nmos.enable_mosaic(spark, dbutils)\nmos.enable_gdal(spark)\n```\n\n_Check Mosaic [GDAL Installation Guide](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#) for updated instructions on/around APR 2023._\n",
"bugtrack_url": null,
"license": "Databricks License",
"summary": "GDAL install with Java Bindings for Databricks Runtime 11+",
"version": "3.4.3.post5",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7cc480b1a791050d4a9254ad471f829e332599f6d80335bfa2aa09605b164463",
"md5": "e9f2023deab61ba459f0882c4574cd2e",
"sha256": "a39edf2e02400ff470a133a8737a354ee93dca031dca3af78a547e37bac3b080"
},
"downloads": -1,
"filename": "databricks-mosaic-gdal-3.4.3.post5.tar.gz",
"has_sig": false,
"md5_digest": "e9f2023deab61ba459f0882c4574cd2e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 71421132,
"upload_time": "2023-04-04T12:37:25",
"upload_time_iso_8601": "2023-04-04T12:37:25.148934Z",
"url": "https://files.pythonhosted.org/packages/7c/c4/80b1a791050d4a9254ad471f829e332599f6d80335bfa2aa09605b164463/databricks-mosaic-gdal-3.4.3.post5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-04 12:37:25",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "databricks-mosaic-gdal"
}