# geofeather
[![Build Status](https://travis-ci.org/brendan-ward/geofeather.svg?branch=master)](https://travis-ci.org/brendan-ward/geofeather)
[![Coverage Status](https://coveralls.io/repos/github/brendan-ward/geofeather/badge.svg?branch=master)](https://coveralls.io/github/brendan-ward/geofeather?branch=master)
A faster file-based format for geometries with `geopandas`.
This project capitalizes on the very fast [`feather`](https://github.com/wesm/feather) file format to store geometry (points, lines, polygons) data for interoperability with `geopandas`.
[Introductory post](https://medium.com/@brendan_ward/introducing-geofeather-a-python-library-for-faster-geospatial-i-o-with-geopandas-341120d45ee5).
## Why does this exist?
This project exists because reading and writing standard spatial formats (e.g., shapefile) in `geopandas` is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.
In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via `.to_file()` on a `GeoDataFrame`.
We see about 2x faster reads compared to geopandas `read_file()` function.
## How does it work?
The `feather` format works brilliantly for standard `pandas` data frames. In order to leverage the `feather` format, we simply convert the geometry data from `shapely` objects into Well Known Binary ([WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)) format, and then store that column as raw bytes.
We store the coordinate reference system using JSON format in a sidecar file `.crs`.
## Installation
Available on PyPi at: https://pypi.org/project/geofeather/
`pip install geofeather`
## Usage
### Write
Given an existing `GeoDataFrame` `my_gdf`, pass this into `to_geofeather`:
```
to_geofeather(my_gdf, 'test.feather')
```
### Read
```
my_gdf = from_geofeather('test.feather')
```
### TEMPORARY
[`pygeos`](https://github.com/pygeos/pygeos) provides much faster operations of geospatial operations over arrays of geospatial data.
`geopandas` is in the process of migrating to using `pygeos` geometries as its internal data storage instead of `shapely` objects.
Until `pygeos` is fully integrated, there are shims in `geofeather` to support interoperability with pandas DataFrames containing `pygeos` geometries. If you are already using `pygeos` against data you read from `geofeather`, using the following shims will generate 3-7x speedups reading and writing data compared to `geofeather` reading into GeoDataFrames.
Internally, the feather file is identical to the one created above.
`pygeos` is required in order to use this functionality.
WARNING: this will be deprecated as soon as `pygeos` is integrated into `geopandas`.
```
from geofeather.pygeos import to_geofeather, from_geofeather
# given a DataFrame df containing pygeos geometries in 'geometry' column
# and a crs object
to_geofeather(df, 'test.feather', crs=crs)
df = from_geofeather('test.geofeather')
```
Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above `from_geofeather`
## Indexes
Right now, indexes are not supported in `feather` files. In order to get around this, simply reset your index before calling `to_geofeather`.
## Changes
### 0.3.0
- allow serializing to / from pandas DataFrames containing `pygeos` geometries (see notes above).
- use new CRS object in geopandas data frames (#4)
- dropped `to_shp`; use geopandas `to_file()` instead.
### 0.2.0
- allow reading a subset of columns from a feather file
- store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)
### 0.1.0
- Initial release
## Credits
Everything that makes this fast is due to the hard work of contributors to `pyarrow`, `geopandas`, and `shapely`.
Raw data
{
"_id": null,
"home_page": "https://github.com/brendan-ward/geofeather",
"name": "geofeather",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Brendan C. Ward",
"author_email": "bcward@astutespruce.com",
"download_url": "https://files.pythonhosted.org/packages/b6/24/c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab/geofeather-0.3.0.tar.gz",
"platform": "",
"description": "# geofeather\n\n[![Build Status](https://travis-ci.org/brendan-ward/geofeather.svg?branch=master)](https://travis-ci.org/brendan-ward/geofeather)\n[![Coverage Status](https://coveralls.io/repos/github/brendan-ward/geofeather/badge.svg?branch=master)](https://coveralls.io/github/brendan-ward/geofeather?branch=master)\n\nA faster file-based format for geometries with `geopandas`.\n\nThis project capitalizes on the very fast [`feather`](https://github.com/wesm/feather) file format to store geometry (points, lines, polygons) data for interoperability with `geopandas`.\n\n[Introductory post](https://medium.com/@brendan_ward/introducing-geofeather-a-python-library-for-faster-geospatial-i-o-with-geopandas-341120d45ee5).\n\n## Why does this exist?\n\nThis project exists because reading and writing standard spatial formats (e.g., shapefile) in `geopandas` is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.\n\nIn our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via `.to_file()` on a `GeoDataFrame`.\n\nWe see about 2x faster reads compared to geopandas `read_file()` function.\n\n## How does it work?\n\nThe `feather` format works brilliantly for standard `pandas` data frames. In order to leverage the `feather` format, we simply convert the geometry data from `shapely` objects into Well Known Binary ([WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)) format, and then store that column as raw bytes.\n\nWe store the coordinate reference system using JSON format in a sidecar file `.crs`.\n\n## Installation\n\nAvailable on PyPi at: https://pypi.org/project/geofeather/\n\n`pip install geofeather`\n\n## Usage\n\n### Write\n\nGiven an existing `GeoDataFrame` `my_gdf`, pass this into `to_geofeather`:\n\n```\nto_geofeather(my_gdf, 'test.feather')\n```\n\n### Read\n\n```\nmy_gdf = from_geofeather('test.feather')\n\n```\n\n### TEMPORARY\n\n[`pygeos`](https://github.com/pygeos/pygeos) provides much faster operations of geospatial operations over arrays of geospatial data.\n\n`geopandas` is in the process of migrating to using `pygeos` geometries as its internal data storage instead of `shapely` objects.\n\nUntil `pygeos` is fully integrated, there are shims in `geofeather` to support interoperability with pandas DataFrames containing `pygeos` geometries. If you are already using `pygeos` against data you read from `geofeather`, using the following shims will generate 3-7x speedups reading and writing data compared to `geofeather` reading into GeoDataFrames.\n\nInternally, the feather file is identical to the one created above.\n\n`pygeos` is required in order to use this functionality.\n\nWARNING: this will be deprecated as soon as `pygeos` is integrated into `geopandas`.\n\n```\nfrom geofeather.pygeos import to_geofeather, from_geofeather\n\n# given a DataFrame df containing pygeos geometries in 'geometry' column\n# and a crs object\n\nto_geofeather(df, 'test.feather', crs=crs)\n\ndf = from_geofeather('test.geofeather')\n```\n\nNote: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above `from_geofeather`\n\n## Indexes\n\nRight now, indexes are not supported in `feather` files. In order to get around this, simply reset your index before calling `to_geofeather`.\n\n## Changes\n\n### 0.3.0\n\n- allow serializing to / from pandas DataFrames containing `pygeos` geometries (see notes above).\n- use new CRS object in geopandas data frames (#4)\n- dropped `to_shp`; use geopandas `to_file()` instead.\n\n### 0.2.0\n\n- allow reading a subset of columns from a feather file\n- store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)\n\n### 0.1.0\n\n- Initial release\n\n## Credits\n\nEverything that makes this fast is due to the hard work of contributors to `pyarrow`, `geopandas`, and `shapely`.\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fast file-based format for geometries with Geopandas",
"version": "0.3.0",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "6d4c5f61bdf4068842c224c2a63f3a20",
"sha256": "132a79ef3b31d53fe13287eba31b057d33ced233bf5ed01af25856ec6c60e5ff"
},
"downloads": -1,
"filename": "geofeather-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d4c5f61bdf4068842c224c2a63f3a20",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6389,
"upload_time": "2020-03-23T15:48:09",
"upload_time_iso_8601": "2020-03-23T15:48:09.846185Z",
"url": "https://files.pythonhosted.org/packages/6c/1e/8a0a3b25b2fff01ab834bdc73794e83de1353f73f8ec481a60bcb5b71b00/geofeather-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "b42e2a04a440b4f77e3c927733ce0e20",
"sha256": "5889ebc31c02dd38215884badb3fa20029628088dbe3d5936894d0488ec01fa4"
},
"downloads": -1,
"filename": "geofeather-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "b42e2a04a440b4f77e3c927733ce0e20",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4429,
"upload_time": "2020-03-23T15:48:10",
"upload_time_iso_8601": "2020-03-23T15:48:10.805416Z",
"url": "https://files.pythonhosted.org/packages/b6/24/c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab/geofeather-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-03-23 15:48:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "brendan-ward",
"github_project": "geofeather",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "geofeather"
}