geofeather


Namegeofeather JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/brendan-ward/geofeather
SummaryFast file-based format for geometries with Geopandas
upload_time2020-03-23 15:48:10
maintainer
docs_urlNone
authorBrendan C. Ward
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # geofeather

[![Build Status](https://travis-ci.org/brendan-ward/geofeather.svg?branch=master)](https://travis-ci.org/brendan-ward/geofeather)
[![Coverage Status](https://coveralls.io/repos/github/brendan-ward/geofeather/badge.svg?branch=master)](https://coveralls.io/github/brendan-ward/geofeather?branch=master)

A faster file-based format for geometries with `geopandas`.

This project capitalizes on the very fast [`feather`](https://github.com/wesm/feather) file format to store geometry (points, lines, polygons) data for interoperability with `geopandas`.

[Introductory post](https://medium.com/@brendan_ward/introducing-geofeather-a-python-library-for-faster-geospatial-i-o-with-geopandas-341120d45ee5).

## Why does this exist?

This project exists because reading and writing standard spatial formats (e.g., shapefile) in `geopandas` is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.

In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via `.to_file()` on a `GeoDataFrame`.

We see about 2x faster reads compared to geopandas `read_file()` function.

## How does it work?

The `feather` format works brilliantly for standard `pandas` data frames. In order to leverage the `feather` format, we simply convert the geometry data from `shapely` objects into Well Known Binary ([WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)) format, and then store that column as raw bytes.

We store the coordinate reference system using JSON format in a sidecar file `.crs`.

## Installation

Available on PyPi at: https://pypi.org/project/geofeather/

`pip install geofeather`

## Usage

### Write

Given an existing `GeoDataFrame` `my_gdf`, pass this into `to_geofeather`:

```
to_geofeather(my_gdf, 'test.feather')
```

### Read

```
my_gdf = from_geofeather('test.feather')

```

### TEMPORARY

[`pygeos`](https://github.com/pygeos/pygeos) provides much faster operations of geospatial operations over arrays of geospatial data.

`geopandas` is in the process of migrating to using `pygeos` geometries as its internal data storage instead of `shapely` objects.

Until `pygeos` is fully integrated, there are shims in `geofeather` to support interoperability with pandas DataFrames containing `pygeos` geometries. If you are already using `pygeos` against data you read from `geofeather`, using the following shims will generate 3-7x speedups reading and writing data compared to `geofeather` reading into GeoDataFrames.

Internally, the feather file is identical to the one created above.

`pygeos` is required in order to use this functionality.

WARNING: this will be deprecated as soon as `pygeos` is integrated into `geopandas`.

```
from geofeather.pygeos import to_geofeather, from_geofeather

# given a DataFrame df containing pygeos geometries in 'geometry' column
# and a crs object

to_geofeather(df, 'test.feather', crs=crs)

df = from_geofeather('test.geofeather')
```

Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above `from_geofeather`

## Indexes

Right now, indexes are not supported in `feather` files. In order to get around this, simply reset your index before calling `to_geofeather`.

## Changes

### 0.3.0

-   allow serializing to / from pandas DataFrames containing `pygeos` geometries (see notes above).
-   use new CRS object in geopandas data frames (#4)
-   dropped `to_shp`; use geopandas `to_file()` instead.

### 0.2.0

-   allow reading a subset of columns from a feather file
-   store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)

### 0.1.0

-   Initial release

## Credits

Everything that makes this fast is due to the hard work of contributors to `pyarrow`, `geopandas`, and `shapely`.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/brendan-ward/geofeather",
    "name": "geofeather",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Brendan C. Ward",
    "author_email": "bcward@astutespruce.com",
    "download_url": "https://files.pythonhosted.org/packages/b6/24/c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab/geofeather-0.3.0.tar.gz",
    "platform": "",
    "description": "# geofeather\n\n[![Build Status](https://travis-ci.org/brendan-ward/geofeather.svg?branch=master)](https://travis-ci.org/brendan-ward/geofeather)\n[![Coverage Status](https://coveralls.io/repos/github/brendan-ward/geofeather/badge.svg?branch=master)](https://coveralls.io/github/brendan-ward/geofeather?branch=master)\n\nA faster file-based format for geometries with `geopandas`.\n\nThis project capitalizes on the very fast [`feather`](https://github.com/wesm/feather) file format to store geometry (points, lines, polygons) data for interoperability with `geopandas`.\n\n[Introductory post](https://medium.com/@brendan_ward/introducing-geofeather-a-python-library-for-faster-geospatial-i-o-with-geopandas-341120d45ee5).\n\n## Why does this exist?\n\nThis project exists because reading and writing standard spatial formats (e.g., shapefile) in `geopandas` is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.\n\nIn our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via `.to_file()` on a `GeoDataFrame`.\n\nWe see about 2x faster reads compared to geopandas `read_file()` function.\n\n## How does it work?\n\nThe `feather` format works brilliantly for standard `pandas` data frames. In order to leverage the `feather` format, we simply convert the geometry data from `shapely` objects into Well Known Binary ([WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)) format, and then store that column as raw bytes.\n\nWe store the coordinate reference system using JSON format in a sidecar file `.crs`.\n\n## Installation\n\nAvailable on PyPi at: https://pypi.org/project/geofeather/\n\n`pip install geofeather`\n\n## Usage\n\n### Write\n\nGiven an existing `GeoDataFrame` `my_gdf`, pass this into `to_geofeather`:\n\n```\nto_geofeather(my_gdf, 'test.feather')\n```\n\n### Read\n\n```\nmy_gdf = from_geofeather('test.feather')\n\n```\n\n### TEMPORARY\n\n[`pygeos`](https://github.com/pygeos/pygeos) provides much faster operations of geospatial operations over arrays of geospatial data.\n\n`geopandas` is in the process of migrating to using `pygeos` geometries as its internal data storage instead of `shapely` objects.\n\nUntil `pygeos` is fully integrated, there are shims in `geofeather` to support interoperability with pandas DataFrames containing `pygeos` geometries. If you are already using `pygeos` against data you read from `geofeather`, using the following shims will generate 3-7x speedups reading and writing data compared to `geofeather` reading into GeoDataFrames.\n\nInternally, the feather file is identical to the one created above.\n\n`pygeos` is required in order to use this functionality.\n\nWARNING: this will be deprecated as soon as `pygeos` is integrated into `geopandas`.\n\n```\nfrom geofeather.pygeos import to_geofeather, from_geofeather\n\n# given a DataFrame df containing pygeos geometries in 'geometry' column\n# and a crs object\n\nto_geofeather(df, 'test.feather', crs=crs)\n\ndf = from_geofeather('test.geofeather')\n```\n\nNote: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above `from_geofeather`\n\n## Indexes\n\nRight now, indexes are not supported in `feather` files. In order to get around this, simply reset your index before calling `to_geofeather`.\n\n## Changes\n\n### 0.3.0\n\n-   allow serializing to / from pandas DataFrames containing `pygeos` geometries (see notes above).\n-   use new CRS object in geopandas data frames (#4)\n-   dropped `to_shp`; use geopandas `to_file()` instead.\n\n### 0.2.0\n\n-   allow reading a subset of columns from a feather file\n-   store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)\n\n### 0.1.0\n\n-   Initial release\n\n## Credits\n\nEverything that makes this fast is due to the hard work of contributors to `pyarrow`, `geopandas`, and `shapely`.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast file-based format for geometries with Geopandas",
    "version": "0.3.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "6d4c5f61bdf4068842c224c2a63f3a20",
                "sha256": "132a79ef3b31d53fe13287eba31b057d33ced233bf5ed01af25856ec6c60e5ff"
            },
            "downloads": -1,
            "filename": "geofeather-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d4c5f61bdf4068842c224c2a63f3a20",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6389,
            "upload_time": "2020-03-23T15:48:09",
            "upload_time_iso_8601": "2020-03-23T15:48:09.846185Z",
            "url": "https://files.pythonhosted.org/packages/6c/1e/8a0a3b25b2fff01ab834bdc73794e83de1353f73f8ec481a60bcb5b71b00/geofeather-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "b42e2a04a440b4f77e3c927733ce0e20",
                "sha256": "5889ebc31c02dd38215884badb3fa20029628088dbe3d5936894d0488ec01fa4"
            },
            "downloads": -1,
            "filename": "geofeather-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b42e2a04a440b4f77e3c927733ce0e20",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4429,
            "upload_time": "2020-03-23T15:48:10",
            "upload_time_iso_8601": "2020-03-23T15:48:10.805416Z",
            "url": "https://files.pythonhosted.org/packages/b6/24/c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab/geofeather-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-03-23 15:48:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "brendan-ward",
    "github_project": "geofeather",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "geofeather"
}
        
Elapsed time: 0.04637s