geobatchpy


Namegeobatchpy JSON
Version 0.2.3 PyPI version JSON
download
home_pagehttps://github.com/huels-originals/geobatchpy
SummaryA CLI and Python Client for the Geoapify API.
upload_time2023-06-03 08:35:08
maintainer
docs_urlNone
authorPaul Kinsvater
requires_python>=3.7.1,<4.0
licenseMIT
keywords geoapify geocoding openstreetmap geojson
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # A CLI and Python Client for the Geoapify API

We have been using the Geoapify API to **geocode millions of location records** for data validation and analytics. We built
this package to make this process comfortable using Python and the command line.

Why Geoapify and this package may also be a good fit for you:

- You need to batch process large numbers of location records (geocode, reverse geocode, places & details).
- The license must support commercial use without restrictions.
- It needs to be cheap (or even for free if you don't need more than 6k addresses per day).

Sign up at [geoapify.com](https://geoapify.com/) and start with their free plan of 3k credits per day which translates
to up to 6k address geocodings.

## Install our package with `pip`

This package is available on the public PyPI:

```shell
pip install geobatchpy
```

## Examples

See our documentation at [geobatchpy.readthedocs.io](https://geobatchpy.readthedocs.io/en/latest/) for a growing number of
comprehensive example use cases. Below we illustrate both, the Python API and the CLI, for a tiny batch geocoding
example.

### A simple batch geocoding example using the Python API

Below we geocode multiple addresses in a single batch. There are two ways how we can provide the location data as input.
Either we use a list of strings, one string per address. These are then taken as free text searches. Or we provide
structured input as a list of dictionaries, again one per address. See the
[Geoapify API documentation](https://apidocs.geoapify.com/) for a complete list of address attributes accepted by the
geocoding services. Use the optional `parameters` dictionary if all your addresses have an attribute in common. E.g.,
below we request results in French.

```python
from geobatchpy import Client

client = Client(api_key='<your-api-key>')

addresses = ['Hülser Markt 1, 47839 Krefeld',
             'DB Schenker, Essen, Germany',
             'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']

# see the geoapify.com API docs for more optional parameters
res = client.batch.geocode(locations=addresses, parameters={'lang': 'fr'}, simplify_output=True)
```

Alternatively you can provide a list of dictionaries, with every address in a structured form. And if you still need
the free text search for some, you can do this with the `'text'` attribute. Here is the same example, with the first
two address translated to structured form:

```python
addresses = [{'city': 'Krefeld', 'street': 'Hülser Markt', 'housenumber': 1, 'postcode': '47839'},
             {'name': 'DB Schenker', 'city': 'Essen', 'country': 'Germany'},
             {'text': 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen'}]
```

```python
# Showing the first of three result sets: res[0]
{
    "query": {
        "text": "Hülser Markt 1, 47839 Krefeld",
        "parsed": {
            "housenumber": "1",
            "street": "hülser markt",
            "postcode": "47839",
            "city": "krefeld",
            "expected_type": "building",
        },
    },
    "datasource": {
        "sourcename": "openstreetmap",
        "attribution": "© OpenStreetMap contributors",
        "license": "Open Database License",
        "url": "https://www.openstreetmap.org/copyright",
    },
    "name": "Metzgerei Etteldorf",
    "housenumber": "1",
    "street": "Hülser Markt",
    "suburb": "Hüls",
    "city": "Krefeld",
    "state": "Rhénanie-du-Nord-Westphalie",
    "postcode": "47839",
    "country": "Allemagne",
    "country_code": "de",
    "lon": 6.510696417033254,
    "lat": 51.373026800000005,
    "formatted": "Metzgerei Etteldorf, Hülser Markt 1, 47839 Krefeld, Allemagne",
    "address_line1": "Metzgerei Etteldorf",
    "address_line2": "Hülser Markt 1, 47839 Krefeld, Allemagne",
    "category": "commercial.food_and_drink.butcher",
    "result_type": "amenity",
    "rank": {
        "importance": 0.31100000000000005,
        "popularity": 5.585340759145855,
        "confidence": 1,
        "confidence_city_level": 1,
        "confidence_street_level": 1,
        "match_type": "inner_part",
    },
    "place_id": "516b5e6500f40a1a40590a449957bfaf4940f00102f9010ecff70d00000000c002019203134d65747a676572656920457474656c646f7266",
}
```

### The same batch geocoding example using the CLI

We built the `geoapify` command line interface to make batch processing large numbers of records more comfortable.

Steps:
1. Prepare a JSON file as input.
2. Use `geoapify post-batch-jobs` to submit one or more jobs to the Geoapify servers.
3. Use `geoapify monitor-batch-jobs` for monitoring progress and data retrieval.

```python
# Step 1 - written in Python:
from geobatchpy.batch import parse_geocoding_inputs
from geobatchpy.utils import write_data_to_json_file

addresses = ['Hülser Markt 1, 47839 Krefeld',
             'DB Schenker, Essen, Germany',
             'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']

data = {
    'api': '/v1/geocode/search',  # see the Geoapify API docs for other APIs that work with batch processing
    'inputs': parse_geocoding_inputs(locations=addresses),
    'batch_len': 2,  # optional - will put first two addresses in batch 1, last address in batch 2
    'id': 'my-batch-geocoding-job'  # optional - a reference which will be reused in the output file
}

write_data_to_json_file(data=data, file_path='<path-data-in>')
```

The following command submits one or more jobs and stores job URLs to disk. Those URLs are required to monitor
and retrieve results.

```shell
geobatch submit <path-data-in> <path-post-data-out> --api-key <your-key>
```

You can omit the `--api-key` option if you set the `GEOAPIFY_KEY` environment variable. Next we start monitoring
progress:

```shell
geobatch receive <path-post-data-out> <path-results-data-out> --api-key <your-key>
```

We can abort the monitoring at any time and restart later - provided the jobs still are in the cache of
Geoapify servers (24 hours).

## References and further reading

- [geoapify.com API documentation](https://apidocs.geoapify.com/)
- [Towards Data Science - Deduplicate and clean-up millions of location records](https://towardsdatascience.com/deduplicate-and-clean-up-millions-of-location-records-abcffb308ebf)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huels-originals/geobatchpy",
    "name": "geobatchpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.1,<4.0",
    "maintainer_email": "",
    "keywords": "geoapify,geocoding,openstreetmap,geojson",
    "author": "Paul Kinsvater",
    "author_email": "paul.kinsvater@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ed/ee/0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2/geobatchpy-0.2.3.tar.gz",
    "platform": null,
    "description": "# A CLI and Python Client for the Geoapify API\n\nWe have been using the Geoapify API to **geocode millions of location records** for data validation and analytics. We built\nthis package to make this process comfortable using Python and the command line.\n\nWhy Geoapify and this package may also be a good fit for you:\n\n- You need to batch process large numbers of location records (geocode, reverse geocode, places & details).\n- The license must support commercial use without restrictions.\n- It needs to be cheap (or even for free if you don't need more than 6k addresses per day).\n\nSign up at [geoapify.com](https://geoapify.com/) and start with their free plan of 3k credits per day which translates\nto up to 6k address geocodings.\n\n## Install our package with `pip`\n\nThis package is available on the public PyPI:\n\n```shell\npip install geobatchpy\n```\n\n## Examples\n\nSee our documentation at [geobatchpy.readthedocs.io](https://geobatchpy.readthedocs.io/en/latest/) for a growing number of\ncomprehensive example use cases. Below we illustrate both, the Python API and the CLI, for a tiny batch geocoding\nexample.\n\n### A simple batch geocoding example using the Python API\n\nBelow we geocode multiple addresses in a single batch. There are two ways how we can provide the location data as input.\nEither we use a list of strings, one string per address. These are then taken as free text searches. Or we provide\nstructured input as a list of dictionaries, again one per address. See the\n[Geoapify API documentation](https://apidocs.geoapify.com/) for a complete list of address attributes accepted by the\ngeocoding services. Use the optional `parameters` dictionary if all your addresses have an attribute in common. E.g.,\nbelow we request results in French.\n\n```python\nfrom geobatchpy import Client\n\nclient = Client(api_key='<your-api-key>')\n\naddresses = ['H\u00fclser Markt 1, 47839 Krefeld',\n             'DB Schenker, Essen, Germany',\n             'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']\n\n# see the geoapify.com API docs for more optional parameters\nres = client.batch.geocode(locations=addresses, parameters={'lang': 'fr'}, simplify_output=True)\n```\n\nAlternatively you can provide a list of dictionaries, with every address in a structured form. And if you still need\nthe free text search for some, you can do this with the `'text'` attribute. Here is the same example, with the first\ntwo address translated to structured form:\n\n```python\naddresses = [{'city': 'Krefeld', 'street': 'H\u00fclser Markt', 'housenumber': 1, 'postcode': '47839'},\n             {'name': 'DB Schenker', 'city': 'Essen', 'country': 'Germany'},\n             {'text': 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen'}]\n```\n\n```python\n# Showing the first of three result sets: res[0]\n{\n    \"query\": {\n        \"text\": \"H\u00fclser Markt 1, 47839 Krefeld\",\n        \"parsed\": {\n            \"housenumber\": \"1\",\n            \"street\": \"h\u00fclser markt\",\n            \"postcode\": \"47839\",\n            \"city\": \"krefeld\",\n            \"expected_type\": \"building\",\n        },\n    },\n    \"datasource\": {\n        \"sourcename\": \"openstreetmap\",\n        \"attribution\": \"\u00a9 OpenStreetMap contributors\",\n        \"license\": \"Open Database License\",\n        \"url\": \"https://www.openstreetmap.org/copyright\",\n    },\n    \"name\": \"Metzgerei Etteldorf\",\n    \"housenumber\": \"1\",\n    \"street\": \"H\u00fclser Markt\",\n    \"suburb\": \"H\u00fcls\",\n    \"city\": \"Krefeld\",\n    \"state\": \"Rh\u00e9nanie-du-Nord-Westphalie\",\n    \"postcode\": \"47839\",\n    \"country\": \"Allemagne\",\n    \"country_code\": \"de\",\n    \"lon\": 6.510696417033254,\n    \"lat\": 51.373026800000005,\n    \"formatted\": \"Metzgerei Etteldorf, H\u00fclser Markt 1, 47839 Krefeld, Allemagne\",\n    \"address_line1\": \"Metzgerei Etteldorf\",\n    \"address_line2\": \"H\u00fclser Markt 1, 47839 Krefeld, Allemagne\",\n    \"category\": \"commercial.food_and_drink.butcher\",\n    \"result_type\": \"amenity\",\n    \"rank\": {\n        \"importance\": 0.31100000000000005,\n        \"popularity\": 5.585340759145855,\n        \"confidence\": 1,\n        \"confidence_city_level\": 1,\n        \"confidence_street_level\": 1,\n        \"match_type\": \"inner_part\",\n    },\n    \"place_id\": \"516b5e6500f40a1a40590a449957bfaf4940f00102f9010ecff70d00000000c002019203134d65747a676572656920457474656c646f7266\",\n}\n```\n\n### The same batch geocoding example using the CLI\n\nWe built the `geoapify` command line interface to make batch processing large numbers of records more comfortable.\n\nSteps:\n1. Prepare a JSON file as input.\n2. Use `geoapify post-batch-jobs` to submit one or more jobs to the Geoapify servers.\n3. Use `geoapify monitor-batch-jobs` for monitoring progress and data retrieval.\n\n```python\n# Step 1 - written in Python:\nfrom geobatchpy.batch import parse_geocoding_inputs\nfrom geobatchpy.utils import write_data_to_json_file\n\naddresses = ['H\u00fclser Markt 1, 47839 Krefeld',\n             'DB Schenker, Essen, Germany',\n             'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']\n\ndata = {\n    'api': '/v1/geocode/search',  # see the Geoapify API docs for other APIs that work with batch processing\n    'inputs': parse_geocoding_inputs(locations=addresses),\n    'batch_len': 2,  # optional - will put first two addresses in batch 1, last address in batch 2\n    'id': 'my-batch-geocoding-job'  # optional - a reference which will be reused in the output file\n}\n\nwrite_data_to_json_file(data=data, file_path='<path-data-in>')\n```\n\nThe following command submits one or more jobs and stores job URLs to disk. Those URLs are required to monitor\nand retrieve results.\n\n```shell\ngeobatch submit <path-data-in> <path-post-data-out> --api-key <your-key>\n```\n\nYou can omit the `--api-key` option if you set the `GEOAPIFY_KEY` environment variable. Next we start monitoring\nprogress:\n\n```shell\ngeobatch receive <path-post-data-out> <path-results-data-out> --api-key <your-key>\n```\n\nWe can abort the monitoring at any time and restart later - provided the jobs still are in the cache of\nGeoapify servers (24 hours).\n\n## References and further reading\n\n- [geoapify.com API documentation](https://apidocs.geoapify.com/)\n- [Towards Data Science - Deduplicate and clean-up millions of location records](https://towardsdatascience.com/deduplicate-and-clean-up-millions-of-location-records-abcffb308ebf)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A CLI and Python Client for the Geoapify API.",
    "version": "0.2.3",
    "project_urls": {
        "Homepage": "https://github.com/huels-originals/geobatchpy",
        "Repository": "https://github.com/huels-originals/geobatchpy"
    },
    "split_keywords": [
        "geoapify",
        "geocoding",
        "openstreetmap",
        "geojson"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "12881b95d2cf0fd806c67dd6a7cbee844fe4d32396993801759f3aad73115f17",
                "md5": "eddb9ccc2edee7553733a4d46ebe4529",
                "sha256": "b2526262569b1605bae70bca37424afad3d757c140d187ba9d2581e6cddef877"
            },
            "downloads": -1,
            "filename": "geobatchpy-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eddb9ccc2edee7553733a4d46ebe4529",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.1,<4.0",
            "size": 17048,
            "upload_time": "2023-06-03T08:35:06",
            "upload_time_iso_8601": "2023-06-03T08:35:06.251781Z",
            "url": "https://files.pythonhosted.org/packages/12/88/1b95d2cf0fd806c67dd6a7cbee844fe4d32396993801759f3aad73115f17/geobatchpy-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "edee0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2",
                "md5": "f5c9dafb9daf2cb89bd1b96a5dbc5fa8",
                "sha256": "8284cd4a96009817cf9179282548588ff736df5c1ae536ec5eda59e2a9ee98c1"
            },
            "downloads": -1,
            "filename": "geobatchpy-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f5c9dafb9daf2cb89bd1b96a5dbc5fa8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.1,<4.0",
            "size": 16210,
            "upload_time": "2023-06-03T08:35:08",
            "upload_time_iso_8601": "2023-06-03T08:35:08.399495Z",
            "url": "https://files.pythonhosted.org/packages/ed/ee/0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2/geobatchpy-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-03 08:35:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huels-originals",
    "github_project": "geobatchpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "geobatchpy"
}
        
Elapsed time: 0.07273s