# A CLI and Python Client for the Geoapify API
We have been using the Geoapify API to **geocode millions of location records** for data validation and analytics. We built
this package to make this process comfortable using Python and the command line.
Why Geoapify and this package may also be a good fit for you:
- You need to batch process large numbers of location records (geocode, reverse geocode, places & details).
- The license must support commercial use without restrictions.
- It needs to be cheap (or even for free if you don't need more than 6k addresses per day).
Sign up at [geoapify.com](https://geoapify.com/) and start with their free plan of 3k credits per day which translates
to up to 6k address geocodings.
## Install our package with `pip`
This package is available on the public PyPI:
```shell
pip install geobatchpy
```
## Examples
See our documentation at [geobatchpy.readthedocs.io](https://geobatchpy.readthedocs.io/en/latest/) for a growing number of
comprehensive example use cases. Below we illustrate both, the Python API and the CLI, for a tiny batch geocoding
example.
### A simple batch geocoding example using the Python API
Below we geocode multiple addresses in a single batch. There are two ways how we can provide the location data as input.
Either we use a list of strings, one string per address. These are then taken as free text searches. Or we provide
structured input as a list of dictionaries, again one per address. See the
[Geoapify API documentation](https://apidocs.geoapify.com/) for a complete list of address attributes accepted by the
geocoding services. Use the optional `parameters` dictionary if all your addresses have an attribute in common. E.g.,
below we request results in French.
```python
from geobatchpy import Client
client = Client(api_key='<your-api-key>')
addresses = ['Hülser Markt 1, 47839 Krefeld',
'DB Schenker, Essen, Germany',
'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']
# see the geoapify.com API docs for more optional parameters
res = client.batch.geocode(locations=addresses, parameters={'lang': 'fr'}, simplify_output=True)
```
Alternatively you can provide a list of dictionaries, with every address in a structured form. And if you still need
the free text search for some, you can do this with the `'text'` attribute. Here is the same example, with the first
two address translated to structured form:
```python
addresses = [{'city': 'Krefeld', 'street': 'Hülser Markt', 'housenumber': 1, 'postcode': '47839'},
{'name': 'DB Schenker', 'city': 'Essen', 'country': 'Germany'},
{'text': 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen'}]
```
```python
# Showing the first of three result sets: res[0]
{
"query": {
"text": "Hülser Markt 1, 47839 Krefeld",
"parsed": {
"housenumber": "1",
"street": "hülser markt",
"postcode": "47839",
"city": "krefeld",
"expected_type": "building",
},
},
"datasource": {
"sourcename": "openstreetmap",
"attribution": "© OpenStreetMap contributors",
"license": "Open Database License",
"url": "https://www.openstreetmap.org/copyright",
},
"name": "Metzgerei Etteldorf",
"housenumber": "1",
"street": "Hülser Markt",
"suburb": "Hüls",
"city": "Krefeld",
"state": "Rhénanie-du-Nord-Westphalie",
"postcode": "47839",
"country": "Allemagne",
"country_code": "de",
"lon": 6.510696417033254,
"lat": 51.373026800000005,
"formatted": "Metzgerei Etteldorf, Hülser Markt 1, 47839 Krefeld, Allemagne",
"address_line1": "Metzgerei Etteldorf",
"address_line2": "Hülser Markt 1, 47839 Krefeld, Allemagne",
"category": "commercial.food_and_drink.butcher",
"result_type": "amenity",
"rank": {
"importance": 0.31100000000000005,
"popularity": 5.585340759145855,
"confidence": 1,
"confidence_city_level": 1,
"confidence_street_level": 1,
"match_type": "inner_part",
},
"place_id": "516b5e6500f40a1a40590a449957bfaf4940f00102f9010ecff70d00000000c002019203134d65747a676572656920457474656c646f7266",
}
```
### The same batch geocoding example using the CLI
We built the `geoapify` command line interface to make batch processing large numbers of records more comfortable.
Steps:
1. Prepare a JSON file as input.
2. Use `geoapify post-batch-jobs` to submit one or more jobs to the Geoapify servers.
3. Use `geoapify monitor-batch-jobs` for monitoring progress and data retrieval.
```python
# Step 1 - written in Python:
from geobatchpy.batch import parse_geocoding_inputs
from geobatchpy.utils import write_data_to_json_file
addresses = ['Hülser Markt 1, 47839 Krefeld',
'DB Schenker, Essen, Germany',
'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']
data = {
'api': '/v1/geocode/search', # see the Geoapify API docs for other APIs that work with batch processing
'inputs': parse_geocoding_inputs(locations=addresses),
'batch_len': 2, # optional - will put first two addresses in batch 1, last address in batch 2
'id': 'my-batch-geocoding-job' # optional - a reference which will be reused in the output file
}
write_data_to_json_file(data=data, file_path='<path-data-in>')
```
The following command submits one or more jobs and stores job URLs to disk. Those URLs are required to monitor
and retrieve results.
```shell
geobatch submit <path-data-in> <path-post-data-out> --api-key <your-key>
```
You can omit the `--api-key` option if you set the `GEOAPIFY_KEY` environment variable. Next we start monitoring
progress:
```shell
geobatch receive <path-post-data-out> <path-results-data-out> --api-key <your-key>
```
We can abort the monitoring at any time and restart later - provided the jobs still are in the cache of
Geoapify servers (24 hours).
## References and further reading
- [geoapify.com API documentation](https://apidocs.geoapify.com/)
- [Towards Data Science - Deduplicate and clean-up millions of location records](https://towardsdatascience.com/deduplicate-and-clean-up-millions-of-location-records-abcffb308ebf)
Raw data
{
"_id": null,
"home_page": "https://github.com/huels-originals/geobatchpy",
"name": "geobatchpy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.1,<4.0",
"maintainer_email": "",
"keywords": "geoapify,geocoding,openstreetmap,geojson",
"author": "Paul Kinsvater",
"author_email": "paul.kinsvater@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ed/ee/0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2/geobatchpy-0.2.3.tar.gz",
"platform": null,
"description": "# A CLI and Python Client for the Geoapify API\n\nWe have been using the Geoapify API to **geocode millions of location records** for data validation and analytics. We built\nthis package to make this process comfortable using Python and the command line.\n\nWhy Geoapify and this package may also be a good fit for you:\n\n- You need to batch process large numbers of location records (geocode, reverse geocode, places & details).\n- The license must support commercial use without restrictions.\n- It needs to be cheap (or even for free if you don't need more than 6k addresses per day).\n\nSign up at [geoapify.com](https://geoapify.com/) and start with their free plan of 3k credits per day which translates\nto up to 6k address geocodings.\n\n## Install our package with `pip`\n\nThis package is available on the public PyPI:\n\n```shell\npip install geobatchpy\n```\n\n## Examples\n\nSee our documentation at [geobatchpy.readthedocs.io](https://geobatchpy.readthedocs.io/en/latest/) for a growing number of\ncomprehensive example use cases. Below we illustrate both, the Python API and the CLI, for a tiny batch geocoding\nexample.\n\n### A simple batch geocoding example using the Python API\n\nBelow we geocode multiple addresses in a single batch. There are two ways how we can provide the location data as input.\nEither we use a list of strings, one string per address. These are then taken as free text searches. Or we provide\nstructured input as a list of dictionaries, again one per address. See the\n[Geoapify API documentation](https://apidocs.geoapify.com/) for a complete list of address attributes accepted by the\ngeocoding services. Use the optional `parameters` dictionary if all your addresses have an attribute in common. E.g.,\nbelow we request results in French.\n\n```python\nfrom geobatchpy import Client\n\nclient = Client(api_key='<your-api-key>')\n\naddresses = ['H\u00fclser Markt 1, 47839 Krefeld',\n 'DB Schenker, Essen, Germany',\n 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']\n\n# see the geoapify.com API docs for more optional parameters\nres = client.batch.geocode(locations=addresses, parameters={'lang': 'fr'}, simplify_output=True)\n```\n\nAlternatively you can provide a list of dictionaries, with every address in a structured form. And if you still need\nthe free text search for some, you can do this with the `'text'` attribute. Here is the same example, with the first\ntwo address translated to structured form:\n\n```python\naddresses = [{'city': 'Krefeld', 'street': 'H\u00fclser Markt', 'housenumber': 1, 'postcode': '47839'},\n {'name': 'DB Schenker', 'city': 'Essen', 'country': 'Germany'},\n {'text': 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen'}]\n```\n\n```python\n# Showing the first of three result sets: res[0]\n{\n \"query\": {\n \"text\": \"H\u00fclser Markt 1, 47839 Krefeld\",\n \"parsed\": {\n \"housenumber\": \"1\",\n \"street\": \"h\u00fclser markt\",\n \"postcode\": \"47839\",\n \"city\": \"krefeld\",\n \"expected_type\": \"building\",\n },\n },\n \"datasource\": {\n \"sourcename\": \"openstreetmap\",\n \"attribution\": \"\u00a9 OpenStreetMap contributors\",\n \"license\": \"Open Database License\",\n \"url\": \"https://www.openstreetmap.org/copyright\",\n },\n \"name\": \"Metzgerei Etteldorf\",\n \"housenumber\": \"1\",\n \"street\": \"H\u00fclser Markt\",\n \"suburb\": \"H\u00fcls\",\n \"city\": \"Krefeld\",\n \"state\": \"Rh\u00e9nanie-du-Nord-Westphalie\",\n \"postcode\": \"47839\",\n \"country\": \"Allemagne\",\n \"country_code\": \"de\",\n \"lon\": 6.510696417033254,\n \"lat\": 51.373026800000005,\n \"formatted\": \"Metzgerei Etteldorf, H\u00fclser Markt 1, 47839 Krefeld, Allemagne\",\n \"address_line1\": \"Metzgerei Etteldorf\",\n \"address_line2\": \"H\u00fclser Markt 1, 47839 Krefeld, Allemagne\",\n \"category\": \"commercial.food_and_drink.butcher\",\n \"result_type\": \"amenity\",\n \"rank\": {\n \"importance\": 0.31100000000000005,\n \"popularity\": 5.585340759145855,\n \"confidence\": 1,\n \"confidence_city_level\": 1,\n \"confidence_street_level\": 1,\n \"match_type\": \"inner_part\",\n },\n \"place_id\": \"516b5e6500f40a1a40590a449957bfaf4940f00102f9010ecff70d00000000c002019203134d65747a676572656920457474656c646f7266\",\n}\n```\n\n### The same batch geocoding example using the CLI\n\nWe built the `geoapify` command line interface to make batch processing large numbers of records more comfortable.\n\nSteps:\n1. Prepare a JSON file as input.\n2. Use `geoapify post-batch-jobs` to submit one or more jobs to the Geoapify servers.\n3. Use `geoapify monitor-batch-jobs` for monitoring progress and data retrieval.\n\n```python\n# Step 1 - written in Python:\nfrom geobatchpy.batch import parse_geocoding_inputs\nfrom geobatchpy.utils import write_data_to_json_file\n\naddresses = ['H\u00fclser Markt 1, 47839 Krefeld',\n 'DB Schenker, Essen, Germany',\n 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']\n\ndata = {\n 'api': '/v1/geocode/search', # see the Geoapify API docs for other APIs that work with batch processing\n 'inputs': parse_geocoding_inputs(locations=addresses),\n 'batch_len': 2, # optional - will put first two addresses in batch 1, last address in batch 2\n 'id': 'my-batch-geocoding-job' # optional - a reference which will be reused in the output file\n}\n\nwrite_data_to_json_file(data=data, file_path='<path-data-in>')\n```\n\nThe following command submits one or more jobs and stores job URLs to disk. Those URLs are required to monitor\nand retrieve results.\n\n```shell\ngeobatch submit <path-data-in> <path-post-data-out> --api-key <your-key>\n```\n\nYou can omit the `--api-key` option if you set the `GEOAPIFY_KEY` environment variable. Next we start monitoring\nprogress:\n\n```shell\ngeobatch receive <path-post-data-out> <path-results-data-out> --api-key <your-key>\n```\n\nWe can abort the monitoring at any time and restart later - provided the jobs still are in the cache of\nGeoapify servers (24 hours).\n\n## References and further reading\n\n- [geoapify.com API documentation](https://apidocs.geoapify.com/)\n- [Towards Data Science - Deduplicate and clean-up millions of location records](https://towardsdatascience.com/deduplicate-and-clean-up-millions-of-location-records-abcffb308ebf)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A CLI and Python Client for the Geoapify API.",
"version": "0.2.3",
"project_urls": {
"Homepage": "https://github.com/huels-originals/geobatchpy",
"Repository": "https://github.com/huels-originals/geobatchpy"
},
"split_keywords": [
"geoapify",
"geocoding",
"openstreetmap",
"geojson"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "12881b95d2cf0fd806c67dd6a7cbee844fe4d32396993801759f3aad73115f17",
"md5": "eddb9ccc2edee7553733a4d46ebe4529",
"sha256": "b2526262569b1605bae70bca37424afad3d757c140d187ba9d2581e6cddef877"
},
"downloads": -1,
"filename": "geobatchpy-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eddb9ccc2edee7553733a4d46ebe4529",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.1,<4.0",
"size": 17048,
"upload_time": "2023-06-03T08:35:06",
"upload_time_iso_8601": "2023-06-03T08:35:06.251781Z",
"url": "https://files.pythonhosted.org/packages/12/88/1b95d2cf0fd806c67dd6a7cbee844fe4d32396993801759f3aad73115f17/geobatchpy-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "edee0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2",
"md5": "f5c9dafb9daf2cb89bd1b96a5dbc5fa8",
"sha256": "8284cd4a96009817cf9179282548588ff736df5c1ae536ec5eda59e2a9ee98c1"
},
"downloads": -1,
"filename": "geobatchpy-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "f5c9dafb9daf2cb89bd1b96a5dbc5fa8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.1,<4.0",
"size": 16210,
"upload_time": "2023-06-03T08:35:08",
"upload_time_iso_8601": "2023-06-03T08:35:08.399495Z",
"url": "https://files.pythonhosted.org/packages/ed/ee/0d857650cb0a7a74776aef66fefb94d845d82f4d0b4bcdfb1290bf18b0f2/geobatchpy-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-03 08:35:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "huels-originals",
"github_project": "geobatchpy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "geobatchpy"
}