| Name | whereabouts JSON |
| Version |
0.4.1
JSON |
| download |
| home_page | None |
| Summary | Fast, accurate open source geocoding in Python |
| upload_time | 2025-08-17 10:46:40 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.12 |
| license | MIT License Copyright (c) 2023 Alex Lee Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
| keywords |
geocoding
geospatial
record linkage
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
certifi
charset-normalizer
cramjam
duckdb
et-xmlfile
fastparquet
fsspec
idna
iniconfig
jinja2
joblib
lxml
markupsafe
numpy
openpyxl
packaging
pandas
pluggy
pyarrow
pygments
pytest
python-dateutil
pytz
pyyaml
requests
ruff
scikit-learn
scipy
six
threadpoolctl
tqdm
tzdata
urllib3
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
[](http://whereabouts.readthedocs.io/en/latest/?badge=latest)
[](https://pepy.tech/project/whereabouts)
[](https://github.com/ajl2718/whereabouts/issues)
# Whereabouts
A light-weight, fast geocoder for Python using DuckDB. Try it out online at [Huggingface](https://huggingface.co/spaces/saunteringcat/whereabouts-geocoding)
## Description
Whereabouts is an open-source geocoding library for Python, allowing you to geocode and standardize address data all within your own environment:
Features:
- Two line installation
- No additional database setup required. Uses DuckDB to run all queries
- No need to send data to an external geocoding API
- Fast (Geocode 1000s / sec depending on your setup)
- Robust to typographical errors
## Performance
Whereabouts performs well compared with other geocoders. The charts below show the accuracy when calculated at apartment / unit, house, street and suburb level, comparing Whereabouts with Google, Mapbox and Nominatim on sets of residential and retail addresses.
<p align="center">
<img src="geocoder_comparison_residential_050924.png" alt="Geocoding accuracy on a set residential awddresses" width="45%"/>
<img src="geocoder_comparison_retail_050924.png" alt="Geocoding accuracy on a set of business addresses" width="45%"/>
</p>
Code to produce these results is found in the [whereabouts_testing repo](https://github.com/ajl2718/whereabouts_testing)
## Requirements
- Python 3.12+
## Installation: via uv / pip / conda
whereabouts can be installed either from this repo or using pip / uv / conda.
```
uv add whereabouts
```
## Installation from this repo
Firstly, clone the repo
```
git clone https://github.com/ajl2718/whereabouts.git
```
Then create a uv project via:
```
uv venv
```
This will install all the required dependences that are listed in the `pyproject.toml` file.
## Download a geocoder database or create your own
You will need a geocoding database to match addresses against. You can either download a pre-built database or create your own using a dataset of high quality reference addresses for a given country, state or other geographic region.
### Option 1: Download a pre-built geocoder database
Pre-built geocoding database are available from [Huggingface](https://www.huggingface.co). The list of available databases can be found [here](https://huggingface.co/saunteringcat/whereabouts-db/tree/main)
As an example, to install the small size geocoder database for California:
```
python -m whereabouts download us_ca_sm
```
or for the small size geocoder database for all of Australia:
```
python -m whereabouts download au_all_sm
```
### Option 2: Create a geocoder database
Rather than using a pre-built database, you can create your own geocoder database if you have your own address file. This file should be a single csv or parquet file with the following columns:
| Column name | Description | Data type |
| ----------- | ----------- | --------- |
| ADDRESS_DETAIL_PID | Unique identifier for address | int |
| ADDRESS_LABEL | The full address | str |
| ADDRESS_SITE_NAME | Name of the site. This is usually null | str |
| LOCALITY_NAME | Name of the suburb or locality | str |
| POSTCODE | Postcode of address | int |
| STATE | The state, region or territory for the address | str |
| LATITUDE | Latitude of geocoded address | float |
| LONGITUDE | Longitude of geocoded address | float |
These fields should be specified in a `setup.yml` file. Once the `setup.yml` is created and a reference dataset is available, the geocoding database can be created:
```
python -m whereabouts setup_geocoder setup.yml
```
An example `setup.yml` file is provided with this repo. Note that the state names listed are specific to Australia and should be changed according to the country's data you are working with.
## Geocoding examples
Geocode a list of addresses
```
from whereabouts.Matcher import Matcher
matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')
```
For more accurate geocoding you can use trigram phrases rather than token phrases. Note you will need one of the large databases to use trigram geocoding.
```
matcher.geocode(addresslist, how='trigram')
```
## How it works
The algorithm employs simple record linkage techniques, making it suitable for implementation in around 10 lines of SQL. It is based on the following papers
- https://arxiv.org/abs/1708.01402
- https://arxiv.org/abs/1712.09691
## Documentation
Work in progress: https://whereabouts.readthedocs.io/en/latest/
## License Disclaimer for Third-Party Data
Note that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases:
- The Australian databases are built from the [Geocoded National Address File](https://https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf) with conditions of use based on the [End User License Agreemment](https://data.gov.au/dataset/ds-dga-e1a365fc-52f5-4798-8f0c-ed1d33d43b6d/distribution/dist-dga-0102be65-3781-42d9-9458-fdaf7170efed/details?q=previous%20gnaf)
- The US databases are still work-in-progress but are based on data from [OpenAddresses](https://openaddresses.io/) and so any work with whereabouts based on US address data should adhere to the [OpenAddresses license](https://github.com/openaddresses/openaddresses/blob/master/LICENSE).
Users of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.
## Citing
To cite this repo, please use the following
```bibtext
@software{whereabouts_2024,
author = {Alex Lee},
doi = {[10.5281/zenodo.1234](https://doi.org/10.5281/zenodo.13627073)},
month = {10},
title = {{Whereabouts}},
url = {https://github.com/ajl2718/whereabouts},
version = {0.3.14},
year = {2024}
}
Raw data
{
"_id": null,
"home_page": null,
"name": "whereabouts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "geocoding, geospatial, record linkage",
"author": null,
"author_email": "Alex Lee <ajlee3141@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b5/e1/1908df6c807d4104583436003d225ef8d4499add23c1408b331bd1f28e57/whereabouts-0.4.1.tar.gz",
"platform": null,
"description": "[](http://whereabouts.readthedocs.io/en/latest/?badge=latest)\n[](https://pepy.tech/project/whereabouts)\n[](https://github.com/ajl2718/whereabouts/issues)\n\n# Whereabouts\nA light-weight, fast geocoder for Python using DuckDB. Try it out online at [Huggingface](https://huggingface.co/spaces/saunteringcat/whereabouts-geocoding)\n\n## Description\nWhereabouts is an open-source geocoding library for Python, allowing you to geocode and standardize address data all within your own environment:\n\nFeatures:\n- Two line installation\n- No additional database setup required. Uses DuckDB to run all queries\n- No need to send data to an external geocoding API\n- Fast (Geocode 1000s / sec depending on your setup)\n- Robust to typographical errors\n\n## Performance\nWhereabouts performs well compared with other geocoders. The charts below show the accuracy when calculated at apartment / unit, house, street and suburb level, comparing Whereabouts with Google, Mapbox and Nominatim on sets of residential and retail addresses.\n\n<p align=\"center\">\n <img src=\"geocoder_comparison_residential_050924.png\" alt=\"Geocoding accuracy on a set residential awddresses\" width=\"45%\"/>\n <img src=\"geocoder_comparison_retail_050924.png\" alt=\"Geocoding accuracy on a set of business addresses\" width=\"45%\"/>\n</p>\n\nCode to produce these results is found in the [whereabouts_testing repo](https://github.com/ajl2718/whereabouts_testing)\n\n## Requirements\n- Python 3.12+\n\n## Installation: via uv / pip / conda\n\nwhereabouts can be installed either from this repo or using pip / uv / conda.\n\n```\nuv add whereabouts\n```\n\n## Installation from this repo\nFirstly, clone the repo\n\n```\ngit clone https://github.com/ajl2718/whereabouts.git\n```\n\nThen create a uv project via:\n\n```\nuv venv\n```\n\nThis will install all the required dependences that are listed in the `pyproject.toml` file.\n\n## Download a geocoder database or create your own\n\nYou will need a geocoding database to match addresses against. You can either download a pre-built database or create your own using a dataset of high quality reference addresses for a given country, state or other geographic region.\n\n### Option 1: Download a pre-built geocoder database\n\nPre-built geocoding database are available from [Huggingface](https://www.huggingface.co). The list of available databases can be found [here](https://huggingface.co/saunteringcat/whereabouts-db/tree/main)\n\nAs an example, to install the small size geocoder database for California:\n\n```\npython -m whereabouts download us_ca_sm\n```\n\nor for the small size geocoder database for all of Australia:\n\n```\npython -m whereabouts download au_all_sm\n```\n\n### Option 2: Create a geocoder database\n\nRather than using a pre-built database, you can create your own geocoder database if you have your own address file. This file should be a single csv or parquet file with the following columns:\n\n| Column name | Description | Data type |\n| ----------- | ----------- | --------- |\n| ADDRESS_DETAIL_PID | Unique identifier for address | int |\n| ADDRESS_LABEL | The full address | str |\n| ADDRESS_SITE_NAME | Name of the site. This is usually null | str |\n| LOCALITY_NAME | Name of the suburb or locality | str |\n| POSTCODE | Postcode of address | int |\n| STATE | The state, region or territory for the address | str |\n| LATITUDE | Latitude of geocoded address | float |\n| LONGITUDE | Longitude of geocoded address | float |\n\nThese fields should be specified in a `setup.yml` file. Once the `setup.yml` is created and a reference dataset is available, the geocoding database can be created:\n\n```\npython -m whereabouts setup_geocoder setup.yml\n```\n\nAn example `setup.yml` file is provided with this repo. Note that the state names listed are specific to Australia and should be changed according to the country's data you are working with.\n\n## Geocoding examples\n\nGeocode a list of addresses \n```\nfrom whereabouts.Matcher import Matcher\n\nmatcher = Matcher(db_name='au_all_sm')\nmatcher.geocode(addresslist, how='standard')\n```\n\nFor more accurate geocoding you can use trigram phrases rather than token phrases. Note you will need one of the large databases to use trigram geocoding.\n```\nmatcher.geocode(addresslist, how='trigram')\n```\n\n## How it works\nThe algorithm employs simple record linkage techniques, making it suitable for implementation in around 10 lines of SQL. It is based on the following papers\n- https://arxiv.org/abs/1708.01402\n- https://arxiv.org/abs/1712.09691\n\n## Documentation\nWork in progress: https://whereabouts.readthedocs.io/en/latest/\n\n## License Disclaimer for Third-Party Data\nNote that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases:\n\n- The Australian databases are built from the [Geocoded National Address File](https://https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf) with conditions of use based on the [End User License Agreemment](https://data.gov.au/dataset/ds-dga-e1a365fc-52f5-4798-8f0c-ed1d33d43b6d/distribution/dist-dga-0102be65-3781-42d9-9458-fdaf7170efed/details?q=previous%20gnaf)\n- The US databases are still work-in-progress but are based on data from [OpenAddresses](https://openaddresses.io/) and so any work with whereabouts based on US address data should adhere to the [OpenAddresses license](https://github.com/openaddresses/openaddresses/blob/master/LICENSE).\n\nUsers of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.\n\n## Citing\nTo cite this repo, please use the following\n\n```bibtext\n@software{whereabouts_2024,\n author = {Alex Lee},\n doi = {[10.5281/zenodo.1234](https://doi.org/10.5281/zenodo.13627073)},\n month = {10},\n title = {{Whereabouts}},\n url = {https://github.com/ajl2718/whereabouts},\n version = {0.3.14},\n year = {2024}\n}",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2023 Alex Lee Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "Fast, accurate open source geocoding in Python",
"version": "0.4.1",
"project_urls": {
"Documentation": "https://whereabouts.readthedocs.io/en/latest",
"Issues": "https://github.com/ajl2718/whereabouts/issues",
"Source": "https://github.com/ajl2718/whereabouts"
},
"split_keywords": [
"geocoding",
" geospatial",
" record linkage"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bd308044f5d8a3f083b84b7e9ef5dd8ce80dec4fc3c810e8e4e172c81f41b7e2",
"md5": "c4492d6275e6782bc619a19623324907",
"sha256": "5201f137e3e701cdba7beedd32b9dab872f0f9c6e16df5bad13a0701def7e2d9"
},
"downloads": -1,
"filename": "whereabouts-0.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c4492d6275e6782bc619a19623324907",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 38496,
"upload_time": "2025-08-17T10:46:38",
"upload_time_iso_8601": "2025-08-17T10:46:38.465099Z",
"url": "https://files.pythonhosted.org/packages/bd/30/8044f5d8a3f083b84b7e9ef5dd8ce80dec4fc3c810e8e4e172c81f41b7e2/whereabouts-0.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b5e11908df6c807d4104583436003d225ef8d4499add23c1408b331bd1f28e57",
"md5": "aa649470808ad99382202f8fa851f86f",
"sha256": "39bd6e6f92ec3520011e04d5a6c49577b73896f5afeef1cb1571673e1ec08bbd"
},
"downloads": -1,
"filename": "whereabouts-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "aa649470808ad99382202f8fa851f86f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 513190,
"upload_time": "2025-08-17T10:46:40",
"upload_time_iso_8601": "2025-08-17T10:46:40.440692Z",
"url": "https://files.pythonhosted.org/packages/b5/e1/1908df6c807d4104583436003d225ef8d4499add23c1408b331bd1f28e57/whereabouts-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-17 10:46:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ajl2718",
"github_project": "whereabouts",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "certifi",
"specs": [
[
"==",
"2025.8.3"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.3"
]
]
},
{
"name": "cramjam",
"specs": [
[
"==",
"2.11.0"
]
]
},
{
"name": "duckdb",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "et-xmlfile",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "fastparquet",
"specs": [
[
"==",
"2024.11.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2025.7.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"6.0.0"
]
]
},
{
"name": "markupsafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"==",
"3.1.5"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"21.0.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.19.2"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.4.1"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.4"
]
]
},
{
"name": "ruff",
"specs": [
[
"==",
"0.12.9"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.16.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.5.0"
]
]
}
],
"lcname": "whereabouts"
}