usgeocoder


Nameusgeocoder JSON
Version 0.2 PyPI version JSON
download
home_pagehttps://github.com/ClayGendron/usgeocoder
SummaryUSGeoCoder is an easy and free-to-use package for geocoding US addresses with the US Census Geocoder API
upload_time2023-09-20 03:43:23
maintainer
docs_urlNone
authorClay Gendron
requires_python
license
keywords geocoding geocoder geospatial us census api address coordinates batch processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
<img src="https://raw.githubusercontent.com/claygendron/usgeocoder/main/USGeocoder.png" width="1200">
</h1>

# Overview

Thank you for your interest in USGeoCoder package!
USGeoCoder is an easy and free-to-use package for geocoding US addresses with the US Census Geocoder API.
This package was created to solve two problems I encountered while trying to geocode data in my data pipelines:

1. Geocode thousands of addresses in a reasonable amount of time without caps on total requests.
2. Do it for free.

The [US Census Geocoder API](https://geocoding.geo.census.gov/geocoder/) was the best solution I found to meet these requirements.
There are limitations, of course (the main one being that this API only works for US addresses), but by sending requests in parallel, this package can geocode around 2,000 - 4,000 addresses per minute without ever hitting a rate limit or a total request cap.

This package is designed to help anyone, from an individual data scientist or developer working on small projects to a business managing large data pipelines.
If this package helps you, I would love to hear from you! And I would love it even more if you give feedback or contribute to the package 😊

**Note:** This package is in a Beta state, so please be aware that there may be bugs or issues. Thank you for your patience.

# Table of Contents

1. [Installation](#installation)
2. [Usage](#usage)
   - [API Request Functions](#api-request-functions)
   - [Batch Geocoder Function](#batch-geocoder-function)
   - [Geocoder Class](#geocoder-class)
3. [Contribute](#contribute)
4. [License](#license)

# Installation

Make sure you have Python 3 installed, along with the pandas library.

```bash
pip install usgeocoder
```

# Usage

This package consists of three main sets of functions and classes.

- API Request Functions (Forward and Reverse)
- Batch Geocoder Function (Parallelize API Request Functions)
- Geocoder Class (Data Manager for Batch Geocoder)

The components will be detailed below in order.

## API Request Functions

```python
from usgeocoder import geocode_address, geocode_coordinates
```

It is very simple to run a single request to geocode an address or a pair of coordinates.

Addresses should look like this: `123 Main St, City, State Zip`.

Coordinates should look like this: `(Longitude, Latitude)`.

```python
# Forward
address = '123 Main St, City, State Zip'
response = geocode_address(address)

# Reverse
coordinates = (-70.207895, 43.623068)
response = geocode_coordinates(coordinates)
```

**Note:** Notice coordinate pairs are stored as (Longitude, Latitude) or (x, y).
If results are not as expected, try switching the order of the coordinates.
For instance, Google Maps shows points as (Latitude, Longitude) or (y, x).
The order of (Longitude, Latitude) was chosen because it is consistent with the mathematical convention of plotting points on a Cartesian plane, and it is how many GIS systems order coordinate points.

## Batch Geocoder Function

```python
from usgeocoder import batch_geocoder
```

The `batch_geocoder` function will allow you to parallelize the requests in the `geocode_address` and `geocode_coordinates` functions.

```python
# Forward
addresses = ['123 Main St, City, State Zip', '456 Main St, City, State Zip']
located, failed = batch_geocoder(addresses, direction='forward', n_threads=100)

# Reverse
coordinates = [(-70.207895, 43.623068), (-71.469826, 43.014701)]
located, failed = batch_geocoder(coordinates, direction='reverse', n_threads=100)
```

**Note:** The `batch_geocoder` function has been optimized to run at a max of 100 for `n_threads`.
Increasing `n_threads` beyond 100 will increase the likelihood of hitting a rate limit error.

## Geocoder Class

```python
from usgeocoder import Geocoder
```

The `Geocoder` class aims to organize the geocoding process in a data pipeline.
When the `Geocoder` class is initialized, it will create a directory called `geocoder` in the current working directory.
This new directory will store each address or set of coordinates seen by the `Geocoder` class.
If this directory already exists, the `Geocoder` class will instead load in the data from the directory.
A directory is created to avoid making duplicate requests to the API for the same address or set of coordinates, whether the request was successful or not.

### Using the Process Method

The recommended way to use the `Geocoder` class is to initialize it and then use the `process()` method to manage what actions to take in the geocoding process.
The `process()` method will take a dataframe that has a column with complete addresses or sets of coordinates.
This column should be called `Address` or `Coordinates` and be formatted the same as required by the API request functions.
By default, the `process()` method will perform the following:

- Add the data from the pipeline to the `Geocoder` class.
- Forward geocode the addresses.
- Reverse geocode the coordinates from the forward geocoding step.
- Merge the geocoded data back to a copy of the original dataframe.
- Return the merged dataframe.

Here is an example of using the `process()` method.

```python
geo = Geocoder()
geocoded_df = geo.process(data=df)

# or

geo = Geocoder(df)
geocoded_df = geo.process()
```

If you want to customize the geocoding process, you can flip certain steps to `True` or `False` in the `process()` method.
Here is an example of the defaults.

```python
geo.process(
   data=df,
   forward=True,
   reverse=True,
   merge=True,
   verbose=False
)
```

**Note:** The `Geocoder` class was designed assuming that most users will be geocoding addresses.
Therefore, the default behavior is to forward geocode addresses and then reverse geocode the coordinates from the forward geocoding step.
If you are strictly reverse geocoding coordinates, you can set `forward=False` in the `process()` method to skip the forward geocoding step.

### Using Separate Methods

If you want to use the `Geocoder` class to manage the geocoding process but would like to use separate methods for each step, you can do so.
Here is an example of the separate methods utilized in the `process()` method.

```python
geo = Geocoder(df)
geo.forward()
geo.reverse()
geo.merge_data()
geocoded_df = geo.data
```

**Note:** When adding data to the `Geocoder` class, it is designed to add the `Address` or `Coordinates` as an un-duplicated list to its `addresses` and `coordinates` attributes.
When the `forward()` or `reverse()` methods are called, they look to these attributes for the data to geocode.
If you add a dataframe with both `Address` and `Coordinates` columns, the `Geocoder` class will only populate the `coordinates` attribute as there is no need to forward geocode the addresses.
If the `forward()` method is called, it will raise an error.

### Using Helper Functions

If you have a dataframe with separate columns for `Street Address`, `City`, `State`, and `Zip`, and named accordingly, you can use a helper function to create a new `Address` column, or create the column yourself.
The below example illustrates a simple step to rename the `existing_cols` to the required column names.

```python
# Create a new column with complete address using helper function
from usgeocoder import concatenate_address
existing_cols = ['address 1', 'address 2', 'city', 'state', 'zip code', 'important feature']
df = pd.DataFrame(columns=existing_cols)
df.rename(columns={
   'address 1': 'Street Address', 
   'city': 'City', 
   'state': 'State', 
   'zip code': 'Zip'
}, inplace=True)

df['Address'] = concatenate_address(df)
```

These steps work just the same for reverse geocoding to create a new `Coordinates` column with separate `Longitude` and `Latitude` columns.

```python
# Create a new column with complete coordinates using helper function
from usgeocoder import concatenate_coordinates
existing_cols = ['x', 'y', 'important feature']
df = pd.DataFrame(columns=existing_cols)
df.rename(columns={
   'x': 'Longitude', 
   'y': 'Latitude'
}, inplace=True)

df['Coordinates'] = concatenate_coordinates(df)
```

# Contribute

If you would like to make this package better, please consider contributing 😊

# License

[MIT](https://choosealicense.com/licenses/mit/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ClayGendron/usgeocoder",
    "name": "usgeocoder",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "geocoding,geocoder,geospatial,US,census,api,address,coordinates,batch processing",
    "author": "Clay Gendron",
    "author_email": "chg@claygendron.io",
    "download_url": "https://files.pythonhosted.org/packages/32/87/50436053a02e11fc227066bf0b40ceceba59bae7e2dc996a140807f819fe/usgeocoder-0.2.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n<img src=\"https://raw.githubusercontent.com/claygendron/usgeocoder/main/USGeocoder.png\" width=\"1200\">\n</h1>\n\n# Overview\n\nThank you for your interest in USGeoCoder package!\nUSGeoCoder is an easy and free-to-use package for geocoding US addresses with the US Census Geocoder API.\nThis package was created to solve two problems I encountered while trying to geocode data in my data pipelines:\n\n1. Geocode thousands of addresses in a reasonable amount of time without caps on total requests.\n2. Do it for free.\n\nThe [US Census Geocoder API](https://geocoding.geo.census.gov/geocoder/) was the best solution I found to meet these requirements.\nThere are limitations, of course (the main one being that this API only works for US addresses), but by sending requests in parallel, this package can geocode around 2,000 - 4,000 addresses per minute without ever hitting a rate limit or a total request cap.\n\nThis package is designed to help anyone, from an individual data scientist or developer working on small projects to a business managing large data pipelines.\nIf this package helps you, I would love to hear from you! And I would love it even more if you give feedback or contribute to the package \ud83d\ude0a\n\n**Note:** This package is in a Beta state, so please be aware that there may be bugs or issues. Thank you for your patience.\n\n# Table of Contents\n\n1. [Installation](#installation)\n2. [Usage](#usage)\n   - [API Request Functions](#api-request-functions)\n   - [Batch Geocoder Function](#batch-geocoder-function)\n   - [Geocoder Class](#geocoder-class)\n3. [Contribute](#contribute)\n4. [License](#license)\n\n# Installation\n\nMake sure you have Python 3 installed, along with the pandas library.\n\n```bash\npip install usgeocoder\n```\n\n# Usage\n\nThis package consists of three main sets of functions and classes.\n\n- API Request Functions (Forward and Reverse)\n- Batch Geocoder Function (Parallelize API Request Functions)\n- Geocoder Class (Data Manager for Batch Geocoder)\n\nThe components will be detailed below in order.\n\n## API Request Functions\n\n```python\nfrom usgeocoder import geocode_address, geocode_coordinates\n```\n\nIt is very simple to run a single request to geocode an address or a pair of coordinates.\n\nAddresses should look like this: `123 Main St, City, State Zip`.\n\nCoordinates should look like this: `(Longitude, Latitude)`.\n\n```python\n# Forward\naddress = '123 Main St, City, State Zip'\nresponse = geocode_address(address)\n\n# Reverse\ncoordinates = (-70.207895, 43.623068)\nresponse = geocode_coordinates(coordinates)\n```\n\n**Note:** Notice coordinate pairs are stored as (Longitude, Latitude) or (x, y).\nIf results are not as expected, try switching the order of the coordinates.\nFor instance, Google Maps shows points as (Latitude, Longitude) or (y, x).\nThe order of (Longitude, Latitude) was chosen because it is consistent with the mathematical convention of plotting points on a Cartesian plane, and it is how many GIS systems order coordinate points.\n\n## Batch Geocoder Function\n\n```python\nfrom usgeocoder import batch_geocoder\n```\n\nThe `batch_geocoder` function will allow you to parallelize the requests in the `geocode_address` and `geocode_coordinates` functions.\n\n```python\n# Forward\naddresses = ['123 Main St, City, State Zip', '456 Main St, City, State Zip']\nlocated, failed = batch_geocoder(addresses, direction='forward', n_threads=100)\n\n# Reverse\ncoordinates = [(-70.207895, 43.623068), (-71.469826, 43.014701)]\nlocated, failed = batch_geocoder(coordinates, direction='reverse', n_threads=100)\n```\n\n**Note:** The `batch_geocoder` function has been optimized to run at a max of 100 for `n_threads`.\nIncreasing `n_threads` beyond 100 will increase the likelihood of hitting a rate limit error.\n\n## Geocoder Class\n\n```python\nfrom usgeocoder import Geocoder\n```\n\nThe `Geocoder` class aims to organize the geocoding process in a data pipeline.\nWhen the `Geocoder` class is initialized, it will create a directory called `geocoder` in the current working directory.\nThis new directory will store each address or set of coordinates seen by the `Geocoder` class.\nIf this directory already exists, the `Geocoder` class will instead load in the data from the directory.\nA directory is created to avoid making duplicate requests to the API for the same address or set of coordinates, whether the request was successful or not.\n\n### Using the Process Method\n\nThe recommended way to use the `Geocoder` class is to initialize it and then use the `process()` method to manage what actions to take in the geocoding process.\nThe `process()` method will take a dataframe that has a column with complete addresses or sets of coordinates.\nThis column should be called `Address` or `Coordinates` and be formatted the same as required by the API request functions.\nBy default, the `process()` method will perform the following:\n\n- Add the data from the pipeline to the `Geocoder` class.\n- Forward geocode the addresses.\n- Reverse geocode the coordinates from the forward geocoding step.\n- Merge the geocoded data back to a copy of the original dataframe.\n- Return the merged dataframe.\n\nHere is an example of using the `process()` method.\n\n```python\ngeo = Geocoder()\ngeocoded_df = geo.process(data=df)\n\n# or\n\ngeo = Geocoder(df)\ngeocoded_df = geo.process()\n```\n\nIf you want to customize the geocoding process, you can flip certain steps to `True` or `False` in the `process()` method.\nHere is an example of the defaults.\n\n```python\ngeo.process(\n   data=df,\n   forward=True,\n   reverse=True,\n   merge=True,\n   verbose=False\n)\n```\n\n**Note:** The `Geocoder` class was designed assuming that most users will be geocoding addresses.\nTherefore, the default behavior is to forward geocode addresses and then reverse geocode the coordinates from the forward geocoding step.\nIf you are strictly reverse geocoding coordinates, you can set `forward=False` in the `process()` method to skip the forward geocoding step.\n\n### Using Separate Methods\n\nIf you want to use the `Geocoder` class to manage the geocoding process but would like to use separate methods for each step, you can do so.\nHere is an example of the separate methods utilized in the `process()` method.\n\n```python\ngeo = Geocoder(df)\ngeo.forward()\ngeo.reverse()\ngeo.merge_data()\ngeocoded_df = geo.data\n```\n\n**Note:** When adding data to the `Geocoder` class, it is designed to add the `Address` or `Coordinates` as an un-duplicated list to its `addresses` and `coordinates` attributes.\nWhen the `forward()` or `reverse()` methods are called, they look to these attributes for the data to geocode.\nIf you add a dataframe with both `Address` and `Coordinates` columns, the `Geocoder` class will only populate the `coordinates` attribute as there is no need to forward geocode the addresses.\nIf the `forward()` method is called, it will raise an error.\n\n### Using Helper Functions\n\nIf you have a dataframe with separate columns for `Street Address`, `City`, `State`, and `Zip`, and named accordingly, you can use a helper function to create a new `Address` column, or create the column yourself.\nThe below example illustrates a simple step to rename the `existing_cols` to the required column names.\n\n```python\n# Create a new column with complete address using helper function\nfrom usgeocoder import concatenate_address\nexisting_cols = ['address 1', 'address 2', 'city', 'state', 'zip code', 'important feature']\ndf = pd.DataFrame(columns=existing_cols)\ndf.rename(columns={\n   'address 1': 'Street Address', \n   'city': 'City', \n   'state': 'State', \n   'zip code': 'Zip'\n}, inplace=True)\n\ndf['Address'] = concatenate_address(df)\n```\n\nThese steps work just the same for reverse geocoding to create a new `Coordinates` column with separate `Longitude` and `Latitude` columns.\n\n```python\n# Create a new column with complete coordinates using helper function\nfrom usgeocoder import concatenate_coordinates\nexisting_cols = ['x', 'y', 'important feature']\ndf = pd.DataFrame(columns=existing_cols)\ndf.rename(columns={\n   'x': 'Longitude', \n   'y': 'Latitude'\n}, inplace=True)\n\ndf['Coordinates'] = concatenate_coordinates(df)\n```\n\n# Contribute\n\nIf you would like to make this package better, please consider contributing \ud83d\ude0a\n\n# License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "USGeoCoder is an easy and free-to-use package for geocoding US addresses with the US Census Geocoder API",
    "version": "0.2",
    "project_urls": {
        "Documentation": "https://github.com/ClayGendron/usgeocoder/",
        "Homepage": "https://github.com/ClayGendron/usgeocoder",
        "Source Code": "https://github.com/ClayGendron/usgeocoder/"
    },
    "split_keywords": [
        "geocoding",
        "geocoder",
        "geospatial",
        "us",
        "census",
        "api",
        "address",
        "coordinates",
        "batch processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "197a9f566833f2e1e416763a025cd219677c766ffb703b4a1a786019559c4eee",
                "md5": "7912757c653be87ee523075f15e5a994",
                "sha256": "5160275b7945d8397d3f43eb3cd0ca29b5d166dcc3b95fe19ccec776ab66c5a8"
            },
            "downloads": -1,
            "filename": "usgeocoder-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7912757c653be87ee523075f15e5a994",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14644,
            "upload_time": "2023-09-20T03:43:21",
            "upload_time_iso_8601": "2023-09-20T03:43:21.713587Z",
            "url": "https://files.pythonhosted.org/packages/19/7a/9f566833f2e1e416763a025cd219677c766ffb703b4a1a786019559c4eee/usgeocoder-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "328750436053a02e11fc227066bf0b40ceceba59bae7e2dc996a140807f819fe",
                "md5": "8921f2a8cdfb10acd19e44a4ee2853ec",
                "sha256": "4e694f2a2bf82f0b0b28d053e4d669ee16938ada4cb892dcd41cc64a5b222eaf"
            },
            "downloads": -1,
            "filename": "usgeocoder-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8921f2a8cdfb10acd19e44a4ee2853ec",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16382,
            "upload_time": "2023-09-20T03:43:23",
            "upload_time_iso_8601": "2023-09-20T03:43:23.289803Z",
            "url": "https://files.pythonhosted.org/packages/32/87/50436053a02e11fc227066bf0b40ceceba59bae7e2dc996a140807f819fe/usgeocoder-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-20 03:43:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ClayGendron",
    "github_project": "usgeocoder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "usgeocoder"
}
        
Elapsed time: 0.82470s