agrc-sweeper


Nameagrc-sweeper JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/agrc/sweeper
SummaryCLI tool for making good data
upload_time2024-05-07 17:18:55
maintainerNone
docs_urlNone
authorUGRC
requires_python>=3
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # agrc-sweeper [![PyPI version](https://badge.fury.io/py/agrc-sweeper.svg)](https://badge.fury.io/py/agrc-sweeper)[![Push Events](https://github.com/agrc/sweeper/actions/workflows/push.yml/badge.svg)](https://github.com/agrc/sweeper/actions/workflows/push.yml)

The data cleaning service.

![sweeper_sm](https://user-images.githubusercontent.com/325813/90411835-91c4c080-e069-11ea-9d03-f3e60421b835.png)

## Available Sweepers

### Addresses

Checks that addresses have minimum required parts and optionally normalizes them.

### Duplicates

Checks for duplicate features.

### Empties

Checks for empty geometries.

### Metadata

Checks to make sure that the metadata meets [the Basic SGID Metadata Requirements](https://gis.utah.gov/about/policy/metadata/#basic-sgid-metadata).

#### Tags

Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).

This check also verifies that the data set contains a tag that matches the database name (e.g. `SGID`) and the schema (e.g. `Cadastre`).

`--try-fix` adds missing required tags and title-cases any existing tags.

#### Summary

Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.

#### Description

Checks to make sure that the description contains a link to a data page on gis.utah.gov.

#### Use Limitations

Checks to make sure that the text in this section matches the [official text for UGRC](src/sweeper/sweepers/UseLimitations.html).

`--try-fix` updates the text to match the official text.

## Parsing Addresses

This project contains a module that can be used as a standalone address parser, `sweeper.address_parser`. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.

### Usage Example

```python
from sweeper.address_parser import Address

address = Address('123 South Main Street')
print(address)

'''
--> Parsed Address:
{'address_number': '123',
 'normalized': '123 S MAIN ST',
 'prefix_direction': 'S',
 'street_name': 'MAIN',
 'street_type': 'ST'}
'''
```

### Available Address class properties

All properties default to None if there is no parsed value.

`address_number`

`address_number_suffix`

`prefix_direction`

`street_name`

`street_direction`

`street_type`

`unit_type`

`unit_id`
If no `unit_type` is found, this property is prefixed with `#` (e.g. `# 3`). If `unit_type` is found, `#` is stripped from this property.

`city`

`zip_code`

`po_box`
The PO Box if a po-box-type address was entered (e.g. `po_box` would be `1` for `p.o. box 1`).

`normalized`
A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format `PO BOX <number>`.

## Installation (requires Pro 2.7+)

<!-- Current conda install arcpy -c esri seems to be wonky; just clone to be safe -->

1. clone arcgis conda environment
   - `conda create --name sweeper --clone arcgispro-py3`
1. activate environment
   - `activate sweeper`
1. install sweeper
   - `pip install agrc-sweeper`
1. Optionally duplicate `config.sample.json` as `config.json` in the folder where you will run sweeper.

> [!CAUTION]
> This is required for the following functions:
>
> - `--scheduled` argument (required for sending emails)
> - `--change-detect` argument
> - using user-specific connection files via the `CONNECTIONS_FOLDER` config value

## Exclusions

Tables can be skipped by adding values to the `EXCLUSIONS.<sweeper_key>` config array. These values are matched against table names using [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Note that these do not apply when using the `--table-name` argument.

## Development

1. clone arcgis conda environment
   - `conda create --name sweeper --clone arcgispro-py3`
1. activate environment
   - `activate sweeper`
1. install required dependencies to work on sweeper
   - `pip install -e ".[tests]"`
1. `test_metadata.py` uses a SQL database that needs to be restored via `src/sweeper/tests/data/Sweeper.bak` to your local SQL Server.
1. run sweeper: `sweeper`
1. test: `pytest`
1. lint: `ruff check .`
1. format: `ruff format .`



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/agrc/sweeper",
    "name": "agrc-sweeper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": null,
    "keywords": null,
    "author": "UGRC",
    "author_email": "ugrc-developers@utah.gov",
    "download_url": "https://files.pythonhosted.org/packages/b9/2e/c0f148029ac1dd9e035cdc4b5d8cb4884228c8d99562a040162c3a979ffd/agrc-sweeper-2.0.0.tar.gz",
    "platform": null,
    "description": "# agrc-sweeper [![PyPI version](https://badge.fury.io/py/agrc-sweeper.svg)](https://badge.fury.io/py/agrc-sweeper)[![Push Events](https://github.com/agrc/sweeper/actions/workflows/push.yml/badge.svg)](https://github.com/agrc/sweeper/actions/workflows/push.yml)\n\nThe data cleaning service.\n\n![sweeper_sm](https://user-images.githubusercontent.com/325813/90411835-91c4c080-e069-11ea-9d03-f3e60421b835.png)\n\n## Available Sweepers\n\n### Addresses\n\nChecks that addresses have minimum required parts and optionally normalizes them.\n\n### Duplicates\n\nChecks for duplicate features.\n\n### Empties\n\nChecks for empty geometries.\n\n### Metadata\n\nChecks to make sure that the metadata meets [the Basic SGID Metadata Requirements](https://gis.utah.gov/about/policy/metadata/#basic-sgid-metadata).\n\n#### Tags\n\nChecks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).\n\nThis check also verifies that the data set contains a tag that matches the database name (e.g. `SGID`) and the schema (e.g. `Cadastre`).\n\n`--try-fix` adds missing required tags and title-cases any existing tags.\n\n#### Summary\n\nChecks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.\n\n#### Description\n\nChecks to make sure that the description contains a link to a data page on gis.utah.gov.\n\n#### Use Limitations\n\nChecks to make sure that the text in this section matches the [official text for UGRC](src/sweeper/sweepers/UseLimitations.html).\n\n`--try-fix` updates the text to match the official text.\n\n## Parsing Addresses\n\nThis project contains a module that can be used as a standalone address parser, `sweeper.address_parser`. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.\n\n### Usage Example\n\n```python\nfrom sweeper.address_parser import Address\n\naddress = Address('123 South Main Street')\nprint(address)\n\n'''\n--> Parsed Address:\n{'address_number': '123',\n 'normalized': '123 S MAIN ST',\n 'prefix_direction': 'S',\n 'street_name': 'MAIN',\n 'street_type': 'ST'}\n'''\n```\n\n### Available Address class properties\n\nAll properties default to None if there is no parsed value.\n\n`address_number`\n\n`address_number_suffix`\n\n`prefix_direction`\n\n`street_name`\n\n`street_direction`\n\n`street_type`\n\n`unit_type`\n\n`unit_id`\nIf no `unit_type` is found, this property is prefixed with `#` (e.g. `# 3`). If `unit_type` is found, `#` is stripped from this property.\n\n`city`\n\n`zip_code`\n\n`po_box`\nThe PO Box if a po-box-type address was entered (e.g. `po_box` would be `1` for `p.o. box 1`).\n\n`normalized`\nA normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format `PO BOX <number>`.\n\n## Installation (requires Pro 2.7+)\n\n<!-- Current conda install arcpy -c esri seems to be wonky; just clone to be safe -->\n\n1. clone arcgis conda environment\n   - `conda create --name sweeper --clone arcgispro-py3`\n1. activate environment\n   - `activate sweeper`\n1. install sweeper\n   - `pip install agrc-sweeper`\n1. Optionally duplicate `config.sample.json` as `config.json` in the folder where you will run sweeper.\n\n> [!CAUTION]\n> This is required for the following functions:\n>\n> - `--scheduled` argument (required for sending emails)\n> - `--change-detect` argument\n> - using user-specific connection files via the `CONNECTIONS_FOLDER` config value\n\n## Exclusions\n\nTables can be skipped by adding values to the `EXCLUSIONS.<sweeper_key>` config array. These values are matched against table names using [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Note that these do not apply when using the `--table-name` argument.\n\n## Development\n\n1. clone arcgis conda environment\n   - `conda create --name sweeper --clone arcgispro-py3`\n1. activate environment\n   - `activate sweeper`\n1. install required dependencies to work on sweeper\n   - `pip install -e \".[tests]\"`\n1. `test_metadata.py` uses a SQL database that needs to be restored via `src/sweeper/tests/data/Sweeper.bak` to your local SQL Server.\n1. run sweeper: `sweeper`\n1. test: `pytest`\n1. lint: `ruff check .`\n1. format: `ruff format .`\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CLI tool for making good data",
    "version": "2.0.0",
    "project_urls": {
        "Homepage": "https://github.com/agrc/sweeper"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "255dd5232f52da47bb21cbc6b593000e601ca61c6e85553a39a9315bbf357970",
                "md5": "ff23c203e70a7016fc906fbcd19aabe9",
                "sha256": "b47238c6c5d5b18f92ee0490dae141941ceec79d6cf7d0ca61c4932570db6da8"
            },
            "downloads": -1,
            "filename": "agrc_sweeper-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff23c203e70a7016fc906fbcd19aabe9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 24101,
            "upload_time": "2024-05-07T17:18:54",
            "upload_time_iso_8601": "2024-05-07T17:18:54.702143Z",
            "url": "https://files.pythonhosted.org/packages/25/5d/d5232f52da47bb21cbc6b593000e601ca61c6e85553a39a9315bbf357970/agrc_sweeper-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b92ec0f148029ac1dd9e035cdc4b5d8cb4884228c8d99562a040162c3a979ffd",
                "md5": "ba0d01185182ac4cc21abcc0159019f7",
                "sha256": "a329491f572651fcf3f224951d2695df794cd4502710c28e6afa7b9c0d45ee3d"
            },
            "downloads": -1,
            "filename": "agrc-sweeper-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ba0d01185182ac4cc21abcc0159019f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 19376,
            "upload_time": "2024-05-07T17:18:55",
            "upload_time_iso_8601": "2024-05-07T17:18:55.969904Z",
            "url": "https://files.pythonhosted.org/packages/b9/2e/c0f148029ac1dd9e035cdc4b5d8cb4884228c8d99562a040162c3a979ffd/agrc-sweeper-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-07 17:18:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "agrc",
    "github_project": "sweeper",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "agrc-sweeper"
}
        
Elapsed time: 0.22472s