# ugrc-sweeper [![PyPI version](https://badge.fury.io/py/ugrc-sweeper.svg)](https://badge.fury.io/py/ugrc-sweeper)[![Push Events](https://github.com/agrc/sweeper/actions/workflows/push.yml/badge.svg)](https://github.com/agrc/sweeper/actions/workflows/push.yml)
The data cleaning service.
![sweeper_sm](https://user-images.githubusercontent.com/325813/90411835-91c4c080-e069-11ea-9d03-f3e60421b835.png)
## Available Sweepers
### Addresses
Checks that addresses have minimum required parts and optionally normalizes them.
### Duplicates
Checks for duplicate features.
### Empties
Checks for empty geometries.
### Metadata
Checks to make sure that the metadata meets [the Basic SGID Metadata Requirements](https://gis.utah.gov/about/policy/metadata/#basic-sgid-metadata).
#### Tags
Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).
This check also verifies that the data set contains a tag that matches the database name (e.g. `SGID`) and the schema (e.g. `Cadastre`).
`--try-fix` adds missing required tags and title-cases any existing tags.
#### Summary
Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.
#### Description
Checks to make sure that the description contains a link to a data page on gis.utah.gov.
#### Use Limitations
Checks to make sure that the text in this section matches the [official text for UGRC](src/sweeper/sweepers/UseLimitations.html).
`--try-fix` updates the text to match the official text.
## Parsing Addresses
This project contains a module that can be used as a standalone address parser, `sweeper.address_parser`. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.
### Usage Example
```python
from sweeper.address_parser import Address
address = Address('123 South Main Street')
print(address)
'''
--> Parsed Address:
{'address_number': '123',
'normalized': '123 S MAIN ST',
'prefix_direction': 'S',
'street_name': 'MAIN',
'street_type': 'ST'}
'''
```
### Available Address class properties
All properties default to None if there is no parsed value.
`address_number`
`address_number_suffix`
`prefix_direction`
`street_name`
`street_direction`
`street_type`
`unit_type`
`unit_id`
If no `unit_type` is found, this property is prefixed with `#` (e.g. `# 3`). If `unit_type` is found, `#` is stripped from this property.
`city`
`zip_code`
`po_box`
The PO Box if a po-box-type address was entered (e.g. `po_box` would be `1` for `p.o. box 1`).
`normalized`
A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format `PO BOX <number>`.
## Installation (requires Pro 2.7+)
<!-- Current conda install arcpy -c esri seems to be wonky; just clone to be safe -->
1. clone arcgis conda environment
- `conda create --name sweeper --clone arcgispro-py3`
1. activate environment
- `activate sweeper`
1. install sweeper
- `pip install ugrc-sweeper`
1. Optionally duplicate `config.sample.json` as `config.json` in the folder where you will run sweeper.
> [!CAUTION]
> This is required for the following functions:
>
> - `--scheduled` argument (required for sending emails)
> - `--change-detect` argument
> - using user-specific connection files via the `CONNECTIONS_FOLDER` config value
## Exclusions
Tables can be skipped by adding values to the `EXCLUSIONS.<sweeper_key>` config array. These values are matched against table names using [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Note that these do not apply when using the `--table-name` argument.
## Development
1. clone arcgis conda environment
- `conda create --name sweeper --clone arcgispro-py3`
1. activate environment
- `activate sweeper`
1. install required dependencies to work on sweeper
- `pip install -e ".[tests]"`
1. `test_metadata.py` uses a SQL database that needs to be restored via `src/sweeper/tests/data/Sweeper.bak` to your local SQL Server.
1. run sweeper: `sweeper`
1. test: `pytest`
1. lint: `ruff check .`
1. format: `ruff format .`
Raw data
{
"_id": null,
"home_page": "https://github.com/agrc/sweeper",
"name": "ugrc-sweeper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": null,
"author": "UGRC",
"author_email": "ugrc-developers@utah.gov",
"download_url": "https://files.pythonhosted.org/packages/91/7f/a815b3f6625cb763ee79e62e05d0219d935d55ec1155030bf59a47e52bd7/ugrc-sweeper-2.0.2.tar.gz",
"platform": null,
"description": "# ugrc-sweeper [![PyPI version](https://badge.fury.io/py/ugrc-sweeper.svg)](https://badge.fury.io/py/ugrc-sweeper)[![Push Events](https://github.com/agrc/sweeper/actions/workflows/push.yml/badge.svg)](https://github.com/agrc/sweeper/actions/workflows/push.yml)\n\nThe data cleaning service.\n\n![sweeper_sm](https://user-images.githubusercontent.com/325813/90411835-91c4c080-e069-11ea-9d03-f3e60421b835.png)\n\n## Available Sweepers\n\n### Addresses\n\nChecks that addresses have minimum required parts and optionally normalizes them.\n\n### Duplicates\n\nChecks for duplicate features.\n\n### Empties\n\nChecks for empty geometries.\n\n### Metadata\n\nChecks to make sure that the metadata meets [the Basic SGID Metadata Requirements](https://gis.utah.gov/about/policy/metadata/#basic-sgid-metadata).\n\n#### Tags\n\nChecks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).\n\nThis check also verifies that the data set contains a tag that matches the database name (e.g. `SGID`) and the schema (e.g. `Cadastre`).\n\n`--try-fix` adds missing required tags and title-cases any existing tags.\n\n#### Summary\n\nChecks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.\n\n#### Description\n\nChecks to make sure that the description contains a link to a data page on gis.utah.gov.\n\n#### Use Limitations\n\nChecks to make sure that the text in this section matches the [official text for UGRC](src/sweeper/sweepers/UseLimitations.html).\n\n`--try-fix` updates the text to match the official text.\n\n## Parsing Addresses\n\nThis project contains a module that can be used as a standalone address parser, `sweeper.address_parser`. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.\n\n### Usage Example\n\n```python\nfrom sweeper.address_parser import Address\n\naddress = Address('123 South Main Street')\nprint(address)\n\n'''\n--> Parsed Address:\n{'address_number': '123',\n 'normalized': '123 S MAIN ST',\n 'prefix_direction': 'S',\n 'street_name': 'MAIN',\n 'street_type': 'ST'}\n'''\n```\n\n### Available Address class properties\n\nAll properties default to None if there is no parsed value.\n\n`address_number`\n\n`address_number_suffix`\n\n`prefix_direction`\n\n`street_name`\n\n`street_direction`\n\n`street_type`\n\n`unit_type`\n\n`unit_id`\nIf no `unit_type` is found, this property is prefixed with `#` (e.g. `# 3`). If `unit_type` is found, `#` is stripped from this property.\n\n`city`\n\n`zip_code`\n\n`po_box`\nThe PO Box if a po-box-type address was entered (e.g. `po_box` would be `1` for `p.o. box 1`).\n\n`normalized`\nA normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format `PO BOX <number>`.\n\n## Installation (requires Pro 2.7+)\n\n<!-- Current conda install arcpy -c esri seems to be wonky; just clone to be safe -->\n\n1. clone arcgis conda environment\n - `conda create --name sweeper --clone arcgispro-py3`\n1. activate environment\n - `activate sweeper`\n1. install sweeper\n - `pip install ugrc-sweeper`\n1. Optionally duplicate `config.sample.json` as `config.json` in the folder where you will run sweeper.\n\n> [!CAUTION]\n> This is required for the following functions:\n>\n> - `--scheduled` argument (required for sending emails)\n> - `--change-detect` argument\n> - using user-specific connection files via the `CONNECTIONS_FOLDER` config value\n\n## Exclusions\n\nTables can be skipped by adding values to the `EXCLUSIONS.<sweeper_key>` config array. These values are matched against table names using [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Note that these do not apply when using the `--table-name` argument.\n\n## Development\n\n1. clone arcgis conda environment\n - `conda create --name sweeper --clone arcgispro-py3`\n1. activate environment\n - `activate sweeper`\n1. install required dependencies to work on sweeper\n - `pip install -e \".[tests]\"`\n1. `test_metadata.py` uses a SQL database that needs to be restored via `src/sweeper/tests/data/Sweeper.bak` to your local SQL Server.\n1. run sweeper: `sweeper`\n1. test: `pytest`\n1. lint: `ruff check .`\n1. format: `ruff format .`\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "CLI tool for making good data",
"version": "2.0.2",
"project_urls": {
"Homepage": "https://github.com/agrc/sweeper"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "72c4292dff49cb28c5e4b43ea92e62a6c694ed9f9714f99b9db5c6114abac29d",
"md5": "99a662cf4dd0502f78e2ef2ec9a28455",
"sha256": "f32bbb0b071f63b4b01c73e569dcd7645b85b0342626ddef2efa71e77393053f"
},
"downloads": -1,
"filename": "ugrc_sweeper-2.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "99a662cf4dd0502f78e2ef2ec9a28455",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 24178,
"upload_time": "2024-05-08T17:22:42",
"upload_time_iso_8601": "2024-05-08T17:22:42.411787Z",
"url": "https://files.pythonhosted.org/packages/72/c4/292dff49cb28c5e4b43ea92e62a6c694ed9f9714f99b9db5c6114abac29d/ugrc_sweeper-2.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "917fa815b3f6625cb763ee79e62e05d0219d935d55ec1155030bf59a47e52bd7",
"md5": "1fcc8af0a9cf4dcecc601d366cf5730b",
"sha256": "a0043dcd2381bb83d82e46ff646ff7acb04c5c6c053fd16280a1c0ce2e02565b"
},
"downloads": -1,
"filename": "ugrc-sweeper-2.0.2.tar.gz",
"has_sig": false,
"md5_digest": "1fcc8af0a9cf4dcecc601d366cf5730b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 21544,
"upload_time": "2024-05-08T17:22:44",
"upload_time_iso_8601": "2024-05-08T17:22:44.710028Z",
"url": "https://files.pythonhosted.org/packages/91/7f/a815b3f6625cb763ee79e62e05d0219d935d55ec1155030bf59a47e52bd7/ugrc-sweeper-2.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-08 17:22:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "agrc",
"github_project": "sweeper",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "ugrc-sweeper"
}