| Name | soildb JSON |
| Version |
0.2.0
JSON |
| download |
| home_page | None |
| Summary | A Python client for accessing USDA Soil Data Access (SDA) web service |
| upload_time | 2025-10-19 19:55:43 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.8 |
| license | MIT |
| keywords |
agriculture
gis
sda
soil
ssurgo
usda
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# soildb
[](https://pypi.org/project/soildb/)
[](https://opensource.org/licenses/MIT)
Python client for the USDA-NRCS Soil Data Access (SDA) web service and
other National Cooperative Soil Survey data sources.
## Overview
`soildb` provides Python access to the USDA Soil Data Access (SDA) web
service <https://sdmdataaccess.nrcs.usda.gov/>.
Query soil survey data, export to pandas/polars DataFrames, and handle
spatial queries.
## Installation
``` bash
pip install soildb
```
For spatial functionality:
``` bash
pip install soildb[spatial]
```
For all optional features support:
``` bash
pip install soildb[all]
```
## Features
- Query soil survey data from SDA
- Export to pandas and polars DataFrames
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination
- Async I/O for high performance and concurrency
## Quick Start
### Query Builder
This is a basic example of building a custom query and getting the
results:
``` python
from soildb import Query
query = (Query()
.select("mukey", "muname", "musym")
.from_("mapunit")
.inner_join("legend", "mapunit.lkey = legend.lkey")
.where("areasymbol = 'IA109'")
.limit(5))
# inspect query
print(query.to_sql())
result = await soildb.SDAClient().execute(query)
df = result.to_pandas()
print(df.head())
```
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
mukey muname musym
0 408337 Colo silty clay loam, channeled, 0 to 2 percen... 1133
1 408339 Colo silty clay loam, 0 to 2 percent slopes 133
2 408340 Colo silty clay loam, 2 to 4 percent slopes 133B
3 408345 Clarion loam, 9 to 14 percent slopes, moderate... 138D2
4 408348 Harpster silt loam, 0 to 2 percent slopes 1595
## Async Setup
You may have noticed that we need to `await` the query execution result.
All soildb functions are async. Here’s how to run them in different
environments like Jupyter notebooks, VSCode, or regular Python scripts.
### Basic Async Execution
``` python
import asyncio
import soildb
async def main():
# Your async code here
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(main())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(main())
result
```
For comprehensive async usage, see the [Async Programming
Guide](docs/async.md).
### Convenience Functions
soildb provides several high-level functions for common tasks:
``` python
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```
Found 80 map units
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
</div>
If you have suggestions for new convenience functions please file a
[“feature request” on
GitHub](https://github,com/brownag/py-soildb/issues/new).
### Spatial Queries
soildb also offers support for queries by location via
`spatial_query()`. You can specify arbitrary geometry to target several
spatial and tabular types of results.
``` python
import asyncio
async def spatial_query_example():
from soildb import spatial_query
# Point query
async with soildb.SDAClient() as client:
response = await spatial_query(
geometry="POINT (-93.6 42.0)",
table="mupolygon",
spatial_relation="intersects"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(spatial_query_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(spatial_query_example())
result
```
Point query found 1 results
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | mukey | areasymbol | musym | nationalmusym | muname | mukind |
|----|----|----|----|----|----|----|
| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |
</div>
### Bulk Data Fetching
soildb makes it easy to retrieve large datasets efficiently, using
concurrent requests and built-in functions that automatically handle
pagination.
``` python
import asyncio
async def bulk_fetch_example():
from soildb import fetch_by_keys, get_mukey_by_areasymbol
# Get mukeys for multiple areas concurrently
areas = ["IA109", "IA113", "IA117"]
mukeys_tasks = [
get_mukey_by_areasymbol([area])
for area in areas
]
# Execute all mukey requests concurrently
mukeys_results = await asyncio.gather(*mukeys_tasks)
# Flatten the results (each task returns a list)
all_mukeys = []
for mukeys in mukeys_results:
all_mukeys.extend(mukeys)
print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
# Fetch data in chunks automatically
response = await fetch_by_keys(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(bulk_fetch_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(bulk_fetch_example())
result.head(10)
```
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | mukey | cokey | compname | localphase | comppct_r |
|-----|--------|----------|----------|------------|-----------|
| 0 | 408333 | 25562547 | Kingston | \<NA\> | 2 |
| 1 | 408333 | 25562548 | Okoboji | \<NA\> | 5 |
| 2 | 408333 | 25562549 | Spicer | \<NA\> | 90 |
| 3 | 408333 | 25562550 | Madelia | \<NA\> | 3 |
| 4 | 408334 | 25562837 | Okoboji | \<NA\> | 5 |
| 5 | 408334 | 25562838 | Glencoe | \<NA\> | 3 |
| 6 | 408334 | 25562839 | Canisteo | \<NA\> | 2 |
| 7 | 408334 | 25562840 | Webster | \<NA\> | 85 |
| 8 | 408334 | 25562841 | Nicollet | \<NA\> | 5 |
| 9 | 408335 | 25562135 | Biscay | \<NA\> | 1 |
</div>
The `component` table has a hierarchical relationship:
- mukey (map unit key) is the parent
- cokey (component key) is the child
So when fetching components, you typically want to filter by mukey to
get all components for specific map units.
The specialized `fetch_component_by_mukey()` convenience function
handles this, but above we use the lower-level `fetch_by_keys()` with
the `"mukey"` as the `key_column` to achieve the same result and
demonstrate pagination over chunks with `100` rows each.
# Examples
See the [`examples/` directory](examples/) and [documentation](docs/)
for detailed usage patterns.
## License
This project is licensed under the MIT License. See the
[LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "soildb",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "agriculture, gis, sda, soil, ssurgo, usda",
"author": null,
"author_email": "Andrew Brown <brown.andrewg@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e8/f2/2f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487/soildb-0.2.0.tar.gz",
"platform": null,
"description": "# soildb\n\n\n[](https://pypi.org/project/soildb/)\n[](https://opensource.org/licenses/MIT)\n\nPython client for the USDA-NRCS Soil Data Access (SDA) web service and\nother National Cooperative Soil Survey data sources.\n\n## Overview\n\n`soildb` provides Python access to the USDA Soil Data Access (SDA) web\nservice <https://sdmdataaccess.nrcs.usda.gov/>.\n\nQuery soil survey data, export to pandas/polars DataFrames, and handle\nspatial queries.\n\n## Installation\n\n``` bash\npip install soildb\n```\n\nFor spatial functionality:\n\n``` bash\npip install soildb[spatial]\n```\n\nFor all optional features support:\n\n``` bash\npip install soildb[all]\n```\n\n## Features\n\n- Query soil survey data from SDA\n- Export to pandas and polars DataFrames\n- Build custom SQL queries with fluent interface\n- Spatial queries with points, bounding boxes, and polygons\n- Bulk data fetching with automatic pagination\n- Async I/O for high performance and concurrency\n\n## Quick Start\n\n### Query Builder\n\nThis is a basic example of building a custom query and getting the\nresults:\n\n``` python\nfrom soildb import Query\n \nquery = (Query()\n .select(\"mukey\", \"muname\", \"musym\")\n .from_(\"mapunit\")\n .inner_join(\"legend\", \"mapunit.lkey = legend.lkey\")\n .where(\"areasymbol = 'IA109'\")\n .limit(5))\n \n# inspect query\nprint(query.to_sql())\n\nresult = await soildb.SDAClient().execute(query)\n\ndf = result.to_pandas()\nprint(df.head())\n```\n\n SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'\n mukey muname musym\n 0 408337 Colo silty clay loam, channeled, 0 to 2 percen... 1133\n 1 408339 Colo silty clay loam, 0 to 2 percent slopes 133\n 2 408340 Colo silty clay loam, 2 to 4 percent slopes 133B\n 3 408345 Clarion loam, 9 to 14 percent slopes, moderate... 138D2\n 4 408348 Harpster silt loam, 0 to 2 percent slopes 1595\n\n## Async Setup\n\nYou may have noticed that we need to `await` the query execution result.\n\nAll soildb functions are async. Here\u2019s how to run them in different\nenvironments like Jupyter notebooks, VSCode, or regular Python scripts.\n\n### Basic Async Execution\n\n``` python\nimport asyncio\nimport soildb\n\nasync def main():\n # Your async code here\n mapunits = await soildb.get_mapunit_by_areasymbol(\"IA109\")\n df = mapunits.to_pandas()\n return df\n\n# Handle different environments\ntry:\n # Check if there's already an event loop (Jupyter, etc.)\n loop = asyncio.get_running_loop()\n import nest_asyncio\n nest_asyncio.apply()\n result = loop.run_until_complete(main())\nexcept RuntimeError:\n # No existing loop, use asyncio.run()\n result = asyncio.run(main())\n\nresult\n```\n\nFor comprehensive async usage, see the [Async Programming\nGuide](docs/async.md).\n\n### Convenience Functions\n\nsoildb provides several high-level functions for common tasks:\n\n``` python\nmapunits = await soildb.get_mapunit_by_areasymbol(\"IA109\")\ndf = mapunits.to_pandas()\nprint(f\"Found {len(df)} map units\")\ndf.head()\n```\n\n Found 80 map units\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | mukey | musym | muname | mukind | muacres | areasymbol | areaname |\n|----|----|----|----|----|----|----|----|\n| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |\n| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |\n| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |\n| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |\n| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |\n\n</div>\n\nIf you have suggestions for new convenience functions please file a\n[\u201cfeature request\u201d on\nGitHub](https://github,com/brownag/py-soildb/issues/new).\n\n### Spatial Queries\n\nsoildb also offers support for queries by location via\n`spatial_query()`. You can specify arbitrary geometry to target several\nspatial and tabular types of results.\n\n``` python\nimport asyncio\n\nasync def spatial_query_example():\n from soildb import spatial_query\n \n # Point query\n async with soildb.SDAClient() as client:\n response = await spatial_query(\n geometry=\"POINT (-93.6 42.0)\",\n table=\"mupolygon\",\n spatial_relation=\"intersects\"\n )\n df = response.to_pandas()\n print(f\"Point query found {len(df)} results\")\n return df\n\n# Handle different environments\ntry:\n # Check if there's already an event loop (Jupyter, etc.)\n loop = asyncio.get_running_loop()\n import nest_asyncio\n nest_asyncio.apply()\n result = loop.run_until_complete(spatial_query_example())\nexcept RuntimeError:\n # No existing loop, use asyncio.run()\n result = asyncio.run(spatial_query_example())\n\nresult\n```\n\n Point query found 1 results\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | mukey | areasymbol | musym | nationalmusym | muname | mukind |\n|----|----|----|----|----|----|----|\n| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |\n\n</div>\n\n### Bulk Data Fetching\n\nsoildb makes it easy to retrieve large datasets efficiently, using\nconcurrent requests and built-in functions that automatically handle\npagination.\n\n``` python\nimport asyncio\n\nasync def bulk_fetch_example():\n from soildb import fetch_by_keys, get_mukey_by_areasymbol\n \n # Get mukeys for multiple areas concurrently\n areas = [\"IA109\", \"IA113\", \"IA117\"]\n mukeys_tasks = [\n get_mukey_by_areasymbol([area]) \n for area in areas\n ]\n \n # Execute all mukey requests concurrently\n mukeys_results = await asyncio.gather(*mukeys_tasks)\n \n # Flatten the results (each task returns a list)\n all_mukeys = []\n for mukeys in mukeys_results:\n all_mukeys.extend(mukeys)\n \n print(f\"Found {len(all_mukeys)} mukeys across {len(areas)} areas\")\n \n # Fetch data in chunks automatically\n response = await fetch_by_keys(\n all_mukeys, \n \"component\", \n key_column=\"mukey\", \n chunk_size=100,\n columns=[\"mukey\", \"cokey\", \"compname\", \"localphase\", \"comppct_r\"]\n )\n df = response.to_pandas()\n print(f\"Fetched {len(df)} component records\")\n return df\n\n# Handle different environments\ntry:\n # Check if there's already an event loop (Jupyter, etc.)\n loop = asyncio.get_running_loop()\n import nest_asyncio\n nest_asyncio.apply()\n result = loop.run_until_complete(bulk_fetch_example())\nexcept RuntimeError:\n # No existing loop, use asyncio.run()\n result = asyncio.run(bulk_fetch_example())\n\nresult.head(10)\n```\n\n Found 410 mukeys across 3 areas\n Fetching 410 keys in 5 chunks of 100\n Fetched 1067 component records\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | mukey | cokey | compname | localphase | comppct_r |\n|-----|--------|----------|----------|------------|-----------|\n| 0 | 408333 | 25562547 | Kingston | \\<NA\\> | 2 |\n| 1 | 408333 | 25562548 | Okoboji | \\<NA\\> | 5 |\n| 2 | 408333 | 25562549 | Spicer | \\<NA\\> | 90 |\n| 3 | 408333 | 25562550 | Madelia | \\<NA\\> | 3 |\n| 4 | 408334 | 25562837 | Okoboji | \\<NA\\> | 5 |\n| 5 | 408334 | 25562838 | Glencoe | \\<NA\\> | 3 |\n| 6 | 408334 | 25562839 | Canisteo | \\<NA\\> | 2 |\n| 7 | 408334 | 25562840 | Webster | \\<NA\\> | 85 |\n| 8 | 408334 | 25562841 | Nicollet | \\<NA\\> | 5 |\n| 9 | 408335 | 25562135 | Biscay | \\<NA\\> | 1 |\n\n</div>\n\nThe `component` table has a hierarchical relationship:\n\n- mukey (map unit key) is the parent\n- cokey (component key) is the child\n\nSo when fetching components, you typically want to filter by mukey to\nget all components for specific map units.\n\nThe specialized `fetch_component_by_mukey()` convenience function\nhandles this, but above we use the lower-level `fetch_by_keys()` with\nthe `\"mukey\"` as the `key_column` to achieve the same result and\ndemonstrate pagination over chunks with `100` rows each.\n\n# Examples\n\nSee the [`examples/` directory](examples/) and [documentation](docs/)\nfor detailed usage patterns.\n\n## License\n\nThis project is licensed under the MIT License. See the\n[LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python client for accessing USDA Soil Data Access (SDA) web service",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/brownag/py-soildb/issues",
"Documentation": "https://py-soildb.readthedocs.io",
"Homepage": "https://github.com/brownag/py-soildb",
"Repository": "https://github.com/brownag/py-soildb"
},
"split_keywords": [
"agriculture",
" gis",
" sda",
" soil",
" ssurgo",
" usda"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "96378e338dc5720845c9847c32199d5ba2ca69f06c418f83b2397262577d1af0",
"md5": "2f8862b1154fbdde70c2807e25680a76",
"sha256": "1c504be453576fb544f72aab045960ec009d168e9e2bb0584e89c8ec07f65e69"
},
"downloads": -1,
"filename": "soildb-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f8862b1154fbdde70c2807e25680a76",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 49338,
"upload_time": "2025-10-19T19:55:42",
"upload_time_iso_8601": "2025-10-19T19:55:42.175040Z",
"url": "https://files.pythonhosted.org/packages/96/37/8e338dc5720845c9847c32199d5ba2ca69f06c418f83b2397262577d1af0/soildb-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e8f22f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487",
"md5": "9f41a152fdeda87372c876c95b541ba1",
"sha256": "b9bb7d6858ef4d18cf9aaef613597115d2d61aa0aa13d916babd67bb9bd4ba09"
},
"downloads": -1,
"filename": "soildb-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "9f41a152fdeda87372c876c95b541ba1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 96982,
"upload_time": "2025-10-19T19:55:43",
"upload_time_iso_8601": "2025-10-19T19:55:43.762585Z",
"url": "https://files.pythonhosted.org/packages/e8/f2/2f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487/soildb-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-19 19:55:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "brownag",
"github_project": "py-soildb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "soildb"
}