soildb


Namesoildb JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryA Python client for accessing USDA Soil Data Access (SDA) web service
upload_time2025-10-19 19:55:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords agriculture gis sda soil ssurgo usda
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # soildb


[![PyPI
version](https://badge.fury.io/py/soildb.svg)](https://pypi.org/project/soildb/)
[![License:
MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

Python client for the USDA-NRCS Soil Data Access (SDA) web service and
other National Cooperative Soil Survey data sources.

## Overview

`soildb` provides Python access to the USDA Soil Data Access (SDA) web
service <https://sdmdataaccess.nrcs.usda.gov/>.

Query soil survey data, export to pandas/polars DataFrames, and handle
spatial queries.

## Installation

``` bash
pip install soildb
```

For spatial functionality:

``` bash
pip install soildb[spatial]
```

For all optional features support:

``` bash
pip install soildb[all]
```

## Features

- Query soil survey data from SDA
- Export to pandas and polars DataFrames
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination
- Async I/O for high performance and concurrency

## Quick Start

### Query Builder

This is a basic example of building a custom query and getting the
results:

``` python
from soildb import Query
    
query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))
    
# inspect query
print(query.to_sql())

result = await soildb.SDAClient().execute(query)

df = result.to_pandas()
print(df.head())
```

    SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
        mukey                                             muname  musym
    0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
    1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
    2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
    3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
    4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

## Async Setup

You may have noticed that we need to `await` the query execution result.

All soildb functions are async. Here’s how to run them in different
environments like Jupyter notebooks, VSCode, or regular Python scripts.

### Basic Async Execution

``` python
import asyncio
import soildb

async def main():
    # Your async code here
    mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
    df = mapunits.to_pandas()
    return df

# Handle different environments
try:
    # Check if there's already an event loop (Jupyter, etc.)
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    result = loop.run_until_complete(main())
except RuntimeError:
    # No existing loop, use asyncio.run()
    result = asyncio.run(main())

result
```

For comprehensive async usage, see the [Async Programming
Guide](docs/async.md).

### Convenience Functions

soildb provides several high-level functions for common tasks:

``` python
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```

    Found 80 map units

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |

</div>

If you have suggestions for new convenience functions please file a
[“feature request” on
GitHub](https://github,com/brownag/py-soildb/issues/new).

### Spatial Queries

soildb also offers support for queries by location via
`spatial_query()`. You can specify arbitrary geometry to target several
spatial and tabular types of results.

``` python
import asyncio

async def spatial_query_example():
    from soildb import spatial_query
    
    # Point query
    async with soildb.SDAClient() as client:
        response = await spatial_query(
            geometry="POINT (-93.6 42.0)",
            table="mupolygon",
            spatial_relation="intersects"
        )
        df = response.to_pandas()
        print(f"Point query found {len(df)} results")
        return df

# Handle different environments
try:
    # Check if there's already an event loop (Jupyter, etc.)
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    result = loop.run_until_complete(spatial_query_example())
except RuntimeError:
    # No existing loop, use asyncio.run()
    result = asyncio.run(spatial_query_example())

result
```

    Point query found 1 results

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | mukey | areasymbol | musym | nationalmusym | muname | mukind |
|----|----|----|----|----|----|----|
| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |

</div>

### Bulk Data Fetching

soildb makes it easy to retrieve large datasets efficiently, using
concurrent requests and built-in functions that automatically handle
pagination.

``` python
import asyncio

async def bulk_fetch_example():
    from soildb import fetch_by_keys, get_mukey_by_areasymbol
    
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    mukeys_tasks = [
        get_mukey_by_areasymbol([area]) 
        for area in areas
    ]
    
    # Execute all mukey requests concurrently
    mukeys_results = await asyncio.gather(*mukeys_tasks)
    
    # Flatten the results (each task returns a list)
    all_mukeys = []
    for mukeys in mukeys_results:
        all_mukeys.extend(mukeys)
    
    print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
    
    # Fetch data in chunks automatically
    response = await fetch_by_keys(
        all_mukeys, 
        "component", 
        key_column="mukey", 
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
    )
    df = response.to_pandas()
    print(f"Fetched {len(df)} component records")
    return df

# Handle different environments
try:
    # Check if there's already an event loop (Jupyter, etc.)
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    result = loop.run_until_complete(bulk_fetch_example())
except RuntimeError:
    # No existing loop, use asyncio.run()
    result = asyncio.run(bulk_fetch_example())

result.head(10)
```

    Found 410 mukeys across 3 areas
    Fetching 410 keys in 5 chunks of 100
    Fetched 1067 component records

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | mukey  | cokey    | compname | localphase | comppct_r |
|-----|--------|----------|----------|------------|-----------|
| 0   | 408333 | 25562547 | Kingston | \<NA\>     | 2         |
| 1   | 408333 | 25562548 | Okoboji  | \<NA\>     | 5         |
| 2   | 408333 | 25562549 | Spicer   | \<NA\>     | 90        |
| 3   | 408333 | 25562550 | Madelia  | \<NA\>     | 3         |
| 4   | 408334 | 25562837 | Okoboji  | \<NA\>     | 5         |
| 5   | 408334 | 25562838 | Glencoe  | \<NA\>     | 3         |
| 6   | 408334 | 25562839 | Canisteo | \<NA\>     | 2         |
| 7   | 408334 | 25562840 | Webster  | \<NA\>     | 85        |
| 8   | 408334 | 25562841 | Nicollet | \<NA\>     | 5         |
| 9   | 408335 | 25562135 | Biscay   | \<NA\>     | 1         |

</div>

The `component` table has a hierarchical relationship:

- mukey (map unit key) is the parent
- cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to
get all components for specific map units.

The specialized `fetch_component_by_mukey()` convenience function
handles this, but above we use the lower-level `fetch_by_keys()` with
the `"mukey"` as the `key_column` to achieve the same result and
demonstrate pagination over chunks with `100` rows each.

# Examples

See the [`examples/` directory](examples/) and [documentation](docs/)
for detailed usage patterns.

## License

This project is licensed under the MIT License. See the
[LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "soildb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "agriculture, gis, sda, soil, ssurgo, usda",
    "author": null,
    "author_email": "Andrew Brown <brown.andrewg@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e8/f2/2f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487/soildb-0.2.0.tar.gz",
    "platform": null,
    "description": "# soildb\n\n\n[![PyPI\nversion](https://badge.fury.io/py/soildb.svg)](https://pypi.org/project/soildb/)\n[![License:\nMIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\nPython client for the USDA-NRCS Soil Data Access (SDA) web service and\nother National Cooperative Soil Survey data sources.\n\n## Overview\n\n`soildb` provides Python access to the USDA Soil Data Access (SDA) web\nservice <https://sdmdataaccess.nrcs.usda.gov/>.\n\nQuery soil survey data, export to pandas/polars DataFrames, and handle\nspatial queries.\n\n## Installation\n\n``` bash\npip install soildb\n```\n\nFor spatial functionality:\n\n``` bash\npip install soildb[spatial]\n```\n\nFor all optional features support:\n\n``` bash\npip install soildb[all]\n```\n\n## Features\n\n- Query soil survey data from SDA\n- Export to pandas and polars DataFrames\n- Build custom SQL queries with fluent interface\n- Spatial queries with points, bounding boxes, and polygons\n- Bulk data fetching with automatic pagination\n- Async I/O for high performance and concurrency\n\n## Quick Start\n\n### Query Builder\n\nThis is a basic example of building a custom query and getting the\nresults:\n\n``` python\nfrom soildb import Query\n    \nquery = (Query()\n        .select(\"mukey\", \"muname\", \"musym\")\n        .from_(\"mapunit\")\n        .inner_join(\"legend\", \"mapunit.lkey = legend.lkey\")\n        .where(\"areasymbol = 'IA109'\")\n        .limit(5))\n    \n# inspect query\nprint(query.to_sql())\n\nresult = await soildb.SDAClient().execute(query)\n\ndf = result.to_pandas()\nprint(df.head())\n```\n\n    SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'\n        mukey                                             muname  musym\n    0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133\n    1  408339        Colo silty clay loam, 0 to 2 percent slopes    133\n    2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B\n    3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2\n    4  408348          Harpster silt loam, 0 to 2 percent slopes   1595\n\n## Async Setup\n\nYou may have noticed that we need to `await` the query execution result.\n\nAll soildb functions are async. Here\u2019s how to run them in different\nenvironments like Jupyter notebooks, VSCode, or regular Python scripts.\n\n### Basic Async Execution\n\n``` python\nimport asyncio\nimport soildb\n\nasync def main():\n    # Your async code here\n    mapunits = await soildb.get_mapunit_by_areasymbol(\"IA109\")\n    df = mapunits.to_pandas()\n    return df\n\n# Handle different environments\ntry:\n    # Check if there's already an event loop (Jupyter, etc.)\n    loop = asyncio.get_running_loop()\n    import nest_asyncio\n    nest_asyncio.apply()\n    result = loop.run_until_complete(main())\nexcept RuntimeError:\n    # No existing loop, use asyncio.run()\n    result = asyncio.run(main())\n\nresult\n```\n\nFor comprehensive async usage, see the [Async Programming\nGuide](docs/async.md).\n\n### Convenience Functions\n\nsoildb provides several high-level functions for common tasks:\n\n``` python\nmapunits = await soildb.get_mapunit_by_areasymbol(\"IA109\")\ndf = mapunits.to_pandas()\nprint(f\"Found {len(df)} map units\")\ndf.head()\n```\n\n    Found 80 map units\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |\n|----|----|----|----|----|----|----|----|\n| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |\n| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |\n| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |\n| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |\n| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |\n\n</div>\n\nIf you have suggestions for new convenience functions please file a\n[\u201cfeature request\u201d on\nGitHub](https://github,com/brownag/py-soildb/issues/new).\n\n### Spatial Queries\n\nsoildb also offers support for queries by location via\n`spatial_query()`. You can specify arbitrary geometry to target several\nspatial and tabular types of results.\n\n``` python\nimport asyncio\n\nasync def spatial_query_example():\n    from soildb import spatial_query\n    \n    # Point query\n    async with soildb.SDAClient() as client:\n        response = await spatial_query(\n            geometry=\"POINT (-93.6 42.0)\",\n            table=\"mupolygon\",\n            spatial_relation=\"intersects\"\n        )\n        df = response.to_pandas()\n        print(f\"Point query found {len(df)} results\")\n        return df\n\n# Handle different environments\ntry:\n    # Check if there's already an event loop (Jupyter, etc.)\n    loop = asyncio.get_running_loop()\n    import nest_asyncio\n    nest_asyncio.apply()\n    result = loop.run_until_complete(spatial_query_example())\nexcept RuntimeError:\n    # No existing loop, use asyncio.run()\n    result = asyncio.run(spatial_query_example())\n\nresult\n```\n\n    Point query found 1 results\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|  | mukey | areasymbol | musym | nationalmusym | muname | mukind |\n|----|----|----|----|----|----|----|\n| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |\n\n</div>\n\n### Bulk Data Fetching\n\nsoildb makes it easy to retrieve large datasets efficiently, using\nconcurrent requests and built-in functions that automatically handle\npagination.\n\n``` python\nimport asyncio\n\nasync def bulk_fetch_example():\n    from soildb import fetch_by_keys, get_mukey_by_areasymbol\n    \n    # Get mukeys for multiple areas concurrently\n    areas = [\"IA109\", \"IA113\", \"IA117\"]\n    mukeys_tasks = [\n        get_mukey_by_areasymbol([area]) \n        for area in areas\n    ]\n    \n    # Execute all mukey requests concurrently\n    mukeys_results = await asyncio.gather(*mukeys_tasks)\n    \n    # Flatten the results (each task returns a list)\n    all_mukeys = []\n    for mukeys in mukeys_results:\n        all_mukeys.extend(mukeys)\n    \n    print(f\"Found {len(all_mukeys)} mukeys across {len(areas)} areas\")\n    \n    # Fetch data in chunks automatically\n    response = await fetch_by_keys(\n        all_mukeys, \n        \"component\", \n        key_column=\"mukey\", \n        chunk_size=100,\n        columns=[\"mukey\", \"cokey\", \"compname\", \"localphase\", \"comppct_r\"]\n    )\n    df = response.to_pandas()\n    print(f\"Fetched {len(df)} component records\")\n    return df\n\n# Handle different environments\ntry:\n    # Check if there's already an event loop (Jupyter, etc.)\n    loop = asyncio.get_running_loop()\n    import nest_asyncio\n    nest_asyncio.apply()\n    result = loop.run_until_complete(bulk_fetch_example())\nexcept RuntimeError:\n    # No existing loop, use asyncio.run()\n    result = asyncio.run(bulk_fetch_example())\n\nresult.head(10)\n```\n\n    Found 410 mukeys across 3 areas\n    Fetching 410 keys in 5 chunks of 100\n    Fetched 1067 component records\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | mukey  | cokey    | compname | localphase | comppct_r |\n|-----|--------|----------|----------|------------|-----------|\n| 0   | 408333 | 25562547 | Kingston | \\<NA\\>     | 2         |\n| 1   | 408333 | 25562548 | Okoboji  | \\<NA\\>     | 5         |\n| 2   | 408333 | 25562549 | Spicer   | \\<NA\\>     | 90        |\n| 3   | 408333 | 25562550 | Madelia  | \\<NA\\>     | 3         |\n| 4   | 408334 | 25562837 | Okoboji  | \\<NA\\>     | 5         |\n| 5   | 408334 | 25562838 | Glencoe  | \\<NA\\>     | 3         |\n| 6   | 408334 | 25562839 | Canisteo | \\<NA\\>     | 2         |\n| 7   | 408334 | 25562840 | Webster  | \\<NA\\>     | 85        |\n| 8   | 408334 | 25562841 | Nicollet | \\<NA\\>     | 5         |\n| 9   | 408335 | 25562135 | Biscay   | \\<NA\\>     | 1         |\n\n</div>\n\nThe `component` table has a hierarchical relationship:\n\n- mukey (map unit key) is the parent\n- cokey (component key) is the child\n\nSo when fetching components, you typically want to filter by mukey to\nget all components for specific map units.\n\nThe specialized `fetch_component_by_mukey()` convenience function\nhandles this, but above we use the lower-level `fetch_by_keys()` with\nthe `\"mukey\"` as the `key_column` to achieve the same result and\ndemonstrate pagination over chunks with `100` rows each.\n\n# Examples\n\nSee the [`examples/` directory](examples/) and [documentation](docs/)\nfor detailed usage patterns.\n\n## License\n\nThis project is licensed under the MIT License. See the\n[LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python client for accessing USDA Soil Data Access (SDA) web service",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/brownag/py-soildb/issues",
        "Documentation": "https://py-soildb.readthedocs.io",
        "Homepage": "https://github.com/brownag/py-soildb",
        "Repository": "https://github.com/brownag/py-soildb"
    },
    "split_keywords": [
        "agriculture",
        " gis",
        " sda",
        " soil",
        " ssurgo",
        " usda"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96378e338dc5720845c9847c32199d5ba2ca69f06c418f83b2397262577d1af0",
                "md5": "2f8862b1154fbdde70c2807e25680a76",
                "sha256": "1c504be453576fb544f72aab045960ec009d168e9e2bb0584e89c8ec07f65e69"
            },
            "downloads": -1,
            "filename": "soildb-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f8862b1154fbdde70c2807e25680a76",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 49338,
            "upload_time": "2025-10-19T19:55:42",
            "upload_time_iso_8601": "2025-10-19T19:55:42.175040Z",
            "url": "https://files.pythonhosted.org/packages/96/37/8e338dc5720845c9847c32199d5ba2ca69f06c418f83b2397262577d1af0/soildb-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e8f22f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487",
                "md5": "9f41a152fdeda87372c876c95b541ba1",
                "sha256": "b9bb7d6858ef4d18cf9aaef613597115d2d61aa0aa13d916babd67bb9bd4ba09"
            },
            "downloads": -1,
            "filename": "soildb-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9f41a152fdeda87372c876c95b541ba1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 96982,
            "upload_time": "2025-10-19T19:55:43",
            "upload_time_iso_8601": "2025-10-19T19:55:43.762585Z",
            "url": "https://files.pythonhosted.org/packages/e8/f2/2f515b9b0b56b9f8c91deedb87b371d83f2a56f9beb5a381bdd1b1442487/soildb-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-19 19:55:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brownag",
    "github_project": "py-soildb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "soildb"
}
        
Elapsed time: 1.03633s