soilspecdata


Namesoilspecdata JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/franckalbinet/soilspecdata
SummaryDownload and load soil spectral data
upload_time2025-02-01 16:22:32
maintainerNone
docs_urlNone
authorFranck Albinet
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SoilSpecData


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

> A Python package for handling soil spectroscopy data, with a focus on
> the [Open Soil Spectral Library
> (OSSL)](https://explorer.soilspectroscopy.org/).

## Installation

``` sh
pip install -U soilspecdata
```

The `-U` flag is used to update the package to the latest version. This
is important to ensure that you have the latest features and bug fixes.

If you want to install the development version, run in the project root:

``` sh
pip install -e .[dev]
```

## Features

- Easy loading and handling of OSSL dataset
- Support for both VISNIR (Visible Near-Infrared) and MIR (Mid-Infrared)
  spectral data
- Flexible wavenumber range filtering
- Convenient access to soil properties and metadata
- Automatic caching of downloaded data
- Get aligned spectra and target variable(s)
- *Further datasets to come …*

## Quick Start

``` python
# Import the package
from soilspecdata.datasets.ossl import get_ossl
```

### Load OSSL dataset

``` python
ossl = get_ossl()
```

The spectral analysis covers both MIR `(400-4000 cm⁻¹)` and VISNIR
`(4000-28571 cm⁻¹)` regions, with data reported in increasing
wavenumbers for consistency across the entire spectral range.

Ranges of interest can further be filtered using the `wmin` and `wmax`
parameters in the `get_mir` and `get_visnir` methods.

### MIR spectra

``` python
mir_data = ossl.get_mir()
```

### VISNIR spectra

Using custom wavenumber range:

``` python
visnir_data = ossl.get_visnir(wmin=4000, wmax=25000)
```

### VISNIR \| MIR dataclass member variables

``` python
print(visnir_data)
```

    SpectraData attributes:
    ----------------------
    Available attributes: wavenumbers, spectra, measurement_type, sample_ids

    Wavenumbers:
    -----------
    [4000, 4003, 4006, 4009, 4012, 4016, 4019, 4022, 4025, 4029]
    Shape: (1051,)

    Spectra:
    -------
    [[0.3859, 0.3819, 0.3792, 0.3776, 0.3769],
     [0.3429, 0.3419, 0.3414, 0.3413, 0.3415],
     [0.3425, 0.3384, 0.3354, 0.3334, 0.3323],
     [0.2745, 0.2754, 0.2759, 0.2761, 0.276 ],
     [0.285 , 0.2794, 0.2755, 0.273 , 0.2718]]
    Shape: (64644, 1051)

    Measurement type (Reflectance or Absorbance):
    --------------------------------------------
    ref

    Sample IDs:
    ----------
    ['FS15R_FS4068', 'FS15R_FS4069', 'FS15R_FS4070', 'FS15R_FS4071',
     'FS15R_FS4072', 'FS15R_FS4073', 'FS15R_FS4074', 'FS15R_FS4075',
     'FS15R_FS4076', 'FS15R_FS4077']
    Total samples: 64644

### Getting soil properties and other metadata

Example: get **Cation Exchange Capacity (CEC)** measurements (in
cmolc/kg) for all samples. Results are returned as a `pd.DataFrame`
indexed by sample ID (`id`):

``` python
properties = ossl.get_properties(['cec_usda.a723_cmolc.kg'], require_complete=True)
```

``` python
properties.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|        | cec_usda.a723_cmolc.kg |
|--------|------------------------|
| id     |                        |
| S40857 | 6.633217               |
| S40858 | 3.822628               |
| S40859 | 3.427324               |
| S40860 | 1.906545               |
| S40861 | 13.403203              |

</div>

> [!NOTE]
>
> `require_complete=True` ensures that only non null values are returned
> in selected columns (here `cec_usda.a723_cmolc.kg`).

For more details on the OSSL dataset and its variables, see the [OSSL
documentation](https://soilspectroscopy.github.io/ossl-manual/db-desc.html).
Any column name part of the `ossl.properties_cols` list can be used as a
target or metadata variable.

``` python
ossl.properties_cols
```

    ['dataset.code_ascii_txt',
     'id.layer_local_c',
     'id.layer_uuid_txt',
     'id.project_ascii_txt',
     'id.location_olc_txt',
     'id.dataset.site_ascii_txt',
     'id.scan_local_c',
     'longitude.point_wgs84_dd',
     'latitude.point_wgs84_dd',
     'layer.sequence_usda_uint16',
     'layer.upper.depth_usda_cm',
     'layer.lower.depth_usda_cm',
     'observation.date.begin_iso.8601_yyyy.mm.dd',
     'observation.date.end_iso.8601_yyyy.mm.dd',
     'surveyor.title_utf8_txt',
     'layer.texture_usda_txt',
     'pedon.taxa_usda_txt',
     'horizon.designation_usda_txt',
     'longitude.county_wgs84_dd',
     'latitude.county_wgs84_dd',
     'location.point.error_any_m',
     'location.country_iso.3166_txt',
     'observation.ogc.schema.title_ogc_txt',
     'observation.ogc.schema_idn_url',
     'surveyor.contact_ietf_email',
     'surveyor.address_utf8_txt',
     'dataset.title_utf8_txt',
     'dataset.owner_utf8_txt',
     'dataset.address_idn_url',
     'dataset.doi_idf_url',
     'dataset.license.title_ascii_txt',
     'dataset.license.address_idn_url',
     'dataset.contact.name_utf8_txt',
     'dataset.contact_ietf_email',
     'acidity_usda.a795_cmolc.kg',
     'aggstb_usda.a1_w.pct',
     'al.dith_usda.a65_w.pct',
     'al.ext_aquaregia_g.kg',
     'al.ext_usda.a1056_mg.kg',
     'al.ext_usda.a69_cmolc.kg',
     'al.ox_usda.a59_w.pct',
     'awc.33.1500kPa_usda.c80_w.frac',
     'b.ext_mel3_mg.kg',
     'bd_iso.11272_g.cm3',
     'bd_usda.a21_g.cm3',
     'bd_usda.a4_g.cm3',
     'c.tot_iso.10694_w.pct',
     'c.tot_usda.a622_w.pct',
     'ca.ext_aquaregia_mg.kg',
     'ca.ext_usda.a1059_mg.kg',
     'ca.ext_usda.a722_cmolc.kg',
     'caco3_iso.10693_w.pct',
     'caco3_usda.a54_w.pct',
     'cec_iso.11260_cmolc.kg',
     'cec_usda.a723_cmolc.kg',
     'cf_iso.11464_w.pct',
     'cf_usda.c236_w.pct',
     'clay.tot_iso.11277_w.pct',
     'clay.tot_usda.a334_w.pct',
     'cu.ext_usda.a1063_mg.kg',
     'ec_iso.11265_ds.m',
     'ec_usda.a364_ds.m',
     'efferv_usda.a479_class',
     'fe.dith_usda.a66_w.pct',
     'fe.ext_aquaregia_g.kg',
     'fe.ext_usda.a1064_mg.kg',
     'fe.ox_usda.a60_w.pct',
     'file_sequence',
     'k.ext_aquaregia_mg.kg',
     'k.ext_usda.a1065_mg.kg',
     'k.ext_usda.a725_cmolc.kg',
     'mg.ext_aquaregia_mg.kg',
     'mg.ext_usda.a1066_mg.kg',
     'mg.ext_usda.a724_cmolc.kg',
     'mn.ext_aquaregia_mg.kg',
     'mn.ext_usda.a1067_mg.kg',
     'mn.ext_usda.a70_mg.kg',
     'n.tot_iso.11261_w.pct',
     'n.tot_iso.13878_w.pct',
     'n.tot_usda.a623_w.pct',
     'na.ext_aquaregia_mg.kg',
     'na.ext_usda.a1068_mg.kg',
     'na.ext_usda.a726_cmolc.kg',
     'oc_iso.10694_w.pct',
     'oc_usda.c1059_w.pct',
     'oc_usda.c729_w.pct',
     'p.ext_aquaregia_mg.kg',
     'p.ext_iso.11263_mg.kg',
     'p.ext_usda.a1070_mg.kg',
     'p.ext_usda.a270_mg.kg',
     'p.ext_usda.a274_mg.kg',
     'p.ext_usda.a652_mg.kg',
     'ph.cacl2_iso.10390_index',
     'ph.cacl2_usda.a477_index',
     'ph.cacl2_usda.a481_index',
     'ph.h2o_iso.10390_index',
     'ph.h2o_usda.a268_index',
     's.ext_mel3_mg.kg',
     's.tot_usda.a624_w.pct',
     'sand.tot_iso.11277_w.pct',
     'sand.tot_usda.c405_w.pct',
     'sand.tot_usda.c60_w.pct',
     'silt.tot_iso.11277_w.pct',
     'silt.tot_usda.c407_w.pct',
     'silt.tot_usda.c62_w.pct',
     'wr.10kPa_usda.a414_w.pct',
     'wr.10kPa_usda.a8_w.pct',
     'wr.1500kPa_usda.a417_w.pct',
     'wr.33kPa_usda.a415_w.pct',
     'wr.33kPa_usda.a9_w.pct',
     'zn.ext_usda.a1073_mg.kg',
     'scan.mir.date.begin_iso.8601_yyyy.mm.dd',
     'scan.mir.date.end_iso.8601_yyyy.mm.dd',
     'scan.mir.model.name_utf8_txt',
     'scan.mir.model.code_any_txt',
     'scan.mir.method.optics_any_txt',
     'scan.mir.method.preparation_any_txt',
     'scan.mir.license.title_ascii_txt',
     'scan.mir.license.address_idn_url',
     'scan.mir.doi_idf_url',
     'scan.mir.contact.name_utf8_txt',
     'scan.mir.contact.email_ietf_txt',
     'scan.visnir.date.begin_iso.8601_yyyy.mm.dd',
     'scan.visnir.date.end_iso.8601_yyyy.mm.dd',
     'scan.visnir.model.name_utf8_txt',
     'scan.visnir.model.code_any_txt',
     'scan.visnir.method.optics_any_txt',
     'scan.visnir.method.preparation_any_txt',
     'scan.visnir.license.title_ascii_txt',
     'scan.visnir.license.address_idn_url',
     'scan.visnir.doi_idf_url',
     'scan.visnir.contact.name_utf8_txt',
     'scan.visnir.contact.email_ietf_txt']

- Get metadata (e.g., geographical coordinates):

``` python
metadata = ossl.get_properties(['longitude.point_wgs84_dd', 'latitude.point_wgs84_dd'], require_complete=False)
```

### Preparing data for machine learning pipeline

To get directly aligned spectra and target variable(s):

``` python
X, y, ids = ossl.get_aligned_data(
    spectra_data=mir_data,
    target_cols='cec_usda.a723_cmolc.kg'
)

X.shape, y.shape, ids.shape
```

    ((57064, 1701), (57064, 1), (57064,))

And plot the first 20 MIR spectra:

``` python
from matplotlib import pyplot as plt

plt.figure(figsize=(12, 3))
plt.plot(mir_data.wavenumbers, mir_data.spectra[:20,:].T, alpha=0.3, color='steelblue', lw=1)
plt.gca().invert_xaxis()
plt.grid(True, linestyle='--', alpha=0.7)

plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance');
```

![](index_files/figure-commonmark/cell-12-output-1.png)

## Data Structure

The package returns spectra data in a structured format containing:

- Wavenumbers
- Spectra measurements
- Measurement type (reflectance/absorbance)
- Sample IDs

Properties and metadata are returned as pandas DataFrames indexed by
sample ID.

## Cache Management

By default, the OSSL dataset is cached in `~/.soilspecdata/`. To force a
fresh download:

``` python
ossl = get_ossl(force_download=True)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

Apache2

## Citation(s)

- [OSSL
  Library](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0296545):
  Safanelli, J.L., Hengl, T., Parente, L.L., Minarik, R., Bloom, D.E.,
  Todd-Brown, K., Gholizadeh, A., Mendes, W. de S., Sanderman, J., 2025.
  Open Soil Spectral Library (OSSL): Building reproducible soil
  calibration models through open development and community engagement.
  PLOS ONE 20, e0296545. https://doi.org/10.1371/journal.pone.0296545

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/franckalbinet/soilspecdata",
    "name": "soilspecdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "Franck Albinet",
    "author_email": "franckalbinet@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/10/ea/b0b5581299c0d6d48876cf129ae7851883ab98d2705c072d14d1b58654e5/soilspecdata-0.0.9.tar.gz",
    "platform": null,
    "description": "# SoilSpecData\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n> A Python package for handling soil spectroscopy data, with a focus on\n> the [Open Soil Spectral Library\n> (OSSL)](https://explorer.soilspectroscopy.org/).\n\n## Installation\n\n``` sh\npip install -U soilspecdata\n```\n\nThe `-U` flag is used to update the package to the latest version. This\nis important to ensure that you have the latest features and bug fixes.\n\nIf you want to install the development version, run in the project root:\n\n``` sh\npip install -e .[dev]\n```\n\n## Features\n\n- Easy loading and handling of OSSL dataset\n- Support for both VISNIR (Visible Near-Infrared) and MIR (Mid-Infrared)\n  spectral data\n- Flexible wavenumber range filtering\n- Convenient access to soil properties and metadata\n- Automatic caching of downloaded data\n- Get aligned spectra and target variable(s)\n- *Further datasets to come \u2026*\n\n## Quick Start\n\n``` python\n# Import the package\nfrom soilspecdata.datasets.ossl import get_ossl\n```\n\n### Load OSSL dataset\n\n``` python\nossl = get_ossl()\n```\n\nThe spectral analysis covers both MIR `(400-4000 cm\u207b\u00b9)` and VISNIR\n`(4000-28571 cm\u207b\u00b9)` regions, with data reported in increasing\nwavenumbers for consistency across the entire spectral range.\n\nRanges of interest can further be filtered using the `wmin` and `wmax`\nparameters in the `get_mir` and `get_visnir` methods.\n\n### MIR spectra\n\n``` python\nmir_data = ossl.get_mir()\n```\n\n### VISNIR spectra\n\nUsing custom wavenumber range:\n\n``` python\nvisnir_data = ossl.get_visnir(wmin=4000, wmax=25000)\n```\n\n### VISNIR \\| MIR dataclass member variables\n\n``` python\nprint(visnir_data)\n```\n\n    SpectraData attributes:\n    ----------------------\n    Available attributes: wavenumbers, spectra, measurement_type, sample_ids\n\n    Wavenumbers:\n    -----------\n    [4000, 4003, 4006, 4009, 4012, 4016, 4019, 4022, 4025, 4029]\n    Shape: (1051,)\n\n    Spectra:\n    -------\n    [[0.3859, 0.3819, 0.3792, 0.3776, 0.3769],\n     [0.3429, 0.3419, 0.3414, 0.3413, 0.3415],\n     [0.3425, 0.3384, 0.3354, 0.3334, 0.3323],\n     [0.2745, 0.2754, 0.2759, 0.2761, 0.276 ],\n     [0.285 , 0.2794, 0.2755, 0.273 , 0.2718]]\n    Shape: (64644, 1051)\n\n    Measurement type (Reflectance or Absorbance):\n    --------------------------------------------\n    ref\n\n    Sample IDs:\n    ----------\n    ['FS15R_FS4068', 'FS15R_FS4069', 'FS15R_FS4070', 'FS15R_FS4071',\n     'FS15R_FS4072', 'FS15R_FS4073', 'FS15R_FS4074', 'FS15R_FS4075',\n     'FS15R_FS4076', 'FS15R_FS4077']\n    Total samples: 64644\n\n### Getting soil properties and other metadata\n\nExample: get **Cation Exchange Capacity (CEC)** measurements (in\ncmolc/kg) for all samples. Results are returned as a `pd.DataFrame`\nindexed by sample ID (`id`):\n\n``` python\nproperties = ossl.get_properties(['cec_usda.a723_cmolc.kg'], require_complete=True)\n```\n\n``` python\nproperties.head()\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|        | cec_usda.a723_cmolc.kg |\n|--------|------------------------|\n| id     |                        |\n| S40857 | 6.633217               |\n| S40858 | 3.822628               |\n| S40859 | 3.427324               |\n| S40860 | 1.906545               |\n| S40861 | 13.403203              |\n\n</div>\n\n> [!NOTE]\n>\n> `require_complete=True` ensures that only non null values are returned\n> in selected columns (here `cec_usda.a723_cmolc.kg`).\n\nFor more details on the OSSL dataset and its variables, see the [OSSL\ndocumentation](https://soilspectroscopy.github.io/ossl-manual/db-desc.html).\nAny column name part of the `ossl.properties_cols` list can be used as a\ntarget or metadata variable.\n\n``` python\nossl.properties_cols\n```\n\n    ['dataset.code_ascii_txt',\n     'id.layer_local_c',\n     'id.layer_uuid_txt',\n     'id.project_ascii_txt',\n     'id.location_olc_txt',\n     'id.dataset.site_ascii_txt',\n     'id.scan_local_c',\n     'longitude.point_wgs84_dd',\n     'latitude.point_wgs84_dd',\n     'layer.sequence_usda_uint16',\n     'layer.upper.depth_usda_cm',\n     'layer.lower.depth_usda_cm',\n     'observation.date.begin_iso.8601_yyyy.mm.dd',\n     'observation.date.end_iso.8601_yyyy.mm.dd',\n     'surveyor.title_utf8_txt',\n     'layer.texture_usda_txt',\n     'pedon.taxa_usda_txt',\n     'horizon.designation_usda_txt',\n     'longitude.county_wgs84_dd',\n     'latitude.county_wgs84_dd',\n     'location.point.error_any_m',\n     'location.country_iso.3166_txt',\n     'observation.ogc.schema.title_ogc_txt',\n     'observation.ogc.schema_idn_url',\n     'surveyor.contact_ietf_email',\n     'surveyor.address_utf8_txt',\n     'dataset.title_utf8_txt',\n     'dataset.owner_utf8_txt',\n     'dataset.address_idn_url',\n     'dataset.doi_idf_url',\n     'dataset.license.title_ascii_txt',\n     'dataset.license.address_idn_url',\n     'dataset.contact.name_utf8_txt',\n     'dataset.contact_ietf_email',\n     'acidity_usda.a795_cmolc.kg',\n     'aggstb_usda.a1_w.pct',\n     'al.dith_usda.a65_w.pct',\n     'al.ext_aquaregia_g.kg',\n     'al.ext_usda.a1056_mg.kg',\n     'al.ext_usda.a69_cmolc.kg',\n     'al.ox_usda.a59_w.pct',\n     'awc.33.1500kPa_usda.c80_w.frac',\n     'b.ext_mel3_mg.kg',\n     'bd_iso.11272_g.cm3',\n     'bd_usda.a21_g.cm3',\n     'bd_usda.a4_g.cm3',\n     'c.tot_iso.10694_w.pct',\n     'c.tot_usda.a622_w.pct',\n     'ca.ext_aquaregia_mg.kg',\n     'ca.ext_usda.a1059_mg.kg',\n     'ca.ext_usda.a722_cmolc.kg',\n     'caco3_iso.10693_w.pct',\n     'caco3_usda.a54_w.pct',\n     'cec_iso.11260_cmolc.kg',\n     'cec_usda.a723_cmolc.kg',\n     'cf_iso.11464_w.pct',\n     'cf_usda.c236_w.pct',\n     'clay.tot_iso.11277_w.pct',\n     'clay.tot_usda.a334_w.pct',\n     'cu.ext_usda.a1063_mg.kg',\n     'ec_iso.11265_ds.m',\n     'ec_usda.a364_ds.m',\n     'efferv_usda.a479_class',\n     'fe.dith_usda.a66_w.pct',\n     'fe.ext_aquaregia_g.kg',\n     'fe.ext_usda.a1064_mg.kg',\n     'fe.ox_usda.a60_w.pct',\n     'file_sequence',\n     'k.ext_aquaregia_mg.kg',\n     'k.ext_usda.a1065_mg.kg',\n     'k.ext_usda.a725_cmolc.kg',\n     'mg.ext_aquaregia_mg.kg',\n     'mg.ext_usda.a1066_mg.kg',\n     'mg.ext_usda.a724_cmolc.kg',\n     'mn.ext_aquaregia_mg.kg',\n     'mn.ext_usda.a1067_mg.kg',\n     'mn.ext_usda.a70_mg.kg',\n     'n.tot_iso.11261_w.pct',\n     'n.tot_iso.13878_w.pct',\n     'n.tot_usda.a623_w.pct',\n     'na.ext_aquaregia_mg.kg',\n     'na.ext_usda.a1068_mg.kg',\n     'na.ext_usda.a726_cmolc.kg',\n     'oc_iso.10694_w.pct',\n     'oc_usda.c1059_w.pct',\n     'oc_usda.c729_w.pct',\n     'p.ext_aquaregia_mg.kg',\n     'p.ext_iso.11263_mg.kg',\n     'p.ext_usda.a1070_mg.kg',\n     'p.ext_usda.a270_mg.kg',\n     'p.ext_usda.a274_mg.kg',\n     'p.ext_usda.a652_mg.kg',\n     'ph.cacl2_iso.10390_index',\n     'ph.cacl2_usda.a477_index',\n     'ph.cacl2_usda.a481_index',\n     'ph.h2o_iso.10390_index',\n     'ph.h2o_usda.a268_index',\n     's.ext_mel3_mg.kg',\n     's.tot_usda.a624_w.pct',\n     'sand.tot_iso.11277_w.pct',\n     'sand.tot_usda.c405_w.pct',\n     'sand.tot_usda.c60_w.pct',\n     'silt.tot_iso.11277_w.pct',\n     'silt.tot_usda.c407_w.pct',\n     'silt.tot_usda.c62_w.pct',\n     'wr.10kPa_usda.a414_w.pct',\n     'wr.10kPa_usda.a8_w.pct',\n     'wr.1500kPa_usda.a417_w.pct',\n     'wr.33kPa_usda.a415_w.pct',\n     'wr.33kPa_usda.a9_w.pct',\n     'zn.ext_usda.a1073_mg.kg',\n     'scan.mir.date.begin_iso.8601_yyyy.mm.dd',\n     'scan.mir.date.end_iso.8601_yyyy.mm.dd',\n     'scan.mir.model.name_utf8_txt',\n     'scan.mir.model.code_any_txt',\n     'scan.mir.method.optics_any_txt',\n     'scan.mir.method.preparation_any_txt',\n     'scan.mir.license.title_ascii_txt',\n     'scan.mir.license.address_idn_url',\n     'scan.mir.doi_idf_url',\n     'scan.mir.contact.name_utf8_txt',\n     'scan.mir.contact.email_ietf_txt',\n     'scan.visnir.date.begin_iso.8601_yyyy.mm.dd',\n     'scan.visnir.date.end_iso.8601_yyyy.mm.dd',\n     'scan.visnir.model.name_utf8_txt',\n     'scan.visnir.model.code_any_txt',\n     'scan.visnir.method.optics_any_txt',\n     'scan.visnir.method.preparation_any_txt',\n     'scan.visnir.license.title_ascii_txt',\n     'scan.visnir.license.address_idn_url',\n     'scan.visnir.doi_idf_url',\n     'scan.visnir.contact.name_utf8_txt',\n     'scan.visnir.contact.email_ietf_txt']\n\n- Get metadata (e.g., geographical coordinates):\n\n``` python\nmetadata = ossl.get_properties(['longitude.point_wgs84_dd', 'latitude.point_wgs84_dd'], require_complete=False)\n```\n\n### Preparing data for machine learning pipeline\n\nTo get directly aligned spectra and target variable(s):\n\n``` python\nX, y, ids = ossl.get_aligned_data(\n    spectra_data=mir_data,\n    target_cols='cec_usda.a723_cmolc.kg'\n)\n\nX.shape, y.shape, ids.shape\n```\n\n    ((57064, 1701), (57064, 1), (57064,))\n\nAnd plot the first 20 MIR spectra:\n\n``` python\nfrom matplotlib import pyplot as plt\n\nplt.figure(figsize=(12, 3))\nplt.plot(mir_data.wavenumbers, mir_data.spectra[:20,:].T, alpha=0.3, color='steelblue', lw=1)\nplt.gca().invert_xaxis()\nplt.grid(True, linestyle='--', alpha=0.7)\n\nplt.xlabel('Wavenumber (cm\u207b\u00b9)')\nplt.ylabel('Absorbance');\n```\n\n![](index_files/figure-commonmark/cell-12-output-1.png)\n\n## Data Structure\n\nThe package returns spectra data in a structured format containing:\n\n- Wavenumbers\n- Spectra measurements\n- Measurement type (reflectance/absorbance)\n- Sample IDs\n\nProperties and metadata are returned as pandas DataFrames indexed by\nsample ID.\n\n## Cache Management\n\nBy default, the OSSL dataset is cached in `~/.soilspecdata/`. To force a\nfresh download:\n\n``` python\nossl = get_ossl(force_download=True)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nApache2\n\n## Citation(s)\n\n- [OSSL\n  Library](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0296545):\n  Safanelli, J.L., Hengl, T., Parente, L.L., Minarik, R., Bloom, D.E.,\n  Todd-Brown, K., Gholizadeh, A., Mendes, W. de S., Sanderman, J., 2025.\n  Open Soil Spectral Library (OSSL): Building reproducible soil\n  calibration models through open development and community engagement.\n  PLOS ONE 20, e0296545. https://doi.org/10.1371/journal.pone.0296545\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Download and load soil spectral data",
    "version": "0.0.9",
    "project_urls": {
        "Homepage": "https://github.com/franckalbinet/soilspecdata"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74baf90e4728c0618e2d351775aaf33638219f8a2ed4cb17ad3fffdeec80953f",
                "md5": "e43a35e5f970b0629f8c9d1de3e466ef",
                "sha256": "388d619b862bd0052a33f4c1e515aac997a28031ac6171ec8134f1cb880a76b3"
            },
            "downloads": -1,
            "filename": "soilspecdata-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e43a35e5f970b0629f8c9d1de3e466ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 13968,
            "upload_time": "2025-02-01T16:22:30",
            "upload_time_iso_8601": "2025-02-01T16:22:30.954644Z",
            "url": "https://files.pythonhosted.org/packages/74/ba/f90e4728c0618e2d351775aaf33638219f8a2ed4cb17ad3fffdeec80953f/soilspecdata-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10eab0b5581299c0d6d48876cf129ae7851883ab98d2705c072d14d1b58654e5",
                "md5": "427fd7e654c5a70c422acc43b69cbaa5",
                "sha256": "8ce5f384a5aa76d2592f3cb77bb451998ce6b28253ced85f04a2c1dbda06d02c"
            },
            "downloads": -1,
            "filename": "soilspecdata-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "427fd7e654c5a70c422acc43b69cbaa5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 17550,
            "upload_time": "2025-02-01T16:22:32",
            "upload_time_iso_8601": "2025-02-01T16:22:32.955678Z",
            "url": "https://files.pythonhosted.org/packages/10/ea/b0b5581299c0d6d48876cf129ae7851883ab98d2705c072d14d1b58654e5/soilspecdata-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-01 16:22:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "franckalbinet",
    "github_project": "soilspecdata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "soilspecdata"
}
        
Elapsed time: 0.81648s