# Large Soil Spectral Models (LSSM)
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
This is a Python package allowing to reproduce the research work done by
[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the
context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled
**“Multiscale Characterization of Exchangeable Potassium Content in Soil
to Remediate Agricultural Land Affected by Radioactive Contamination
using Machine Learning, Soil Spectroscopy and Remote Sensing”**.
**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,
Dercon, G., 2022. Prediction of exchangeable potassium in soil through
mid-infrared spectroscopy and deep learning: From prediction to
explainability. Artificial Intelligence in Agriculture 6,
230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)
investigated the possibility to predict exchangeable potassium in soil
using large Mid-infrared soil spectral libraries and Deep Learning. Code
available [here](https://github.com/franckalbinet/mirzai).
We are now **exploring the potential to characterize and predict
exchangeable potassium using both Near- and Mid-infrared soil
spectroscopy, with a focus on leveraging advanced Deep Learning models
such as ResNet and ViT transformers through transfer learning**.
*Our Deep Learning pipeline is primarily based on the approach described
by [Jeremy Howard](https://github.com/fastai/course22p2)*.
## Install
``` sh
pip install lssm
```
## Getting started
We demonstrate a typical workflow below to showcase our method.
``` python
from pathlib import Path
from functools import partial
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from torch import optim, nn
import timm
from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
```
### Loading training & validation data
1. Load model from `timm` python package, Deep Learning
State-Of-The-Art (SOTA) pre-trained models:
``` python
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
```
2. Automatically download large spectral libraries developed by our
colleagues at [WCRC](https://www.woodwellclimate.org). We focus on
exchangeable potassium in the example below:
``` python
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
```
Reading & selecting data ...
3. A bit of data features and target preprocessing:
``` python
X = Pipeline([('to_abs', ToAbsorbance()),
('cr', ContinuumRemoval(X_names))]).fit_transform(X)
y = Log1p().fit_transform(y)
```
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]
4. Typical train/test split to get a train and valid dataset:
``` python
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp],
test_size=0.1,
stratify=ds_name[:n_smp],
random_state=41)
```
5. Finally, creating a custom PyTorch `DataLoader`:
``` python
train_ds, valid_ds = [SpectralDataset(X, y, )
for X, y, in [(X_train, y_train), (X_valid, y_valid)]]
# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
```
### Training
``` python
epochs = 1
lr = 5e-3
# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())
# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)
# A series of preprocessing performed on GPUs
# - put to GPU
# - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
# - resize the 2D version
# - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))
cbs = [DeviceCB(), gadf, resize, stats, TrainCB(),
metrics, ProgressCB(plot=False)]
learn = Learner(model, dls, nn.MSELoss(), lr=lr,
cbs=cbs+xtra, opt_func=optim.AdamW)
learn.fit(epochs)
```
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<div>
<progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
0.00% [0/1 00:00<?]
</div>
<div>
<progress value='55' class='' max='1252' style='width:300px; height:20px; vertical-align: middle;'></progress>
4.39% [55/1252 00:23<08:42 0.084]
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/franckalbinet/lssm",
"name": "lssm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "nbdev jupyter notebook python",
"author": "Franck Albinet",
"author_email": "franckalbinet@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/54/f6/6c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe/lssm-0.0.10.tar.gz",
"platform": null,
"description": "# Large Soil Spectral Models (LSSM)\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\nThis is a Python package allowing to reproduce the research work done by\n[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the\ncontext of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled\n**\u201cMultiscale Characterization of Exchangeable Potassium Content in Soil\nto Remediate Agricultural Land Affected by Radioactive Contamination\nusing Machine Learning, Soil Spectroscopy and Remote Sensing\u201d**.\n\n**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,\nDercon, G., 2022. Prediction of exchangeable potassium in soil through\nmid-infrared spectroscopy and deep learning: From prediction to\nexplainability. Artificial Intelligence in Agriculture 6,\n230\u2013241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)\ninvestigated the possibility to predict exchangeable potassium in soil\nusing large Mid-infrared soil spectral libraries and Deep Learning. Code\navailable [here](https://github.com/franckalbinet/mirzai).\n\nWe are now **exploring the potential to characterize and predict\nexchangeable potassium using both Near- and Mid-infrared soil\nspectroscopy, with a focus on leveraging advanced Deep Learning models\nsuch as ResNet and ViT transformers through transfer learning**.\n\n*Our Deep Learning pipeline is primarily based on the approach described\nby [Jeremy Howard](https://github.com/fastai/course22p2)*.\n\n## Install\n\n``` sh\npip install lssm\n```\n\n## Getting started\n\nWe demonstrate a typical workflow below to showcase our method.\n\n``` python\nfrom pathlib import Path\nfrom functools import partial\n\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.model_selection import train_test_split\n\nfrom torch import optim, nn\n\nimport timm\n\nfrom torcheval.metrics import R2Score\nfrom torch.optim import lr_scheduler\nfrom lssm.loading import load_ossl\nfrom lssm.learner import Learner\nfrom lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p\nfrom lssm.dataloaders import SpectralDataset, get_dls\nfrom lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,\n DeviceCB, TrainCB, ProgressCB)\nfrom lssm.transforms import GADFTfm, _resizeTfm, StatsTfm\n```\n\n### Loading training & validation data\n\n1. Load model from `timm` python package, Deep Learning\n State-Of-The-Art (SOTA) pre-trained models:\n\n``` python\nmodel_name = 'resnet18'\nmodel = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)\n```\n\n2. Automatically download large spectral libraries developed by our\n colleagues at [WCRC](https://www.woodwellclimate.org). We focus on\n exchangeable potassium in the example below:\n\n``` python\nanalytes = 'k.ext_usda.a725_cmolc.kg'\ndata = load_ossl(analytes, spectra_type='visnir')\nX, y, X_names, smp_idx, ds_name, ds_label = data\n```\n\n Reading & selecting data ...\n\n3. A bit of data features and target preprocessing:\n\n``` python\nX = Pipeline([('to_abs', ToAbsorbance()), \n ('cr', ContinuumRemoval(X_names))]).fit_transform(X)\n\ny = Log1p().fit_transform(y)\n```\n\n 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 44489/44489 [00:15<00:00, 2850.84it/s]\n\n4. Typical train/test split to get a train and valid dataset:\n\n``` python\nn_smp = 5000 # For demo. purpose (in reality we have > 50K)\nX_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], \n test_size=0.1,\n stratify=ds_name[:n_smp], \n random_state=41)\n```\n\n5. Finally, creating a custom PyTorch `DataLoader`:\n\n``` python\ntrain_ds, valid_ds = [SpectralDataset(X, y, ) \n for X, y, in [(X_train, y_train), (X_valid, y_valid)]]\n\n# Then PyTorch dataloaders\ndls = get_dls(train_ds, valid_ds, bs=32)\n```\n\n### Training\n\n``` python\nepochs = 1\nlr = 5e-3\n\n# We use `r2` along to assess performance\nmetrics = MetricsCB(r2=R2Score())\n\n# We use Once Cycle Learning Rate scheduling approach\ntmax = epochs * len(dls.train)\nsched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)\n\n# A series of preprocessing performed on GPUs\n# - put to GPU\n# - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)\n# - resize the 2D version\n# - apply pre-trained model stats\nxtra = [BatchSchedCB(sched)]\ngadf = BatchTransformCB(GADFTfm())\nresize = BatchTransformCB(_resizeTfm)\nstats = BatchTransformCB(StatsTfm(model.default_cfg))\n\ncbs = [DeviceCB(), gadf, resize, stats, TrainCB(), \n metrics, ProgressCB(plot=False)]\n\nlearn = Learner(model, dls, nn.MSELoss(), lr=lr, \n cbs=cbs+xtra, opt_func=optim.AdamW)\n\nlearn.fit(epochs)\n```\n\n<style>\n /* Turns off some styling */\n progress {\n /* gets rid of default border in Firefox and Opera. */\n border: none;\n /* Needs to be in here for Safari polyfill so background images work as expected. */\n background-size: auto;\n }\n progress:not([value]), progress:not([value])::-webkit-progress-bar {\n background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);\n }\n .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n background: #F44336;\n }\n</style>\n\n <div>\n <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>\n 0.00% [0/1 00:00<?]\n </div>\n \n <div>\n <progress value='55' class='' max='1252' style='width:300px; height:20px; vertical-align: middle;'></progress>\n 4.39% [55/1252 00:23<08:42 0.084]\n </div>\n \n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)",
"version": "0.0.10",
"project_urls": {
"Homepage": "https://github.com/franckalbinet/lssm"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "34bb8dd71ec49df5a4eb2ae65b815bc239cef9ecd38dbc733d9eb1c10e20d32c",
"md5": "84af23ed6d29fbc262b202ff2ef84d10",
"sha256": "268cac6fc5d10e7b8c36fc29c4e616deb9fb2397974935c363cdfdcdd1b6b167"
},
"downloads": -1,
"filename": "lssm-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "84af23ed6d29fbc262b202ff2ef84d10",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 25102,
"upload_time": "2024-07-02T15:02:08",
"upload_time_iso_8601": "2024-07-02T15:02:08.440538Z",
"url": "https://files.pythonhosted.org/packages/34/bb/8dd71ec49df5a4eb2ae65b815bc239cef9ecd38dbc733d9eb1c10e20d32c/lssm-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "54f66c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe",
"md5": "a5f8d14f5237147e1218be1f45b5af59",
"sha256": "553274c34c245d6da643cd5cc4819747541860936e0206c22992b1e561b6b496"
},
"downloads": -1,
"filename": "lssm-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "a5f8d14f5237147e1218be1f45b5af59",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 25669,
"upload_time": "2024-07-02T15:02:11",
"upload_time_iso_8601": "2024-07-02T15:02:11.435650Z",
"url": "https://files.pythonhosted.org/packages/54/f6/6c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe/lssm-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-02 15:02:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "franckalbinet",
"github_project": "lssm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lssm"
}