lssm

Name	lssm JSON
Version	0.0.10 JSON
	download
home_page	https://github.com/franckalbinet/lssm
Summary	Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
upload_time	2024-07-02 15:02:11
maintainer	None
docs_url	None
author	Franck Albinet
requires_python	>=3.7
license	Apache Software License 2.0
keywords	nbdev jupyter notebook python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Large Soil Spectral Models (LSSM)


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This is a Python package allowing to reproduce the research work done by
[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the
context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled
**“Multiscale Characterization of Exchangeable Potassium Content in Soil
to Remediate Agricultural Land Affected by Radioactive Contamination
using Machine Learning, Soil Spectroscopy and Remote Sensing”**.

**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,
Dercon, G., 2022. Prediction of exchangeable potassium in soil through
mid-infrared spectroscopy and deep learning: From prediction to
explainability. Artificial Intelligence in Agriculture 6,
230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)
investigated the possibility to predict exchangeable potassium in soil
using large Mid-infrared soil spectral libraries and Deep Learning. Code
available [here](https://github.com/franckalbinet/mirzai).

We are now **exploring the potential to characterize and predict
exchangeable potassium using both Near- and Mid-infrared soil
spectroscopy, with a focus on leveraging advanced Deep Learning models
such as ResNet and ViT transformers through transfer learning**.

*Our Deep Learning pipeline is primarily based on the approach described
by [Jeremy Howard](https://github.com/fastai/course22p2)*.

## Install

``` sh
pip install lssm
```

## Getting started

We demonstrate a typical workflow below to showcase our method.

``` python
from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
                            DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
```

### Loading training & validation data

1.  Load model from `timm` python package, Deep Learning
    State-Of-The-Art (SOTA) pre-trained models:

``` python
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
```

2.  Automatically download large spectral libraries developed by our
    colleagues at [WCRC](https://www.woodwellclimate.org). We focus on
    exchangeable potassium in the example below:

``` python
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
```

    Reading & selecting data ...

3.  A bit of data features and target preprocessing:

``` python
X = Pipeline([('to_abs', ToAbsorbance()), 
              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)
```

    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]

4.  Typical train/test split to get a train and valid dataset:

``` python
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 
                                                      test_size=0.1,
                                                      stratify=ds_name[:n_smp], 
                                                      random_state=41)
```

5.  Finally, creating a custom PyTorch `DataLoader`:

``` python
train_ds, valid_ds = [SpectralDataset(X, y, ) 
                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
```

### Training

``` python
epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
#    - put to GPU
#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
#    - resize the 2D version
#    - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 
       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 
                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)
```

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>

    <div>
      <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
      0.00% [0/1 00:00&lt;?]
    </div>
    &#10;
&#10;    <div>
      <progress value='55' class='' max='1252' style='width:300px; height:20px; vertical-align: middle;'></progress>
      4.39% [55/1252 00:23&lt;08:42 0.084]
    </div>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/franckalbinet/lssm",
    "name": "lssm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "Franck Albinet",
    "author_email": "franckalbinet@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/54/f6/6c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe/lssm-0.0.10.tar.gz",
    "platform": null,
    "description": "# Large Soil Spectral Models (LSSM)\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\nThis is a Python package allowing to reproduce the research work done by\n[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the\ncontext of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled\n**\u201cMultiscale Characterization of Exchangeable Potassium Content in Soil\nto Remediate Agricultural Land Affected by Radioactive Contamination\nusing Machine Learning, Soil Spectroscopy and Remote Sensing\u201d**.\n\n**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,\nDercon, G., 2022. Prediction of exchangeable potassium in soil through\nmid-infrared spectroscopy and deep learning: From prediction to\nexplainability. Artificial Intelligence in Agriculture 6,\n230\u2013241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)\ninvestigated the possibility to predict exchangeable potassium in soil\nusing large Mid-infrared soil spectral libraries and Deep Learning. Code\navailable [here](https://github.com/franckalbinet/mirzai).\n\nWe are now **exploring the potential to characterize and predict\nexchangeable potassium using both Near- and Mid-infrared soil\nspectroscopy, with a focus on leveraging advanced Deep Learning models\nsuch as ResNet and ViT transformers through transfer learning**.\n\n*Our Deep Learning pipeline is primarily based on the approach described\nby [Jeremy Howard](https://github.com/fastai/course22p2)*.\n\n## Install\n\n``` sh\npip install lssm\n```\n\n## Getting started\n\nWe demonstrate a typical workflow below to showcase our method.\n\n``` python\nfrom pathlib import Path\nfrom functools import partial\n\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.model_selection import train_test_split\n\nfrom torch import optim, nn\n\nimport timm\n\nfrom torcheval.metrics import R2Score\nfrom torch.optim import lr_scheduler\nfrom lssm.loading import load_ossl\nfrom lssm.learner import Learner\nfrom lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p\nfrom lssm.dataloaders import SpectralDataset, get_dls\nfrom lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,\n                            DeviceCB, TrainCB, ProgressCB)\nfrom lssm.transforms import GADFTfm, _resizeTfm, StatsTfm\n```\n\n### Loading training & validation data\n\n1.  Load model from `timm` python package, Deep Learning\n    State-Of-The-Art (SOTA) pre-trained models:\n\n``` python\nmodel_name = 'resnet18'\nmodel = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)\n```\n\n2.  Automatically download large spectral libraries developed by our\n    colleagues at [WCRC](https://www.woodwellclimate.org). We focus on\n    exchangeable potassium in the example below:\n\n``` python\nanalytes = 'k.ext_usda.a725_cmolc.kg'\ndata = load_ossl(analytes, spectra_type='visnir')\nX, y, X_names, smp_idx, ds_name, ds_label = data\n```\n\n    Reading & selecting data ...\n\n3.  A bit of data features and target preprocessing:\n\n``` python\nX = Pipeline([('to_abs', ToAbsorbance()), \n              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)\n\ny = Log1p().fit_transform(y)\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 44489/44489 [00:15<00:00, 2850.84it/s]\n\n4.  Typical train/test split to get a train and valid dataset:\n\n``` python\nn_smp = 5000 # For demo. purpose (in reality we have > 50K)\nX_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], \n                                                      test_size=0.1,\n                                                      stratify=ds_name[:n_smp], \n                                                      random_state=41)\n```\n\n5.  Finally, creating a custom PyTorch `DataLoader`:\n\n``` python\ntrain_ds, valid_ds = [SpectralDataset(X, y, ) \n                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]\n\n# Then PyTorch dataloaders\ndls = get_dls(train_ds, valid_ds, bs=32)\n```\n\n### Training\n\n``` python\nepochs = 1\nlr = 5e-3\n\n# We use `r2` along to assess performance\nmetrics = MetricsCB(r2=R2Score())\n\n# We use Once Cycle Learning Rate scheduling approach\ntmax = epochs * len(dls.train)\nsched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)\n\n# A series of preprocessing performed on GPUs\n#    - put to GPU\n#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)\n#    - resize the 2D version\n#    - apply pre-trained model stats\nxtra = [BatchSchedCB(sched)]\ngadf = BatchTransformCB(GADFTfm())\nresize = BatchTransformCB(_resizeTfm)\nstats = BatchTransformCB(StatsTfm(model.default_cfg))\n\ncbs = [DeviceCB(), gadf, resize, stats, TrainCB(), \n       metrics, ProgressCB(plot=False)]\n\nlearn = Learner(model, dls, nn.MSELoss(), lr=lr, \n                cbs=cbs+xtra, opt_func=optim.AdamW)\n\nlearn.fit(epochs)\n```\n\n<style>\n    /* Turns off some styling */\n    progress {\n        /* gets rid of default border in Firefox and Opera. */\n        border: none;\n        /* Needs to be in here for Safari polyfill so background images work as expected. */\n        background-size: auto;\n    }\n    progress:not([value]), progress:not([value])::-webkit-progress-bar {\n        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);\n    }\n    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n        background: #F44336;\n    }\n</style>\n\n    <div>\n      <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>\n      0.00% [0/1 00:00&lt;?]\n    </div>\n    &#10;\n&#10;    <div>\n      <progress value='55' class='' max='1252' style='width:300px; height:20px; vertical-align: middle;'></progress>\n      4.39% [55/1252 00:23&lt;08:42 0.084]\n    </div>\n    \n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)",
    "version": "0.0.10",
    "project_urls": {
        "Homepage": "https://github.com/franckalbinet/lssm"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34bb8dd71ec49df5a4eb2ae65b815bc239cef9ecd38dbc733d9eb1c10e20d32c",
                "md5": "84af23ed6d29fbc262b202ff2ef84d10",
                "sha256": "268cac6fc5d10e7b8c36fc29c4e616deb9fb2397974935c363cdfdcdd1b6b167"
            },
            "downloads": -1,
            "filename": "lssm-0.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "84af23ed6d29fbc262b202ff2ef84d10",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 25102,
            "upload_time": "2024-07-02T15:02:08",
            "upload_time_iso_8601": "2024-07-02T15:02:08.440538Z",
            "url": "https://files.pythonhosted.org/packages/34/bb/8dd71ec49df5a4eb2ae65b815bc239cef9ecd38dbc733d9eb1c10e20d32c/lssm-0.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "54f66c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe",
                "md5": "a5f8d14f5237147e1218be1f45b5af59",
                "sha256": "553274c34c245d6da643cd5cc4819747541860936e0206c22992b1e561b6b496"
            },
            "downloads": -1,
            "filename": "lssm-0.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "a5f8d14f5237147e1218be1f45b5af59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 25669,
            "upload_time": "2024-07-02T15:02:11",
            "upload_time_iso_8601": "2024-07-02T15:02:11.435650Z",
            "url": "https://files.pythonhosted.org/packages/54/f6/6c240e9ef64381201532e7cc6871fb837591dfac3edcd8aec012aa43c2fe/lssm-0.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-02 15:02:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "franckalbinet",
    "github_project": "lssm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lssm"
}

Franck Albinet