iclearn


Nameiclearn JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
SummaryA collection of utilities for machine learning applications.
upload_time2025-09-17 16:09:37
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords machine learning workflow hpc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # iclearn

`iclearn` is a tool for standardizing distributed machine-learning workflows at ICHEC. It will allow us to develop a common set of performance benchmarking, profiling and optimization tools and apply them to ML workflows across scientific domains.

## Design ##

![Top-level library architecture](./docs/media/iclearn_toplevel_arch.png)

The top-level library architecture is shown above. A machine learning experiment is defined via a YAML file and launched via the CLI. Resources addressed in the YAML are loaded from a range of libraries, which are built out per-domain (e.g. Earth Observation) or per-framework (e.g. PyTorch). Libraries can include ML models and datasets, but also specialized metrics calculators, output handlers and profiling tools.

Once resources are loaded a machine learning experiment is executed in a `Session` using supported frameworks, primarily the PyTorch ecosystem at the moment, but others are planned.

![Library integration](./docs/media/iclearn_library_integration.png)

Practical integration of a third-party library is shown in the figure above. A config file is read through the CLI. Models, dataloaders and similar are loaded from third party libraries by 'provider' callbacks which take 'resource IDs' from the config and provide corresponding Python objects. The Python objects are derived from `iclearn` base classes and implement event handlers for different stages of a machine learning workflow, such as training steps, testing or inference.

A sample yaml file for a machine learning training session is shown below:

``` yaml
name: linear_train
dataloader:
  batch_size: 64
  dataset:
    name: linear
model:
  name: "torch.linear"
  framework: "pytorch"
  optimizer:
    name: "torch.SGD"
    learning_rate: 0.001
  loss_function: "torch.MSELoss"
outputs:
  - name: "logging"
  - name: "plotting"
    active: false
with_profiling: false
num_epochs: 10
num_batches: 0
```

This includes named PyTorch models or model elements, e.g. `torch.linear` and `torch.SGD` and their parameters, a named dataset `linear` and named output handlers `plotting` and `logging`.

A third party library may expose custom datasets `my_library.my_dataset` or output handlers `my_library.mlflow`, `my_library.my_grid_plotter`. 

with a simple implementation via inheritance from `iclearn` templates, as shown below.

```python
from iclearn.data import Dataloader, Splits
from iclearn.model import Model, Metrics

class MyModel(Model):

    def __init__(metrics: Metrics):
        super(metrics = metrics, MyOptimizer(MyLossFunc()))
        
    def predict(self, x):
        return ...
        
class MyDataloader(Dataloader):

    def load_dataset(root: Path, name: str, splits):
        return ...
        
    def load_dataloader(name: str):
        return...
```

As a real example of launching a CLI with a config you can train a simple built-in linear regression with:

``` shell
iclearn train --config test/data/experiments/linear_train.yaml
```

In practice you would launch your own program that includes functionality for providing your custom library resources via callbacks, giving something like:

``` shell
my_custom_pipeline train  --config my_experiment.yaml
```

## Installing ##

The package is available on PyPI, you can install the base package with:

``` shell
pip install iclearn
```

Most functionality so far uses PyTorch, you can install the PyTorch add-ons with:

``` shell
pip install 'iclearn[torch]'
```

## License ##

This software is Copyright ICHEC 2024 and can be re-used under the terms of the GPL v3+. See the included `LICENSE` file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "iclearn",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Machine Learning, Workflow, HPC",
    "author": null,
    "author_email": "\"James Grogan, Irish Centre for High End Computing\" <james.grogan@ichec.ie>",
    "download_url": "https://files.pythonhosted.org/packages/b4/8c/0a5f3448eca0415b9fa1062a81447f91440154a739420d18a20f2b4a3aeb/iclearn-0.1.5.tar.gz",
    "platform": null,
    "description": "# iclearn\n\n`iclearn` is a tool for standardizing distributed machine-learning workflows at ICHEC. It will allow us to develop a common set of performance benchmarking, profiling and optimization tools and apply them to ML workflows across scientific domains.\n\n## Design ##\n\n![Top-level library architecture](./docs/media/iclearn_toplevel_arch.png)\n\nThe top-level library architecture is shown above. A machine learning experiment is defined via a YAML file and launched via the CLI. Resources addressed in the YAML are loaded from a range of libraries, which are built out per-domain (e.g. Earth Observation) or per-framework (e.g. PyTorch). Libraries can include ML models and datasets, but also specialized metrics calculators, output handlers and profiling tools.\n\nOnce resources are loaded a machine learning experiment is executed in a `Session` using supported frameworks, primarily the PyTorch ecosystem at the moment, but others are planned.\n\n![Library integration](./docs/media/iclearn_library_integration.png)\n\nPractical integration of a third-party library is shown in the figure above. A config file is read through the CLI. Models, dataloaders and similar are loaded from third party libraries by 'provider' callbacks which take 'resource IDs' from the config and provide corresponding Python objects. The Python objects are derived from `iclearn` base classes and implement event handlers for different stages of a machine learning workflow, such as training steps, testing or inference.\n\nA sample yaml file for a machine learning training session is shown below:\n\n``` yaml\nname: linear_train\ndataloader:\n  batch_size: 64\n  dataset:\n    name: linear\nmodel:\n  name: \"torch.linear\"\n  framework: \"pytorch\"\n  optimizer:\n    name: \"torch.SGD\"\n    learning_rate: 0.001\n  loss_function: \"torch.MSELoss\"\noutputs:\n  - name: \"logging\"\n  - name: \"plotting\"\n    active: false\nwith_profiling: false\nnum_epochs: 10\nnum_batches: 0\n```\n\nThis includes named PyTorch models or model elements, e.g. `torch.linear` and `torch.SGD` and their parameters, a named dataset `linear` and named output handlers `plotting` and `logging`.\n\nA third party library may expose custom datasets `my_library.my_dataset` or output handlers `my_library.mlflow`, `my_library.my_grid_plotter`. \n\nwith a simple implementation via inheritance from `iclearn` templates, as shown below.\n\n```python\nfrom iclearn.data import Dataloader, Splits\nfrom iclearn.model import Model, Metrics\n\nclass MyModel(Model):\n\n    def __init__(metrics: Metrics):\n        super(metrics = metrics, MyOptimizer(MyLossFunc()))\n        \n    def predict(self, x):\n        return ...\n        \nclass MyDataloader(Dataloader):\n\n    def load_dataset(root: Path, name: str, splits):\n        return ...\n        \n    def load_dataloader(name: str):\n        return...\n```\n\nAs a real example of launching a CLI with a config you can train a simple built-in linear regression with:\n\n``` shell\niclearn train --config test/data/experiments/linear_train.yaml\n```\n\nIn practice you would launch your own program that includes functionality for providing your custom library resources via callbacks, giving something like:\n\n``` shell\nmy_custom_pipeline train  --config my_experiment.yaml\n```\n\n## Installing ##\n\nThe package is available on PyPI, you can install the base package with:\n\n``` shell\npip install iclearn\n```\n\nMost functionality so far uses PyTorch, you can install the PyTorch add-ons with:\n\n``` shell\npip install 'iclearn[torch]'\n```\n\n## License ##\n\nThis software is Copyright ICHEC 2024 and can be re-used under the terms of the GPL v3+. See the included `LICENSE` file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A collection of utilities for machine learning applications.",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://git.ichec.ie/performance/toolshed/iclearn",
        "Repository": "https://git.ichec.ie/performance/toolshed/iclearn"
    },
    "split_keywords": [
        "machine learning",
        " workflow",
        " hpc"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6a71343588e7e1112963f6e7709c6cd40499281e2f588d2bfa407dfb1bf87af9",
                "md5": "123531b4fad203b1367f94002aafd2a1",
                "sha256": "c7de9b74bc43c8114c30a8b8359e18a8844bc9794ad06cb24e355f71034a5cc1"
            },
            "downloads": -1,
            "filename": "iclearn-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "123531b4fad203b1367f94002aafd2a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 56507,
            "upload_time": "2025-09-17T16:09:35",
            "upload_time_iso_8601": "2025-09-17T16:09:35.899091Z",
            "url": "https://files.pythonhosted.org/packages/6a/71/343588e7e1112963f6e7709c6cd40499281e2f588d2bfa407dfb1bf87af9/iclearn-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b48c0a5f3448eca0415b9fa1062a81447f91440154a739420d18a20f2b4a3aeb",
                "md5": "4d58318010c6a868b6aefa099008b909",
                "sha256": "04c8e6dfbb449695138b018ef5350628a0a1e5a727e86ec892b9dcaa5b18007e"
            },
            "downloads": -1,
            "filename": "iclearn-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "4d58318010c6a868b6aefa099008b909",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 47172,
            "upload_time": "2025-09-17T16:09:37",
            "upload_time_iso_8601": "2025-09-17T16:09:37.770495Z",
            "url": "https://files.pythonhosted.org/packages/b4/8c/0a5f3448eca0415b9fa1062a81447f91440154a739420d18a20f2b4a3aeb/iclearn-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-17 16:09:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "iclearn"
}
        
Elapsed time: 1.72936s