epoch-engine

Name	epoch-engine JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Trainer and evaluator for PyTorch models with a focus on simplicity and flexibility.
upload_time	2025-08-05 20:00:46
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	pytorch deep-learning
VCS
bugtrack_url
requirements	torch torchvision tqdm scikit-learn
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Epoch Engine - Python Library for training PyTorch models

[![PyPI](https://img.shields.io/pypi/v/epoch-engine)](https://pypi.org/project/epoch-engine/)
[![Publish](https://github.com/spolivin/epoch-engine/actions/workflows/publish.yml/badge.svg)](https://github.com/spolivin/epoch-engine/actions/workflows/publish.yml)
[![License](https://img.shields.io/github/license/spolivin/epoch-engine)](https://github.com/spolivin/epoch-engine/blob/master/LICENSE.txt)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://pre-commit.com/)

This project represents my attempt to come up with a convenient way to train neural nets coded in Torch. While being aware of already existing libraries for training PyTorch models (e.g. PyTorch Lightning), my idea here is to make training of the models more visual and understandable as to what is going on during training.

The project is currently in its raw form, more changes expected.

## Features

* TQDM-Progress bar support for both training and validation loops
* Intemediate metrics computations after each forward pass (loss is computed by default + added support for flexibly defining metrics from `scikit-learn`)
* Saving/loading checkpoints from/into Trainer directly without having to touch model, optimizer or scheduler separately
* Resuming training from the loaded checkpoint with epoch number being remembered automatically to avoid having to remember from which epoch the training originally started
* Ready-to-use neural net architectures coded from scratch (currently only 4-layer Encoder-Decoder and ResNet with 20 layers architectures are available)

## Installation

The package can be installed as follows:

```bash
# Installing the main package
pip install epoch-engine

# Installing additional optional dependencies
pip install epoch-engine[build,linters]
```

### Development mode

```bash
# Cloning the repo and moving to the repo dir
git clone https://github.com/spolivin/epoch-engine.git
cd epoch-engine

# Installing the package and optional deps (dev mode)
pip install -e .
pip install -e .[build,linters]
```

### Pre-commit support

The repository provides support for running pre-commit checks via hooks defined in `.pre-commit-config.yaml`. These can be loaded in the current git repository by running:

```bash
pre-commit install
```
> `pre-commit` will already be loaded to the venv after running `pip install epoch-engine[linters]` or `pip install -e .[linters]`

## Python API

Let's suppose that we have constructed a model in PyTorch called `model` and set up the loss function we would like to optimize:

```python
import torch.nn as nn

from epoch_engine.models.architectures import BasicBlock, ResNet

# Instantiating a ResNet model for gray-scale images
net = ResNet(
    in_channels=1,
    block=BasicBlock,
    num_blocks=[3, 3, 3],
    num_classes=10,
)
loss_func = nn.CrossEntropyLoss()
```

We have also the already prepared dataloaders: `train_loader` and `valid_loader` for training and validation sets respectively. Then, we can set up the Trainer as follows:

```python
import torch
from sklearn.metrics import accuracy_score

# Instantiating a Trainer (with auto device detection)
trainer = Trainer(
    model=net,
    criterion=loss_func,
    train_loader=train_loader,
    valid_loader=valid_loader,
    train_on="auto",
)

# Setting up an additional metric to track (loss is computed and displayed by default)
trainer.register_metric("accuracy", accuracy_score)

# Setting up optimizer and scheduler
trainer.configure_optimizers(
    optimizer_class=torch.optim.SGD,
    optimizer_params={
        "lr": 0.25,
        "momentum": 0.75,
    },
    scheduler_class=torch.optim.lr_scheduler.StepLR,
    scheduler_params={"step_size": 2, "gamma": 0.1},
)
```
> Parameter `train_on` will detect whether CUDA, MPS or CPU is to be used (when initialized with `"auto"`). We can register an additional metric to track which will be shown for training and validation sets after each epoch. Lastly, we need to configure optimizer and scheduler to use (scheduler-related parameters can be omitted if we do not want to use scheduler).

Now, we launch the training which will show the TQDM-progress bar (if enabled):

```python
# Launching training
trainer.run(
    epochs=5,
    seed=42,
    enable_tqdm=True,
    )
```

### Loading/saving checkpoints

After running training, we can save the state dict for the model and optimizer for later use (resuming training) as follows:

```python
trainer.save_checkpoint("checkpoints/checkpoint.pt")
```
> This method will also save the latest epoch at which the training stopped (useful for displaying the next epoch during new training run).

We can also load the state dict back into the model and optimizer if available before running a new training:

```python
trainer.load_checkpoint(path="checkpoints/checkpoint.pt")
```
> One needs to make sure in this case that the optimizer defined in the Trainer matches with that for which checkpoint was saved.

### Test script

The basics of the developed API are presented in the `run_trainer.py` I built in the root of the repository. It can be run for instance as follows:

```bash
python run_trainer.py --model=resnet --epochs=3 --batch-size=16
```
> The training will be launched on the device automatically derived based on the CUDA availability and the final training checkpoint will be saved in `checkpoints` directory.

One can also resume the training from the saved checkpoint:

```bash
python run_trainer.py --model=resnet --epochs=4 --resume-training=True --ckpt-path=checkpoints/ckpt_3.pt
```
> The training will be resumed from the loaded checkpoint with TQDM-progress bar showing the next training epoch.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "epoch-engine",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pytorch, deep-learning",
    "author": null,
    "author_email": "Sergey Polivin <s.polivin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ab/3d/406ccbacb3b4dcdcdbcbb7c239e01982cee720d7184d6aea5a1314d184fc/epoch_engine-0.1.1.tar.gz",
    "platform": null,
    "description": "# Epoch Engine - Python Library for training PyTorch models\n\n[![PyPI](https://img.shields.io/pypi/v/epoch-engine)](https://pypi.org/project/epoch-engine/)\n[![Publish](https://github.com/spolivin/epoch-engine/actions/workflows/publish.yml/badge.svg)](https://github.com/spolivin/epoch-engine/actions/workflows/publish.yml)\n[![License](https://img.shields.io/github/license/spolivin/epoch-engine)](https://github.com/spolivin/epoch-engine/blob/master/LICENSE.txt)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://pre-commit.com/)\n\nThis project represents my attempt to come up with a convenient way to train neural nets coded in Torch. While being aware of already existing libraries for training PyTorch models (e.g. PyTorch Lightning), my idea here is to make training of the models more visual and understandable as to what is going on during training.\n\nThe project is currently in its raw form, more changes expected.\n\n## Features\n\n* TQDM-Progress bar support for both training and validation loops\n* Intemediate metrics computations after each forward pass (loss is computed by default + added support for flexibly defining metrics from `scikit-learn`)\n* Saving/loading checkpoints from/into Trainer directly without having to touch model, optimizer or scheduler separately\n* Resuming training from the loaded checkpoint with epoch number being remembered automatically to avoid having to remember from which epoch the training originally started\n* Ready-to-use neural net architectures coded from scratch (currently only 4-layer Encoder-Decoder and ResNet with 20 layers architectures are available)\n\n## Installation\n\nThe package can be installed as follows:\n\n```bash\n# Installing the main package\npip install epoch-engine\n\n# Installing additional optional dependencies\npip install epoch-engine[build,linters]\n```\n\n### Development mode\n\n```bash\n# Cloning the repo and moving to the repo dir\ngit clone https://github.com/spolivin/epoch-engine.git\ncd epoch-engine\n\n# Installing the package and optional deps (dev mode)\npip install -e .\npip install -e .[build,linters]\n```\n\n### Pre-commit support\n\nThe repository provides support for running pre-commit checks via hooks defined in `.pre-commit-config.yaml`. These can be loaded in the current git repository by running:\n\n```bash\npre-commit install\n```\n> `pre-commit` will already be loaded to the venv after running `pip install epoch-engine[linters]` or `pip install -e .[linters]`\n\n## Python API\n\nLet's suppose that we have constructed a model in PyTorch called `model` and set up the loss function we would like to optimize:\n\n```python\nimport torch.nn as nn\n\nfrom epoch_engine.models.architectures import BasicBlock, ResNet\n\n# Instantiating a ResNet model for gray-scale images\nnet = ResNet(\n    in_channels=1,\n    block=BasicBlock,\n    num_blocks=[3, 3, 3],\n    num_classes=10,\n)\nloss_func = nn.CrossEntropyLoss()\n```\n\nWe have also the already prepared dataloaders: `train_loader` and `valid_loader` for training and validation sets respectively. Then, we can set up the Trainer as follows:\n\n```python\nimport torch\nfrom sklearn.metrics import accuracy_score\n\n# Instantiating a Trainer (with auto device detection)\ntrainer = Trainer(\n    model=net,\n    criterion=loss_func,\n    train_loader=train_loader,\n    valid_loader=valid_loader,\n    train_on=\"auto\",\n)\n\n# Setting up an additional metric to track (loss is computed and displayed by default)\ntrainer.register_metric(\"accuracy\", accuracy_score)\n\n# Setting up optimizer and scheduler\ntrainer.configure_optimizers(\n    optimizer_class=torch.optim.SGD,\n    optimizer_params={\n        \"lr\": 0.25,\n        \"momentum\": 0.75,\n    },\n    scheduler_class=torch.optim.lr_scheduler.StepLR,\n    scheduler_params={\"step_size\": 2, \"gamma\": 0.1},\n)\n```\n> Parameter `train_on` will detect whether CUDA, MPS or CPU is to be used (when initialized with `\"auto\"`). We can register an additional metric to track which will be shown for training and validation sets after each epoch. Lastly, we need to configure optimizer and scheduler to use (scheduler-related parameters can be omitted if we do not want to use scheduler).\n\nNow, we launch the training which will show the TQDM-progress bar (if enabled):\n\n```python\n# Launching training\ntrainer.run(\n    epochs=5,\n    seed=42,\n    enable_tqdm=True,\n    )\n```\n\n### Loading/saving checkpoints\n\nAfter running training, we can save the state dict for the model and optimizer for later use (resuming training) as follows:\n\n```python\ntrainer.save_checkpoint(\"checkpoints/checkpoint.pt\")\n```\n> This method will also save the latest epoch at which the training stopped (useful for displaying the next epoch during new training run).\n\nWe can also load the state dict back into the model and optimizer if available before running a new training:\n\n```python\ntrainer.load_checkpoint(path=\"checkpoints/checkpoint.pt\")\n```\n> One needs to make sure in this case that the optimizer defined in the Trainer matches with that for which checkpoint was saved.\n\n### Test script\n\nThe basics of the developed API are presented in the `run_trainer.py` I built in the root of the repository. It can be run for instance as follows:\n\n```bash\npython run_trainer.py --model=resnet --epochs=3 --batch-size=16\n```\n> The training will be launched on the device automatically derived based on the CUDA availability and the final training checkpoint will be saved in `checkpoints` directory.\n\nOne can also resume the training from the saved checkpoint:\n\n```bash\npython run_trainer.py --model=resnet --epochs=4 --resume-training=True --ckpt-path=checkpoints/ckpt_3.pt\n```\n> The training will be resumed from the loaded checkpoint with TQDM-progress bar showing the next training epoch.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Trainer and evaluator for PyTorch models with a focus on simplicity and flexibility.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/spolivin/epoch-engine",
        "Issues": "https://github.com/spolivin/epoch-engine/issues",
        "Pull-requests": "https://github.com/spolivin/epoch-engine/pulls",
        "Source": "https://github.com/spolivin/epoch-engine/tree/master/epoch_engine"
    },
    "split_keywords": [
        "pytorch",
        " deep-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e3d4da6e90d44b1baa1475b875a1770d32e7000734973afb7c4a71a7bee3dc2",
                "md5": "6f6894026ea46554657fbce3ab873fc8",
                "sha256": "5ddcf78b717eb29f5c506c8872f20937aba57f23609ec0b45562ded8143a7c61"
            },
            "downloads": -1,
            "filename": "epoch_engine-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6f6894026ea46554657fbce3ab873fc8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12531,
            "upload_time": "2025-08-05T20:00:45",
            "upload_time_iso_8601": "2025-08-05T20:00:45.231429Z",
            "url": "https://files.pythonhosted.org/packages/7e/3d/4da6e90d44b1baa1475b875a1770d32e7000734973afb7c4a71a7bee3dc2/epoch_engine-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ab3d406ccbacb3b4dcdcdbcbb7c239e01982cee720d7184d6aea5a1314d184fc",
                "md5": "9edd93c2101d6c5dd2ff095a62d9da29",
                "sha256": "0ca61f24a90efc079285f5bd9f7458558b4b609a6e5f335c28ad4dd52411367e"
            },
            "downloads": -1,
            "filename": "epoch_engine-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9edd93c2101d6c5dd2ff095a62d9da29",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13330,
            "upload_time": "2025-08-05T20:00:46",
            "upload_time_iso_8601": "2025-08-05T20:00:46.245201Z",
            "url": "https://files.pythonhosted.org/packages/ab/3d/406ccbacb3b4dcdcdbcbb7c239e01982cee720d7184d6aea5a1314d184fc/epoch_engine-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-05 20:00:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "spolivin",
    "github_project": "epoch-engine",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.5.0"
                ]
            ]
        },
        {
            "name": "torchvision",
            "specs": [
                [
                    "==",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.67.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.7.1"
                ]
            ]
        }
    ],
    "lcname": "epoch-engine"
}

None