lc-checkpoint


Namelc-checkpoint JSON
Version 0.5.2 PyPI version JSON
download
home_pagehttps://pypi.org/project/lc-checkpoint/
SummaryA package for compressing PyTorch model checkpoints using the LC-Checkpoint method
upload_time2023-08-06 11:02:51
maintainer
docs_urlNone
authorDedy Van Hauten
requires_python>=3.8
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LC-Checkpoint

LC-Checkpoint is a Python package that implements the LC-Checkpoint method for compressing and checkpointing PyTorch models during training.

## Installation

You can install LC-Checkpoint using pip:

```
pip install lc_checkpoint
```

## Usage

To use LC-Checkpoint in your PyTorch training script, you can follow these steps:

1.  Import the LC-Checkpoint module:
    
    ```
    from lc_checkpoint import main as lc
    ```
    

    
2.  Initialize the LC-Checkpoint method with your PyTorch model, optimizer, loss function, and other hyperparameters:
    
    ```
    model = model()
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    criterion = nn.CrossEntropyLoss()
    checkpoint_dir = 'checkpoints/lc-checkpoint'
    num_buckets = 5
    num_bits = 32

    lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)
    ```
    
    
3.  Use the LC-Checkpoint method in your training loop:
    
    ```

    # Save base model weights
    init_save_dir = "checkpoints/initialstate.pt"
    torch.save(model.state_dict(), init_save_dir)
    prev_state_dict = model.state_dict()

    for epoch in range(epochs):  # loop over the dataset multiple times

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # Load the previous checkpoints if exist
            try:
                # Find the latest checkpoint file
                lc_checkpoint_files = glob.glob(os.path.join('checkpoints/lc-checkpoint', 'lc_checkpoint_epoch*.pt'))
                latest_checkpoint_file = max(lc_checkpoint_files, key=os.path.getctime)
                prev_state_dict, epoch_loaded = lc.load_checkpoint(latest_checkpoint_file)
                print('Restored latest checkpoint:', latest_checkpoint_file)
                latest_checkpoint_file_size = os.path.getsize(latest_checkpoint_file)
                latest_checkpoint_file_size_kb = latest_checkpoint_file_size / 1024
                print('Latest checkpoint file size:', latest_checkpoint_file_size_kb, 'KB')
                restore_time = time.time() - start_time
                total_restore_time += restore_time
                print('Time taken to restore checkpoint:', restore_time)
                start_time = time.time()  # reset the start time
                print('-' * 50)
            except:
                pass

            # Get the inputs and labels
            inputs, labels = data

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            new_state_dict = model.state_dict()
            new_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in new_state_dict.values()])  # convert each tensor to a numpy array and concatenate them
            prev_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in prev_state_dict.values()])  # convert each tensor to a numpy array and concatenate them
            δt = new_state_weights - prev_state_weights # Get delta
            prev_state_dict = new_state_dict

            # Save the checkpoint
            compressed_data, encoder = lc.compress_data(δt, num_bits=num_bits, k=num_buckets)
            save_start_time = time.time()  # record the start time
            lc.save_checkpoint('checkpoint.pt', compressed_data, epoch, i, encoder)
            save_time = time.time() - save_start_time  # calculate the time taken to save the checkpoint

            # Print statistics
            running_loss += loss.item()
            if i % 1 == 0:    # print every 1 mini-batches
                print('[Epoch: %d, Iteration: %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 1))
                running_loss = 0.0
                print('Time taken to save checkpoint:', save_time)
    ```
4. Using LC-Checkpoint to restore model.
```
last_epoch, last_iter = 30, 8
max_iter = 10
restored_model = restore_model(model(), last_epoch, last_iter, max_iter, init_save_dir)
```

## API Reference

### `lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)`

Initializes the LC-Checkpoint method with the given PyTorch model, optimizer, loss function, checkpoint directory, number of buckets, and number of bits.

### `lc.compress_data(δt, num_bits=num_bits, k=num_buckets, treshold=True)`

Compresses the model parameters and returns the compressed data.

### `lc.decode_data(encoded)`

Decodes the compressed data and returns the original model parameters.

### `lc.save_checkpoint(filename, compressed_data, epoch, iteration)`

Saves the compressed data to a file with the given filename, epoch, and iteration.

### `lc.load_checkpoint(filename)`

Loads the compressed data from a file with the given filename.

### `lc.restore_model(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name)`
Loads the compressed data into a pytorch model.

### `lc.restore_model_async(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name, n_cores)`
Loads the compressed data into a pytorch model asynchronously.

### `lc.calculate_compression_rate(prev_state_dict, num_bits=num_bits, num_buckets=num_buckets)`

Calculates the compression rate of the LC-Checkpoint method based on the previous state dictionary and the current number of bits and buckets.

## License

LC-Checkpoint is licensed under the MIT License. See the LICENSE file for more information.

## Acknowledgements

LC-Checkpoint is based on paper "On Efficient Constructions of Checkpoints" authored by Yu Chen, Zhenming Liu, Bin Ren, Xin Jin.

            

Raw data

            {
    "_id": null,
    "home_page": "https://pypi.org/project/lc-checkpoint/",
    "name": "lc-checkpoint",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Dedy Van Hauten",
    "author_email": "dedy.van@ui.ac.id",
    "download_url": "https://files.pythonhosted.org/packages/fa/0a/7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80/lc-checkpoint-0.5.2.tar.gz",
    "platform": null,
    "description": "# LC-Checkpoint\n\nLC-Checkpoint is a Python package that implements the LC-Checkpoint method for compressing and checkpointing PyTorch models during training.\n\n## Installation\n\nYou can install LC-Checkpoint using pip:\n\n```\npip install lc_checkpoint\n```\n\n## Usage\n\nTo use LC-Checkpoint in your PyTorch training script, you can follow these steps:\n\n1.  Import the LC-Checkpoint module:\n    \n    ```\n    from lc_checkpoint import main as lc\n    ```\n    \n\n    \n2.  Initialize the LC-Checkpoint method with your PyTorch model, optimizer, loss function, and other hyperparameters:\n    \n    ```\n    model = model()\n    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n    criterion = nn.CrossEntropyLoss()\n    checkpoint_dir = 'checkpoints/lc-checkpoint'\n    num_buckets = 5\n    num_bits = 32\n\n    lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)\n    ```\n    \n    \n3.  Use the LC-Checkpoint method in your training loop:\n    \n    ```\n\n    # Save base model weights\n    init_save_dir = \"checkpoints/initialstate.pt\"\n    torch.save(model.state_dict(), init_save_dir)\n    prev_state_dict = model.state_dict()\n\n    for epoch in range(epochs):  # loop over the dataset multiple times\n\n        running_loss = 0.0\n        for i, data in enumerate(trainloader, 0):\n            # Load the previous checkpoints if exist\n            try:\n                # Find the latest checkpoint file\n                lc_checkpoint_files = glob.glob(os.path.join('checkpoints/lc-checkpoint', 'lc_checkpoint_epoch*.pt'))\n                latest_checkpoint_file = max(lc_checkpoint_files, key=os.path.getctime)\n                prev_state_dict, epoch_loaded = lc.load_checkpoint(latest_checkpoint_file)\n                print('Restored latest checkpoint:', latest_checkpoint_file)\n                latest_checkpoint_file_size = os.path.getsize(latest_checkpoint_file)\n                latest_checkpoint_file_size_kb = latest_checkpoint_file_size / 1024\n                print('Latest checkpoint file size:', latest_checkpoint_file_size_kb, 'KB')\n                restore_time = time.time() - start_time\n                total_restore_time += restore_time\n                print('Time taken to restore checkpoint:', restore_time)\n                start_time = time.time()  # reset the start time\n                print('-' * 50)\n            except:\n                pass\n\n            # Get the inputs and labels\n            inputs, labels = data\n\n            # Zero the parameter gradients\n            optimizer.zero_grad()\n\n            # Forward + backward + optimize\n            outputs = model(inputs)\n            loss = criterion(outputs, labels)\n            loss.backward()\n            optimizer.step()\n\n            new_state_dict = model.state_dict()\n            new_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in new_state_dict.values()])  # convert each tensor to a numpy array and concatenate them\n            prev_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in prev_state_dict.values()])  # convert each tensor to a numpy array and concatenate them\n            \u03b4t = new_state_weights - prev_state_weights # Get delta\n            prev_state_dict = new_state_dict\n\n            # Save the checkpoint\n            compressed_data, encoder = lc.compress_data(\u03b4t, num_bits=num_bits, k=num_buckets)\n            save_start_time = time.time()  # record the start time\n            lc.save_checkpoint('checkpoint.pt', compressed_data, epoch, i, encoder)\n            save_time = time.time() - save_start_time  # calculate the time taken to save the checkpoint\n\n            # Print statistics\n            running_loss += loss.item()\n            if i % 1 == 0:    # print every 1 mini-batches\n                print('[Epoch: %d, Iteration: %5d] loss: %.3f' %\n                      (epoch + 1, i + 1, running_loss / 1))\n                running_loss = 0.0\n                print('Time taken to save checkpoint:', save_time)\n    ```\n4. Using LC-Checkpoint to restore model.\n```\nlast_epoch, last_iter = 30, 8\nmax_iter = 10\nrestored_model = restore_model(model(), last_epoch, last_iter, max_iter, init_save_dir)\n```\n\n## API Reference\n\n### `lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)`\n\nInitializes the LC-Checkpoint method with the given PyTorch model, optimizer, loss function, checkpoint directory, number of buckets, and number of bits.\n\n### `lc.compress_data(\u03b4t, num_bits=num_bits, k=num_buckets, treshold=True)`\n\nCompresses the model parameters and returns the compressed data.\n\n### `lc.decode_data(encoded)`\n\nDecodes the compressed data and returns the original model parameters.\n\n### `lc.save_checkpoint(filename, compressed_data, epoch, iteration)`\n\nSaves the compressed data to a file with the given filename, epoch, and iteration.\n\n### `lc.load_checkpoint(filename)`\n\nLoads the compressed data from a file with the given filename.\n\n### `lc.restore_model(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name)`\nLoads the compressed data into a pytorch model.\n\n### `lc.restore_model_async(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name, n_cores)`\nLoads the compressed data into a pytorch model asynchronously.\n\n### `lc.calculate_compression_rate(prev_state_dict, num_bits=num_bits, num_buckets=num_buckets)`\n\nCalculates the compression rate of the LC-Checkpoint method based on the previous state dictionary and the current number of bits and buckets.\n\n## License\n\nLC-Checkpoint is licensed under the MIT License. See the LICENSE file for more information.\n\n## Acknowledgements\n\nLC-Checkpoint is based on paper \"On Efficient Constructions of Checkpoints\" authored by Yu Chen, Zhenming Liu, Bin Ren, Xin Jin.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package for compressing PyTorch model checkpoints using the LC-Checkpoint method",
    "version": "0.5.2",
    "project_urls": {
        "Homepage": "https://pypi.org/project/lc-checkpoint/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e4ed7f77544087ff28b57b7942c732cbc546acfe610bff751d8f5305759d7656",
                "md5": "6db6bb887f9eaea047068b4901ab336c",
                "sha256": "a63e89a7e062a5aed28c8687446b919d366e8c7173939f04d4cfb29718082bdf"
            },
            "downloads": -1,
            "filename": "lc_checkpoint-0.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6db6bb887f9eaea047068b4901ab336c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6287,
            "upload_time": "2023-08-06T11:02:50",
            "upload_time_iso_8601": "2023-08-06T11:02:50.054787Z",
            "url": "https://files.pythonhosted.org/packages/e4/ed/7f77544087ff28b57b7942c732cbc546acfe610bff751d8f5305759d7656/lc_checkpoint-0.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fa0a7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80",
                "md5": "0379e5d642db51be2305afea6920a204",
                "sha256": "8621a2bdba48afc0725db1a485b51a8c4a1649d6b01031be670d9afa8a5d0a70"
            },
            "downloads": -1,
            "filename": "lc-checkpoint-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "0379e5d642db51be2305afea6920a204",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6761,
            "upload_time": "2023-08-06T11:02:51",
            "upload_time_iso_8601": "2023-08-06T11:02:51.742360Z",
            "url": "https://files.pythonhosted.org/packages/fa/0a/7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80/lc-checkpoint-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-06 11:02:51",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "lc-checkpoint"
}
        
Elapsed time: 0.09646s