# LC-Checkpoint
LC-Checkpoint is a Python package that implements the LC-Checkpoint method for compressing and checkpointing PyTorch models during training.
## Installation
You can install LC-Checkpoint using pip:
```
pip install lc_checkpoint
```
## Usage
To use LC-Checkpoint in your PyTorch training script, you can follow these steps:
1. Import the LC-Checkpoint module:
```
from lc_checkpoint import main as lc
```
2. Initialize the LC-Checkpoint method with your PyTorch model, optimizer, loss function, and other hyperparameters:
```
model = model()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()
checkpoint_dir = 'checkpoints/lc-checkpoint'
num_buckets = 5
num_bits = 32
lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)
```
3. Use the LC-Checkpoint method in your training loop:
```
# Save base model weights
init_save_dir = "checkpoints/initialstate.pt"
torch.save(model.state_dict(), init_save_dir)
prev_state_dict = model.state_dict()
for epoch in range(epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# Load the previous checkpoints if exist
try:
# Find the latest checkpoint file
lc_checkpoint_files = glob.glob(os.path.join('checkpoints/lc-checkpoint', 'lc_checkpoint_epoch*.pt'))
latest_checkpoint_file = max(lc_checkpoint_files, key=os.path.getctime)
prev_state_dict, epoch_loaded = lc.load_checkpoint(latest_checkpoint_file)
print('Restored latest checkpoint:', latest_checkpoint_file)
latest_checkpoint_file_size = os.path.getsize(latest_checkpoint_file)
latest_checkpoint_file_size_kb = latest_checkpoint_file_size / 1024
print('Latest checkpoint file size:', latest_checkpoint_file_size_kb, 'KB')
restore_time = time.time() - start_time
total_restore_time += restore_time
print('Time taken to restore checkpoint:', restore_time)
start_time = time.time() # reset the start time
print('-' * 50)
except:
pass
# Get the inputs and labels
inputs, labels = data
# Zero the parameter gradients
optimizer.zero_grad()
# Forward + backward + optimize
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
new_state_dict = model.state_dict()
new_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in new_state_dict.values()]) # convert each tensor to a numpy array and concatenate them
prev_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in prev_state_dict.values()]) # convert each tensor to a numpy array and concatenate them
δt = new_state_weights - prev_state_weights # Get delta
prev_state_dict = new_state_dict
# Save the checkpoint
compressed_data, encoder = lc.compress_data(δt, num_bits=num_bits, k=num_buckets)
save_start_time = time.time() # record the start time
lc.save_checkpoint('checkpoint.pt', compressed_data, epoch, i, encoder)
save_time = time.time() - save_start_time # calculate the time taken to save the checkpoint
# Print statistics
running_loss += loss.item()
if i % 1 == 0: # print every 1 mini-batches
print('[Epoch: %d, Iteration: %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 1))
running_loss = 0.0
print('Time taken to save checkpoint:', save_time)
```
4. Using LC-Checkpoint to restore model.
```
last_epoch, last_iter = 30, 8
max_iter = 10
restored_model = restore_model(model(), last_epoch, last_iter, max_iter, init_save_dir)
```
## API Reference
### `lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)`
Initializes the LC-Checkpoint method with the given PyTorch model, optimizer, loss function, checkpoint directory, number of buckets, and number of bits.
### `lc.compress_data(δt, num_bits=num_bits, k=num_buckets, treshold=True)`
Compresses the model parameters and returns the compressed data.
### `lc.decode_data(encoded)`
Decodes the compressed data and returns the original model parameters.
### `lc.save_checkpoint(filename, compressed_data, epoch, iteration)`
Saves the compressed data to a file with the given filename, epoch, and iteration.
### `lc.load_checkpoint(filename)`
Loads the compressed data from a file with the given filename.
### `lc.restore_model(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name)`
Loads the compressed data into a pytorch model.
### `lc.restore_model_async(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name, n_cores)`
Loads the compressed data into a pytorch model asynchronously.
### `lc.calculate_compression_rate(prev_state_dict, num_bits=num_bits, num_buckets=num_buckets)`
Calculates the compression rate of the LC-Checkpoint method based on the previous state dictionary and the current number of bits and buckets.
## License
LC-Checkpoint is licensed under the MIT License. See the LICENSE file for more information.
## Acknowledgements
LC-Checkpoint is based on paper "On Efficient Constructions of Checkpoints" authored by Yu Chen, Zhenming Liu, Bin Ren, Xin Jin.
Raw data
{
"_id": null,
"home_page": "https://pypi.org/project/lc-checkpoint/",
"name": "lc-checkpoint",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Dedy Van Hauten",
"author_email": "dedy.van@ui.ac.id",
"download_url": "https://files.pythonhosted.org/packages/fa/0a/7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80/lc-checkpoint-0.5.2.tar.gz",
"platform": null,
"description": "# LC-Checkpoint\n\nLC-Checkpoint is a Python package that implements the LC-Checkpoint method for compressing and checkpointing PyTorch models during training.\n\n## Installation\n\nYou can install LC-Checkpoint using pip:\n\n```\npip install lc_checkpoint\n```\n\n## Usage\n\nTo use LC-Checkpoint in your PyTorch training script, you can follow these steps:\n\n1. Import the LC-Checkpoint module:\n \n ```\n from lc_checkpoint import main as lc\n ```\n \n\n \n2. Initialize the LC-Checkpoint method with your PyTorch model, optimizer, loss function, and other hyperparameters:\n \n ```\n model = model()\n optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n criterion = nn.CrossEntropyLoss()\n checkpoint_dir = 'checkpoints/lc-checkpoint'\n num_buckets = 5\n num_bits = 32\n\n lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)\n ```\n \n \n3. Use the LC-Checkpoint method in your training loop:\n \n ```\n\n # Save base model weights\n init_save_dir = \"checkpoints/initialstate.pt\"\n torch.save(model.state_dict(), init_save_dir)\n prev_state_dict = model.state_dict()\n\n for epoch in range(epochs): # loop over the dataset multiple times\n\n running_loss = 0.0\n for i, data in enumerate(trainloader, 0):\n # Load the previous checkpoints if exist\n try:\n # Find the latest checkpoint file\n lc_checkpoint_files = glob.glob(os.path.join('checkpoints/lc-checkpoint', 'lc_checkpoint_epoch*.pt'))\n latest_checkpoint_file = max(lc_checkpoint_files, key=os.path.getctime)\n prev_state_dict, epoch_loaded = lc.load_checkpoint(latest_checkpoint_file)\n print('Restored latest checkpoint:', latest_checkpoint_file)\n latest_checkpoint_file_size = os.path.getsize(latest_checkpoint_file)\n latest_checkpoint_file_size_kb = latest_checkpoint_file_size / 1024\n print('Latest checkpoint file size:', latest_checkpoint_file_size_kb, 'KB')\n restore_time = time.time() - start_time\n total_restore_time += restore_time\n print('Time taken to restore checkpoint:', restore_time)\n start_time = time.time() # reset the start time\n print('-' * 50)\n except:\n pass\n\n # Get the inputs and labels\n inputs, labels = data\n\n # Zero the parameter gradients\n optimizer.zero_grad()\n\n # Forward + backward + optimize\n outputs = model(inputs)\n loss = criterion(outputs, labels)\n loss.backward()\n optimizer.step()\n\n new_state_dict = model.state_dict()\n new_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in new_state_dict.values()]) # convert each tensor to a numpy array and concatenate them\n prev_state_weights = np.concatenate([tensor.numpy().flatten() for tensor in prev_state_dict.values()]) # convert each tensor to a numpy array and concatenate them\n \u03b4t = new_state_weights - prev_state_weights # Get delta\n prev_state_dict = new_state_dict\n\n # Save the checkpoint\n compressed_data, encoder = lc.compress_data(\u03b4t, num_bits=num_bits, k=num_buckets)\n save_start_time = time.time() # record the start time\n lc.save_checkpoint('checkpoint.pt', compressed_data, epoch, i, encoder)\n save_time = time.time() - save_start_time # calculate the time taken to save the checkpoint\n\n # Print statistics\n running_loss += loss.item()\n if i % 1 == 0: # print every 1 mini-batches\n print('[Epoch: %d, Iteration: %5d] loss: %.3f' %\n (epoch + 1, i + 1, running_loss / 1))\n running_loss = 0.0\n print('Time taken to save checkpoint:', save_time)\n ```\n4. Using LC-Checkpoint to restore model.\n```\nlast_epoch, last_iter = 30, 8\nmax_iter = 10\nrestored_model = restore_model(model(), last_epoch, last_iter, max_iter, init_save_dir)\n```\n\n## API Reference\n\n### `lc.initialize(model, optimizer, criterion, checkpoint_dir, num_buckets, num_bits)`\n\nInitializes the LC-Checkpoint method with the given PyTorch model, optimizer, loss function, checkpoint directory, number of buckets, and number of bits.\n\n### `lc.compress_data(\u03b4t, num_bits=num_bits, k=num_buckets, treshold=True)`\n\nCompresses the model parameters and returns the compressed data.\n\n### `lc.decode_data(encoded)`\n\nDecodes the compressed data and returns the original model parameters.\n\n### `lc.save_checkpoint(filename, compressed_data, epoch, iteration)`\n\nSaves the compressed data to a file with the given filename, epoch, and iteration.\n\n### `lc.load_checkpoint(filename)`\n\nLoads the compressed data from a file with the given filename.\n\n### `lc.restore_model(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name)`\nLoads the compressed data into a pytorch model.\n\n### `lc.restore_model_async(model, last_epoch, last_iter, max_iter, init_filename, ckpt_name, n_cores)`\nLoads the compressed data into a pytorch model asynchronously.\n\n### `lc.calculate_compression_rate(prev_state_dict, num_bits=num_bits, num_buckets=num_buckets)`\n\nCalculates the compression rate of the LC-Checkpoint method based on the previous state dictionary and the current number of bits and buckets.\n\n## License\n\nLC-Checkpoint is licensed under the MIT License. See the LICENSE file for more information.\n\n## Acknowledgements\n\nLC-Checkpoint is based on paper \"On Efficient Constructions of Checkpoints\" authored by Yu Chen, Zhenming Liu, Bin Ren, Xin Jin.\n",
"bugtrack_url": null,
"license": "",
"summary": "A package for compressing PyTorch model checkpoints using the LC-Checkpoint method",
"version": "0.5.2",
"project_urls": {
"Homepage": "https://pypi.org/project/lc-checkpoint/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e4ed7f77544087ff28b57b7942c732cbc546acfe610bff751d8f5305759d7656",
"md5": "6db6bb887f9eaea047068b4901ab336c",
"sha256": "a63e89a7e062a5aed28c8687446b919d366e8c7173939f04d4cfb29718082bdf"
},
"downloads": -1,
"filename": "lc_checkpoint-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6db6bb887f9eaea047068b4901ab336c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 6287,
"upload_time": "2023-08-06T11:02:50",
"upload_time_iso_8601": "2023-08-06T11:02:50.054787Z",
"url": "https://files.pythonhosted.org/packages/e4/ed/7f77544087ff28b57b7942c732cbc546acfe610bff751d8f5305759d7656/lc_checkpoint-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fa0a7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80",
"md5": "0379e5d642db51be2305afea6920a204",
"sha256": "8621a2bdba48afc0725db1a485b51a8c4a1649d6b01031be670d9afa8a5d0a70"
},
"downloads": -1,
"filename": "lc-checkpoint-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "0379e5d642db51be2305afea6920a204",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6761,
"upload_time": "2023-08-06T11:02:51",
"upload_time_iso_8601": "2023-08-06T11:02:51.742360Z",
"url": "https://files.pythonhosted.org/packages/fa/0a/7ac86c51cc2ff188627404dfcbca58ce54fae9a37b883354e12d92136f80/lc-checkpoint-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-06 11:02:51",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "lc-checkpoint"
}