# PyTorch Matrix Product Operators
A modern PyTorch implementation of Matrix Product Operators (MPO) for neural network compression, based on the paper "Compressing deep neural networks by matrix product operators" by Ze-Feng Gao et al.
## Overview
This library provides PyTorch implementations of tensor-train decomposed neural network layers that can significantly reduce the number of parameters in deep neural networks while maintaining accuracy.
## Features
- **TT-decomposed layers**: `TTLinear` and `TTConv2d` for compressed fully-connected and convolutional layers
- **Modern PyTorch**: Full compatibility with PyTorch 2.0+, type hints, device-agnostic
- **Pretrained model compression**: Convert existing PyTorch models to MPO format
- **Multiple architectures**: VGG-16/19, ResNet-18/34/50/101/152, and custom models
- **Automatic factorization**: Smart dimension factorization for optimal compression
- **Comprehensive examples**: MNIST, CIFAR-10, ImageNet training scripts
- **Analysis tools**: Compression ratio calculation, performance benchmarks
## Installation
```bash
# Clone the repository
git clone https://github.com/krzysztofwos/torch-mpo
cd torch-mpo
# Install with uv (recommended)
uv sync # Install base dependencies
uv sync --all-extras # Install with all extras (dev, docs)
# Or install with pip (development mode)
pip install -e .
pip install -e ".[dev]" # With development dependencies
```
## Quick Start
### Basic Usage
```python
import torch
from torch_mpo import TTLinear, TTConv2d
# Create a TT-decomposed linear layer
linear = TTLinear(
in_features=1024,
out_features=512,
tt_ranks=8, # Higher rank = better accuracy, more parameters
bias=True
)
# Create a TT-decomposed convolutional layer
conv = TTConv2d(
in_channels=128,
out_channels=256,
kernel_size=3,
padding=1,
tt_ranks=8
)
# Use them like standard PyTorch layers
x = torch.randn(32, 1024)
y = linear(x) # Shape: [32, 512]
x = torch.randn(32, 128, 32, 32)
y = conv(x) # Shape: [32, 256, 32, 32]
```
### Compress Existing Models
```python
from torch_mpo import compress_model
import torchvision.models as models
# Load a pretrained model
model = models.vgg16(pretrained=True)
# Compress it with MPO
compressed_model = compress_model(
model,
compression_ratio=0.1, # Target 10x compression
compress_linear=True, # Compress Linear layers
compress_conv=True, # Compress Conv2d layers
verbose=True
)
# Fine-tune the compressed model
optimizer = torch.optim.Adam(compressed_model.parameters(), lr=1e-4)
# ... continue with training
```
### Use Pre-built Architectures
```python
from torch_mpo.models import vgg16_mpo, resnet50_mpo
# VGG-16 with MPO compression
model = vgg16_mpo(
num_classes=10,
tt_ranks_conv=8, # TT-rank for conv layers
tt_ranks_fc=16, # TT-rank for FC layers
compress_conv=True,
compress_fc=True
)
# ResNet-50 with MPO compression
model = resnet50_mpo(
num_classes=1000,
tt_ranks_conv=16,
tt_ranks_fc=32,
use_mpo_conv=True,
use_mpo_fc=True
)
```
## Examples
The `examples/` directory contains complete training scripts:
### MNIST with LeNet-5 MPO
```bash
python examples/mnist_lenet5_mpo.py --tt-rank 8 --epochs 10
```
### CIFAR-10 with VGG-16 MPO
```bash
python examples/cifar10_vgg16_mpo.py --tt-rank-conv 8 --tt-rank-fc 16 --epochs 20
```
### ImageNet with ResNet-50 MPO
```bash
python examples/imagenet_resnet50_mpo.py /path/to/imagenet \
--tt-rank-conv 16 --tt-rank-fc 32 --epochs 90
```
### Compress Pretrained VGG
```bash
python examples/compress_vgg.py --model vgg16 --compression-ratio 0.1
```
## Performance Benchmarks
Run benchmarks to compare MPO layers with standard layers:
```bash
python benchmarks/benchmark_layers.py
```
### Typical Results
| Layer | Original Params | MPO Params (rank=8) | Compression | Speedup |
| ---------------------- | --------------- | ------------------- | ----------- | ------- |
| Linear(4096, 4096) | 16.8M | 655K | 25.6x | 0.8x |
| Conv2d(256, 512, 3) | 1.2M | 123K | 9.7x | 0.9x |
| VGG-16 (full model) | 138M | 15M | 9.2x | 0.85x |
| ResNet-50 (full model) | 25.6M | 8.2M | 3.1x | 0.95x |
## Documentation
See the comprehensive tutorial in `docs/mpo_tutorial.md` covering:
- Mathematical foundations of TT decomposition
- How MPO compression works
- Implementation details
- Best practices and tips
- Advanced topics
## Key Concepts
### TT-Ranks
The `tt_ranks` parameter controls the trade-off between compression and accuracy:
- **Lower ranks** (4-8): High compression, some accuracy loss
- **Medium ranks** (8-16): Good balance
- **Higher ranks** (16-32): Less compression, minimal accuracy loss
### Automatic Factorization
The library automatically factorizes dimensions for optimal compression:
```python
# 1024 = 4 × 16 × 16 (automatic factorization)
layer = TTLinear(1024, 512, tt_ranks=8)
```
### Custom Factorization
You can also specify custom factorizations:
```python
layer = TTLinear(
in_features=784, # 28×28 MNIST
out_features=256,
inp_modes=[7, 4, 7, 4], # 7×4×7×4 = 784
out_modes=[4, 4, 4, 4], # 4×4×4×4 = 256
tt_ranks=[1, 8, 8, 8, 1]
)
```
### Initialization and Numerical Stability
Proper initialization is crucial for TT-decomposed layers to maintain stable gradients during training:
#### TTLinear Initialization
- Uses standard Xavier/Kaiming initialization for each core
- No additional scaling needed as the decomposition naturally regularizes
#### TTConv2d Initialization
- More complex due to spatial convolution followed by TT cores
- **Key insight**: Variance accumulates through both spatial conv and TT cores
- **Solution**: TT cores are scaled by `1/d^0.25` where `d` is the number of cores
- This empirically maintains output variance similar to standard Conv2d layers
Without proper initialization scaling, deep networks can experience:
- **Exploding activations**: Outputs growing exponentially through layers
- **Vanishing gradients**: Making training impossible
- **Poor convergence**: Model stuck at random performance
The library handles this automatically, but when implementing custom layers, careful attention to initialization is essential.
## Contributing
Contributions are welcome. Please feel free to submit a Pull Request.
## Citation
If you use this code in your research, please cite both the original paper and this implementation:
### Original Paper
```bibtex
@article{gao2020compressing,
title={Compressing deep neural networks by matrix product operators},
author={Gao, Ze-Feng and Song, Chao and Wang, Lei and others},
journal={Physical Review Research},
volume={2},
number={2},
pages={023300},
year={2020}
}
```
### This Implementation
```bibtex
@software{torch-mpo2024,
title={torch-mpo: PyTorch Matrix Product Operators},
author={Woś, Krzysztof},
year={2024},
url={https://github.com/krzysztofwos/torch-mpo},
version={0.1.0}
}
```
## License
MIT License
Raw data
{
"_id": null,
"home_page": null,
"name": "torch-mpo",
"maintainer": "Krzysztof Wo\u015b",
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "deep-learning, model-compression, mpo, neural-networks, pytorch, tensor-decomposition, tensor-train",
"author": "Krzysztof Wo\u015b",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/20/0d/9481dfb00db5858c7f0df4ec6cab8659d4e4875bf660e3ea7a0c3084d596/torch_mpo-0.1.0.tar.gz",
"platform": null,
"description": "# PyTorch Matrix Product Operators\n\nA modern PyTorch implementation of Matrix Product Operators (MPO) for neural network compression, based on the paper \"Compressing deep neural networks by matrix product operators\" by Ze-Feng Gao et al.\n\n## Overview\n\nThis library provides PyTorch implementations of tensor-train decomposed neural network layers that can significantly reduce the number of parameters in deep neural networks while maintaining accuracy.\n\n## Features\n\n- **TT-decomposed layers**: `TTLinear` and `TTConv2d` for compressed fully-connected and convolutional layers\n- **Modern PyTorch**: Full compatibility with PyTorch 2.0+, type hints, device-agnostic\n- **Pretrained model compression**: Convert existing PyTorch models to MPO format\n- **Multiple architectures**: VGG-16/19, ResNet-18/34/50/101/152, and custom models\n- **Automatic factorization**: Smart dimension factorization for optimal compression\n- **Comprehensive examples**: MNIST, CIFAR-10, ImageNet training scripts\n- **Analysis tools**: Compression ratio calculation, performance benchmarks\n\n## Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/krzysztofwos/torch-mpo\ncd torch-mpo\n\n# Install with uv (recommended)\nuv sync # Install base dependencies\nuv sync --all-extras # Install with all extras (dev, docs)\n\n# Or install with pip (development mode)\npip install -e .\npip install -e \".[dev]\" # With development dependencies\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nimport torch\nfrom torch_mpo import TTLinear, TTConv2d\n\n# Create a TT-decomposed linear layer\nlinear = TTLinear(\n in_features=1024,\n out_features=512,\n tt_ranks=8, # Higher rank = better accuracy, more parameters\n bias=True\n)\n\n# Create a TT-decomposed convolutional layer\nconv = TTConv2d(\n in_channels=128,\n out_channels=256,\n kernel_size=3,\n padding=1,\n tt_ranks=8\n)\n\n# Use them like standard PyTorch layers\nx = torch.randn(32, 1024)\ny = linear(x) # Shape: [32, 512]\n\nx = torch.randn(32, 128, 32, 32)\ny = conv(x) # Shape: [32, 256, 32, 32]\n```\n\n### Compress Existing Models\n\n```python\nfrom torch_mpo import compress_model\nimport torchvision.models as models\n\n# Load a pretrained model\nmodel = models.vgg16(pretrained=True)\n\n# Compress it with MPO\ncompressed_model = compress_model(\n model,\n compression_ratio=0.1, # Target 10x compression\n compress_linear=True, # Compress Linear layers\n compress_conv=True, # Compress Conv2d layers\n verbose=True\n)\n\n# Fine-tune the compressed model\noptimizer = torch.optim.Adam(compressed_model.parameters(), lr=1e-4)\n# ... continue with training\n```\n\n### Use Pre-built Architectures\n\n```python\nfrom torch_mpo.models import vgg16_mpo, resnet50_mpo\n\n# VGG-16 with MPO compression\nmodel = vgg16_mpo(\n num_classes=10,\n tt_ranks_conv=8, # TT-rank for conv layers\n tt_ranks_fc=16, # TT-rank for FC layers\n compress_conv=True,\n compress_fc=True\n)\n\n# ResNet-50 with MPO compression\nmodel = resnet50_mpo(\n num_classes=1000,\n tt_ranks_conv=16,\n tt_ranks_fc=32,\n use_mpo_conv=True,\n use_mpo_fc=True\n)\n```\n\n## Examples\n\nThe `examples/` directory contains complete training scripts:\n\n### MNIST with LeNet-5 MPO\n\n```bash\npython examples/mnist_lenet5_mpo.py --tt-rank 8 --epochs 10\n```\n\n### CIFAR-10 with VGG-16 MPO\n\n```bash\npython examples/cifar10_vgg16_mpo.py --tt-rank-conv 8 --tt-rank-fc 16 --epochs 20\n```\n\n### ImageNet with ResNet-50 MPO\n\n```bash\npython examples/imagenet_resnet50_mpo.py /path/to/imagenet \\\n --tt-rank-conv 16 --tt-rank-fc 32 --epochs 90\n```\n\n### Compress Pretrained VGG\n\n```bash\npython examples/compress_vgg.py --model vgg16 --compression-ratio 0.1\n```\n\n## Performance Benchmarks\n\nRun benchmarks to compare MPO layers with standard layers:\n\n```bash\npython benchmarks/benchmark_layers.py\n```\n\n### Typical Results\n\n| Layer | Original Params | MPO Params (rank=8) | Compression | Speedup |\n| ---------------------- | --------------- | ------------------- | ----------- | ------- |\n| Linear(4096, 4096) | 16.8M | 655K | 25.6x | 0.8x |\n| Conv2d(256, 512, 3) | 1.2M | 123K | 9.7x | 0.9x |\n| VGG-16 (full model) | 138M | 15M | 9.2x | 0.85x |\n| ResNet-50 (full model) | 25.6M | 8.2M | 3.1x | 0.95x |\n\n## Documentation\n\nSee the comprehensive tutorial in `docs/mpo_tutorial.md` covering:\n\n- Mathematical foundations of TT decomposition\n- How MPO compression works\n- Implementation details\n- Best practices and tips\n- Advanced topics\n\n## Key Concepts\n\n### TT-Ranks\n\nThe `tt_ranks` parameter controls the trade-off between compression and accuracy:\n\n- **Lower ranks** (4-8): High compression, some accuracy loss\n- **Medium ranks** (8-16): Good balance\n- **Higher ranks** (16-32): Less compression, minimal accuracy loss\n\n### Automatic Factorization\n\nThe library automatically factorizes dimensions for optimal compression:\n\n```python\n# 1024 = 4 \u00d7 16 \u00d7 16 (automatic factorization)\nlayer = TTLinear(1024, 512, tt_ranks=8)\n```\n\n### Custom Factorization\n\nYou can also specify custom factorizations:\n\n```python\nlayer = TTLinear(\n in_features=784, # 28\u00d728 MNIST\n out_features=256,\n inp_modes=[7, 4, 7, 4], # 7\u00d74\u00d77\u00d74 = 784\n out_modes=[4, 4, 4, 4], # 4\u00d74\u00d74\u00d74 = 256\n tt_ranks=[1, 8, 8, 8, 1]\n)\n```\n\n### Initialization and Numerical Stability\n\nProper initialization is crucial for TT-decomposed layers to maintain stable gradients during training:\n\n#### TTLinear Initialization\n\n- Uses standard Xavier/Kaiming initialization for each core\n- No additional scaling needed as the decomposition naturally regularizes\n\n#### TTConv2d Initialization\n\n- More complex due to spatial convolution followed by TT cores\n- **Key insight**: Variance accumulates through both spatial conv and TT cores\n- **Solution**: TT cores are scaled by `1/d^0.25` where `d` is the number of cores\n- This empirically maintains output variance similar to standard Conv2d layers\n\nWithout proper initialization scaling, deep networks can experience:\n\n- **Exploding activations**: Outputs growing exponentially through layers\n- **Vanishing gradients**: Making training impossible\n- **Poor convergence**: Model stuck at random performance\n\nThe library handles this automatically, but when implementing custom layers, careful attention to initialization is essential.\n\n## Contributing\n\nContributions are welcome. Please feel free to submit a Pull Request.\n\n## Citation\n\nIf you use this code in your research, please cite both the original paper and this implementation:\n\n### Original Paper\n\n```bibtex\n@article{gao2020compressing,\n title={Compressing deep neural networks by matrix product operators},\n author={Gao, Ze-Feng and Song, Chao and Wang, Lei and others},\n journal={Physical Review Research},\n volume={2},\n number={2},\n pages={023300},\n year={2020}\n}\n```\n\n### This Implementation\n\n```bibtex\n@software{torch-mpo2024,\n title={torch-mpo: PyTorch Matrix Product Operators},\n author={Wo\u015b, Krzysztof},\n year={2024},\n url={https://github.com/krzysztofwos/torch-mpo},\n version={0.1.0}\n}\n```\n\n## License\n\nMIT License\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "PyTorch implementation of Matrix Product Operators for neural network compression",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/krzysztofwos/torch-mpo/blob/main/docs/tutorial.md",
"Homepage": "https://github.com/krzysztofwos/torch-mpo",
"Issues": "https://github.com/krzysztofwos/torch-mpo/issues",
"Repository": "https://github.com/krzysztofwos/torch-mpo"
},
"split_keywords": [
"deep-learning",
" model-compression",
" mpo",
" neural-networks",
" pytorch",
" tensor-decomposition",
" tensor-train"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f90d25b8afc7150a772731c8f14a515461832ac24f3f0774e2fa7c392f55a692",
"md5": "89ddfb2a52a2562e64bc1a4ca18d77c1",
"sha256": "adc22ae2a43f52e167b9bd05708cae1ecef2a67cfb3fd04558f9947e8c94e70e"
},
"downloads": -1,
"filename": "torch_mpo-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "89ddfb2a52a2562e64bc1a4ca18d77c1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 25715,
"upload_time": "2025-09-19T16:32:35",
"upload_time_iso_8601": "2025-09-19T16:32:35.916234Z",
"url": "https://files.pythonhosted.org/packages/f9/0d/25b8afc7150a772731c8f14a515461832ac24f3f0774e2fa7c392f55a692/torch_mpo-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "200d9481dfb00db5858c7f0df4ec6cab8659d4e4875bf660e3ea7a0c3084d596",
"md5": "3cb856cd301fa0a09473293c878c140b",
"sha256": "952a1d4554e015574a25120dd804d696ac50539a2576901ac430d657284ae14a"
},
"downloads": -1,
"filename": "torch_mpo-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "3cb856cd301fa0a09473293c878c140b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 141942,
"upload_time": "2025-09-19T16:32:37",
"upload_time_iso_8601": "2025-09-19T16:32:37.257271Z",
"url": "https://files.pythonhosted.org/packages/20/0d/9481dfb00db5858c7f0df4ec6cab8659d4e4875bf660e3ea7a0c3084d596/torch_mpo-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-19 16:32:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "krzysztofwos",
"github_project": "torch-mpo",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "torch-mpo"
}