# TorchDiff
<div align="center">
<img src="imgs/logo_.png" alt="TorchDiff Logo" width="300"/>
</div>
<div align="center">
[](https://opensource.org/licenses/MIT)
[](https://pytorch.org/)
[](https://pypi.org/project/torchdiff/)
[](https://www.python.org/)
[](https://pepy.tech/project/torchdiff)
[](https://github.com/LoqmanSamani/TorchDiff)
[](https://github.com/LoqmanSamani/TorchDiff)
[](https://github.com/LoqmanSamani/TorchDiff/issues)
</div>
---
## 🔎 Overview
**TorchDiff** is a PyTorch-based library for building and experimenting with diffusion models, inspired by leading research papers.
The **TorchDiff 2.0.0** release includes implementations of five major diffusion model families:
- **DDPM** (Denoising Diffusion Probabilistic Models)
- **DDIM** (Denoising Diffusion Implicit Models)
- **SDE-based Diffusion**
- **LDM** (Latent Diffusion Models)
- **UnCLIP** (the model powering OpenAI’s *DALL·E 2*)
These models support both **conditional** (e.g., text-to-image) and **unconditional** generation.
<div align="center">
<img src="imgs/mount.png" alt="Diffusion Model Process" width="1000"/>
<br>
<em>Image generated using Sora</em>
<br><br>
</div>
TorchDiff is designed with **modularity** in mind. Each model is broken down into reusable components:
- **Forward Diffusion**: Adds noise (e.g., `ForwardDDPM`).
- **Reverse Diffusion**: Removes noise to recover data (e.g., `ReverseDDPM`).
- **Variance Scheduler**: Controls noise schedules (e.g., `VarianceSchedulerDDPM`).
- **Training**: Full training pipelines (e.g., `TrainDDPM`).
- **Sampling**: Efficient inference and generation (e.g., `SampleDDPM`).
Additional utilities:
- **Noise Predictor**: A U-Net-like model with attention and time embeddings.
- **Text Encoder**: Transformer-based (e.g., BERT) for conditional generation.
- **Metrics**: Evaluation suite including MSE, PSNR, SSIM, FID, and LPIPS.
---
## ⚡ Quick Start
Here’s a minimal working example to train and sample with **DDPM** on dummy data:
```python
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torchdiff.ddpm import VarianceSchedulerDDPM, ForwardDDPM, ReverseDDPM, TrainDDPM, SampleDDPM
from torchdiff.utils import NoisePredictor
# Dataset (CIFAR10 for demo)
transform = transforms.Compose([
transforms.Resize(32),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# Model components
noise_pred = NoisePredictor(in_channels=3)
vs = VarianceSchedulerDDPM(num_steps=1000)
fwd, rev = ForwardDDPM(vs), ReverseDDPM(vs)
# Optimizer & loss
optim = torch.optim.Adam(noise_pred.parameters(), lr=1e-4)
loss_fn = nn.MSELoss()
# Training
trainer = TrainDDPM(
noise_predictor=noise_pred, forward_diffusion=fwd, reverse_diffusion=rev,
conditional_model=None, optimizer=optim, objective=loss_fn,
data_loader=train_loader, max_epochs=1, device="cpu"
)
trainer()
# Sampling
sampler = SampleDDPM(reverse_diffusion=rev, noise_predictor=noise_pred,
image_shape=(32, 32), batch_size=4, in_channels=3, device="cpu")
images = sampler()
print("Generated images shape:", images.shape)
```
For detailed examples, check the [examples/](https://github.com/LoqmanSamani/TorchDiff/tree/systembiology/examples) directory.
---
## 📚 Resources
- 🌐 [Project Website](https://loqmansamani.github.io/torchdiff/)
- 📖 [API Reference](https://torchdiff.readthedocs.io/en/latest/index.html)
---
## ⚡ Installation
Install from **PyPI (recommended):**
```bash
pip install torchdiff
```
Or install from source for development:
```bash
# Clone repository
git clone https://github.com/LoqmanSamani/TorchDiff.git
cd TorchDiff
# Install dependencies
pip install -r requirements.txt
# Install package
pip install .
```
> Requires **Python 3.8+**. For GPU acceleration, ensure PyTorch is installed with the correct CUDA version.
---
## 🧩 Implemented Models
### 1. Denoising Diffusion Probabilistic Models (DDPM)
**Paper**: [Ho et al., 2020](https://arxiv.org/abs/2006.11239)
DDPMs learn to reverse a gradual noise-adding process to generate high-quality images. TorchDiff provides a modular implementation for both unconditional and conditional (text-guided) generation.
📓 [DDPM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ddpm.ipynb)
---
### 2. Denoising Diffusion Implicit Models (DDIM)
**Paper**: [Song et al., 2021](https://arxiv.org/abs/2010.02502)
DDIM accelerates sampling by reducing the number of denoising steps while maintaining image quality. TorchDiff supports both conditional and unconditional DDIM generation.
📓 [DDIM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ddim.ipynb)
---
### 3. Score-Based Generative Models via Stochastic Differential Equations (SDE)
**Paper**: [Song et al., 2021](https://arxiv.org/abs/2011.13456)
SDE-based models generalize diffusion via stochastic processes, supporting multiple formulations: **VE, VP, sub-VP**, and deterministic **ODE** variants. TorchDiff includes full training and sampling pipelines for both conditional and unconditional use cases.
📓 [SDE Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/sde.ipynb)
---
### 4. Latent Diffusion Models (LDM)
**Paper**: [Rombach et al., 2022](https://arxiv.org/abs/2112.10752)
LDMs operate in a compressed latent space using a VAE, enabling **efficient high-resolution image synthesis** with reduced computational cost. TorchDiff supports using DDPM, DDIM, or SDE as the diffusion backbone in latent space.
📓 [LDM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ldm.ipynb)
---
### 5. UnCLIP (Hierarchical Text-Conditional Image Generation with CLIP Latents)
**Paper**: [Ramesh et al., 2022](https://arxiv.org/abs/2204.06125)
UnCLIP, the architecture behind *DALL·E 2*, leverages **CLIP latents** to enable hierarchical text-to-image generation. It first maps text into CLIP’s multimodal embedding space, then performs diffusion-based generation in that space, followed by refinement in pixel space.
Training UnCLIP is significantly more complex than other diffusion families, and thus a minimal example is not shown here.
📓 [UnCLIP Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/unclip.ipynb)
---
## 🔐 License
Released under the [MIT License](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/LICENSE).
---
## 🚧 Roadmap / Future Work
TorchDiff is under active development. Planned features include:
- 🧠 New diffusion variants and improved training algorithms.
- ⚡ Faster and more memory-efficient sampling.
- 🎯 Additional utilities to simplify experimentation.
---
## 🤝 Contributing
Contributions are welcome!
- Open an [Issue](../../issues) to report bugs or request features.
- Submit a PR with improvements or new features.
Your feedback helps make TorchDiff better for the community.
---
## 📖 Citation
If you use **TorchDiff** in your research or project, please cite the original papers and this repository.
### Core Diffusion Papers
```bibtex
@article{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
journal={Advances in Neural Information Processing Systems},
year={2020}
}
@article{song2021denoising,
title={Denoising Diffusion Implicit Models},
author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
journal={International Conference on Learning Representations (ICLR)},
year={2021}
}
@article{song2021score,
title={Score-Based Generative Modeling through Stochastic Differential Equations},
author={Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben},
journal={International Conference on Learning Representations (ICLR)},
year={2021}
}
@article{rombach2022high,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
@article{ramesh2022hierarchical,
title={Hierarchical Text-Conditional Image Generation with CLIP Latents},
author={Ramesh, Aditya and Pavlov, Mikhail and Goh, Gabriel and Gray, Scott and Voss, Chelsea and Radford, Alec and Chen, Mark and Sutskever, Ilya},
journal={arXiv preprint arXiv:2204.06125},
year={2022}
}
```
### TorchDiff Repository
```bibtex
@misc{torchdiff2025,
author = {Samani, Loghman},
title = {TorchDiff: A Modular Diffusion Modeling Library in PyTorch},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LoqmanSamani/TorchDiff}},
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/LoqmanSamani/TorchDiff",
"name": "TorchDiff",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "diffusion models, pytorch, machine learning, deep learning",
"author": "Loghman Samani",
"author_email": "samaniloqman91@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/16/a4/efa4eb30e68a1adb4250b535e0f68b1ab1dfb7c48b63119fa04074f80e19/torchdiff-2.0.0.tar.gz",
"platform": null,
"description": "# TorchDiff\n\n<div align=\"center\">\n <img src=\"imgs/logo_.png\" alt=\"TorchDiff Logo\" width=\"300\"/>\n</div>\n\n<div align=\"center\">\n\n[](https://opensource.org/licenses/MIT)\n[](https://pytorch.org/)\n[](https://pypi.org/project/torchdiff/)\n[](https://www.python.org/)\n[](https://pepy.tech/project/torchdiff)\n[](https://github.com/LoqmanSamani/TorchDiff)\n[](https://github.com/LoqmanSamani/TorchDiff)\n[](https://github.com/LoqmanSamani/TorchDiff/issues)\n\n</div>\n\n---\n\n## \ud83d\udd0e Overview \n\n**TorchDiff** is a PyTorch-based library for building and experimenting with diffusion models, inspired by leading research papers. \n\nThe **TorchDiff 2.0.0** release includes implementations of five major diffusion model families: \n- **DDPM** (Denoising Diffusion Probabilistic Models) \n- **DDIM** (Denoising Diffusion Implicit Models) \n- **SDE-based Diffusion** \n- **LDM** (Latent Diffusion Models) \n- **UnCLIP** (the model powering OpenAI\u2019s *DALL\u00b7E 2*) \n\nThese models support both **conditional** (e.g., text-to-image) and **unconditional** generation. \n\n<div align=\"center\">\n <img src=\"imgs/mount.png\" alt=\"Diffusion Model Process\" width=\"1000\"/>\n <br>\n <em>Image generated using Sora</em>\n <br><br>\n</div>\n\nTorchDiff is designed with **modularity** in mind. Each model is broken down into reusable components: \n- **Forward Diffusion**: Adds noise (e.g., `ForwardDDPM`). \n- **Reverse Diffusion**: Removes noise to recover data (e.g., `ReverseDDPM`). \n- **Variance Scheduler**: Controls noise schedules (e.g., `VarianceSchedulerDDPM`). \n- **Training**: Full training pipelines (e.g., `TrainDDPM`). \n- **Sampling**: Efficient inference and generation (e.g., `SampleDDPM`). \n\nAdditional utilities: \n- **Noise Predictor**: A U-Net-like model with attention and time embeddings. \n- **Text Encoder**: Transformer-based (e.g., BERT) for conditional generation. \n- **Metrics**: Evaluation suite including MSE, PSNR, SSIM, FID, and LPIPS. \n\n---\n\n## \u26a1 Quick Start \n\nHere\u2019s a minimal working example to train and sample with **DDPM** on dummy data: \n\n```python\nimport torch\nimport torch.nn as nn\nfrom torchvision import datasets, transforms\nfrom torch.utils.data import DataLoader\n\nfrom torchdiff.ddpm import VarianceSchedulerDDPM, ForwardDDPM, ReverseDDPM, TrainDDPM, SampleDDPM\nfrom torchdiff.utils import NoisePredictor\n\n# Dataset (CIFAR10 for demo)\ntransform = transforms.Compose([\n transforms.Resize(32),\n transforms.ToTensor(),\n transforms.Normalize((0.5,), (0.5,))\n])\ntrain_dataset = datasets.CIFAR10(root=\"./data\", train=True, download=True, transform=transform)\ntrain_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n\n# Model components\nnoise_pred = NoisePredictor(in_channels=3)\nvs = VarianceSchedulerDDPM(num_steps=1000)\nfwd, rev = ForwardDDPM(vs), ReverseDDPM(vs)\n\n# Optimizer & loss\noptim = torch.optim.Adam(noise_pred.parameters(), lr=1e-4)\nloss_fn = nn.MSELoss()\n\n# Training\ntrainer = TrainDDPM(\n noise_predictor=noise_pred, forward_diffusion=fwd, reverse_diffusion=rev,\n conditional_model=None, optimizer=optim, objective=loss_fn,\n data_loader=train_loader, max_epochs=1, device=\"cpu\"\n)\ntrainer()\n\n# Sampling\nsampler = SampleDDPM(reverse_diffusion=rev, noise_predictor=noise_pred,\n image_shape=(32, 32), batch_size=4, in_channels=3, device=\"cpu\")\nimages = sampler()\nprint(\"Generated images shape:\", images.shape)\n```\n\nFor detailed examples, check the [examples/](https://github.com/LoqmanSamani/TorchDiff/tree/systembiology/examples) directory. \n\n---\n\n## \ud83d\udcda Resources \n- \ud83c\udf10 [Project Website](https://loqmansamani.github.io/torchdiff/) \n- \ud83d\udcd6 [API Reference](https://torchdiff.readthedocs.io/en/latest/index.html) \n\n---\n\n## \u26a1 Installation \n\nInstall from **PyPI (recommended):**\n```bash\npip install torchdiff\n```\n\nOr install from source for development: \n```bash\n# Clone repository\ngit clone https://github.com/LoqmanSamani/TorchDiff.git\ncd TorchDiff\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install package\npip install .\n```\n\n> Requires **Python 3.8+**. For GPU acceleration, ensure PyTorch is installed with the correct CUDA version. \n\n---\n\n## \ud83e\udde9 Implemented Models \n\n### 1. Denoising Diffusion Probabilistic Models (DDPM) \n**Paper**: [Ho et al., 2020](https://arxiv.org/abs/2006.11239) \n\nDDPMs learn to reverse a gradual noise-adding process to generate high-quality images. TorchDiff provides a modular implementation for both unconditional and conditional (text-guided) generation. \n\n\ud83d\udcd3 [DDPM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ddpm.ipynb) \n\n---\n\n### 2. Denoising Diffusion Implicit Models (DDIM) \n**Paper**: [Song et al., 2021](https://arxiv.org/abs/2010.02502) \n\nDDIM accelerates sampling by reducing the number of denoising steps while maintaining image quality. TorchDiff supports both conditional and unconditional DDIM generation. \n\n\ud83d\udcd3 [DDIM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ddim.ipynb) \n\n---\n\n### 3. Score-Based Generative Models via Stochastic Differential Equations (SDE) \n**Paper**: [Song et al., 2021](https://arxiv.org/abs/2011.13456) \n\nSDE-based models generalize diffusion via stochastic processes, supporting multiple formulations: **VE, VP, sub-VP**, and deterministic **ODE** variants. TorchDiff includes full training and sampling pipelines for both conditional and unconditional use cases. \n\n\ud83d\udcd3 [SDE Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/sde.ipynb) \n\n---\n\n### 4. Latent Diffusion Models (LDM) \n**Paper**: [Rombach et al., 2022](https://arxiv.org/abs/2112.10752) \n\nLDMs operate in a compressed latent space using a VAE, enabling **efficient high-resolution image synthesis** with reduced computational cost. TorchDiff supports using DDPM, DDIM, or SDE as the diffusion backbone in latent space. \n\n\ud83d\udcd3 [LDM Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/ldm.ipynb) \n\n---\n\n### 5. UnCLIP (Hierarchical Text-Conditional Image Generation with CLIP Latents) \n**Paper**: [Ramesh et al., 2022](https://arxiv.org/abs/2204.06125) \n\nUnCLIP, the architecture behind *DALL\u00b7E 2*, leverages **CLIP latents** to enable hierarchical text-to-image generation. It first maps text into CLIP\u2019s multimodal embedding space, then performs diffusion-based generation in that space, followed by refinement in pixel space. \n\nTraining UnCLIP is significantly more complex than other diffusion families, and thus a minimal example is not shown here. \n\n\ud83d\udcd3 [UnCLIP Example Notebook](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/examples/unclip.ipynb) \n\n---\n\n## \ud83d\udd10 License \nReleased under the [MIT License](https://github.com/LoqmanSamani/TorchDiff/blob/systembiology/LICENSE). \n\n---\n\n## \ud83d\udea7 Roadmap / Future Work \nTorchDiff is under active development. Planned features include: \n- \ud83e\udde0 New diffusion variants and improved training algorithms. \n- \u26a1 Faster and more memory-efficient sampling. \n- \ud83c\udfaf Additional utilities to simplify experimentation. \n\n---\n\n## \ud83e\udd1d Contributing \nContributions are welcome! \n\n- Open an [Issue](../../issues) to report bugs or request features. \n- Submit a PR with improvements or new features. \n\nYour feedback helps make TorchDiff better for the community. \n\n\n---\n\n## \ud83d\udcd6 Citation \n\nIf you use **TorchDiff** in your research or project, please cite the original papers and this repository. \n\n### Core Diffusion Papers \n\n```bibtex\n@article{ho2020denoising,\n title={Denoising Diffusion Probabilistic Models},\n author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},\n journal={Advances in Neural Information Processing Systems},\n year={2020}\n}\n\n@article{song2021denoising,\n title={Denoising Diffusion Implicit Models},\n author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},\n journal={International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n\n@article{song2021score,\n title={Score-Based Generative Modeling through Stochastic Differential Equations},\n author={Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben},\n journal={International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n\n@article{rombach2022high,\n title={High-Resolution Image Synthesis with Latent Diffusion Models},\n author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\u00f6rn},\n journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n year={2022}\n}\n\n@article{ramesh2022hierarchical,\n title={Hierarchical Text-Conditional Image Generation with CLIP Latents},\n author={Ramesh, Aditya and Pavlov, Mikhail and Goh, Gabriel and Gray, Scott and Voss, Chelsea and Radford, Alec and Chen, Mark and Sutskever, Ilya},\n journal={arXiv preprint arXiv:2204.06125},\n year={2022}\n}\n```\n\n### TorchDiff Repository \n\n```bibtex\n@misc{torchdiff2025,\n author = {Samani, Loghman},\n title = {TorchDiff: A Modular Diffusion Modeling Library in PyTorch},\n year = {2025},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/LoqmanSamani/TorchDiff}},\n}\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A PyTorch-based library for diffusion models",
"version": "2.0.0",
"project_urls": {
"Documentation": "https://torchdiff.readthedocs.io",
"Homepage": "https://loqmansamani.github.io/torchdiff",
"Source": "https://github.com/LoqmanSamani/TorchDiff"
},
"split_keywords": [
"diffusion models",
" pytorch",
" machine learning",
" deep learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6a557701748db332680dfc81df250895624eb234062a05d876ae78e851102a1f",
"md5": "d1f9e68d917b6c98cda643e2013de9dd",
"sha256": "5b89f0e782f20e21152c78cb9c6d9406d7fc080bd5f28b958289ed6719cf84e3"
},
"downloads": -1,
"filename": "torchdiff-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d1f9e68d917b6c98cda643e2013de9dd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 292183,
"upload_time": "2025-08-26T00:28:26",
"upload_time_iso_8601": "2025-08-26T00:28:26.522208Z",
"url": "https://files.pythonhosted.org/packages/6a/55/7701748db332680dfc81df250895624eb234062a05d876ae78e851102a1f/torchdiff-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "16a4efa4eb30e68a1adb4250b535e0f68b1ab1dfb7c48b63119fa04074f80e19",
"md5": "d947827da50f2d1a2c96b79dba2b5b99",
"sha256": "ce20889dc0a35c1ff80b5769d9a004bab0a0773c0da96ce43059d267612a8ccf"
},
"downloads": -1,
"filename": "torchdiff-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "d947827da50f2d1a2c96b79dba2b5b99",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 255976,
"upload_time": "2025-08-26T00:28:28",
"upload_time_iso_8601": "2025-08-26T00:28:28.862503Z",
"url": "https://files.pythonhosted.org/packages/16/a4/efa4eb30e68a1adb4250b535e0f68b1ab1dfb7c48b63119fa04074f80e19/torchdiff-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 00:28:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LoqmanSamani",
"github_project": "TorchDiff",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "lpips",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "pytorch-fid",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.7.0"
]
]
},
{
"name": "torchvision",
"specs": [
[
"==",
"0.22.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.51.3"
]
]
}
],
"lcname": "torchdiff"
}