lightning-thunder

Name	lightning-thunder JSON
Version	0.2.6 JSON
	download
home_page	None
Summary	Lightning Thunder is a source-to-source compiler for PyTorch, enabling PyTorch programs to run on different hardware accelerators and graph compilers.
upload_time	2025-10-22 04:54:03
maintainer	None
docs_url	None
author	None
requires_python	<3.14,>=3.10
license	None
keywords	deep learning ai compiler
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align='center'>

# Give your PyTorch models superpowers ⚡

</div>

<div align="center">
<img alt="Thunder" src="https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/LightningThunderLightModewByline.png#gh-light-mode-only" width="400px" style="max-width: 100%;">
<img alt="Thunder" src="https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/LightningThunderDarkModewByline.png#gh-dark-mode-only" width="400px" style="max-width: 100%;">
<br/>
<br/>

&#160;

<strong>Source-to-source compiler for PyTorch.</strong>
Understandable. Inspectable. Extensible.

</div>

<div align='center'>

<pre>
✅ Run PyTorch 40% faster   ✅ Quantization                ✅ Kernel fusion        
✅ Training recipes         ✅ FP4/FP6/FP8 precision       ✅ Distributed TP/PP/DP 
✅ Inference recipes        ✅ Ready for NVIDIA Blackwell  ✅ CUDA Graphs          
✅ LLMs, non LLMs and more  ✅ Custom Triton kernels       ✅ Compose all the above
</pre>

</div>

Thunder is a source-to-source deep learning compiler for PyTorch that focuses on making it simple to optimize models for training and inference.

It provides:

- a simple, Pythonic IR capturing the entire computation
- a rich system of transforms that simultaneously operate on the computation IR, the model, and the weights
- an extensible dispatch mechanism to fusers and optimized kernel libraries

With Thunder you can:

- profile deep learning programs easily, map individual ops to kernels and inspect programs interactively
- programmatically replace sequences of operations with optimized ones and see the effect on performance
- acquire full computation graphs without graph breaks by flexibly extending the interpreter
- modify programs to fully utilize bleeding edge kernel libraries on specific hardware
- write models for single GPU and transform them to run distributed
- quickly iterate on mixed precision and quantization strategies to search for combinations that minimally affect quality
- bundle all optimizations in composable recipes, so they can be ported across model families

Ultimately, you should think about Thunder as a highly efficient tool to go from “unoptimized” to “optimized”.

If that is of interest for you, read on to Install Thunder and get started quickly.

<div align='center'>

[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/lightning-thunder/blob/main/LICENSE)
[![CI testing](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml)
[![General checks](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml)
[![Documentation Status](https://readthedocs.org/projects/lightning-thunder/badge/?version=latest)](https://lightning-thunder.readthedocs.io/en/latest/?badge=latest)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/Lightning-AI/lightning-thunder/main.svg)](https://results.pre-commit.ci/latest/github/Lightning-AI/lightning-thunder/main)

</div>

<div align="center">
  <div style="text-align: center;">
    <a target="_blank" href="#quick-start" style="margin: 0 10px;">Quick start</a> •
    <a target="_blank" href="#examples" style="margin: 0 10px;">Examples</a> •
    <a target="_blank" href="#performance" style="margin: 0 10px;">Performance</a> •
    <!-- <a target="_blank" href="#hosting-options" style="margin: 0 10px;">Hosting</a> • -->
    <a target="_blank" href="https://lightning.ai/docs/thunder/latest/" style="margin: 0 10px;">Docs</a>
  </div>
</div>

&#160;

<!--
<div align="center">
<a target="_blank" href="https://lightning.ai/docs/thunder/home/get-started">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>
</div>
-->

&#160;

<div align="center">
<img alt="Thunder" src="https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/pretrain_perf.png" width="800px" style="max-width: 100%;">
</div>

# Quick start

Install Thunder via pip ([more options](https://lightning.ai/docs/thunder/latest/fundamentals/installation.html)):

```bash
pip install lightning-thunder

pip install -U torch torchvision
pip install nvfuser-cu128-torch28 nvidia-cudnn-frontend  # if NVIDIA GPU is present
```

<details>
  <summary>For older versions of <code>torch</code></summary>

<code>torch==2.7</code> + CUDA 12.8

```bash
pip install lightning-thunder

pip install torch==2.7.0 torchvision==0.22
pip install nvfuser-cu128-torch27 nvidia-cudnn-frontend  # if NVIDIA GPU is present
```

<code>torch==2.6</code> + CUDA 12.6

```bash
pip install lightning-thunder

pip install torch==2.6.0 torchvision==0.21
pip install nvfuser-cu126-torch26 nvidia-cudnn-frontend  # if NVIDIA GPU is present
```

<code>torch==2.5</code> + CUDA 12.4

```bash
pip install lightning-thunder

pip install torch==2.5.0 torchvision==0.20
pip install nvfuser-cu124-torch25 nvidia-cudnn-frontend  # if NVIDIA GPU is present
```

</details>

<details>
  <summary>Advanced install options</summary>

### Install optional executors

```bash
# Float8 support (this will compile from source, be patient)
pip install "transformer_engine[pytorch]"
```

### Install Thunder bleeding edge

```bash
pip install git+https://github.com/Lightning-AI/lightning-thunder.git@main
```

### Install Thunder for development

```bash
git clone https://github.com/Lightning-AI/lightning-thunder.git
cd lightning-thunder
pip install -e .
```

</details>

### Hello world

Define a function or a torch module:

```python
import torch.nn as nn

model = nn.Sequential(nn.Linear(2048, 4096), nn.ReLU(), nn.Linear(4096, 64))
```

Optimize it with Thunder:

```python
import thunder
import torch

thunder_model = thunder.compile(model)

x = torch.randn(64, 2048)

y = thunder_model(x)

torch.testing.assert_close(y, model(x))
```

## Examples

### LLM training

Install LitGPT (without updating other dependencies)

```
pip install --no-deps 'litgpt[all]'
```

and run

```python
import thunder
import torch
import litgpt

with torch.device("cuda"):
    model = litgpt.GPT.from_name("Llama-3.2-1B").to(torch.bfloat16)

thunder_model = thunder.compile(model)

inp = torch.ones((1, 2048), device="cuda", dtype=torch.int64)

out = thunder_model(inp)
out.sum().backward()
```

### HuggingFace BERT inference

Install Hugging Face Transformers (recommended version is `4.50.2` and above)

```
pip install -U transformers
```

and run

```python
import thunder
import torch
import transformers

model_name = "bert-large-uncased"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

with torch.device("cuda"):
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16
    )
    model.requires_grad_(False)
    model.eval()

    inp = tokenizer(["Hello world!"], return_tensors="pt")

thunder_model = thunder.compile(model)

out = thunder_model(**inp)
print(out)
```

### HuggingFace DeepSeek R1 distill inference

Install Hugging Face Transformers (recommended version is `4.50.2` and above)

```
pip install -U transformers
```

and run

```python
import torch
import transformers
import thunder

model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

with torch.device("cuda"):
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16
    )
    model.requires_grad_(False)
    model.eval()

    inp = tokenizer(["Hello world! Here's a long story"], return_tensors="pt")

thunder_model = thunder.compile(model)

out = thunder_model.generate(
    **inp, do_sample=False, cache_implementation="static", max_new_tokens=100
)
print(out)
```

### Vision Transformer inference

```python
import thunder
import torch
import torchvision as tv

with torch.device("cuda"):
    model = tv.models.vit_b_16()
    model.requires_grad_(False)
    model.eval()

    inp = torch.randn(128, 3, 224, 224)

out = model(inp)

thunder_model = thunder.compile(model)

out = thunder_model(inp)
```

### Benchmarks

Although is Thunder a tool for optimizing models, rather than an opaque compiler that gets you speedups out of the box, here is a set of benchmarks.

Perf-wise, out of the box Thunder is in the ballpark of torch compile, especially when using CUDAGraphs. Note however that Thunder is not a competitor to torch compile! It can actually use torch compile as one of its fusion executors.

The script `examples/quickstart/hf_llm.py` demonstrates how to benchmark a model for text generation, forward pass, forward pass with loss, and a full forward + backward computation.

On an H100 with torch=2.8.0 and nvfuser-cu128-torch28 and Transformers 4.55.4 running Llama 3.2 1B we see the following timings:

```
Transformers with torch.compile and CUDAGraphs (reduce-overhead mode):  521ms
Transformers with torch.compile but no CUDAGraphs (default mode):       814ms
Transformers without torch.compile:                                    1493ms
Thunder with CUDAGraphs:                                                542ms
```

## Plugins

Plugins are a way to apply optimizations to a model, such as parallelism and quantization.

Thunder comes with a few plugins included of the box, but it's easy to write new ones.

- scale up with distributed strategies with DDP, FSDP, TP ()
- optimize numerical precision with FP8, MXFP8
- save memory with quantization
- reduce latency with CUDAGraphs
- debugging and profiling

For example, in order to reduce CPU overheads via CUDAGraphs you can add "reduce-overhead"
to the `plugins=` argument of `thunder.compile`:

```python
thunder_model = thunder.compile(model, plugins="reduce-overhead")
```

This may or may not make a big difference. The point of Thunder is that you can easily
swap optimizations in and out and explore the best combination for your setup.

## How it works

Thunder works in three stages:

1. ⚡️ It acquires your model by interpreting Python bytecode and producing a straight-line Python program

1. ️⚡️ It transforms the model and computation trace to make it distributed, change precision

1. ⚡️ It routes parts of the trace for execution

   - fusion (`NVFuser`, `torch.compile`)
   - specialized libraries (e.g. `cuDNN SDPA`, `TransformerEngine`)
   - custom Triton and CUDA kernels
   - PyTorch eager operations

&#160;

<div align="center">
<img alt="Thunder" src="https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/how_it_works.png" width="800px" style="max-width: 100%;">
</div>

&#160;

This is how the trace looks like for a simple MLP:

```python
import thunder
import torch
import torch.nn as nn

model = nn.Sequential(nn.Linear(1024, 2048), nn.ReLU(), nn.Linear(2048, 256))

thunder_model = thunder.compile(model)
y = thunder_model(torch.randn(4, 1024))

print(thunder.last_traces(thunder_model)[-1])
```

This is the acquired trace, ready to be transformed and executed:

```python
def computation(input, t_0_bias, t_0_weight, t_2_bias, t_2_weight):
# input: "cuda:0 f32[4, 1024]"
# t_0_bias: "cuda:0 f32[2048]"
# t_0_weight: "cuda:0 f32[2048, 1024]"
# t_2_bias: "cuda:0 f32[256]"
# t_2_weight: "cuda:0 f32[256, 2048]"
t3 = ltorch.linear(input, t_0_weight, t_0_bias) # t3: "cuda:0 f32[4, 2048]"
t6 = ltorch.relu(t3, False) # t6: "cuda:0 f32[4, 2048]"
t10 = ltorch.linear(t6, t_2_weight, t_2_bias) # t10: "cuda:0 f32[4, 256]"
return (t10,)
```

Note how Thunder's intermediate representation is just (a subset of) Python!

## Performance

Thunder is fast. Here are the speed-ups obtained on a pre-training task using LitGPT on H100 and B200 hardware, relative to PyTorch eager.

<div align="center">
<img alt="Thunder" src="https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/pretrain_perf.png" width="800px" style="max-width: 100%;">
</div>

# Community

Thunder is an open source project, developed in collaboration with the community with significant contributions from NVIDIA.

💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lightning-thunder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.10",
    "maintainer_email": null,
    "keywords": "deep learning, AI, compiler",
    "author": null,
    "author_email": "Lightning AI <support@lightning.ai>",
    "download_url": "https://files.pythonhosted.org/packages/0a/c7/22602c16fd65e4ed4c4e02031d4cceed2088e2987546c80ea112abd6b550/lightning_thunder-0.2.6.tar.gz",
    "platform": null,
    "description": "<div align='center'>\n\n# Give your PyTorch models superpowers \u26a1\n\n</div>\n\n<div align=\"center\">\n<img alt=\"Thunder\" src=\"https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/LightningThunderLightModewByline.png#gh-light-mode-only\" width=\"400px\" style=\"max-width: 100%;\">\n<img alt=\"Thunder\" src=\"https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/LightningThunderDarkModewByline.png#gh-dark-mode-only\" width=\"400px\" style=\"max-width: 100%;\">\n<br/>\n<br/>\n\n&#160;\n\n<strong>Source-to-source compiler for PyTorch.</strong>\nUnderstandable. Inspectable. Extensible.\n\n</div>\n\n<div align='center'>\n\n<pre>\n\u2705 Run PyTorch 40% faster   \u2705 Quantization                \u2705 Kernel fusion        \n\u2705 Training recipes         \u2705 FP4/FP6/FP8 precision       \u2705 Distributed TP/PP/DP \n\u2705 Inference recipes        \u2705 Ready for NVIDIA Blackwell  \u2705 CUDA Graphs          \n\u2705 LLMs, non LLMs and more  \u2705 Custom Triton kernels       \u2705 Compose all the above\n</pre>\n\n</div>\n\nThunder is a source-to-source deep learning compiler for PyTorch that focuses on making it simple to optimize models for training and inference.\n\nIt provides:\n\n- a simple, Pythonic IR capturing the entire computation\n- a rich system of transforms that simultaneously operate on the computation IR, the model, and the weights\n- an extensible dispatch mechanism to fusers and optimized kernel libraries\n\nWith Thunder you can:\n\n- profile deep learning programs easily, map individual ops to kernels and inspect programs interactively\n- programmatically replace sequences of operations with optimized ones and see the effect on performance\n- acquire full computation graphs without graph breaks by flexibly extending the interpreter\n- modify programs to fully utilize bleeding edge kernel libraries on specific hardware\n- write models for single GPU and transform them to run distributed\n- quickly iterate on mixed precision and quantization strategies to search for combinations that minimally affect quality\n- bundle all optimizations in composable recipes, so they can be ported across model families\n\nUltimately, you should think about Thunder as a highly efficient tool to go from \u201cunoptimized\u201d to \u201coptimized\u201d.\n\nIf that is of interest for you, read on to Install Thunder and get started quickly.\n\n<div align='center'>\n\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/lightning-thunder/blob/main/LICENSE)\n[![CI testing](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml)\n[![General checks](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml)\n[![Documentation Status](https://readthedocs.org/projects/lightning-thunder/badge/?version=latest)](https://lightning-thunder.readthedocs.io/en/latest/?badge=latest)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/Lightning-AI/lightning-thunder/main.svg)](https://results.pre-commit.ci/latest/github/Lightning-AI/lightning-thunder/main)\n\n</div>\n\n<div align=\"center\">\n  <div style=\"text-align: center;\">\n    <a target=\"_blank\" href=\"#quick-start\" style=\"margin: 0 10px;\">Quick start</a> \u2022\n    <a target=\"_blank\" href=\"#examples\" style=\"margin: 0 10px;\">Examples</a> \u2022\n    <a target=\"_blank\" href=\"#performance\" style=\"margin: 0 10px;\">Performance</a> \u2022\n    <!-- <a target=\"_blank\" href=\"#hosting-options\" style=\"margin: 0 10px;\">Hosting</a> \u2022 -->\n    <a target=\"_blank\" href=\"https://lightning.ai/docs/thunder/latest/\" style=\"margin: 0 10px;\">Docs</a>\n  </div>\n</div>\n\n&#160;\n\n<!--\n<div align=\"center\">\n<a target=\"_blank\" href=\"https://lightning.ai/docs/thunder/home/get-started\">\n  <img src=\"https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg\" height=\"36px\" alt=\"Get started\"/>\n</a>\n</div>\n-->\n\n&#160;\n\n<div align=\"center\">\n<img alt=\"Thunder\" src=\"https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/pretrain_perf.png\" width=\"800px\" style=\"max-width: 100%;\">\n</div>\n\n# Quick start\n\nInstall Thunder via pip ([more options](https://lightning.ai/docs/thunder/latest/fundamentals/installation.html)):\n\n```bash\npip install lightning-thunder\n\npip install -U torch torchvision\npip install nvfuser-cu128-torch28 nvidia-cudnn-frontend  # if NVIDIA GPU is present\n```\n\n<details>\n  <summary>For older versions of <code>torch</code></summary>\n\n<code>torch==2.7</code> + CUDA 12.8\n\n```bash\npip install lightning-thunder\n\npip install torch==2.7.0 torchvision==0.22\npip install nvfuser-cu128-torch27 nvidia-cudnn-frontend  # if NVIDIA GPU is present\n```\n\n<code>torch==2.6</code> + CUDA 12.6\n\n```bash\npip install lightning-thunder\n\npip install torch==2.6.0 torchvision==0.21\npip install nvfuser-cu126-torch26 nvidia-cudnn-frontend  # if NVIDIA GPU is present\n```\n\n<code>torch==2.5</code> + CUDA 12.4\n\n```bash\npip install lightning-thunder\n\npip install torch==2.5.0 torchvision==0.20\npip install nvfuser-cu124-torch25 nvidia-cudnn-frontend  # if NVIDIA GPU is present\n```\n\n</details>\n\n<details>\n  <summary>Advanced install options</summary>\n\n### Install optional executors\n\n```bash\n# Float8 support (this will compile from source, be patient)\npip install \"transformer_engine[pytorch]\"\n```\n\n### Install Thunder bleeding edge\n\n```bash\npip install git+https://github.com/Lightning-AI/lightning-thunder.git@main\n```\n\n### Install Thunder for development\n\n```bash\ngit clone https://github.com/Lightning-AI/lightning-thunder.git\ncd lightning-thunder\npip install -e .\n```\n\n</details>\n\n### Hello world\n\nDefine a function or a torch module:\n\n```python\nimport torch.nn as nn\n\nmodel = nn.Sequential(nn.Linear(2048, 4096), nn.ReLU(), nn.Linear(4096, 64))\n```\n\nOptimize it with Thunder:\n\n```python\nimport thunder\nimport torch\n\nthunder_model = thunder.compile(model)\n\nx = torch.randn(64, 2048)\n\ny = thunder_model(x)\n\ntorch.testing.assert_close(y, model(x))\n```\n\n## Examples\n\n### LLM training\n\nInstall LitGPT (without updating other dependencies)\n\n```\npip install --no-deps 'litgpt[all]'\n```\n\nand run\n\n```python\nimport thunder\nimport torch\nimport litgpt\n\nwith torch.device(\"cuda\"):\n    model = litgpt.GPT.from_name(\"Llama-3.2-1B\").to(torch.bfloat16)\n\nthunder_model = thunder.compile(model)\n\ninp = torch.ones((1, 2048), device=\"cuda\", dtype=torch.int64)\n\nout = thunder_model(inp)\nout.sum().backward()\n```\n\n### HuggingFace BERT inference\n\nInstall Hugging Face Transformers (recommended version is `4.50.2` and above)\n\n```\npip install -U transformers\n```\n\nand run\n\n```python\nimport thunder\nimport torch\nimport transformers\n\nmodel_name = \"bert-large-uncased\"\n\ntokenizer = transformers.AutoTokenizer.from_pretrained(model_name)\n\nwith torch.device(\"cuda\"):\n    model = transformers.AutoModelForCausalLM.from_pretrained(\n        model_name, torch_dtype=torch.bfloat16\n    )\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = tokenizer([\"Hello world!\"], return_tensors=\"pt\")\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model(**inp)\nprint(out)\n```\n\n### HuggingFace DeepSeek R1 distill inference\n\nInstall Hugging Face Transformers (recommended version is `4.50.2` and above)\n\n```\npip install -U transformers\n```\n\nand run\n\n```python\nimport torch\nimport transformers\nimport thunder\n\nmodel_name = \"deepseek-ai/DeepSeek-R1-Distill-Llama-8B\"\n\ntokenizer = transformers.AutoTokenizer.from_pretrained(model_name)\n\nwith torch.device(\"cuda\"):\n    model = transformers.AutoModelForCausalLM.from_pretrained(\n        model_name, torch_dtype=torch.bfloat16\n    )\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = tokenizer([\"Hello world! Here's a long story\"], return_tensors=\"pt\")\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model.generate(\n    **inp, do_sample=False, cache_implementation=\"static\", max_new_tokens=100\n)\nprint(out)\n```\n\n### Vision Transformer inference\n\n```python\nimport thunder\nimport torch\nimport torchvision as tv\n\nwith torch.device(\"cuda\"):\n    model = tv.models.vit_b_16()\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = torch.randn(128, 3, 224, 224)\n\nout = model(inp)\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model(inp)\n```\n\n### Benchmarks\n\nAlthough is Thunder a tool for optimizing models, rather than an opaque compiler that gets you speedups out of the box, here is a set of benchmarks.\n\nPerf-wise, out of the box Thunder is in the ballpark of torch compile, especially when using CUDAGraphs. Note however that Thunder is not a competitor to torch compile! It can actually use torch compile as one of its fusion executors.\n\nThe script `examples/quickstart/hf_llm.py` demonstrates how to benchmark a model for text generation, forward pass, forward pass with loss, and a full forward + backward computation.\n\nOn an H100 with torch=2.8.0 and nvfuser-cu128-torch28 and Transformers 4.55.4 running Llama 3.2 1B we see the following timings:\n\n```\nTransformers with torch.compile and CUDAGraphs (reduce-overhead mode):  521ms\nTransformers with torch.compile but no CUDAGraphs (default mode):       814ms\nTransformers without torch.compile:                                    1493ms\nThunder with CUDAGraphs:                                                542ms\n```\n\n## Plugins\n\nPlugins are a way to apply optimizations to a model, such as parallelism and quantization.\n\nThunder comes with a few plugins included of the box, but it's easy to write new ones.\n\n- scale up with distributed strategies with DDP, FSDP, TP ()\n- optimize numerical precision with FP8, MXFP8\n- save memory with quantization\n- reduce latency with CUDAGraphs\n- debugging and profiling\n\nFor example, in order to reduce CPU overheads via CUDAGraphs you can add \"reduce-overhead\"\nto the `plugins=` argument of `thunder.compile`:\n\n```python\nthunder_model = thunder.compile(model, plugins=\"reduce-overhead\")\n```\n\nThis may or may not make a big difference. The point of Thunder is that you can easily\nswap optimizations in and out and explore the best combination for your setup.\n\n## How it works\n\nThunder works in three stages:\n\n1. \u26a1\ufe0f It acquires your model by interpreting Python bytecode and producing a straight-line Python program\n\n1. \ufe0f\u26a1\ufe0f It transforms the model and computation trace to make it distributed, change precision\n\n1. \u26a1\ufe0f It routes parts of the trace for execution\n\n   - fusion (`NVFuser`, `torch.compile`)\n   - specialized libraries (e.g. `cuDNN SDPA`, `TransformerEngine`)\n   - custom Triton and CUDA kernels\n   - PyTorch eager operations\n\n&#160;\n\n<div align=\"center\">\n<img alt=\"Thunder\" src=\"https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/how_it_works.png\" width=\"800px\" style=\"max-width: 100%;\">\n</div>\n\n&#160;\n\nThis is how the trace looks like for a simple MLP:\n\n```python\nimport thunder\nimport torch\nimport torch.nn as nn\n\nmodel = nn.Sequential(nn.Linear(1024, 2048), nn.ReLU(), nn.Linear(2048, 256))\n\nthunder_model = thunder.compile(model)\ny = thunder_model(torch.randn(4, 1024))\n\nprint(thunder.last_traces(thunder_model)[-1])\n```\n\nThis is the acquired trace, ready to be transformed and executed:\n\n```python\ndef computation(input, t_0_bias, t_0_weight, t_2_bias, t_2_weight):\n# input: \"cuda:0 f32[4, 1024]\"\n# t_0_bias: \"cuda:0 f32[2048]\"\n# t_0_weight: \"cuda:0 f32[2048, 1024]\"\n# t_2_bias: \"cuda:0 f32[256]\"\n# t_2_weight: \"cuda:0 f32[256, 2048]\"\nt3 = ltorch.linear(input, t_0_weight, t_0_bias) # t3: \"cuda:0 f32[4, 2048]\"\nt6 = ltorch.relu(t3, False) # t6: \"cuda:0 f32[4, 2048]\"\nt10 = ltorch.linear(t6, t_2_weight, t_2_bias) # t10: \"cuda:0 f32[4, 256]\"\nreturn (t10,)\n```\n\nNote how Thunder's intermediate representation is just (a subset of) Python!\n\n## Performance\n\nThunder is fast. Here are the speed-ups obtained on a pre-training task using LitGPT on H100 and B200 hardware, relative to PyTorch eager.\n\n<div align=\"center\">\n<img alt=\"Thunder\" src=\"https://github.com/Lightning-AI/lightning-thunder/raw/0.2.6/docs/source/_static/images/pretrain_perf.png\" width=\"800px\" style=\"max-width: 100%;\">\n</div>\n\n# Community\n\nThunder is an open source project, developed in collaboration with the community with significant contributions from NVIDIA.\n\n\ud83d\udcac [Get help on Discord](https://discord.com/invite/XncpTy7DSt)\n\ud83d\udccb [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Lightning Thunder is a source-to-source compiler for PyTorch, enabling PyTorch programs to run on different hardware accelerators and graph compilers.",
    "version": "0.2.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/Lightning-AI/lightning-thunder/issues",
        "Documentation": "https://lightning-thunder.rtfd.io/en/latest/",
        "Homepage": "https://github.com/Lightning-AI/lightning-thunder",
        "Source": "https://github.com/Lightning-AI/lightning-thunder"
    },
    "split_keywords": [
        "deep learning",
        " ai",
        " compiler"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "65d8dfff5fee348b34c2295c872977fc1a819ffba930b557c4d3eb3a9a38e85d",
                "md5": "9422f94f875991d9cbbd88fff2c305d6",
                "sha256": "7b0bb904e5a162b0a2b762ce4652e1ad993b467f45f714b159b78d3bc94e2b9c"
            },
            "downloads": -1,
            "filename": "lightning_thunder-0.2.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9422f94f875991d9cbbd88fff2c305d6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.10",
            "size": 997500,
            "upload_time": "2025-10-22T04:54:01",
            "upload_time_iso_8601": "2025-10-22T04:54:01.346111Z",
            "url": "https://files.pythonhosted.org/packages/65/d8/dfff5fee348b34c2295c872977fc1a819ffba930b557c4d3eb3a9a38e85d/lightning_thunder-0.2.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ac722602c16fd65e4ed4c4e02031d4cceed2088e2987546c80ea112abd6b550",
                "md5": "6b7830799ace14172415b0aaf46496e8",
                "sha256": "74897156457a505d694fa8a79c0e1e99dbb17ad3b4f62a940a472c567d7e2e15"
            },
            "downloads": -1,
            "filename": "lightning_thunder-0.2.6.tar.gz",
            "has_sig": false,
            "md5_digest": "6b7830799ace14172415b0aaf46496e8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.10",
            "size": 637043,
            "upload_time": "2025-10-22T04:54:03",
            "upload_time_iso_8601": "2025-10-22T04:54:03.639298Z",
            "url": "https://files.pythonhosted.org/packages/0a/c7/22602c16fd65e4ed4c4e02031d4cceed2088e2987546c80ea112abd6b550/lightning_thunder-0.2.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 04:54:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Lightning-AI",
    "github_project": "lightning-thunder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "lightning-thunder"
}

None