Name | xfuser JSON |
Version |
0.4.4
JSON |
| download |
home_page | https://github.com/xdit-project/xDiT. |
Summary | A Scalable Inference Engine for Diffusion Transformers (DiTs) on Multiple Computing Devices |
upload_time | 2025-07-25 09:19:06 |
maintainer | None |
docs_url | None |
author | xDiT Team |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<!-- <h1>KTransformers</h1> -->
<p align="center">
<picture>
<img alt="xDiT" src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/XDiTlogo.png" width="50%">
</p>
<h3>A Scalable Inference Engine for Diffusion Transformers (DiTs) on Multiple Computing Devices</h3>
<a href="#cite-us">π Papers</a> | <a href="#QuickStart">π Quick Start</a> | <a href="#support-dits">π― Supported DiTs</a> | <a href="#dev-guide">π Dev Guide </a> | <a href="https://github.com/xdit-project/xDiT/discussions">π Discussion </a> | <a href="https://medium.com/@xditproject">π Blogs</a></strong>
<p></p>
[](https://discord.gg/YEWzWfCF9S)
</div>
<h2 id="agenda">Table of Contents</h2>
- [π₯ Meet xDiT](#meet-xdit)
- [π’ Open-source Community](#updates)
- [π― Supported DiTs](#support-dits)
- [π Performance](#perf)
- [π QuickStart](#QuickStart)
- [πΌοΈ ComfyUI with xDiT](#comfyui)
- [β¨ xDiT's Arsenal](#secrets)
- [Parallel Methods](#parallel)
- [1. PipeFusion](#PipeFusion)
- [2. Unified Sequence Parallel](#USP)
- [3. Hybrid Parallel](#hybrid_parallel)
- [4. CFG Parallel](#cfg_parallel)
- [5. Parallel VAE](#parallel_vae)
- [Single GPU Acceleration](#1gpuacc)
- [Compilation Acceleration](#compilation)
- [Cache Acceleration](#cache_acceleration)
- [π Develop Guide](#dev-guide)
- [π§ History and Looking for Contributions](#history)
- [π Cite Us](#cite-us)
<h2 id="meet-xdit">π₯ Meet xDiT</h2>
Diffusion Transformers (DiTs) are driving advancements in high-quality image and video generation.
With the escalating input context length in DiTs, the computational demand of the Attention mechanism grows **quadratically**!
Consequently, multi-GPU and multi-machine deployments are essential to meet the **real-time** requirements in online services.
<h3 id="meet-xdit-parallel">Parallel Inference</h3>
To meet real-time demand for DiTs applications, parallel inference is a must.
xDiT is an inference engine designed for the parallel deployment of DiTs on a large scale.
xDiT provides a suite of efficient parallel approaches for Diffusion Models, as well as computation accelerations.
The overview of xDiT is shown as follows.
<picture>
<img alt="xDiT" src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_overview.png">
</picture>
1. Sequence Parallelism, [USP](https://arxiv.org/abs/2405.07719) is a unified sequence parallel approach proposed by us combining DeepSpeed-Ulysses, Ring-Attention.
2. [PipeFusion](https://arxiv.org/abs/2405.14430), a sequence-level pipeline parallelism, similar to [TeraPipe](https://arxiv.org/abs/2102.07988) but takes advantage of the input temporal redundancy characteristics of diffusion models.
3. Data Parallel: Processes multiple prompts or generates multiple images from a single prompt in parallel across images.
4. CFG Parallel, also known as Split Batch: Activates when using classifier-free guidance (CFG) with a constant parallelism of 2.
The four parallel methods in xDiT can be configured in a hybrid manner, optimizing communication patterns to best suit the underlying network hardware.
As shown in the following picture, xDiT offers a set of APIs to adapt DiT models in [huggingface/diffusers](https://github.com/huggingface/diffusers) to hybrid parallel implementation through simple wrappers.
If the model you require is not available in the model zoo, developing it by yourself is not so difficult; please refer to our [Dev Guide](#dev-guide).
We also have implemented the following parallel strategies for reference:
1. Tensor Parallelism
2. [DistriFusion](https://arxiv.org/abs/2402.19481)
<h3 id="meet-xdit-cache">Cache Acceleration</h3>
Cache method, including [TeaCache](https://github.com/ali-vilab/TeaCache.git), [First-Block-Cache](https://github.com/chengzeyi/ParaAttention.git) and [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn), which exploits computational redundancies between different steps of the Diffusion Model to accelerate inference on a single GPU.
<h3 id="meet-xdit-perf">Computing Acceleration</h3>
Optimization is orthogonal to parallel and focuses on accelerating performance on a single GPU.
First, xDiT employs a series of kernel acceleration methods. In addition to utilizing well-known Attention optimization libraries, we leverage compilation acceleration technologies such as `torch.compile` and `onediff`.
<h2 id="updates">π’ Open-source Community </h2>
The following open-sourced DiT Models are released with xDiT in day 1.
[HunyuanVideo](https://github.com/Tencent/HunyuanVideo) 
[StepVideo](https://github.com/stepfun-ai/Step-Video-T2V) 
[SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1) 
[Wan2.1](https://github.com/Wan-Video/Wan2.1) 
<h2 id="support-dits">π― Supported DiTs</h2>
<div align="center">
| Model Name | CFG | SP | PipeFusion | TP | Performance Report Link |
| --- | --- | --- | --- | --- | --- |
| [π¬ StepVideo](https://huggingface.co/stepfun-ai/stepvideo-t2v) | NA | βοΈ | β | βοΈ | [Report](./docs/performance/stepvideo.md) |
| [π¬ HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | βοΈ | β | β | [Report](./docs/performance/hunyuanvideo.md) |
| [π¬ ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) | βοΈ | βοΈ | β | β | [Report](./docs/performance/consisid.md) |
| [π¬ CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | βοΈ | βοΈ | β | β | [Report](./docs/performance/cogvideo.md) |
| [π¬ Mochi-1](https://github.com/xdit-project/mochi-xdit) | βοΈ | βοΈ | β | β | [Report](https://github.com/xdit-project/mochi-xdit) |
| [π¬ CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | βοΈ | βοΈ | β | β | [Report](./docs/performance/cogvideo.md) |
| [π¬ Latte](https://huggingface.co/maxin-cn/Latte-1) | β | βοΈ | β | β | [Report](./docs/performance/latte.md) |
| [π΅ HunyuanDiT-v1.2-Diffusers](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers) | βοΈ | βοΈ | βοΈ | β | [Report](./docs/performance/hunyuandit.md) |
| [π Flux](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | NA | βοΈ | βοΈ | β | [Report](./docs/performance/flux.md) |
| [π΄ PixArt-Sigma](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS) | βοΈ | βοΈ | βοΈ | β | [Report](./docs/performance/pixart_alpha_legacy.md) |
| [π’ PixArt-alpha](https://huggingface.co/PixArt-alpha/PixArt-alpha) | βοΈ | βοΈ | βοΈ | β | [Report](./docs/performance/pixart_alpha_legacy.md) |
| [π Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) | βοΈ | βοΈ | βοΈ | β | [Report](./docs/performance/sd3.md) |
| [π€ SANA](https://github.com/NVlabs/Sana/blob/main/asset/docs/model_zoo.md) | βοΈ | βοΈ | βοΈ | β | [Report](./docs/performance/sana.md) |
| [β« SANA Sprint](https://github.com/NVlabs/Sana/blob/main/asset/docs/model_zoo.md#sana-sprint) | NA | βοΈ | β | β | NA |
| [π£ SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | βοΈ | β | β | β | NA |
</div>
[π΄ DiT-XL](https://huggingface.co/facebook/DiT-XL-2-256) is supported by legacy version only, including DistriFusion and Tensor Parallel as the standalone parallel strategies:
<h2 id="comfyui">πΌοΈ TACO-DiT: ComfyUI with xDiT</h2>
ComfyUI, is the most popular web-based Diffusion Model interface optimized for workflow.
It provides users with a UI platform for image generation, supporting plugins like LoRA, ControlNet, and IPAdaptor. Yet, its design for native single-GPU usage leaves it struggling with the demands of today's large DiTs, resulting in unacceptably high latency for users like Flux.1.
Using our commercial project **TACO-DiT**, a close-sourced ComfyUI variant built with xDiT, we've successfully implemented a multi-GPU parallel processing workflow within ComfyUI, effectively addressing Flux.1's performance challenges. Below is an example of using TACO-DiT to accelerate a Flux workflow with LoRA:

By using TACO-DiT, you could significantly reduce your ComfyUI workflow inference latency, and boosting the throughput with Multi-GPUs. Now it is compatible with multiple Plug-ins, including ControlNet and LoRAs.
More features and details can be found in our Intro Video:
+ [[YouTube] TACO-DiT: Accelerating Your ComfyUI Generation Experience](https://www.youtube.com/watch?v=7DXnGrARqys)
+ [[Bilibili] TACO-DiT: ε ιδ½ ηComfyUIηζδ½ιͺ](https://www.bilibili.com/video/BV18tU7YbEra/?vd_source=59c1f990379162c8f596974f34224e4f)
The blog article is also available: [Supercharge Your AIGC Experience: Leverage xDiT for Multiple GPU Parallel in ComfyUI Flux.1 Workflow](https://medium.com/@xditproject/supercharge-your-aigc-experience-leverage-xdit-for-multiple-gpu-parallel-in-comfyui-flux-1-54b34e4bca05).
<h2 id="QuickStart">π QuickStart</h2>
### 1. Install from pip
We set `diffusers` and `flash_attn` as two optional installation requirements.
About `diffusers` version:
- If you only use the USP interface, `diffusers` is not required. Models are typically released as `nn.Module`
first, before being integrated into diffusers. xDiT sometimes is applied as an USP plugin to existing projects.
- Different models may require different diffusers versions. Model implementations can vary between diffusers versions, especially for latest models, which affects parallel processing. When encountering model execution errors, you may need to try several recent diffusers versions.
- While we specify a diffusers version in `setup.py`, newer models may require later versions or even need to be installed from main branch.
About `flash_attn` version:
- Without `flash_attn` installed, xDiT falls back to a PyTorch implementation of ring attention, which helps NPU users with compatibility
- However, not using `flash_attn` on GPUs may result in suboptimal performance. For best GPU performance, we strongly recommend installing `flash_attn`.
```
pip install xfuser # Basic installation
pip install "xfuser[diffusers,flash-attn]" # With both diffusers and flash attention
```
### 2. Install from source
```
pip install -e .
# Or optionally, with diffusers
pip install -e ".[diffusers,flash-attn]"
```
Note that we use two self-maintained packages:
1. [yunchang](https://github.com/feifeibear/long-context-attention)
2. [DistVAE](https://github.com/xdit-project/DistVAE)
The [flash_attn](https://github.com/Dao-AILab/flash-attention) used for yunchang should be >= 2.6.0
### 3. Docker
We provide a docker image for developers to develop with xDiT. The docker image is [thufeifeibear/xdit-dev](https://hub.docker.com/r/thufeifeibear/xdit-dev).
### 4. Usage
We provide examples demonstrating how to run models with xDiT in the [./examples/](./examples/) directory.
You can easily modify the model type, model directory, and parallel options in the [examples/run.sh](examples/run.sh) within the script to run some already supported DiT models.
```bash
bash examples/run.sh
```
Hybridizing multiple parallelism techniques together is essential for efficiently scaling.
It's important that **the product of all parallel degrees matches the number of devices**.
Note use_cfg_parallel means cfg_parallel=2. For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism.
Here ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_cfg_parallel) == number of devices == 8.
```bash
torchrun --nproc_per_node=8 \
examples/pixartalpha_example.py \
--model models/PixArt-XL-2-1024-MS \
--pipefusion_parallel_degree 2 \
--ulysses_degree 2 \
--num_inference_steps 20 \
--warmup_steps 0 \
--prompt "A cute dog" \
--use_cfg_parallel
```
β οΈ Applying PipeFusion requires setting `warmup_steps`, also required in DistriFusion, typically set to a small number compared with `num_inference_steps`.
The warmup step impacts the efficiency of PipeFusion as it cannot be executed in parallel, thus degrading to a serial execution.
We observed that a warmup of 0 had no effect on the PixArt model.
Users can tune this value according to their specific tasks.
### 5. Launch an HTTP Service
You can also launch an HTTP service to generate images with xDiT.
[Launching a Text-to-Image Http Service](./docs/developer/Http_Service.md)
<h2 id="dev-guide">π Develop Guide</h2>
We provide a step-by-step guide for adding new models, please refer to the following tutorial.
[Apply xDiT to new models](./docs/developer/adding_models/readme.md)
A high-level design of xDiT framework is provided below, which may help you understand the xDiT framework.
[The implement and design of xdit framework](./docs/developer/The_implement_design_of_xdit_framework.md)
<h2 id="secrets">β¨ The xDiT's Arsenal</h2>
The remarkable performance of xDiT is attributed to two key facets.
Firstly, it leverages parallelization techniques, pioneering innovations such as USP, PipeFusion, and hybrid parallelism, to scale DiTs inference to unprecedented scales.
Secondly, we employ compilation technologies to enhance execution on GPUs, integrating established solutions like `torch.compile` and `onediff` to optimize xDiT's performance.
<h3 id="parallel">1. Parallel Methods</h3>
As illustrated in the accompanying images, xDiTs offer a comprehensive set of parallelization techniques. For the DiT backbone, the foundational methodsβData, USP, PipeFusion, and CFG parallelβoperate in a hybrid fashion. Additionally, the distinct methods, Tensor and DistriFusion parallel, function independently.
For the VAE module, xDiT offers a parallel implementation, [DistVAE](https://github.com/xdit-project/DistVAE), designed to prevent out-of-memory (OOM) issues.
The (<span style="color: red;">xDiT</span>) highlights the methods first proposed by use.
<div align="center">
<img src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_method.png" alt="xdit methods">
</div>
The communication and memory costs associated with the aforementioned intra-image parallelism, except for the CFG and DP (they are inter-image parallel), in DiTs are detailed in the table below. (* denotes that communication can be overlapped with computation.)
As we can see, PipeFusion and Sequence Parallel achieve the lowest communication cost on different scales and hardware configurations, making them suitable foundational components for a hybrid approach.
π: Number of pixels;\
ππ: Model hidden size;\
π³: Number of model layers;\
π·: Total model parameters;\
π΅: Number of parallel devices;\
π΄: Number of patch splits;\
πΈπΆ: Query and Output parameter count;\
π²π½: KV Activation parameter count;\
π¨ = πΈ = πΆ = π² = π½: Equal parameters for Attention, Query, Output, Key, and Value;
| | attn-KV | communication cost | param memory | activations memory | extra buff memory |
|:-------------------------:|:-------:|:----------------------------:|:--------------:|:------------------------------:|:----------------------------------:|
| Tensor Parallel | fresh | $4O(p \times hs)L$ | $\frac{1}{N}P$ | $\frac{2}{N}A = \frac{1}{N}QO$ | $\frac{2}{N}A = \frac{1}{N}KV$ |
| DistriFusion* | stale | $2O(p \times hs)L$ | $P$ | $\frac{2}{N}A = \frac{1}{N}QO$ | $2AL = (KV)L$ |
| Ring Sequence Parallel* | fresh | $2O(p \times hs)L$ | $P$ | $\frac{2}{N}A = \frac{1}{N}QO$ | $\frac{2}{N}A = \frac{1}{N}KV$ |
| Ulysses Sequence Parallel | fresh | $\frac{4}{N}O(p \times hs)L$ | $P$ | $\frac{2}{N}A = \frac{1}{N}QO$ | $\frac{2}{N}A = \frac{1}{N}KV$ |
| PipeFusion* | stale- | $2O(p \times hs)$ | $\frac{1}{N}P$ | $\frac{2}{M}A = \frac{1}{M}QO$ | $\frac{2L}{N}A = \frac{1}{N}(KV)L$ |
<h4 id="PipeFusion">1.1. PipeFusion</h4>
[PipeFusion: Displaced Patch Pipeline Parallelism for Diffusion Models](./docs/methods/pipefusion.md)
<h4 id="USP">1.2. USP: Unified Sequence Parallelism</h4>
[USP: A Unified Sequence Parallelism Approach for Long Context Generative AI](./docs/methods/usp.md)
<h4 id="hybrid_parallel">1.3. Hybrid Parallel</h4>
[Hybrid Parallelism](./docs/methods/hybrid.md)
<h4 id="cfg_parallel">1.4. CFG Parallel</h4>
[CFG Parallel](./docs/methods/cfg_parallel.md)
<h4 id="parallel_vae">1.5. Parallel VAE</h4>
[Patch Parallel VAE](./docs/methods/parallel_vae.md)
<h3 id="1gpuacc">Single GPU Acceleration</h3>
<h4 id="compilation">Compilation Acceleration</h4>
We utilize two compilation acceleration techniques, [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) and [onediff](https://github.com/siliconflow/onediff), to enhance runtime speed on GPUs. These compilation accelerations are used in conjunction with parallelization methods.
We employ the nexfort backend of onediff. Please install it before use:
```
pip install onediff
pip install -U nexfort
```
For usage instructions, refer to the [example/run.sh](./examples/run.sh). Simply append `--use_torch_compile` or `--use_onediff` to your command. Note that these options are mutually exclusive, and their performance varies across different scenarios.
<h4 id="cache_acceleration">Cache Acceleration</h4>
You can use `--use_teacache` or `--use_fbcache` in examples/run.sh, which applies TeaCache and First-Block-Cache respectively.
Note, cache method is only supported for FLUX model with USP. It is currently not applicable for PipeFusion.
xDiT also provides DiTFastAttn for single GPU acceleration. It can reduce the computation cost of attention layers by leveraging redundancies between different steps of the Diffusion Model.
[DiTFastAttn: Attention Compression for Diffusion Transformer Models](./docs/methods/ditfastattn.md)
<h2 id="history">π§ History and Looking for Contributions</h2>
We conducted a major upgrade of this project in August 2024, introducing a new set of APIs that are now the preferred choice for all users.
The legacy APIs are applied in early stage of xDiT to explore and compare different parallelization methods.
They are located in the [legacy](https://github.com/xdit-project/xDiT/tree/legacy) branch, are now considered outdated and do not support hybrid parallelism. Despite this limitation, they offer a broader range of individual parallelization methods, including PipeFusion, Sequence Parallel, DistriFusion, and Tensor Parallel.
For users working with Pixart models, you can still run the examples in the [scripts/](https://github.com/xdit-project/xDiT/tree/legacy/scripts) directory under the `legacy` branch. However, for all other models, we strongly recommend adopting the formal APIs to ensure optimal performance and compatibility.
We also warmly welcome developers to join us in enhancing the project. If you have ideas for new features or models, please share them in our [issues](https://github.com/xdit-project/xDiT/issues). Your contributions are invaluable in driving the project forward and ensuring it meets the needs of the community.
<h2 id="cite-us">π Cite Us</h2>
[xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism](https://arxiv.org/abs/2411.01738)
```
@article{fang2024xdit,
title={xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism},
author={Fang, Jiarui and Pan, Jinzhe and Sun, Xibo and Li, Aoyu and Wang, Jiannan},
journal={arXiv preprint arXiv:2411.01738},
year={2024}
}
```
[PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference](https://arxiv.org/abs/2405.14430)
```
@article{fang2024pipefusion,
title={PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference},
author={Jiarui Fang and Jinzhe Pan and Jiannan Wang and Aoyu Li and Xibo Sun},
journal={arXiv preprint arXiv:2405.14430},
year={2024}
}
```
[USP: A Unified Sequence Parallelism Approach for Long Context Generative AI](https://arxiv.org/abs/2405.07719)
```
@article{fang2024unified,
title={A Unified Sequence Parallelism Approach for Long Context Generative AI},
author={Fang, Jiarui and Zhao, Shangchun},
journal={arXiv preprint arXiv:2405.07719},
year={2024}
}
```
[Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study](https://arxiv.org/abs/2411.13588)
```
@article{sun2024unveiling,
title={Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study},
author={Sun, Xibo and Fang, Jiarui and Li, Aoyu and Pan, Jinzhe},
journal={arXiv preprint arXiv:2411.13588},
year={2024}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/xdit-project/xDiT.",
"name": "xfuser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "xDiT Team",
"author_email": "fangjiarui123@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d1/9a/5ef30ee1938553ef1f8253c03ccdd1da7592182f427274e60296ea1ee3bf/xfuser-0.4.4.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <!-- <h1>KTransformers</h1> -->\n <p align=\"center\">\n\n <picture>\n <img alt=\"xDiT\" src=\"https://raw.githubusercontent.com/xdit-project/xdit_assets/main/XDiTlogo.png\" width=\"50%\">\n\n </p>\n <h3>A Scalable Inference Engine for Diffusion Transformers (DiTs) on Multiple Computing Devices</h3>\n <a href=\"#cite-us\">\ud83d\udcdd Papers</a> | <a href=\"#QuickStart\">\ud83d\ude80 Quick Start</a> | <a href=\"#support-dits\">\ud83c\udfaf Supported DiTs</a> | <a href=\"#dev-guide\">\ud83d\udcda Dev Guide </a> | <a href=\"https://github.com/xdit-project/xDiT/discussions\">\ud83d\udcc8 Discussion </a> | <a href=\"https://medium.com/@xditproject\">\ud83d\udcdd Blogs</a></strong>\n <p></p>\n\n[](https://discord.gg/YEWzWfCF9S)\n\n</div>\n\n<h2 id=\"agenda\">Table of Contents</h2>\n\n- [\ud83d\udd25 Meet xDiT](#meet-xdit)\n- [\ud83d\udce2 Open-source Community](#updates)\n- [\ud83c\udfaf Supported DiTs](#support-dits)\n- [\ud83d\udcc8 Performance](#perf)\n- [\ud83d\ude80 QuickStart](#QuickStart)\n- [\ud83d\uddbc\ufe0f ComfyUI with xDiT](#comfyui)\n- [\u2728 xDiT's Arsenal](#secrets)\n - [Parallel Methods](#parallel)\n - [1. PipeFusion](#PipeFusion)\n - [2. Unified Sequence Parallel](#USP)\n - [3. Hybrid Parallel](#hybrid_parallel)\n - [4. CFG Parallel](#cfg_parallel)\n - [5. Parallel VAE](#parallel_vae)\n - [Single GPU Acceleration](#1gpuacc)\n - [Compilation Acceleration](#compilation)\n - [Cache Acceleration](#cache_acceleration)\n- [\ud83d\udcda Develop Guide](#dev-guide)\n- [\ud83d\udea7 History and Looking for Contributions](#history)\n- [\ud83d\udcdd Cite Us](#cite-us)\n\n\n<h2 id=\"meet-xdit\">\ud83d\udd25 Meet xDiT</h2>\n\nDiffusion Transformers (DiTs) are driving advancements in high-quality image and video generation. \nWith the escalating input context length in DiTs, the computational demand of the Attention mechanism grows **quadratically**! \nConsequently, multi-GPU and multi-machine deployments are essential to meet the **real-time** requirements in online services.\n\n\n<h3 id=\"meet-xdit-parallel\">Parallel Inference</h3>\n\nTo meet real-time demand for DiTs applications, parallel inference is a must.\nxDiT is an inference engine designed for the parallel deployment of DiTs on a large scale. \nxDiT provides a suite of efficient parallel approaches for Diffusion Models, as well as computation accelerations.\n\nThe overview of xDiT is shown as follows.\n\n<picture>\n <img alt=\"xDiT\" src=\"https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_overview.png\">\n</picture>\n\n\n1. Sequence Parallelism, [USP](https://arxiv.org/abs/2405.07719) is a unified sequence parallel approach proposed by us combining DeepSpeed-Ulysses, Ring-Attention.\n\n2. [PipeFusion](https://arxiv.org/abs/2405.14430), a sequence-level pipeline parallelism, similar to [TeraPipe](https://arxiv.org/abs/2102.07988) but takes advantage of the input temporal redundancy characteristics of diffusion models.\n\n3. Data Parallel: Processes multiple prompts or generates multiple images from a single prompt in parallel across images.\n\n4. CFG Parallel, also known as Split Batch: Activates when using classifier-free guidance (CFG) with a constant parallelism of 2.\n\nThe four parallel methods in xDiT can be configured in a hybrid manner, optimizing communication patterns to best suit the underlying network hardware.\n\nAs shown in the following picture, xDiT offers a set of APIs to adapt DiT models in [huggingface/diffusers](https://github.com/huggingface/diffusers) to hybrid parallel implementation through simple wrappers. \nIf the model you require is not available in the model zoo, developing it by yourself is not so difficult; please refer to our [Dev Guide](#dev-guide).\n\nWe also have implemented the following parallel strategies for reference:\n\n1. Tensor Parallelism\n2. [DistriFusion](https://arxiv.org/abs/2402.19481)\n\n<h3 id=\"meet-xdit-cache\">Cache Acceleration</h3>\n\nCache method, including [TeaCache](https://github.com/ali-vilab/TeaCache.git), [First-Block-Cache](https://github.com/chengzeyi/ParaAttention.git) and [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn), which exploits computational redundancies between different steps of the Diffusion Model to accelerate inference on a single GPU.\n\n<h3 id=\"meet-xdit-perf\">Computing Acceleration</h3>\n\nOptimization is orthogonal to parallel and focuses on accelerating performance on a single GPU.\n\nFirst, xDiT employs a series of kernel acceleration methods. In addition to utilizing well-known Attention optimization libraries, we leverage compilation acceleration technologies such as `torch.compile` and `onediff`.\n\n\n<h2 id=\"updates\">\ud83d\udce2 Open-source Community </h2>\n\nThe following open-sourced DiT Models are released with xDiT in day 1.\n\n[HunyuanVideo](https://github.com/Tencent/HunyuanVideo) \n\n[StepVideo](https://github.com/stepfun-ai/Step-Video-T2V) \n\n[SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1) \n\n[Wan2.1](https://github.com/Wan-Video/Wan2.1) \n\n\n\n<h2 id=\"support-dits\">\ud83c\udfaf Supported DiTs</h2>\n\n<div align=\"center\">\n\n| Model Name | CFG | SP | PipeFusion | TP | Performance Report Link |\n| --- | --- | --- | --- | --- | --- |\n| [\ud83c\udfac StepVideo](https://huggingface.co/stepfun-ai/stepvideo-t2v) | NA | \u2714\ufe0f | \u274e | \u2714\ufe0f | [Report](./docs/performance/stepvideo.md) |\n| [\ud83c\udfac HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | \u2714\ufe0f | \u274e | \u274e | [Report](./docs/performance/hunyuanvideo.md) |\n| [\ud83c\udfac ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) | \u2714\ufe0f | \u2714\ufe0f | \u274e | \u274e | [Report](./docs/performance/consisid.md) |\n| [\ud83c\udfac CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | \u2714\ufe0f | \u2714\ufe0f | \u274e | \u274e | [Report](./docs/performance/cogvideo.md) |\n| [\ud83c\udfac Mochi-1](https://github.com/xdit-project/mochi-xdit) | \u2714\ufe0f | \u2714\ufe0f | \u274e | \u274e | [Report](https://github.com/xdit-project/mochi-xdit) |\n| [\ud83c\udfac CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | \u2714\ufe0f | \u2714\ufe0f | \u274e | \u274e | [Report](./docs/performance/cogvideo.md) |\n| [\ud83c\udfac Latte](https://huggingface.co/maxin-cn/Latte-1) | \u274e | \u2714\ufe0f | \u274e | \u274e | [Report](./docs/performance/latte.md) |\n| [\ud83d\udd35 HunyuanDiT-v1.2-Diffusers](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/hunyuandit.md) |\n| [\ud83d\udfe0 Flux](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | NA | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/flux.md) |\n| [\ud83d\udd34 PixArt-Sigma](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/pixart_alpha_legacy.md) |\n| [\ud83d\udfe2 PixArt-alpha](https://huggingface.co/PixArt-alpha/PixArt-alpha) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/pixart_alpha_legacy.md) |\n| [\ud83d\udfe0 Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/sd3.md) |\n| [\ud83d\udfe4 SANA](https://github.com/NVlabs/Sana/blob/main/asset/docs/model_zoo.md) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274e | [Report](./docs/performance/sana.md) |\n| [\u26ab SANA Sprint](https://github.com/NVlabs/Sana/blob/main/asset/docs/model_zoo.md#sana-sprint) | NA | \u2714\ufe0f | \u274e | \u274e | NA |\n| [\ud83d\udfe3 SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | \u2714\ufe0f | \u274e | \u274e | \u274e | NA |\n\n</div>\n\n\n[\ud83d\udd34 DiT-XL](https://huggingface.co/facebook/DiT-XL-2-256) is supported by legacy version only, including DistriFusion and Tensor Parallel as the standalone parallel strategies:\n\n\n\n<h2 id=\"comfyui\">\ud83d\uddbc\ufe0f TACO-DiT: ComfyUI with xDiT</h2>\n\nComfyUI, is the most popular web-based Diffusion Model interface optimized for workflow. \nIt provides users with a UI platform for image generation, supporting plugins like LoRA, ControlNet, and IPAdaptor. Yet, its design for native single-GPU usage leaves it struggling with the demands of today's large DiTs, resulting in unacceptably high latency for users like Flux.1. \n\nUsing our commercial project **TACO-DiT**, a close-sourced ComfyUI variant built with xDiT, we've successfully implemented a multi-GPU parallel processing workflow within ComfyUI, effectively addressing Flux.1's performance challenges. Below is an example of using TACO-DiT to accelerate a Flux workflow with LoRA:\n\n\n\nBy using TACO-DiT, you could significantly reduce your ComfyUI workflow inference latency, and boosting the throughput with Multi-GPUs. Now it is compatible with multiple Plug-ins, including ControlNet and LoRAs.\n\nMore features and details can be found in our Intro Video: \n+ [[YouTube] TACO-DiT: Accelerating Your ComfyUI Generation Experience](https://www.youtube.com/watch?v=7DXnGrARqys) \n+ [[Bilibili] TACO-DiT: \u52a0\u901f\u4f60\u7684ComfyUI\u751f\u6210\u4f53\u9a8c](https://www.bilibili.com/video/BV18tU7YbEra/?vd_source=59c1f990379162c8f596974f34224e4f)\n\nThe blog article is also available: [Supercharge Your AIGC Experience: Leverage xDiT for Multiple GPU Parallel in ComfyUI Flux.1 Workflow](https://medium.com/@xditproject/supercharge-your-aigc-experience-leverage-xdit-for-multiple-gpu-parallel-in-comfyui-flux-1-54b34e4bca05). \n\n<h2 id=\"QuickStart\">\ud83d\ude80 QuickStart</h2>\n\n### 1. Install from pip\n\nWe set `diffusers` and `flash_attn` as two optional installation requirements.\n\nAbout `diffusers` version: \n- If you only use the USP interface, `diffusers` is not required. Models are typically released as `nn.Module`\n first, before being integrated into diffusers. xDiT sometimes is applied as an USP plugin to existing projects.\n- Different models may require different diffusers versions. Model implementations can vary between diffusers versions, especially for latest models, which affects parallel processing. When encountering model execution errors, you may need to try several recent diffusers versions.\n- While we specify a diffusers version in `setup.py`, newer models may require later versions or even need to be installed from main branch.\n\nAbout `flash_attn` version:\n- Without `flash_attn` installed, xDiT falls back to a PyTorch implementation of ring attention, which helps NPU users with compatibility\n- However, not using `flash_attn` on GPUs may result in suboptimal performance. For best GPU performance, we strongly recommend installing `flash_attn`.\n\n```\npip install xfuser # Basic installation\npip install \"xfuser[diffusers,flash-attn]\" # With both diffusers and flash attention\n```\n\n### 2. Install from source \n\n```\npip install -e .\n# Or optionally, with diffusers\npip install -e \".[diffusers,flash-attn]\"\n```\n\nNote that we use two self-maintained packages:\n\n1. [yunchang](https://github.com/feifeibear/long-context-attention)\n2. [DistVAE](https://github.com/xdit-project/DistVAE)\n\nThe [flash_attn](https://github.com/Dao-AILab/flash-attention) used for yunchang should be >= 2.6.0\n\n### 3. Docker\n\nWe provide a docker image for developers to develop with xDiT. The docker image is [thufeifeibear/xdit-dev](https://hub.docker.com/r/thufeifeibear/xdit-dev).\n\n### 4. Usage\n\nWe provide examples demonstrating how to run models with xDiT in the [./examples/](./examples/) directory. \nYou can easily modify the model type, model directory, and parallel options in the [examples/run.sh](examples/run.sh) within the script to run some already supported DiT models.\n\n```bash\nbash examples/run.sh\n```\n\nHybridizing multiple parallelism techniques together is essential for efficiently scaling. \nIt's important that **the product of all parallel degrees matches the number of devices**. \nNote use_cfg_parallel means cfg_parallel=2. For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism. \nHere ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_cfg_parallel) == number of devices == 8.\n\n\n```bash\ntorchrun --nproc_per_node=8 \\\nexamples/pixartalpha_example.py \\\n--model models/PixArt-XL-2-1024-MS \\\n--pipefusion_parallel_degree 2 \\\n--ulysses_degree 2 \\\n--num_inference_steps 20 \\\n--warmup_steps 0 \\\n--prompt \"A cute dog\" \\\n--use_cfg_parallel\n```\n\n\u26a0\ufe0f Applying PipeFusion requires setting `warmup_steps`, also required in DistriFusion, typically set to a small number compared with `num_inference_steps`.\nThe warmup step impacts the efficiency of PipeFusion as it cannot be executed in parallel, thus degrading to a serial execution. \nWe observed that a warmup of 0 had no effect on the PixArt model.\nUsers can tune this value according to their specific tasks.\n\n### 5. Launch an HTTP Service\n\nYou can also launch an HTTP service to generate images with xDiT.\n\n[Launching a Text-to-Image Http Service](./docs/developer/Http_Service.md)\n\n<h2 id=\"dev-guide\">\ud83d\udcda Develop Guide</h2>\n\nWe provide a step-by-step guide for adding new models, please refer to the following tutorial.\n\n[Apply xDiT to new models](./docs/developer/adding_models/readme.md)\n\nA high-level design of xDiT framework is provided below, which may help you understand the xDiT framework.\n\n[The implement and design of xdit framework](./docs/developer/The_implement_design_of_xdit_framework.md)\n\n<h2 id=\"secrets\">\u2728 The xDiT's Arsenal</h2>\n\nThe remarkable performance of xDiT is attributed to two key facets.\nFirstly, it leverages parallelization techniques, pioneering innovations such as USP, PipeFusion, and hybrid parallelism, to scale DiTs inference to unprecedented scales.\n\nSecondly, we employ compilation technologies to enhance execution on GPUs, integrating established solutions like `torch.compile` and `onediff` to optimize xDiT's performance.\n\n<h3 id=\"parallel\">1. Parallel Methods</h3>\n\nAs illustrated in the accompanying images, xDiTs offer a comprehensive set of parallelization techniques. For the DiT backbone, the foundational methods\u2014Data, USP, PipeFusion, and CFG parallel\u2014operate in a hybrid fashion. Additionally, the distinct methods, Tensor and DistriFusion parallel, function independently.\nFor the VAE module, xDiT offers a parallel implementation, [DistVAE](https://github.com/xdit-project/DistVAE), designed to prevent out-of-memory (OOM) issues.\nThe (<span style=\"color: red;\">xDiT</span>) highlights the methods first proposed by use.\n\n<div align=\"center\">\n <img src=\"https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_method.png\" alt=\"xdit methods\">\n</div>\n\nThe communication and memory costs associated with the aforementioned intra-image parallelism, except for the CFG and DP (they are inter-image parallel), in DiTs are detailed in the table below. (* denotes that communication can be overlapped with computation.)\n\nAs we can see, PipeFusion and Sequence Parallel achieve the lowest communication cost on different scales and hardware configurations, making them suitable foundational components for a hybrid approach.\n\n\ud835\udc91: Number of pixels;\\\n\ud835\udc89\ud835\udc94: Model hidden size;\\\n\ud835\udc73: Number of model layers;\\\n\ud835\udc77: Total model parameters;\\\n\ud835\udc75: Number of parallel devices;\\\n\ud835\udc74: Number of patch splits;\\\n\ud835\udc78\ud835\udc76: Query and Output parameter count;\\\n\ud835\udc72\ud835\udc7d: KV Activation parameter count;\\\n\ud835\udc68 = \ud835\udc78 = \ud835\udc76 = \ud835\udc72 = \ud835\udc7d: Equal parameters for Attention, Query, Output, Key, and Value;\n\n\n| | attn-KV | communication cost | param memory | activations memory | extra buff memory |\n|:-------------------------:|:-------:|:----------------------------:|:--------------:|:------------------------------:|:----------------------------------:|\n| Tensor Parallel | fresh | $4O(p \\times hs)L$ | $\\frac{1}{N}P$ | $\\frac{2}{N}A = \\frac{1}{N}QO$ | $\\frac{2}{N}A = \\frac{1}{N}KV$ |\n| DistriFusion* | stale | $2O(p \\times hs)L$ | $P$ | $\\frac{2}{N}A = \\frac{1}{N}QO$ | $2AL = (KV)L$ |\n| Ring Sequence Parallel* | fresh | $2O(p \\times hs)L$ | $P$ | $\\frac{2}{N}A = \\frac{1}{N}QO$ | $\\frac{2}{N}A = \\frac{1}{N}KV$ |\n| Ulysses Sequence Parallel | fresh | $\\frac{4}{N}O(p \\times hs)L$ | $P$ | $\\frac{2}{N}A = \\frac{1}{N}QO$ | $\\frac{2}{N}A = \\frac{1}{N}KV$ |\n| PipeFusion* | stale- | $2O(p \\times hs)$ | $\\frac{1}{N}P$ | $\\frac{2}{M}A = \\frac{1}{M}QO$ | $\\frac{2L}{N}A = \\frac{1}{N}(KV)L$ |\n\n\n<h4 id=\"PipeFusion\">1.1. PipeFusion</h4>\n\n[PipeFusion: Displaced Patch Pipeline Parallelism for Diffusion Models](./docs/methods/pipefusion.md)\n\n<h4 id=\"USP\">1.2. USP: Unified Sequence Parallelism</h4>\n\n[USP: A Unified Sequence Parallelism Approach for Long Context Generative AI](./docs/methods/usp.md)\n\n<h4 id=\"hybrid_parallel\">1.3. Hybrid Parallel</h4>\n\n[Hybrid Parallelism](./docs/methods/hybrid.md)\n\n<h4 id=\"cfg_parallel\">1.4. CFG Parallel</h4>\n\n[CFG Parallel](./docs/methods/cfg_parallel.md)\n\n<h4 id=\"parallel_vae\">1.5. Parallel VAE</h4>\n\n[Patch Parallel VAE](./docs/methods/parallel_vae.md)\n\n<h3 id=\"1gpuacc\">Single GPU Acceleration</h3>\n\n\n<h4 id=\"compilation\">Compilation Acceleration</h4>\n\nWe utilize two compilation acceleration techniques, [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) and [onediff](https://github.com/siliconflow/onediff), to enhance runtime speed on GPUs. These compilation accelerations are used in conjunction with parallelization methods.\n\nWe employ the nexfort backend of onediff. Please install it before use:\n\n```\npip install onediff\npip install -U nexfort\n```\n\nFor usage instructions, refer to the [example/run.sh](./examples/run.sh). Simply append `--use_torch_compile` or `--use_onediff` to your command. Note that these options are mutually exclusive, and their performance varies across different scenarios.\n\n<h4 id=\"cache_acceleration\">Cache Acceleration</h4>\n\nYou can use `--use_teacache` or `--use_fbcache` in examples/run.sh, which applies TeaCache and First-Block-Cache respectively. \nNote, cache method is only supported for FLUX model with USP. It is currently not applicable for PipeFusion.\n\nxDiT also provides DiTFastAttn for single GPU acceleration. It can reduce the computation cost of attention layers by leveraging redundancies between different steps of the Diffusion Model.\n\n[DiTFastAttn: Attention Compression for Diffusion Transformer Models](./docs/methods/ditfastattn.md)\n\n<h2 id=\"history\">\ud83d\udea7 History and Looking for Contributions</h2>\n\nWe conducted a major upgrade of this project in August 2024, introducing a new set of APIs that are now the preferred choice for all users.\n\nThe legacy APIs are applied in early stage of xDiT to explore and compare different parallelization methods.\nThey are located in the [legacy](https://github.com/xdit-project/xDiT/tree/legacy) branch, are now considered outdated and do not support hybrid parallelism. Despite this limitation, they offer a broader range of individual parallelization methods, including PipeFusion, Sequence Parallel, DistriFusion, and Tensor Parallel.\n\nFor users working with Pixart models, you can still run the examples in the [scripts/](https://github.com/xdit-project/xDiT/tree/legacy/scripts) directory under the `legacy` branch. However, for all other models, we strongly recommend adopting the formal APIs to ensure optimal performance and compatibility.\n\nWe also warmly welcome developers to join us in enhancing the project. If you have ideas for new features or models, please share them in our [issues](https://github.com/xdit-project/xDiT/issues). Your contributions are invaluable in driving the project forward and ensuring it meets the needs of the community.\n\n<h2 id=\"cite-us\">\ud83d\udcdd Cite Us</h2>\n\n\n[xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism](https://arxiv.org/abs/2411.01738)\n\n```\n@article{fang2024xdit,\n title={xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism},\n author={Fang, Jiarui and Pan, Jinzhe and Sun, Xibo and Li, Aoyu and Wang, Jiannan},\n journal={arXiv preprint arXiv:2411.01738},\n year={2024}\n}\n\n```\n\n[PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference](https://arxiv.org/abs/2405.14430)\n\n```\n@article{fang2024pipefusion,\n title={PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference},\n author={Jiarui Fang and Jinzhe Pan and Jiannan Wang and Aoyu Li and Xibo Sun},\n journal={arXiv preprint arXiv:2405.14430},\n year={2024}\n}\n\n```\n\n[USP: A Unified Sequence Parallelism Approach for Long Context Generative AI](https://arxiv.org/abs/2405.07719)\n\n\n```\n@article{fang2024unified,\n title={A Unified Sequence Parallelism Approach for Long Context Generative AI},\n author={Fang, Jiarui and Zhao, Shangchun},\n journal={arXiv preprint arXiv:2405.07719},\n year={2024}\n}\n\n```\n\n[Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study](https://arxiv.org/abs/2411.13588)\n\n```\n@article{sun2024unveiling,\n title={Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study},\n author={Sun, Xibo and Fang, Jiarui and Li, Aoyu and Pan, Jinzhe},\n journal={arXiv preprint arXiv:2411.13588},\n year={2024}\n}\n\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A Scalable Inference Engine for Diffusion Transformers (DiTs) on Multiple Computing Devices",
"version": "0.4.4",
"project_urls": {
"Homepage": "https://github.com/xdit-project/xDiT."
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "03e355fb59f3f48ffe1f5533293ab56c37264e3e93a4f74c5a2cf64112869db0",
"md5": "9dd6bad6d01d8c72b442785fcfc5d079",
"sha256": "83d29fcf2566f9d7f442e3e2864a66ae671f2a1bb52f5605cf83a281c0aa04d5"
},
"downloads": -1,
"filename": "xfuser-0.4.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9dd6bad6d01d8c72b442785fcfc5d079",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 236201,
"upload_time": "2025-07-25T09:19:05",
"upload_time_iso_8601": "2025-07-25T09:19:05.160515Z",
"url": "https://files.pythonhosted.org/packages/03/e3/55fb59f3f48ffe1f5533293ab56c37264e3e93a4f74c5a2cf64112869db0/xfuser-0.4.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d19a5ef30ee1938553ef1f8253c03ccdd1da7592182f427274e60296ea1ee3bf",
"md5": "b9deccea9853a9d893a12a537226d8e8",
"sha256": "4c450b84a05178a8c81ec7ad03aa17acdefcb30537e6bbd34b54ec83ec170c35"
},
"downloads": -1,
"filename": "xfuser-0.4.4.tar.gz",
"has_sig": false,
"md5_digest": "b9deccea9853a9d893a12a537226d8e8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 179609,
"upload_time": "2025-07-25T09:19:06",
"upload_time_iso_8601": "2025-07-25T09:19:06.512265Z",
"url": "https://files.pythonhosted.org/packages/d1/9a/5ef30ee1938553ef1f8253c03ccdd1da7592182f427274e60296ea1ee3bf/xfuser-0.4.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 09:19:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xdit-project",
"github_project": "xDiT.",
"github_not_found": true,
"lcname": "xfuser"
}