ppdiffusers


Nameppdiffusers JSON
Version 0.24.0 PyPI version JSON
download
home_pagehttps://github.com/PaddlePaddle/PaddleMIX/ppdiffusers
SummaryPPDiffusers: Diffusers toolbox implemented based on PaddlePaddle
upload_time2024-04-18 03:44:17
maintainerNone
docs_urlNone
authorPaddleMIX Team
requires_python>=3.6
licenseApache 2.0
keywords ppdiffusers paddle paddlemix
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://user-images.githubusercontent.com/11793384/215372703-4385f66a-abe4-44c7-9626-96b7b65270c8.png" width="40%" height="40%" />
</div>

<p align="center">
    <a href="https://pypi.org/project/ppdiffusers/"><img src="https://img.shields.io/pypi/pyversions/ppdiffusers"></a>
    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg"></a>
    <a href="https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
</p>

<h4 align="center">
  <a href=#特性> 特性 </a> |
  <a href=#安装> 安装 </a> |
  <a href=#快速开始> 快速开始 </a> |
  <a href=#模型部署> 模型部署</a>
</h4>

# PPDiffusers: Diffusers toolbox implemented based on PaddlePaddle

**PPDiffusers**是一款支持多种模态(如文本图像跨模态、图像、语音)扩散模型(Diffusion Model)训练和推理的国产化工具箱,依托于[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)框架和[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)自然语言处理开发库。

## News 📢
* 🔥 **2024.04.17 发布 0.24.0 版本,支持[Sora相关技术](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora),支持[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)、[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)、[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)训练推理,新增[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)、[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)模型;
视频生成能力全面升级;
新增视频生成模型[SVD](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_video_diffusion),支持模型微调和推理;
新增姿态可控视频生成模型[AnimateAnyone](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/AnimateAnyone)、即插即用视频生成模型[AnimateDiff](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/inference/text_to_video_generation_animediff.py)、GIF视频生成模型[Hotshot-XL](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community/Hotshot-XL);
新增高速推理文图生成模型[LCM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/consistency_distillation),支持SD/SDXL训练和推理;
[模型推理部署](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)全面升级;新增peft,accelerate后端;
权重加载/保存全面升级,支持分布式、模型切片、safetensors等场景,相关能力已集成DiT、 [IP-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ip_adapter)、[PhotoMaker](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/PhotoMaker)、[InstantID](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/InstantID)等。**
* 🔥 **2023.12.12 发布 0.19.4 版本,修复已知的部分 BUG,修复 0D Tensor 的 Warning,新增 SDXL 的 FastdeployPipeline。**
* 🔥 **2023.09.27 发布 0.19.3 版本,新增[SDXL](#文本图像多模),支持Text2Image、Img2Img、Inpainting、InstructPix2Pix等任务,支持DreamBooth Lora训练;
新增[UniDiffuser](#文本图像多模),通过统一的多模态扩散过程支持文生图、图生文等任务;
新增文本条件视频生成模型[LVDM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_video_lvdm),支持训练与推理;
新增文图生成模型[Kandinsky 2.2](#文本图像多模),[Consistency models](#文本图像多模);
Stable Diffusion支持[BF16 O2训练](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_diffusion),效果对齐FP32;
[LoRA加载升级](#加载HF-LoRA权重),支持加载SDXL的LoRA权重;
[Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)升级,支持ControlNetImg2Img、ControlNetInpaint、StableDiffusionXLControlNet等。**

* 🔥 **2023.06.20 发布 0.16.1 版本,新增[T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter),支持训练与推理;ControlNet升级,支持[reference only推理](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#controlnet-reference-only);新增[WebUIStableDiffusionPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#automatic1111-webui-stable-diffusion),
支持通过prompt的方式动态加载lora、textual_inversion权重;
新增[StableDiffusionHiresFixPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#stable-diffusion-with-high-resolution-fixing),支持高分辨率修复;
新增关键点控制生成任务评价指标[COCOeval](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/scripts/cocoeval_keypoints_score);
新增多种模态扩散模型Pipeline,包括视频生成([Text-to-Video-Synth](#文本视频多模)、[Text-to-Video-Zero](#文本视频多模))、音频生成([AudioLDM](#文本音频多模)、[Spectrogram Diffusion](#音频));新增文图生成模型[IF](#文本图像多模)。**



## 特性
#### 📦 SOTA扩散模型Pipelines集合
我们提供**SOTA(State-of-the-Art)** 的扩散模型Pipelines集合。
目前**PPDiffusers**已经集成了**100+Pipelines**,支持文图生成(Text-to-Image Generation)、文本引导的图像编辑(Text-Guided Image Inpainting)、文本引导的图像变换(Image-to-Image Text-Guided Generation)、文本条件的视频生成(Text-to-Video Generation)、超分(Super Superresolution)、文本条件的音频生成(Text-to-Audio Generation)在内的**10余项**任务,覆盖**文本、图像、视频、音频**等多种模态。
如果想要了解当前支持的所有**Pipelines**以及对应的来源信息,可以阅读[🔥 PPDiffusers Pipelines](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/README.md)文档。


#### 🔊 提供丰富的Noise Scheduler
我们提供了丰富的**噪声调度器(Noise Scheduler)**,可以对**速度**与**质量**进行权衡,用户可在推理时根据需求快速切换使用。
当前**PPDiffusers**已经集成了**14+Scheduler**,不仅支持 [DDPM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddpm.py)、[DDIM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddim.py) 和 [PNDM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_pndm.py),还支持最新的 [🔥 DPMSolver](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_dpmsolver_multistep.py)!

#### 🎛️ 提供多种扩散模型组件
我们提供了**多种扩散模型**组件,如[UNet1DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_1d.py)、[UNet2DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d.py)、[UNet2DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d_condition.py)、[UNet3DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_3d_condition.py)、[VQModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)、[AutoencoderKL](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)等。


#### 📖 提供丰富的训练和推理教程
我们提供了丰富的训练教程,不仅支持扩散模型的二次开发微调,如基于[Textual Inversion](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/textual_inversion)和[DreamBooth](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/dreambooth)使用3-5张图定制化训练生成图像的风格或物体,还支持[🔥 Latent Diffusion Model](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_laion400m)、[🔥 ControlNet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/controlnet)、[🔥 T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter)  等扩散模型的训练!
此外,我们还提供了丰富的[🔥 Pipelines推理样例](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/inference)。

#### 🚀 支持FastDeploy高性能部署
我们提供基于[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)的[🔥 高性能Stable Diffusion Pipeline](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/stable_diffusion/pipeline_fastdeploy_stable_diffusion.py),更多有关FastDeploy进行多推理引擎后端高性能部署的信息请参考[🔥 高性能FastDeploy推理教程](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)。

## 安装

### 环境依赖
```
pip install -r requirements.txt
```
关于PaddlePaddle安装的详细教程请查看[Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)。

### pip安装

```shell
pip install --upgrade ppdiffusers
```

### 手动安装
```shell
git clone https://github.com/PaddlePaddle/PaddleMIX
cd PaddleMIX/ppdiffusers
python setup.py install
```

## 快速开始
我们将以扩散模型的典型代表**Stable Diffusion**为例,带你快速了解PPDiffusers。

**Stable Diffusion**基于**潜在扩散模型(Latent Diffusion Models)**,专门用于**文图生成(Text-to-Image Generation)任务**。该模型是由来自 [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/)以及[RunwayML](https://runwayml.com/)的工程师共同开发完成,目前发布了v1和v2两个版本。v1版本采用了LAION-5B数据集子集(分辨率为 512x512)进行训练,并具有以下架构设置:自动编码器下采样因子为8,UNet大小为860M,文本编码器为CLIP ViT-L/14。v2版本相较于v1版本在生成图像的质量和分辨率等进行了改善。

### Stable Diffusion重点模型权重

<details><summary>&emsp; Stable Diffusion 模型支持的权重(英文) </summary>

**我们只需要将下面的"xxxx",替换成所需的权重名,即可快速使用!**
```python
from ppdiffusers import *

pipe_text2img = StableDiffusionPipeline.from_pretrained("xxxx")
pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained("xxxx")
pipe_inpaint_legacy = StableDiffusionInpaintPipelineLegacy.from_pretrained("xxxx")
pipe_mega = StableDiffusionMegaPipeline.from_pretrained("xxxx")

# pipe_mega.text2img() 等于 pipe_text2img()
# pipe_mega.img2img() 等于 pipe_img2img()
# pipe_mega.inpaint_legacy() 等于 pipe_inpaint_legacy()
```

| PPDiffusers支持的模型名称                     | 支持加载的Pipeline                                    | 备注 | huggingface.co地址 |
| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |
| CompVis/stable-diffusion-v1-4           | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-4 使用 Stable-Diffusion-v1-2 的权重进行初始化。随后在"laion-aesthetics v2 5+"数据集上以 **512x512** 分辨率微调了 **225k** 步数,对文本使用了 **10%** 的dropout(即:训练过程中文图对中的文本有 10% 的概率会变成空文本)。模型使用了[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)作为文本编码器。| [地址](https://huggingface.co/CompVis/stable-diffusion-v1-4) |
| CompVis/ldm-text2im-large-256               | LDMTextToImagePipeline | [LDM论文](https://arxiv.org/pdf/2112.10752.pdf) LDM-KL-8-G* 权重。| [地址](https://huggingface.co/CompVis/ldm-text2im-large-256) |
| CompVis/ldm-super-resolution-4x-openimages  | LDMSuperResolutionPipeline | [LDM论文](https://arxiv.org/pdf/2112.10752.pdf) LDM-VQ-4 权重,[原始权重链接](https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip)。| [地址](https://huggingface.co/CompVis/ldm-super-resolution-4x-openimages) |
| runwayml/stable-diffusion-v1-5              | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-5 使用 Stable-Diffusion-v1-2 的权重进行初始化。随后在"laion-aesthetics v2 5+"数据集上以 **512x512** 分辨率微调了 **595k** 步数,对文本使用了 **10%** 的dropout(即:训练过程中文图对中的文本有 10% 的概率会变成空文本)。模型同样也使用了[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)作为文本编码器。| [地址](https://huggingface.co/runwayml/stable-diffusion-v1-5) |
| runwayml/stable-diffusion-inpainting        | StableDiffusionInpaintPipeline | Stable-Diffusion-Inpainting 使用 Stable-Diffusion-v1-2 的权重进行初始化。首先进行了 **595k** 步的常规训练(实际也就是 Stable-Diffusion-v1-5 的权重),然后进行了 **440k** 步的 inpainting 修复训练。对于 inpainting 修复训练,给 UNet 额外增加了 **5** 输入通道(其中 **4** 个用于被 Mask 遮盖住的图片,**1** 个用于 Mask 本身)。在训练期间,会随机生成 Mask,并有 **25%** 概率会将原始图片全部 Mask 掉。| [地址](https://huggingface.co/runwayml/stable-diffusion-inpainting) |
| stabilityai/stable-diffusion-2-base         | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 该模型首先在 [LAION-5B 256x256 子集上](https://laion.ai/blog/laion-5b/) (过滤条件:[punsafe = 0.1 的 LAION-NSFW 分类器](https://github.com/LAION-AI/CLIP-based-NSFW-Detector) 和 审美分数大于等于 4.5 )从头开始训练 **550k** 步,然后又在分辨率 **>= 512x512** 的同一数据集上进一步训练 **850k** 步。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2-base) |
| stabilityai/stable-diffusion-2              | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | stable-diffusion-2 使用 stable-diffusion-2-base 权重进行初始化,首先在同一数据集上(**512x512** 分辨率)使用 [v-objective](https://arxiv.org/abs/2202.00512) 训练了 **150k** 步。然后又在 **768x768** 分辨率上使用 [v-objective](https://arxiv.org/abs/2202.00512) 继续训练了 **140k** 步。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2) |
| stabilityai/stable-diffusion-2-inpainting   | StableDiffusionInpaintPipeline |stable-diffusion-2-inpainting 使用 stable-diffusion-2-base 权重初始化,并且额外训练了 **200k** 步。训练过程使用了 [LAMA](https://github.com/saic-mdal/lama) 中提出的 Mask 生成策略,并且使用 Mask 图片的 Latent 表示(经过 VAE 编码)作为附加条件。| [地址](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) |
| stabilityai/stable-diffusion-x4-upscaler    | StableDiffusionUpscalePipeline | 该模型在**LAION 10M** 子集上(>2048x2048)训练了 1.25M 步。该模型还在分辨率为 **512x512** 的图像上使用 [Text-guided Latent Upscaling Diffusion Model](https://arxiv.org/abs/2112.10752) 进行了训练。除了**文本输入**之外,它还接收 **noise_level** 作为输入参数,因此我们可以使用 [预定义的 Scheduler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/low_res_scheduler/scheduler_config.json) 向低分辨率的输入图片添加噪声。| [地址](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) |
| hakurei/waifu-diffusion    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | waifu-diffusion-v1-2 使用 stable-diffusion-v1-4 权重初始化,并且在**高质量动漫**图像数据集上进行微调后得到的模型。用于微调的数据是 **680k** 文本图像样本,这些样本是通过 **booru 网站** 下载的。| [地址](https://huggingface.co/hakurei/waifu-diffusion) |
| hakurei/waifu-diffusion-v1-3    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | waifu-diffusion-v1-3 是 waifu-diffusion-v1-2 基础上进一步训练得到的。他们对数据集进行了额外操作:(1)删除下划线;(2)删除括号;(3)用逗号分隔每个booru 标签;(4)随机化标签顺序。| [地址](https://huggingface.co/hakurei/waifu-diffusion) |
| naclbit/trinart_stable_diffusion_v2_60k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | trinart_stable_diffusion 使用 stable-diffusion-v1-4 权重初始化,在 40k **高分辨率漫画/动漫风格**的图片数据集上微调了 8 个 epoch。V2 版模型使用 **dropouts**、**10k+ 图像**和**新的标记策略**训练了**更长时间**。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
| naclbit/trinart_stable_diffusion_v2_95k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | **95k** 步数的结果,其他同上。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
| naclbit/trinart_stable_diffusion_v2_115k    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | **115k** 步数的结果,其他同上。| [地址](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |
| Deltaadams/Hentai-Diffusion    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | None| [地址](https://huggingface.co/Deltaadams/Hentai-Diffusion) |
| ringhyacinth/nail-set-diffuser    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 美甲领域的扩散模型,训练数据使用了 [Weekend](https://weibo.com/u/5982308498)| [地址](https://huggingface.co/ringhyacinth/nail-set-diffuser) |
| Linaqruf/anything-v3.0    | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 该模型可通过输入几个文本提示词就能生成**高质量、高度详细的动漫风格图片**,该模型支持使用 **danbooru 标签文本** 生成图像。| [地址](https://huggingface.co/Linaqruf/anything-v3.0) |

</details>
<details><summary>&emsp; Stable Diffusion 模型支持的权重(中文和多语言) </summary>


| PPDiffusers支持的模型名称                     | 支持加载的Pipeline                                    | 备注 | huggingface.co地址 |
| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |
| BAAI/AltDiffusion                           | AltDiffusionPipeline、AltDiffusionImg2ImgPipeline | 该模型使用 [AltCLIP](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 作为文本编码器,在 Stable Diffusion 基础上训练了**双语Diffusion模型**,其中训练数据来自 [WuDao数据集](https://data.baai.ac.cn/details/WuDaoCorporaText) 和 [LAION](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus) 。| [地址](https://huggingface.co/BAAI/AltDiffusion) |
| BAAI/AltDiffusion-m9                        | AltDiffusionPipeline、AltDiffusionImg2ImgPipeline |该模型使用9种语言的 [AltCLIP-m9](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 作为文本编码器,其他同上。| [地址](https://huggingface.co/BAAI/AltDiffusion-m9) |
| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1 | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 他们将 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) 数据集 (100M) 和 [Zero](https://zero.so.com/) 数据集 (23M) 用作预训练的数据集,先用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 对这两个数据集的图文对相似性进行打分,取 CLIP Score 大于 0.2 的图文对作为训练集。 他们使用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 作为初始化的text encoder,冻住 [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([论文](https://arxiv.org/abs/2112.10752)) 模型的其他部分,只训练 text encoder,以便保留原始模型的生成能力且实现中文概念的对齐。该模型目前在0.2亿图文对上训练了一个 epoch。 在 32 x A100 上训练了大约100小时,该版本只是一个初步的版本。| [地址](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1) |
| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1 | StableDiffusionPipeline、StableDiffusionImg2ImgPipeline、StableDiffusionInpaintPipelineLegacy、StableDiffusionMegaPipeline、StableDiffusionPipelineAllinOne | 他们将 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) 数据集 (100M) 和 [Zero](https://zero.so.com/) 数据集 (23M) 用作预训练的数据集,先用 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) 对这两个数据集的图文对相似性进行打分,取 CLIP Score 大于 0.2 的图文对作为训练集。 他们使用 [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([论文](https://arxiv.org/abs/2112.10752)) 模型进行继续训练,其中训练分为**两个stage**。**第一个stage** 中冻住模型的其他部分,只训练 text encoder ,以便保留原始模型的生成能力且实现中文概念的对齐。**第二个stage** 中将全部模型解冻,一起训练 text encoder 和 diffusion model ,以便 diffusion model 更好的适配中文引导。第一个 stage 他们训练了 80 小时,第二个 stage 训练了 100 小时,两个stage都是用了8 x A100,该版本是一个初步的版本。| [地址](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1) |
</details>


### 加载HF Diffusers权重
```python
from ppdiffusers import StableDiffusionPipeline
# 设置from_hf_hub为True,表示从huggingface hub下载,from_diffusers为True表示加载的是diffusers版Pytorch权重
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2", from_hf_hub=True, from_diffusers=True)
```

### 加载原库的Lightning权重
```python
from ppdiffusers import StableDiffusionPipeline
# 可输入网址 或 本地ckpt、safetensors文件
pipe = StableDiffusionPipeline.from_single_file("https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/ppdiffusers/chilloutmix_NiPrunedFp32Fix.safetensors")
```

### 加载HF LoRA权重
```python
from ppdiffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", paddle_dtype=paddle.float16)

pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0",
    weight_name="sd_xl_offset_example-lora_1.0.safetensors",
    from_diffusers=True)
```

### 加载Civitai社区的LoRA权重
```python
from ppdiffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("TASUKU2023/Chilloutmix")
# 加载lora权重
pipe.load_lora_weights("./",
    weight_name="Moxin_10.safetensors",
    from_diffusers=True)
pipe.fuse_lora()
```

### XFormers加速
为了使用**XFormers加速**,我们需要安装`develop`版本的`paddle`,Linux系统的安装命令如下:
```sh
python -m pip install paddlepaddle-gpu==0.0.0.post117 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
```

```python
import paddle
from ppdiffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("TASUKU2023/Chilloutmix", paddle_dtype=paddle.float16)
# 开启xformers加速 默认选择"cutlass"加速
pipe.enable_xformers_memory_efficient_attention()
# flash 需要使用 A100、A10、3060、3070、3080、3090 等以上显卡。
# pipe.enable_xformers_memory_efficient_attention("flash")
```

### ToME + ControlNet
```python
# 安装develop的ppdiffusers
# pip install "ppdiffusers>=0.24.0"
import paddle
from ppdiffusers import ControlNetModel, StableDiffusionControlNetPipeline
from ppdiffusers.utils import load_image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", safety_checker=None, controlnet=controlnet, paddle_dtype=paddle.float16
)

# Apply ToMe with a 50% merging ratio
pipe.apply_tome(ratio=0.5) # Can also use pipe.unet in place of pipe here

# 我们可以开启 xformers
# pipe.enable_xformers_memory_efficient_attention()
generator = paddle.Generator().manual_seed(0)
prompt = "bird"
image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png"
)

image = pipe(prompt, image, generator=generator).images[0]

image.save("bird.png")
```

### 文图生成 (Text-to-Image Generation)

```python
import paddle
from ppdiffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2")

# 设置随机种子,我们可以复现下面的结果!
paddle.seed(5232132133)
prompt = "a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing "
image = pipe(prompt, guidance_scale=7.5, height=768, width=768).images[0]

image.save("shiba_dog_with_a_red_cap.png")
```
<div align="center">
<img width="500" alt="image" src="https://user-images.githubusercontent.com/50394665/204796701-d7911f76-8670-47d5-8d1b-8368b046c5e4.png">
</div>

### 文本引导的图像变换(Image-to-Image Text-Guided Generation)

<details><summary>&emsp;Image-to-Image Text-Guided Generation Demo </summary>

```python
import paddle
from ppdiffusers import StableDiffusionImg2ImgPipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained("Linaqruf/anything-v3.0", safety_checker=None)

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png"
image = load_image(url).resize((512, 768))

# 设置随机种子,我们可以复现下面的结果!
paddle.seed(42)
prompt = "Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image = pipe(prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5).images[0]
image.save("image_Kurisu_img2img.png")
```
<div align="center">
<img width="500" alt="image" src="https://user-images.githubusercontent.com/50394665/204799529-cd89dcdb-eb1d-4247-91ac-b0f7bad777f8.png">
</div>
</details>

### 文本引导的图像编辑(Text-Guided Image Inpainting)

注意!当前有两种版本的图像编辑代码,一个是Legacy版本,一个是正式版本,下面将分别介绍两种代码如何使用!

<details><summary>&emsp;Legacy版本代码</summary>

```python
import paddle
from ppdiffusers import StableDiffusionInpaintPipelineLegacy
from ppdiffusers.utils import load_image

# 可选模型权重
# CompVis/stable-diffusion-v1-4
# runwayml/stable-diffusion-v1-5
# stabilityai/stable-diffusion-2-base (原始策略 512x512)
# stabilityai/stable-diffusion-2 (v-objective 768x768)
# Linaqruf/anything-v3.0
# ......
img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"

image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

pipe = StableDiffusionInpaintPipelineLegacy.from_pretrained("stabilityai/stable-diffusion-2-base", safety_checker=None)

# 设置随机种子,我们可以复现下面的结果!
paddle.seed(10245)
prompt = "a red cat sitting on a bench"
image = pipe(prompt=prompt, image=image, mask_image=mask_image, strength=0.75).images[0]

image.save("a_red_cat_legacy.png")
```
<div align="center">
<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/204802186-5a6d302b-83aa-4247-a5bb-ebabfcc3abc4.png">
</div>

</details>

<details><summary>&emsp;正式版本代码</summary>

Tips: 下面的使用方法是新版本的代码,也是官方推荐的代码,注意必须配合 **runwayml/stable-diffusion-inpainting** 和 **stabilityai/stable-diffusion-2-inpainting** 才可正常使用。
```python
import paddle
from ppdiffusers import StableDiffusionInpaintPipeline
from ppdiffusers.utils import load_image

# 可选模型权重
# runwayml/stable-diffusion-inpainting
# stabilityai/stable-diffusion-2-inpainting
img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"

image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

pipe = StableDiffusionInpaintPipeline.from_pretrained("stabilityai/stable-diffusion-2-inpainting")

# 设置随机种子,我们可以复现下面的结果!
paddle.seed(1024)
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]

image.save("a_yellow_cat.png")
```
<div align="center">
<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/204801946-6cd043bc-f3db-42cf-82cd-6a6171484523.png">
</div>
</details>

### 文本引导的图像放大 & 超分(Text-Guided Image Upscaling & Super-Resolution)

<details><summary>&emsp;Text-Guided Image Upscaling Demo</summary>

```python
import paddle
from ppdiffusers import StableDiffusionUpscalePipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
# 我们人工将原始图片缩小成 128x128 分辨率,最终保存的图片会放大4倍!
low_res_img = load_image(url).resize((128, 128))

prompt = "a white cat"
image = pipe(prompt=prompt, image=low_res_img).images[0]

image.save("upscaled_white_cat.png")
```
<div align="center">
<img width="200" alt="image" src="https://user-images.githubusercontent.com/50394665/204806180-b7f1b9cf-8a62-4577-b5c4-91adda08a13b.png">
<img width="400" alt="image" src="https://user-images.githubusercontent.com/50394665/204806202-8c110be3-5f48-4946-95ea-21ad5a9a2340.png">
</div>
</details>

<details><summary>&emsp;Super-Resolution Demo</summary>

```python
import paddle
from ppdiffusers import LDMSuperResolutionPipeline
from ppdiffusers.utils import load_image

pipe = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"

# 我们人工将原始图片缩小成 128x128 分辨率,最终保存的图片会放大4倍!
low_res_img = load_image(url).resize((128, 128))

image = pipe(image=low_res_img, num_inference_steps=100).images[0]

image.save("ldm-super-resolution-image.png")
```
<div align="center">
<img width="200" alt="image" src="https://user-images.githubusercontent.com/50394665/204804426-5e28b571-aa41-4f56-ba26-68cca75fdaae.png">
<img width="400" alt="image" src="https://user-images.githubusercontent.com/50394665/204804148-fe7c293b-6cd7-4942-ae9c-446369fe8410.png">
</div>

</details>

## 模型推理部署
除了**Paddle动态图**运行之外,很多模型还支持将模型导出并使用推理引擎运行。我们提供基于[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)上的**StableDiffusion**模型部署示例,涵盖文生图、图生图、图像编辑等任务,用户可以按照我们提供[StableDiffusion模型导出教程](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/deploy/export.md)将模型导出,然后使用`FastDeployStableDiffusionMegaPipeline`进行高性能推理部署!

<details><summary>&emsp; 已预先导出的FastDeploy版Stable Diffusion权重 </summary>

**注意:当前导出的vae encoder带有随机因素!**

- CompVis/stable-diffusion-v1-4@fastdeploy
- runwayml/stable-diffusion-v1-5@fastdeploy
- runwayml/stable-diffusion-inpainting@fastdeploy
- stabilityai/stable-diffusion-2-base@fastdeploy
- stabilityai/stable-diffusion-2@fastdeploy
- stabilityai/stable-diffusion-2-inpainting@fastdeploy
- Linaqruf/anything-v3.0@fastdeploy
- hakurei/waifu-diffusion-v1-3@fastdeploy

</details>

<details><summary>&emsp; FastDeploy Demo </summary>

```python
import paddle
import fastdeploy as fd
from ppdiffusers import FastDeployStableDiffusionMegaPipeline
from ppdiffusers.utils import load_image

def create_runtime_option(device_id=0, backend="paddle", use_cuda_stream=True):
    option = fd.RuntimeOption()
    if backend == "paddle":
        option.use_paddle_backend()
    else:
        option.use_ort_backend()
    if device_id == -1:
        option.use_cpu()
    else:
        option.use_gpu(device_id)
        if use_cuda_stream:
            paddle_stream = paddle.device.cuda.current_stream(device_id).cuda_stream
            option.set_external_raw_stream(paddle_stream)
    return option

runtime_options = {
    "text_encoder": create_runtime_option(0, "paddle"),  # use gpu:0
    "vae_encoder": create_runtime_option(0, "paddle"),  # use gpu:0
    "vae_decoder": create_runtime_option(0, "paddle"),  # use gpu:0
    "unet": create_runtime_option(0, "paddle"),  # use gpu:0
}

fd_pipe = FastDeployStableDiffusionMegaPipeline.from_pretrained(
    "Linaqruf/anything-v3.0@fastdeploy", runtime_options=runtime_options
)

# text2img
prompt = "a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing "
image_text2img = fd_pipe.text2img(prompt=prompt, num_inference_steps=50).images[0]
image_text2img.save("image_text2img.png")

# img2img
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png"
image = load_image(url).resize((512, 512))
prompt = "Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image_img2img = fd_pipe.img2img(
    prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5
).images[0]
image_img2img.save("image_img2img.png")

# inpaint_legacy
img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"
image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))
prompt = "a red cat sitting on a bench"

image_inpaint_legacy = fd_pipe.inpaint_legacy(
    prompt=prompt, image=image, mask_image=mask_image, strength=0.75, num_inference_steps=50
).images[0]
image_inpaint_legacy.save("image_inpaint_legacy.png")
```
</details>
<div align="center">
<img width="900" alt="image" src="https://user-images.githubusercontent.com/50394665/205297240-46b80992-34af-40cd-91a6-ae76589d0e21.png">
</div>


## 更多任务分类展示
### 文本图像多模

<details open>
<summary>&emsp;文图生成(Text-to-Image Generation)</summary>

#### text_to_image_generation-stable_diffusion

```python
from ppdiffusers import StableDiffusionPipeline

# 加载模型和scheduler
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# 执行pipeline进行推理
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

# 保存图片
image.save("astronaut_rides_horse_sd.png")
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209322401-6ecfeaaa-6878-4302-b592-07a31de4e590.png">
</div>

#### text_to_image_generation-stable_diffusion_xl

```python
import paddle
from ppdiffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
     paddle_dtype=paddle.float16,
     variant="fp16"
)
prompt = "a photo of an astronaut riding a horse on mars"
generator = paddle.Generator().manual_seed(42)
image = pipe(prompt=prompt, generator=generator, num_inference_steps=50).images[0]
image.save('sdxl_text2image.png')
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/d72729f9-8685-48f9-a238-e4ddf6d264f3">
</div>

#### text_to_image_generation-sdxl_base_with_refiner

```python
from ppdiffusers import DiffusionPipeline
import paddle

# load both base & refiner
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    paddle_dtype=paddle.float16,
)
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    paddle_dtype=paddle.float16,
    variant="fp16",
)

# Define how many steps and what % of steps to be run on each experts (80/20) here
n_steps = 40
high_noise_frac = 0.8

prompt = "A majestic lion jumping from a big stone at night"
prompt = "a photo of an astronaut riding a horse on mars"
generator = paddle.Generator().manual_seed(42)

# run both experts
image = base(
    prompt=prompt,
    output_type="latent",
    generator=generator,
).images

image = refiner(
    prompt=prompt,
    image=image,
    generator=generator,
).images[0]
image.save('text_to_image_generation-sdxl-base-with-refiner-result.png')
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/8ef36826-ed94-4856-a356-af1677f60d1b">
</div>

#### text_to_image_generation-kandinsky2_2
```python
from ppdiffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

pipe_prior = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior")
prompt = "red cat, 4k photo"
out = pipe_prior(prompt)
image_emb = out.image_embeds
zero_image_emb = out.negative_image_embeds
pipe = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder")
image = pipe(
    image_embeds=image_emb,
    negative_image_embeds=zero_image_emb,
    height=768,
    width=768,
    num_inference_steps=50,
).images
image[0].save("text_to_image_generation-kandinsky2_2-result-cat.png")
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/188f76dd-4bd7-4a33-8f30-b893c7a9e249">
</div>

#### text_to_image_generation-unidiffuser
```python
import paddle
from paddlenlp.trainer import set_seed

from ppdiffusers import UniDiffuserPipeline

model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, paddle_dtype=paddle.float16)
set_seed(42)

# Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
# 1. Text-to-image generation
prompt = "an elephant under the sea"
sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
t2i_image = sample.images[0]
t2i_image.save("t2i_image.png")
````
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/a6eb11d2-ad27-4263-8cb4-b0d8dd42b36c">
</div>

#### text_to_image_generation-deepfloyd_if

```python
import paddle

from ppdiffusers import DiffusionPipeline, IFPipeline, IFSuperResolutionPipeline
from ppdiffusers.utils import pd_to_pil

# Stage 1: generate images
pipe = IFPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", paddle_dtype=paddle.float16)
pipe.enable_xformers_memory_efficient_attention()
prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)
image = pipe(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    output_type="pd",
).images

# save intermediate image
pil_image = pd_to_pil(image)
pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_I.png")
# save gpu memory
pipe.to(paddle_device="cpu")

# Stage 2: super resolution stage1
super_res_1_pipe = IFSuperResolutionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", paddle_dtype=paddle.float16
)
super_res_1_pipe.enable_xformers_memory_efficient_attention()

image = super_res_1_pipe(
    image=image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    output_type="pd",
).images
# save intermediate image
pil_image = pd_to_pil(image)
pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_II.png")
# save gpu memory
super_res_1_pipe.to(paddle_device="cpu")
```
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/246785766-700dfad9-159d-4bfb-bfc7-c18df938a052.png">
</div>
<div align="center">
<center>if_stage_I</center>
</div>
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/246785773-3359ca5f-dadf-4cc8-b318-ff1f9d4a2d35.png">
</div>
<div align="center">
<center>if_stage_II</center>
<!-- <img alt="image" src="https://user-images.githubusercontent.com/20476674/246785774-8870829a-354b-4a87-9d67-93af315f51e6.png">
<center>if_stage_III</center> -->
</div>
</details>


<details><summary>&emsp;文本引导的图像放大(Text-Guided Image Upscaling)</summary>

#### text_guided_image_upscaling-stable_diffusion_2

```python
from ppdiffusers import StableDiffusionUpscalePipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
low_res_img = load_image(url).resize((128, 128))

prompt = "a white cat"
upscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat_sd2.png")
```
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/209324085-0d058b70-89b0-43c2-affe-534eedf116cf.png">
<center>原图像</center>
<img alt="image" src="https://user-images.githubusercontent.com/20476674/209323862-ce2d8658-a52b-4f35-90cb-aa7d310022e7.png">
<center>生成图像</center>
</div>
</details>

<details><summary>&emsp;文本引导的图像编辑(Text-Guided Image Inpainting)</summary>

#### text_guided_image_inpainting-stable_diffusion_2

```python
import paddle

from ppdiffusers import PaintByExamplePipeline
from ppdiffusers.utils import load_image

img_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/image_example_1.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/mask_example_1.png"
example_url = "https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/reference_example_1.jpeg"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))
example_image = load_image(example_url).resize((512, 512))

pipe = PaintByExamplePipeline.from_pretrained("Fantasy-Studio/Paint-by-Example")

# 使用fp16加快生成速度
with paddle.amp.auto_cast(True):
    image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]
image.save("image_guided_image_inpainting-paint_by_example-result.png")
```
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118364-5d91f433-f9ac-4514-b5f0-cb4599905847.png" width=300>
<center>原图像</center>
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118361-0f78d6db-6896-4f8d-b1bd-8350192f7a4e.png" width=300>
<center>掩码图像</center>
<div align="center">
<img alt="image" src="https://user-images.githubusercontent.com/20476674/247118368-305a048d-ddc3-4a5f-8915-58591ef680f0.jpeg" width=300>
<center>参考图像</center>
<img alt="image" src="https://user-images.githubusercontent.com/20476674/247117963-e5b9b754-39a3-480b-a557-46a2f9310e79.png" width=300>
<center>生成图像</center>
</div>
</details>


<details><summary>&emsp;文本引导的图像变换(Image-to-Image Text-Guided Generation)</summary>

#### text_guided_image_inpainting-kandinsky2_2
```python
import numpy as np
import paddle

from ppdiffusers import KandinskyV22InpaintPipeline, KandinskyV22PriorPipeline
from ppdiffusers.utils import load_image

pipe_prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior", paddle_dtype=paddle.float16
)
prompt = "a hat"
image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)
pipe = KandinskyV22InpaintPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", paddle_dtype=paddle.float16
)
init_image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png"
)
mask = np.zeros((768, 768), dtype=np.float32)
mask[:250, 250:-250] = 1
out = pipe(
    image=init_image,
    mask_image=mask,
    image_embeds=image_emb,
    negative_image_embeds=zero_image_emb,
    height=768,
    width=768,
    num_inference_steps=50,
)
image = out.images[0]
image.save("text_guided_image_inpainting-kandinsky2_2-result-cat_with_hat.png")
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/64a943d5-167b-4433-91c3-3cf9279714db">
<center>原图像</center>
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/f469c127-52f4-4173-a693-c06b92a052aa">
<center>生成图像</center>
</div>

#### image_to_image_text_guided_generation-stable_diffusion
```python
import paddle

from ppdiffusers import StableDiffusionImg2ImgPipeline
from ppdiffusers.utils import load_image

# 加载pipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# 下载初始图片
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"

init_image = load_image(url).resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"
# 使用fp16加快生成速度
with paddle.amp.auto_cast(True):
    image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]

image.save("fantasy_landscape.png")
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209327142-d8e1d0c7-3bf8-4a08-a0e8-b11451fc84d8.png">
<center>原图像</center>
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325799-d9ff279b-0d57-435f-bda7-763e3323be23.png">
<center>生成图像</center>
</div>

#### image_to_image_text_guided_generation-stable_diffusion_xl
```python
import paddle
from ppdiffusers import StableDiffusionXLImg2ImgPipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    paddle_dtype=paddle.float16,
    # from_hf_hub=True,
    # from_diffusers=True,
    variant="fp16"
)
url = "https://paddlenlp.bj.bcebos.com/models/community/westfish/develop-0-19-3/000000009.png"
init_image = load_image(url).convert("RGB")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, image=init_image).images[0]
image.save('sdxl_image2image.png')
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/41bd9381-2799-4bed-a5e2-ba312a2f8da9">
<center>原图像</center>
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/db672d03-2e3a-46ac-97fd-d80cca18dbbe">
<center>生成图像</center>
</div>

#### image_to_image_text_guided_generation-kandinsky2_2
```python
import paddle

from ppdiffusers import KandinskyV22Img2ImgPipeline, KandinskyV22PriorPipeline
from ppdiffusers.utils import load_image

pipe_prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior", paddle_dtype=paddle.float16
)
prompt = "A red cartoon frog, 4k"
image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)
pipe = KandinskyV22Img2ImgPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", paddle_dtype=paddle.float16
)

init_image = load_image(
    "https://hf-mirror.com/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/frog.png"
)
image = pipe(
    image=init_image,
    image_embeds=image_emb,
    negative_image_embeds=zero_image_emb,
    height=768,
    width=768,
    num_inference_steps=100,
    strength=0.2,
).images
image[0].save("image_to_image_text_guided_generation-kandinsky2_2-result-red_frog.png")
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/aae57109-94ad-408e-ae75-8cce650cebe5">
<center>原图像</center>
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/23cf2c4e-416f-4f21-82a6-e57de11b5e83">
<center>生成图像</center>
</div>

</details>
</details>

<details><summary>&emsp;文本图像双引导图像生成(Dual Text and Image Guided Generation)</summary>

#### dual_text_and_image_guided_generation-versatile_diffusion
```python
from ppdiffusers import VersatileDiffusionDualGuidedPipeline
from ppdiffusers.utils import load_image

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
image = load_image(url)
text = "a red car in the sun"

pipe = VersatileDiffusionDualGuidedPipeline.from_pretrained("shi-labs/versatile-diffusion")
pipe.remove_unused_weights()

text_to_image_strength = 0.75
image = pipe(prompt=text, image=image, text_to_image_strength=text_to_image_strength).images[0]
image.save("versatile-diffusion-red_car.png")
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325965-2475e9c4-a524-4970-8498-dfe10ff9cf24.jpg" >
<center>原图像</center>
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209325293-049098d0-d591-4abc-b151-9291ac2636da.png">
<center>生成图像</center>
</div>
</details>

### 文本视频多模

<details open>
<summary>&emsp;文本条件的视频生成(Text-to-Video Generation)</summary>

#### text_to_video_generation-lvdm

```python
import paddle

from ppdiffusers import LVDMTextToVideoPipeline

# 加载模型和scheduler
pipe = LVDMTextToVideoPipeline.from_pretrained("westfish/lvdm_text2video_orig_webvid_2m")

# 执行pipeline进行推理
seed = 2013
generator = paddle.Generator().manual_seed(seed)
samples = pipe(
    prompt="cutting in kitchen",
    num_frames=16,
    height=256,
    width=256,
    num_inference_steps=50,
    generator=generator,
    guidance_scale=15,
    eta=1,
    save_dir=".",
    save_name="text_to_video_generation-lvdm-result-ddim_lvdm_text_to_video_ucf",
    encoder_type="2d",
    scale_factor=0.18215,
    shift_factor=0,
)
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/270906907-2b9d53c1-0272-4c7a-81b2-cd962d23bbee.gif">
</div>

#### text_to_video_generation-synth

```python
import imageio

from ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline

pipe = TextToVideoSDPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "An astronaut riding a horse."
video_frames = pipe(prompt, num_inference_steps=25).frames
imageio.mimsave("text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4", video_frames, fps=8)
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/281259277-0ebe29a3-4eba-48ee-a98b-292e60de3c98.gif">
</div>


#### text_to_video_generation-synth with zeroscope_v2_XL

```python
import imageio

from ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline

# from ppdiffusers.utils import export_to_video

pipe = TextToVideoSDPipeline.from_pretrained("cerspense/zeroscope_v2_XL")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "An astronaut riding a horse."
video_frames = pipe(prompt, num_inference_steps=50, height=320, width=576, num_frames=24).frames
imageio.mimsave("text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4", video_frames, fps=8)
```
<div align="center">
<img width="300" alt="image" src="https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/43ebbca0-9f07-458b-809a-acf296a2539b">
</div>

#### text_to_video_generation-zero

```python
import imageio

# pip install imageio[ffmpeg]
import paddle

from ppdiffusers import TextToVideoZeroPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, paddle_dtype=paddle.float16)

prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images
result = [(r * 255).astype("uint8") for r in result]
imageio.mimsave("text_to_video_generation-zero-result-panda.mp4", result, fps=4)
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/246779321-c2b0c2b4-e383-40c7-a4d8-f417e8062b35.gif">
</div>

</details>

### 文本音频多模
<details>
<summary>&emsp;文本条件的音频生成(Text-to-Audio Generation)</summary>

#### text_to_audio_generation-audio_ldm

```python
import paddle
import scipy

from ppdiffusers import AudioLDMPipeline

pipe = AudioLDMPipeline.from_pretrained("cvssp/audioldm", paddle_dtype=paddle.float16)

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

output_path = "text_to_audio_generation-audio_ldm-techno.wav"
# save the audio sample as a .wav file
scipy.io.wavfile.write(output_path, rate=16000, data=audio)
```
<div align = "center">
  <thead>
  </thead>
  <tbody>
   <tr>
      <td align = "center">
      <a href="https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/techno.wav" rel="nofollow">
            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
      </td>
    </tr>
  </tbody>
</div>
</details>

### 图像

<details><summary>&emsp;无条件图像生成(Unconditional Image Generation)</summary>

#### unconditional_image_generation-latent_diffusion_uncond

```python
from ppdiffusers import LDMPipeline

# 加载模型和scheduler
pipe = LDMPipeline.from_pretrained("CompVis/ldm-celebahq-256")

# 执行pipeline进行推理
image = pipe(num_inference_steps=200).images[0]

# 保存图片
image.save("ldm_generated_image.png")
```
<div align="center">
<img width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209327936-7fe914e0-0ea0-4e21-a433-24eaed6ee94c.png">
</div>
</details>

<details><summary>&emsp;超分(Super Superresolution)</summary>

#### super_resolution-latent_diffusion
```python
import paddle

from ppdiffusers import LDMSuperResolutionPipeline
from ppdiffusers.utils import load_image

# 加载pipeline
pipe = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")

# 下载初始图片
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"

init_image = load_image(url).resize((128, 128))
init_image.save("original-image.png")

# 使用fp16加快生成速度
with paddle.amp.auto_cast(True):
    image = pipe(init_image, num_inference_steps=100, eta=1).images[0]

image.save("super-resolution-image.png")
```
<div align="center">
<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209328660-9700fdc3-72b3-43bd-9a00-23b370ba030b.png">
<center>原图像</center>
<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209328479-4eaea5d8-aa4a-4f31-aa2a-b47e3c730f15.png">
<center>生成图像</center>
</div>
</details>


<details><summary>&emsp;图像编辑(Image Inpainting)</summary>

#### image_inpainting-repaint
```python
from ppdiffusers import RePaintPipeline, RePaintScheduler
from ppdiffusers.utils import load_image

img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/celeba_hq_256.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/mask_256.png"

# Load the original image and the mask as PIL images
original_image = load_image(img_url).resize((256, 256))
mask_image = load_image(mask_url).resize((256, 256))

scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256", subfolder="scheduler")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)

output = pipe(
    original_image=original_image,
    mask_image=mask_image,
    num_inference_steps=250,
    eta=0.0,
    jump_length=10,
    jump_n_sample=10,
)
inpainted_image = output.images[0]

inpainted_image.save("repaint-image.png")
```
<div align="center">
<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329052-b6fc2aaf-1a59-49a3-92ef-60180fdffd81.png">
<center>原图像</center>
<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329048-4fe12176-32a0-4800-98f2-49bd8d593799.png">
<center>mask图像</center>
<img  alt="image" src="https://user-images.githubusercontent.com/20476674/209329241-b7e4d99e-468a-4b95-8829-d77ee14bfe98.png">
<center>生成图像</center>
</div>
</details>



<details><summary>&emsp;图像变化(Image Variation)</summary>

#### image_variation-versatile_diffusion
```python
from ppdiffusers import VersatileDiffusionImageVariationPipeline
from ppdiffusers.utils import load_image

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
image = load_image(url)

pipe = VersatileDiffusionImageVariationPipeline.from_pretrained("shi-labs/versatile-diffusion")

image = pipe(image).images[0]
image.save("versatile-diffusion-car_variation.png")
```
<div align="center">
<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209331434-51f6cdbd-b8e4-4faa-8e49-1cc852e35603.jpg">
<center>原图像</center>
<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209331591-f6cc4cd8-8430-4627-8d22-bf404fb2bfdd.png">
<center>生成图像</center>
</div>
</details>





### 音频
<details>
<summary>&emsp;无条件音频生成(Unconditional Audio Generation)</summary>

#### unconditional_audio_generation-audio_diffusion

```python
from scipy.io.wavfile import write
from ppdiffusers import AudioDiffusionPipeline
import paddle

# 加载模型和scheduler
pipe = AudioDiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256")
pipe.set_progress_bar_config(disable=None)
generator = paddle.Generator().manual_seed(42)

output = pipe(generator=generator)
audio = output.audios[0]
image = output.images[0]

# 保存音频到本地
for i, audio in enumerate(audio):
    write(f"audio_diffusion_test{i}.wav", pipe.mel.config.sample_rate, audio.transpose())

# 保存图片
image.save("audio_diffusion_test.png")
```
<div align = "center">
  <thead>
  </thead>
  <tbody>
   <tr>
      <td align = "center">
      <a href="https://paddlenlp.bj.bcebos.com/models/community/teticio/data/audio_diffusion_test0.wav" rel="nofollow">
            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
      </td>
    </tr>
  </tbody>
</div>

<div align="center">
<img  width="300" alt="image" src="https://user-images.githubusercontent.com/20476674/209342125-93e8715e-895b-4115-9e1e-e65c6c2cd95a.png">
</div>


#### unconditional_audio_generation-spectrogram_diffusion

```python
import paddle
import scipy

from ppdiffusers import MidiProcessor, SpectrogramDiffusionPipeline
from ppdiffusers.utils.download_utils import ppdiffusers_url_download

# Download MIDI from: wget https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid
mid_file_path = ppdiffusers_url_download(
    "https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid", cache_dir="."
)
pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion", paddle_dtype=paddle.float16)
processor = MidiProcessor()
output = pipe(processor(mid_file_path))
audio = output.audios[0]

output_path = "unconditional_audio_generation-spectrogram_diffusion-result-beethoven_hammerklavier_2.wav"
# save the audio sample as a .wav file
scipy.io.wavfile.write(output_path, rate=16000, data=audio)
```
<div align = "center">
  <thead>
  </thead>
  <tbody>
   <tr>
      <td align = "center">
      <a href="https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/beethoven_hammerklavier_2.wav" rel="nofollow">
            <img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
      </td>
    </tr>
  </tbody>
</div>
</details>



## License
PPDiffusers 遵循 [Apache-2.0开源协议](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE)。

Stable Diffusion 遵循 [The CreativeML OpenRAIL M 开源协议](https://huggingface.co/spaces/CompVis/stable-diffusion-license)。
> The CreativeML OpenRAIL M is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which this license is based.

## Acknowledge
我们借鉴了🤗 Hugging Face的[Diffusers](https://github.com/huggingface/diffusers)关于预训练扩散模型使用的优秀设计,在此对Hugging Face作者及其开源社区表示感谢。

## Citation

```bibtex
@misc{ppdiffusers,
  author = {PaddlePaddle Authors},
  title = {PPDiffusers: State-of-the-art diffusion model toolkit based on PaddlePaddle},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers}}
}
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PaddlePaddle/PaddleMIX/ppdiffusers",
    "name": "ppdiffusers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "ppdiffusers, paddle, paddlemix",
    "author": "PaddleMIX Team",
    "author_email": "paddlemix@baidu.com",
    "download_url": "https://files.pythonhosted.org/packages/13/b5/7ea2119bf9eea4d570a47927dc5e4cf1de9d49d2d9b88944045bbc91416e/ppdiffusers-0.24.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://user-images.githubusercontent.com/11793384/215372703-4385f66a-abe4-44c7-9626-96b7b65270c8.png\" width=\"40%\" height=\"40%\" />\n</div>\n\n<p align=\"center\">\n    <a href=\"https://pypi.org/project/ppdiffusers/\"><img src=\"https://img.shields.io/pypi/pyversions/ppdiffusers\"></a>\n    <a href=\"\"><img src=\"https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg\"></a>\n    <a href=\"https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE\"><img src=\"https://img.shields.io/badge/license-Apache%202-dfd.svg\"></a>\n</p>\n\n<h4 align=\"center\">\n  <a href=#\u7279\u6027> \u7279\u6027 </a> |\n  <a href=#\u5b89\u88c5> \u5b89\u88c5 </a> |\n  <a href=#\u5feb\u901f\u5f00\u59cb> \u5feb\u901f\u5f00\u59cb </a> |\n  <a href=#\u6a21\u578b\u90e8\u7f72> \u6a21\u578b\u90e8\u7f72</a>\n</h4>\n\n# PPDiffusers: Diffusers toolbox implemented based on PaddlePaddle\n\n**PPDiffusers**\u662f\u4e00\u6b3e\u652f\u6301\u591a\u79cd\u6a21\u6001\uff08\u5982\u6587\u672c\u56fe\u50cf\u8de8\u6a21\u6001\u3001\u56fe\u50cf\u3001\u8bed\u97f3\uff09\u6269\u6563\u6a21\u578b\uff08Diffusion Model\uff09\u8bad\u7ec3\u548c\u63a8\u7406\u7684\u56fd\u4ea7\u5316\u5de5\u5177\u7bb1\uff0c\u4f9d\u6258\u4e8e[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)\u6846\u67b6\u548c[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5f00\u53d1\u5e93\u3002\n\n## News \ud83d\udce2\n* \ud83d\udd25 **2024.04.17 \u53d1\u5e03 0.24.0 \u7248\u672c\uff0c\u652f\u6301[Sora\u76f8\u5173\u6280\u672f](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora)\uff0c\u652f\u6301[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)\u3001[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)\u3001[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)\u8bad\u7ec3\u63a8\u7406\uff0c\u65b0\u589e[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)\u3001[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)\u6a21\u578b\uff1b\n\u89c6\u9891\u751f\u6210\u80fd\u529b\u5168\u9762\u5347\u7ea7\uff1b\n\u65b0\u589e\u89c6\u9891\u751f\u6210\u6a21\u578b[SVD](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_video_diffusion)\uff0c\u652f\u6301\u6a21\u578b\u5fae\u8c03\u548c\u63a8\u7406\uff1b\n\u65b0\u589e\u59ff\u6001\u53ef\u63a7\u89c6\u9891\u751f\u6210\u6a21\u578b[AnimateAnyone](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/AnimateAnyone)\u3001\u5373\u63d2\u5373\u7528\u89c6\u9891\u751f\u6210\u6a21\u578b[AnimateDiff](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/inference/text_to_video_generation_animediff.py)\u3001GIF\u89c6\u9891\u751f\u6210\u6a21\u578b[Hotshot-XL](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community/Hotshot-XL)\uff1b\n\u65b0\u589e\u9ad8\u901f\u63a8\u7406\u6587\u56fe\u751f\u6210\u6a21\u578b[LCM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/consistency_distillation)\uff0c\u652f\u6301SD/SDXL\u8bad\u7ec3\u548c\u63a8\u7406\uff1b\n[\u6a21\u578b\u63a8\u7406\u90e8\u7f72](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)\u5168\u9762\u5347\u7ea7\uff1b\u65b0\u589epeft\uff0caccelerate\u540e\u7aef\uff1b\n\u6743\u91cd\u52a0\u8f7d/\u4fdd\u5b58\u5168\u9762\u5347\u7ea7\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u3001\u6a21\u578b\u5207\u7247\u3001safetensors\u7b49\u573a\u666f\uff0c\u76f8\u5173\u80fd\u529b\u5df2\u96c6\u6210DiT\u3001 [IP-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ip_adapter)\u3001[PhotoMaker](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/PhotoMaker)\u3001[InstantID](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/InstantID)\u7b49\u3002**\n* \ud83d\udd25 **2023.12.12 \u53d1\u5e03 0.19.4 \u7248\u672c\uff0c\u4fee\u590d\u5df2\u77e5\u7684\u90e8\u5206 BUG\uff0c\u4fee\u590d 0D Tensor \u7684 Warning\uff0c\u65b0\u589e SDXL \u7684 FastdeployPipeline\u3002**\n* \ud83d\udd25 **2023.09.27 \u53d1\u5e03 0.19.3 \u7248\u672c\uff0c\u65b0\u589e[SDXL](#\u6587\u672c\u56fe\u50cf\u591a\u6a21)\uff0c\u652f\u6301Text2Image\u3001Img2Img\u3001Inpainting\u3001InstructPix2Pix\u7b49\u4efb\u52a1\uff0c\u652f\u6301DreamBooth Lora\u8bad\u7ec3\uff1b\n\u65b0\u589e[UniDiffuser](#\u6587\u672c\u56fe\u50cf\u591a\u6a21)\uff0c\u901a\u8fc7\u7edf\u4e00\u7684\u591a\u6a21\u6001\u6269\u6563\u8fc7\u7a0b\u652f\u6301\u6587\u751f\u56fe\u3001\u56fe\u751f\u6587\u7b49\u4efb\u52a1\uff1b\n\u65b0\u589e\u6587\u672c\u6761\u4ef6\u89c6\u9891\u751f\u6210\u6a21\u578b[LVDM](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_video_lvdm)\uff0c\u652f\u6301\u8bad\u7ec3\u4e0e\u63a8\u7406\uff1b\n\u65b0\u589e\u6587\u56fe\u751f\u6210\u6a21\u578b[Kandinsky 2.2](#\u6587\u672c\u56fe\u50cf\u591a\u6a21)\uff0c[Consistency models](#\u6587\u672c\u56fe\u50cf\u591a\u6a21)\uff1b\nStable Diffusion\u652f\u6301[BF16 O2\u8bad\u7ec3](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/stable_diffusion)\uff0c\u6548\u679c\u5bf9\u9f50FP32\uff1b\n[LoRA\u52a0\u8f7d\u5347\u7ea7](#\u52a0\u8f7dHF-LoRA\u6743\u91cd)\uff0c\u652f\u6301\u52a0\u8f7dSDXL\u7684LoRA\u6743\u91cd\uff1b\n[Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)\u5347\u7ea7\uff0c\u652f\u6301ControlNetImg2Img\u3001ControlNetInpaint\u3001StableDiffusionXLControlNet\u7b49\u3002**\n\n* \ud83d\udd25 **2023.06.20 \u53d1\u5e03 0.16.1 \u7248\u672c\uff0c\u65b0\u589e[T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter)\uff0c\u652f\u6301\u8bad\u7ec3\u4e0e\u63a8\u7406\uff1bControlNet\u5347\u7ea7\uff0c\u652f\u6301[reference only\u63a8\u7406](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#controlnet-reference-only)\uff1b\u65b0\u589e[WebUIStableDiffusionPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#automatic1111-webui-stable-diffusion)\uff0c\n\u652f\u6301\u901a\u8fc7prompt\u7684\u65b9\u5f0f\u52a8\u6001\u52a0\u8f7dlora\u3001textual_inversion\u6743\u91cd\uff1b\n\u65b0\u589e[StableDiffusionHiresFixPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#stable-diffusion-with-high-resolution-fixing)\uff0c\u652f\u6301\u9ad8\u5206\u8fa8\u7387\u4fee\u590d\uff1b\n\u65b0\u589e\u5173\u952e\u70b9\u63a7\u5236\u751f\u6210\u4efb\u52a1\u8bc4\u4ef7\u6307\u6807[COCOeval](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/scripts/cocoeval_keypoints_score)\uff1b\n\u65b0\u589e\u591a\u79cd\u6a21\u6001\u6269\u6563\u6a21\u578bPipeline\uff0c\u5305\u62ec\u89c6\u9891\u751f\u6210\uff08[Text-to-Video-Synth](#\u6587\u672c\u89c6\u9891\u591a\u6a21)\u3001[Text-to-Video-Zero](#\u6587\u672c\u89c6\u9891\u591a\u6a21)\uff09\u3001\u97f3\u9891\u751f\u6210\uff08[AudioLDM](#\u6587\u672c\u97f3\u9891\u591a\u6a21)\u3001[Spectrogram Diffusion](#\u97f3\u9891)\uff09\uff1b\u65b0\u589e\u6587\u56fe\u751f\u6210\u6a21\u578b[IF](#\u6587\u672c\u56fe\u50cf\u591a\u6a21)\u3002**\n\n\n\n## \u7279\u6027\n#### \ud83d\udce6 SOTA\u6269\u6563\u6a21\u578bPipelines\u96c6\u5408\n\u6211\u4eec\u63d0\u4f9b**SOTA\uff08State-of-the-Art\uff09** \u7684\u6269\u6563\u6a21\u578bPipelines\u96c6\u5408\u3002\n\u76ee\u524d**PPDiffusers**\u5df2\u7ecf\u96c6\u6210\u4e86**100+Pipelines**\uff0c\u652f\u6301\u6587\u56fe\u751f\u6210\uff08Text-to-Image Generation\uff09\u3001\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\uff08Text-Guided Image Inpainting\uff09\u3001\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u53d8\u6362\uff08Image-to-Image Text-Guided Generation\uff09\u3001\u6587\u672c\u6761\u4ef6\u7684\u89c6\u9891\u751f\u6210\uff08Text-to-Video Generation\uff09\u3001\u8d85\u5206\uff08Super Superresolution\uff09\u3001\u6587\u672c\u6761\u4ef6\u7684\u97f3\u9891\u751f\u6210\uff08Text-to-Audio Generation\uff09\u5728\u5185\u7684**10\u4f59\u9879**\u4efb\u52a1\uff0c\u8986\u76d6**\u6587\u672c\u3001\u56fe\u50cf\u3001\u89c6\u9891\u3001\u97f3\u9891**\u7b49\u591a\u79cd\u6a21\u6001\u3002\n\u5982\u679c\u60f3\u8981\u4e86\u89e3\u5f53\u524d\u652f\u6301\u7684\u6240\u6709**Pipelines**\u4ee5\u53ca\u5bf9\u5e94\u7684\u6765\u6e90\u4fe1\u606f\uff0c\u53ef\u4ee5\u9605\u8bfb[\ud83d\udd25 PPDiffusers Pipelines](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/README.md)\u6587\u6863\u3002\n\n\n#### \ud83d\udd0a \u63d0\u4f9b\u4e30\u5bcc\u7684Noise Scheduler\n\u6211\u4eec\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684**\u566a\u58f0\u8c03\u5ea6\u5668\uff08Noise Scheduler\uff09**\uff0c\u53ef\u4ee5\u5bf9**\u901f\u5ea6**\u4e0e**\u8d28\u91cf**\u8fdb\u884c\u6743\u8861\uff0c\u7528\u6237\u53ef\u5728\u63a8\u7406\u65f6\u6839\u636e\u9700\u6c42\u5feb\u901f\u5207\u6362\u4f7f\u7528\u3002\n\u5f53\u524d**PPDiffusers**\u5df2\u7ecf\u96c6\u6210\u4e86**14+Scheduler**\uff0c\u4e0d\u4ec5\u652f\u6301 [DDPM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddpm.py)\u3001[DDIM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_ddim.py) \u548c [PNDM](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_pndm.py)\uff0c\u8fd8\u652f\u6301\u6700\u65b0\u7684 [\ud83d\udd25 DPMSolver](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/schedulers/scheduling_dpmsolver_multistep.py)\uff01\n\n#### \ud83c\udf9b\ufe0f \u63d0\u4f9b\u591a\u79cd\u6269\u6563\u6a21\u578b\u7ec4\u4ef6\n\u6211\u4eec\u63d0\u4f9b\u4e86**\u591a\u79cd\u6269\u6563\u6a21\u578b**\u7ec4\u4ef6\uff0c\u5982[UNet1DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_1d.py)\u3001[UNet2DModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d.py)\u3001[UNet2DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_2d_condition.py)\u3001[UNet3DConditionModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/unet_3d_condition.py)\u3001[VQModel](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)\u3001[AutoencoderKL](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/models/vae.py)\u7b49\u3002\n\n\n#### \ud83d\udcd6 \u63d0\u4f9b\u4e30\u5bcc\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u6559\u7a0b\n\u6211\u4eec\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u8bad\u7ec3\u6559\u7a0b\uff0c\u4e0d\u4ec5\u652f\u6301\u6269\u6563\u6a21\u578b\u7684\u4e8c\u6b21\u5f00\u53d1\u5fae\u8c03\uff0c\u5982\u57fa\u4e8e[Textual Inversion](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/textual_inversion)\u548c[DreamBooth](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/dreambooth)\u4f7f\u75283-5\u5f20\u56fe\u5b9a\u5236\u5316\u8bad\u7ec3\u751f\u6210\u56fe\u50cf\u7684\u98ce\u683c\u6216\u7269\u4f53\uff0c\u8fd8\u652f\u6301[\ud83d\udd25 Latent Diffusion Model](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_laion400m)\u3001[\ud83d\udd25 ControlNet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/controlnet)\u3001[\ud83d\udd25 T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter)  \u7b49\u6269\u6563\u6a21\u578b\u7684\u8bad\u7ec3\uff01\n\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684[\ud83d\udd25 Pipelines\u63a8\u7406\u6837\u4f8b](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/inference)\u3002\n\n#### \ud83d\ude80 \u652f\u6301FastDeploy\u9ad8\u6027\u80fd\u90e8\u7f72\n\u6211\u4eec\u63d0\u4f9b\u57fa\u4e8e[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)\u7684[\ud83d\udd25 \u9ad8\u6027\u80fdStable Diffusion Pipeline](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/ppdiffusers/pipelines/stable_diffusion/pipeline_fastdeploy_stable_diffusion.py)\uff0c\u66f4\u591a\u6709\u5173FastDeploy\u8fdb\u884c\u591a\u63a8\u7406\u5f15\u64ce\u540e\u7aef\u9ad8\u6027\u80fd\u90e8\u7f72\u7684\u4fe1\u606f\u8bf7\u53c2\u8003[\ud83d\udd25 \u9ad8\u6027\u80fdFastDeploy\u63a8\u7406\u6559\u7a0b](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/deploy)\u3002\n\n## \u5b89\u88c5\n\n### \u73af\u5883\u4f9d\u8d56\n```\npip install -r requirements.txt\n```\n\u5173\u4e8ePaddlePaddle\u5b89\u88c5\u7684\u8be6\u7ec6\u6559\u7a0b\u8bf7\u67e5\u770b[Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)\u3002\n\n### pip\u5b89\u88c5\n\n```shell\npip install --upgrade ppdiffusers\n```\n\n### \u624b\u52a8\u5b89\u88c5\n```shell\ngit clone https://github.com/PaddlePaddle/PaddleMIX\ncd PaddleMIX/ppdiffusers\npython setup.py install\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\u6211\u4eec\u5c06\u4ee5\u6269\u6563\u6a21\u578b\u7684\u5178\u578b\u4ee3\u8868**Stable Diffusion**\u4e3a\u4f8b\uff0c\u5e26\u4f60\u5feb\u901f\u4e86\u89e3PPDiffusers\u3002\n\n**Stable Diffusion**\u57fa\u4e8e**\u6f5c\u5728\u6269\u6563\u6a21\u578b\uff08Latent Diffusion Models\uff09**\uff0c\u4e13\u95e8\u7528\u4e8e**\u6587\u56fe\u751f\u6210\uff08Text-to-Image Generation\uff09\u4efb\u52a1**\u3002\u8be5\u6a21\u578b\u662f\u7531\u6765\u81ea [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/)\u4ee5\u53ca[RunwayML](https://runwayml.com/)\u7684\u5de5\u7a0b\u5e08\u5171\u540c\u5f00\u53d1\u5b8c\u6210\uff0c\u76ee\u524d\u53d1\u5e03\u4e86v1\u548cv2\u4e24\u4e2a\u7248\u672c\u3002v1\u7248\u672c\u91c7\u7528\u4e86LAION-5B\u6570\u636e\u96c6\u5b50\u96c6\uff08\u5206\u8fa8\u7387\u4e3a 512x512\uff09\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u5177\u6709\u4ee5\u4e0b\u67b6\u6784\u8bbe\u7f6e\uff1a\u81ea\u52a8\u7f16\u7801\u5668\u4e0b\u91c7\u6837\u56e0\u5b50\u4e3a8\uff0cUNet\u5927\u5c0f\u4e3a860M\uff0c\u6587\u672c\u7f16\u7801\u5668\u4e3aCLIP ViT-L/14\u3002v2\u7248\u672c\u76f8\u8f83\u4e8ev1\u7248\u672c\u5728\u751f\u6210\u56fe\u50cf\u7684\u8d28\u91cf\u548c\u5206\u8fa8\u7387\u7b49\u8fdb\u884c\u4e86\u6539\u5584\u3002\n\n### Stable Diffusion\u91cd\u70b9\u6a21\u578b\u6743\u91cd\n\n<details><summary>&emsp; Stable Diffusion \u6a21\u578b\u652f\u6301\u7684\u6743\u91cd\uff08\u82f1\u6587\uff09 </summary>\n\n**\u6211\u4eec\u53ea\u9700\u8981\u5c06\u4e0b\u9762\u7684\"xxxx\"\uff0c\u66ff\u6362\u6210\u6240\u9700\u7684\u6743\u91cd\u540d\uff0c\u5373\u53ef\u5feb\u901f\u4f7f\u7528\uff01**\n```python\nfrom ppdiffusers import *\n\npipe_text2img = StableDiffusionPipeline.from_pretrained(\"xxxx\")\npipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(\"xxxx\")\npipe_inpaint_legacy = StableDiffusionInpaintPipelineLegacy.from_pretrained(\"xxxx\")\npipe_mega = StableDiffusionMegaPipeline.from_pretrained(\"xxxx\")\n\n# pipe_mega.text2img() \u7b49\u4e8e pipe_text2img()\n# pipe_mega.img2img() \u7b49\u4e8e pipe_img2img()\n# pipe_mega.inpaint_legacy() \u7b49\u4e8e pipe_inpaint_legacy()\n```\n\n| PPDiffusers\u652f\u6301\u7684\u6a21\u578b\u540d\u79f0                     | \u652f\u6301\u52a0\u8f7d\u7684Pipeline                                    | \u5907\u6ce8 | huggingface.co\u5730\u5740 |\n| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |\n| CompVis/stable-diffusion-v1-4           | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-4 \u4f7f\u7528 Stable-Diffusion-v1-2 \u7684\u6743\u91cd\u8fdb\u884c\u521d\u59cb\u5316\u3002\u968f\u540e\u5728\"laion-aesthetics v2 5+\"\u6570\u636e\u96c6\u4e0a\u4ee5 **512x512** \u5206\u8fa8\u7387\u5fae\u8c03\u4e86 **225k** \u6b65\u6570\uff0c\u5bf9\u6587\u672c\u4f7f\u7528\u4e86 **10%** \u7684dropout\uff08\u5373\uff1a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6587\u56fe\u5bf9\u4e2d\u7684\u6587\u672c\u6709 10% \u7684\u6982\u7387\u4f1a\u53d8\u6210\u7a7a\u6587\u672c\uff09\u3002\u6a21\u578b\u4f7f\u7528\u4e86[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u3002| [\u5730\u5740](https://huggingface.co/CompVis/stable-diffusion-v1-4) |\n| CompVis/ldm-text2im-large-256               | LDMTextToImagePipeline | [LDM\u8bba\u6587](https://arxiv.org/pdf/2112.10752.pdf) LDM-KL-8-G* \u6743\u91cd\u3002| [\u5730\u5740](https://huggingface.co/CompVis/ldm-text2im-large-256) |\n| CompVis/ldm-super-resolution-4x-openimages  | LDMSuperResolutionPipeline | [LDM\u8bba\u6587](https://arxiv.org/pdf/2112.10752.pdf) LDM-VQ-4 \u6743\u91cd\uff0c[\u539f\u59cb\u6743\u91cd\u94fe\u63a5](https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip)\u3002| [\u5730\u5740](https://huggingface.co/CompVis/ldm-super-resolution-4x-openimages) |\n| runwayml/stable-diffusion-v1-5              | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | Stable-Diffusion-v1-5 \u4f7f\u7528 Stable-Diffusion-v1-2 \u7684\u6743\u91cd\u8fdb\u884c\u521d\u59cb\u5316\u3002\u968f\u540e\u5728\"laion-aesthetics v2 5+\"\u6570\u636e\u96c6\u4e0a\u4ee5 **512x512** \u5206\u8fa8\u7387\u5fae\u8c03\u4e86 **595k** \u6b65\u6570\uff0c\u5bf9\u6587\u672c\u4f7f\u7528\u4e86 **10%** \u7684dropout\uff08\u5373\uff1a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6587\u56fe\u5bf9\u4e2d\u7684\u6587\u672c\u6709 10% \u7684\u6982\u7387\u4f1a\u53d8\u6210\u7a7a\u6587\u672c\uff09\u3002\u6a21\u578b\u540c\u6837\u4e5f\u4f7f\u7528\u4e86[CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14)\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u3002| [\u5730\u5740](https://huggingface.co/runwayml/stable-diffusion-v1-5) |\n| runwayml/stable-diffusion-inpainting        | StableDiffusionInpaintPipeline | Stable-Diffusion-Inpainting \u4f7f\u7528 Stable-Diffusion-v1-2 \u7684\u6743\u91cd\u8fdb\u884c\u521d\u59cb\u5316\u3002\u9996\u5148\u8fdb\u884c\u4e86 **595k** \u6b65\u7684\u5e38\u89c4\u8bad\u7ec3\uff08\u5b9e\u9645\u4e5f\u5c31\u662f Stable-Diffusion-v1-5 \u7684\u6743\u91cd\uff09\uff0c\u7136\u540e\u8fdb\u884c\u4e86 **440k** \u6b65\u7684 inpainting \u4fee\u590d\u8bad\u7ec3\u3002\u5bf9\u4e8e inpainting \u4fee\u590d\u8bad\u7ec3\uff0c\u7ed9 UNet \u989d\u5916\u589e\u52a0\u4e86 **5** \u8f93\u5165\u901a\u9053\uff08\u5176\u4e2d **4** \u4e2a\u7528\u4e8e\u88ab Mask \u906e\u76d6\u4f4f\u7684\u56fe\u7247\uff0c**1** \u4e2a\u7528\u4e8e Mask \u672c\u8eab\uff09\u3002\u5728\u8bad\u7ec3\u671f\u95f4\uff0c\u4f1a\u968f\u673a\u751f\u6210 Mask\uff0c\u5e76\u6709 **25%** \u6982\u7387\u4f1a\u5c06\u539f\u59cb\u56fe\u7247\u5168\u90e8 Mask \u6389\u3002| [\u5730\u5740](https://huggingface.co/runwayml/stable-diffusion-inpainting) |\n| stabilityai/stable-diffusion-2-base         | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | \u8be5\u6a21\u578b\u9996\u5148\u5728 [LAION-5B 256x256 \u5b50\u96c6\u4e0a](https://laion.ai/blog/laion-5b/) \uff08\u8fc7\u6ee4\u6761\u4ef6\uff1a[punsafe = 0.1 \u7684 LAION-NSFW \u5206\u7c7b\u5668](https://github.com/LAION-AI/CLIP-based-NSFW-Detector) \u548c \u5ba1\u7f8e\u5206\u6570\u5927\u4e8e\u7b49\u4e8e 4.5 \uff09\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3 **550k** \u6b65\uff0c\u7136\u540e\u53c8\u5728\u5206\u8fa8\u7387 **>= 512x512** \u7684\u540c\u4e00\u6570\u636e\u96c6\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3 **850k** \u6b65\u3002| [\u5730\u5740](https://huggingface.co/stabilityai/stable-diffusion-2-base) |\n| stabilityai/stable-diffusion-2              | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | stable-diffusion-2 \u4f7f\u7528 stable-diffusion-2-base \u6743\u91cd\u8fdb\u884c\u521d\u59cb\u5316\uff0c\u9996\u5148\u5728\u540c\u4e00\u6570\u636e\u96c6\u4e0a\uff08**512x512** \u5206\u8fa8\u7387\uff09\u4f7f\u7528 [v-objective](https://arxiv.org/abs/2202.00512) \u8bad\u7ec3\u4e86 **150k** \u6b65\u3002\u7136\u540e\u53c8\u5728 **768x768** \u5206\u8fa8\u7387\u4e0a\u4f7f\u7528 [v-objective](https://arxiv.org/abs/2202.00512) \u7ee7\u7eed\u8bad\u7ec3\u4e86 **140k** \u6b65\u3002| [\u5730\u5740](https://huggingface.co/stabilityai/stable-diffusion-2) |\n| stabilityai/stable-diffusion-2-inpainting   | StableDiffusionInpaintPipeline |stable-diffusion-2-inpainting \u4f7f\u7528 stable-diffusion-2-base \u6743\u91cd\u521d\u59cb\u5316\uff0c\u5e76\u4e14\u989d\u5916\u8bad\u7ec3\u4e86 **200k** \u6b65\u3002\u8bad\u7ec3\u8fc7\u7a0b\u4f7f\u7528\u4e86 [LAMA](https://github.com/saic-mdal/lama) \u4e2d\u63d0\u51fa\u7684 Mask \u751f\u6210\u7b56\u7565\uff0c\u5e76\u4e14\u4f7f\u7528 Mask \u56fe\u7247\u7684 Latent \u8868\u793a\uff08\u7ecf\u8fc7 VAE \u7f16\u7801\uff09\u4f5c\u4e3a\u9644\u52a0\u6761\u4ef6\u3002| [\u5730\u5740](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) |\n| stabilityai/stable-diffusion-x4-upscaler    | StableDiffusionUpscalePipeline | \u8be5\u6a21\u578b\u5728**LAION 10M** \u5b50\u96c6\u4e0a\uff08>2048x2048\uff09\u8bad\u7ec3\u4e86 1.25M \u6b65\u3002\u8be5\u6a21\u578b\u8fd8\u5728\u5206\u8fa8\u7387\u4e3a **512x512** \u7684\u56fe\u50cf\u4e0a\u4f7f\u7528 [Text-guided Latent Upscaling Diffusion Model](https://arxiv.org/abs/2112.10752) \u8fdb\u884c\u4e86\u8bad\u7ec3\u3002\u9664\u4e86**\u6587\u672c\u8f93\u5165**\u4e4b\u5916\uff0c\u5b83\u8fd8\u63a5\u6536 **noise_level** \u4f5c\u4e3a\u8f93\u5165\u53c2\u6570\uff0c\u56e0\u6b64\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528 [\u9884\u5b9a\u4e49\u7684 Scheduler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/low_res_scheduler/scheduler_config.json) \u5411\u4f4e\u5206\u8fa8\u7387\u7684\u8f93\u5165\u56fe\u7247\u6dfb\u52a0\u566a\u58f0\u3002| [\u5730\u5740](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) |\n| hakurei/waifu-diffusion    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | waifu-diffusion-v1-2 \u4f7f\u7528 stable-diffusion-v1-4 \u6743\u91cd\u521d\u59cb\u5316\uff0c\u5e76\u4e14\u5728**\u9ad8\u8d28\u91cf\u52a8\u6f2b**\u56fe\u50cf\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u5f97\u5230\u7684\u6a21\u578b\u3002\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u662f **680k** \u6587\u672c\u56fe\u50cf\u6837\u672c\uff0c\u8fd9\u4e9b\u6837\u672c\u662f\u901a\u8fc7 **booru \u7f51\u7ad9** \u4e0b\u8f7d\u7684\u3002| [\u5730\u5740](https://huggingface.co/hakurei/waifu-diffusion) |\n| hakurei/waifu-diffusion-v1-3    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | waifu-diffusion-v1-3 \u662f waifu-diffusion-v1-2 \u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3\u5f97\u5230\u7684\u3002\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u989d\u5916\u64cd\u4f5c\uff1a\uff081\uff09\u5220\u9664\u4e0b\u5212\u7ebf\uff1b\uff082\uff09\u5220\u9664\u62ec\u53f7\uff1b\uff083\uff09\u7528\u9017\u53f7\u5206\u9694\u6bcf\u4e2abooru \u6807\u7b7e\uff1b\uff084\uff09\u968f\u673a\u5316\u6807\u7b7e\u987a\u5e8f\u3002| [\u5730\u5740](https://huggingface.co/hakurei/waifu-diffusion) |\n| naclbit/trinart_stable_diffusion_v2_60k    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | trinart_stable_diffusion \u4f7f\u7528 stable-diffusion-v1-4 \u6743\u91cd\u521d\u59cb\u5316\uff0c\u5728 40k **\u9ad8\u5206\u8fa8\u7387\u6f2b\u753b/\u52a8\u6f2b\u98ce\u683c**\u7684\u56fe\u7247\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\u4e86 8 \u4e2a epoch\u3002V2 \u7248\u6a21\u578b\u4f7f\u7528 **dropouts**\u3001**10k+ \u56fe\u50cf**\u548c**\u65b0\u7684\u6807\u8bb0\u7b56\u7565**\u8bad\u7ec3\u4e86**\u66f4\u957f\u65f6\u95f4**\u3002| [\u5730\u5740](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |\n| naclbit/trinart_stable_diffusion_v2_95k    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | **95k** \u6b65\u6570\u7684\u7ed3\u679c\uff0c\u5176\u4ed6\u540c\u4e0a\u3002| [\u5730\u5740](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |\n| naclbit/trinart_stable_diffusion_v2_115k    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | **115k** \u6b65\u6570\u7684\u7ed3\u679c\uff0c\u5176\u4ed6\u540c\u4e0a\u3002| [\u5730\u5740](https://huggingface.co/naclbit/trinart_stable_diffusion_v2) |\n| Deltaadams/Hentai-Diffusion    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | None| [\u5730\u5740](https://huggingface.co/Deltaadams/Hentai-Diffusion) |\n| ringhyacinth/nail-set-diffuser    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | \u7f8e\u7532\u9886\u57df\u7684\u6269\u6563\u6a21\u578b\uff0c\u8bad\u7ec3\u6570\u636e\u4f7f\u7528\u4e86 [Weekend](https://weibo.com/u/5982308498)| [\u5730\u5740](https://huggingface.co/ringhyacinth/nail-set-diffuser) |\n| Linaqruf/anything-v3.0    | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | \u8be5\u6a21\u578b\u53ef\u901a\u8fc7\u8f93\u5165\u51e0\u4e2a\u6587\u672c\u63d0\u793a\u8bcd\u5c31\u80fd\u751f\u6210**\u9ad8\u8d28\u91cf\u3001\u9ad8\u5ea6\u8be6\u7ec6\u7684\u52a8\u6f2b\u98ce\u683c\u56fe\u7247**\uff0c\u8be5\u6a21\u578b\u652f\u6301\u4f7f\u7528 **danbooru \u6807\u7b7e\u6587\u672c** \u751f\u6210\u56fe\u50cf\u3002| [\u5730\u5740](https://huggingface.co/Linaqruf/anything-v3.0) |\n\n</details>\n<details><summary>&emsp; Stable Diffusion \u6a21\u578b\u652f\u6301\u7684\u6743\u91cd\uff08\u4e2d\u6587\u548c\u591a\u8bed\u8a00\uff09 </summary>\n\n\n| PPDiffusers\u652f\u6301\u7684\u6a21\u578b\u540d\u79f0                     | \u652f\u6301\u52a0\u8f7d\u7684Pipeline                                    | \u5907\u6ce8 | huggingface.co\u5730\u5740 |\n| :-------------------------------------------: | :--------------------------------------------------------------------: | --- | :-----------------------------------------: |\n| BAAI/AltDiffusion                           | AltDiffusionPipeline\u3001AltDiffusionImg2ImgPipeline | \u8be5\u6a21\u578b\u4f7f\u7528 [AltCLIP](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) \u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\uff0c\u5728 Stable Diffusion \u57fa\u7840\u4e0a\u8bad\u7ec3\u4e86**\u53cc\u8bedDiffusion\u6a21\u578b**\uff0c\u5176\u4e2d\u8bad\u7ec3\u6570\u636e\u6765\u81ea [WuDao\u6570\u636e\u96c6](https://data.baai.ac.cn/details/WuDaoCorporaText) \u548c [LAION](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus) \u3002| [\u5730\u5740](https://huggingface.co/BAAI/AltDiffusion) |\n| BAAI/AltDiffusion-m9                        | AltDiffusionPipeline\u3001AltDiffusionImg2ImgPipeline |\u8be5\u6a21\u578b\u4f7f\u75289\u79cd\u8bed\u8a00\u7684 [AltCLIP-m9](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) \u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\uff0c\u5176\u4ed6\u540c\u4e0a\u3002| [\u5730\u5740](https://huggingface.co/BAAI/AltDiffusion-m9) |\n| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1 | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | \u4ed6\u4eec\u5c06 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) \u6570\u636e\u96c6 (100M) \u548c [Zero](https://zero.so.com/) \u6570\u636e\u96c6 (23M) \u7528\u4f5c\u9884\u8bad\u7ec3\u7684\u6570\u636e\u96c6\uff0c\u5148\u7528 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) \u5bf9\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u7684\u56fe\u6587\u5bf9\u76f8\u4f3c\u6027\u8fdb\u884c\u6253\u5206\uff0c\u53d6 CLIP Score \u5927\u4e8e 0.2 \u7684\u56fe\u6587\u5bf9\u4f5c\u4e3a\u8bad\u7ec3\u96c6\u3002 \u4ed6\u4eec\u4f7f\u7528 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) \u4f5c\u4e3a\u521d\u59cb\u5316\u7684text encoder\uff0c\u51bb\u4f4f [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([\u8bba\u6587](https://arxiv.org/abs/2112.10752)) \u6a21\u578b\u7684\u5176\u4ed6\u90e8\u5206\uff0c\u53ea\u8bad\u7ec3 text encoder\uff0c\u4ee5\u4fbf\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u751f\u6210\u80fd\u529b\u4e14\u5b9e\u73b0\u4e2d\u6587\u6982\u5ff5\u7684\u5bf9\u9f50\u3002\u8be5\u6a21\u578b\u76ee\u524d\u57280.2\u4ebf\u56fe\u6587\u5bf9\u4e0a\u8bad\u7ec3\u4e86\u4e00\u4e2a epoch\u3002 \u5728 32 x A100 \u4e0a\u8bad\u7ec3\u4e86\u5927\u7ea6100\u5c0f\u65f6\uff0c\u8be5\u7248\u672c\u53ea\u662f\u4e00\u4e2a\u521d\u6b65\u7684\u7248\u672c\u3002| [\u5730\u5740](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1) |\n| IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1 | StableDiffusionPipeline\u3001StableDiffusionImg2ImgPipeline\u3001StableDiffusionInpaintPipelineLegacy\u3001StableDiffusionMegaPipeline\u3001StableDiffusionPipelineAllinOne | \u4ed6\u4eec\u5c06 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) \u6570\u636e\u96c6 (100M) \u548c [Zero](https://zero.so.com/) \u6570\u636e\u96c6 (23M) \u7528\u4f5c\u9884\u8bad\u7ec3\u7684\u6570\u636e\u96c6\uff0c\u5148\u7528 [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) \u5bf9\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u7684\u56fe\u6587\u5bf9\u76f8\u4f3c\u6027\u8fdb\u884c\u6253\u5206\uff0c\u53d6 CLIP Score \u5927\u4e8e 0.2 \u7684\u56fe\u6587\u5bf9\u4f5c\u4e3a\u8bad\u7ec3\u96c6\u3002 \u4ed6\u4eec\u4f7f\u7528 [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) ([\u8bba\u6587](https://arxiv.org/abs/2112.10752)) \u6a21\u578b\u8fdb\u884c\u7ee7\u7eed\u8bad\u7ec3\uff0c\u5176\u4e2d\u8bad\u7ec3\u5206\u4e3a**\u4e24\u4e2astage**\u3002**\u7b2c\u4e00\u4e2astage** \u4e2d\u51bb\u4f4f\u6a21\u578b\u7684\u5176\u4ed6\u90e8\u5206\uff0c\u53ea\u8bad\u7ec3 text encoder \uff0c\u4ee5\u4fbf\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u751f\u6210\u80fd\u529b\u4e14\u5b9e\u73b0\u4e2d\u6587\u6982\u5ff5\u7684\u5bf9\u9f50\u3002**\u7b2c\u4e8c\u4e2astage** \u4e2d\u5c06\u5168\u90e8\u6a21\u578b\u89e3\u51bb\uff0c\u4e00\u8d77\u8bad\u7ec3 text encoder \u548c diffusion model \uff0c\u4ee5\u4fbf diffusion model \u66f4\u597d\u7684\u9002\u914d\u4e2d\u6587\u5f15\u5bfc\u3002\u7b2c\u4e00\u4e2a stage \u4ed6\u4eec\u8bad\u7ec3\u4e86 80 \u5c0f\u65f6\uff0c\u7b2c\u4e8c\u4e2a stage \u8bad\u7ec3\u4e86 100 \u5c0f\u65f6\uff0c\u4e24\u4e2astage\u90fd\u662f\u7528\u4e868 x A100\uff0c\u8be5\u7248\u672c\u662f\u4e00\u4e2a\u521d\u6b65\u7684\u7248\u672c\u3002| [\u5730\u5740](https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1) |\n</details>\n\n\n### \u52a0\u8f7dHF Diffusers\u6743\u91cd\n```python\nfrom ppdiffusers import StableDiffusionPipeline\n# \u8bbe\u7f6efrom_hf_hub\u4e3aTrue\uff0c\u8868\u793a\u4ecehuggingface hub\u4e0b\u8f7d\uff0cfrom_diffusers\u4e3aTrue\u8868\u793a\u52a0\u8f7d\u7684\u662fdiffusers\u7248Pytorch\u6743\u91cd\npipe = StableDiffusionPipeline.from_pretrained(\"stabilityai/stable-diffusion-2\", from_hf_hub=True, from_diffusers=True)\n```\n\n### \u52a0\u8f7d\u539f\u5e93\u7684Lightning\u6743\u91cd\n```python\nfrom ppdiffusers import StableDiffusionPipeline\n# \u53ef\u8f93\u5165\u7f51\u5740 \u6216 \u672c\u5730ckpt\u3001safetensors\u6587\u4ef6\npipe = StableDiffusionPipeline.from_single_file(\"https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/ppdiffusers/chilloutmix_NiPrunedFp32Fix.safetensors\")\n```\n\n### \u52a0\u8f7dHF LoRA\u6743\u91cd\n```python\nfrom ppdiffusers import DiffusionPipeline\n\npipe = DiffusionPipeline.from_pretrained(\"stabilityai/stable-diffusion-xl-base-1.0\", paddle_dtype=paddle.float16)\n\npipe.load_lora_weights(\"stabilityai/stable-diffusion-xl-base-1.0\",\n    weight_name=\"sd_xl_offset_example-lora_1.0.safetensors\",\n    from_diffusers=True)\n```\n\n### \u52a0\u8f7dCivitai\u793e\u533a\u7684LoRA\u6743\u91cd\n```python\nfrom ppdiffusers import StableDiffusionPipeline\npipe = StableDiffusionPipeline.from_pretrained(\"TASUKU2023/Chilloutmix\")\n# \u52a0\u8f7dlora\u6743\u91cd\npipe.load_lora_weights(\"./\",\n    weight_name=\"Moxin_10.safetensors\",\n    from_diffusers=True)\npipe.fuse_lora()\n```\n\n### XFormers\u52a0\u901f\n\u4e3a\u4e86\u4f7f\u7528**XFormers\u52a0\u901f**\uff0c\u6211\u4eec\u9700\u8981\u5b89\u88c5`develop`\u7248\u672c\u7684`paddle`\uff0cLinux\u7cfb\u7edf\u7684\u5b89\u88c5\u547d\u4ee4\u5982\u4e0b\uff1a\n```sh\npython -m pip install paddlepaddle-gpu==0.0.0.post117 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html\n```\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionPipeline\npipe = StableDiffusionPipeline.from_pretrained(\"TASUKU2023/Chilloutmix\", paddle_dtype=paddle.float16)\n# \u5f00\u542fxformers\u52a0\u901f \u9ed8\u8ba4\u9009\u62e9\"cutlass\"\u52a0\u901f\npipe.enable_xformers_memory_efficient_attention()\n# flash \u9700\u8981\u4f7f\u7528 A100\u3001A10\u30013060\u30013070\u30013080\u30013090 \u7b49\u4ee5\u4e0a\u663e\u5361\u3002\n# pipe.enable_xformers_memory_efficient_attention(\"flash\")\n```\n\n### ToME + ControlNet\n```python\n# \u5b89\u88c5develop\u7684ppdiffusers\n# pip install \"ppdiffusers>=0.24.0\"\nimport paddle\nfrom ppdiffusers import ControlNetModel, StableDiffusionControlNetPipeline\nfrom ppdiffusers.utils import load_image\n\ncontrolnet = ControlNetModel.from_pretrained(\"lllyasviel/sd-controlnet-canny\")\npipe = StableDiffusionControlNetPipeline.from_pretrained(\n    \"runwayml/stable-diffusion-v1-5\", safety_checker=None, controlnet=controlnet, paddle_dtype=paddle.float16\n)\n\n# Apply ToMe with a 50% merging ratio\npipe.apply_tome(ratio=0.5) # Can also use pipe.unet in place of pipe here\n\n# \u6211\u4eec\u53ef\u4ee5\u5f00\u542f xformers\n# pipe.enable_xformers_memory_efficient_attention()\ngenerator = paddle.Generator().manual_seed(0)\nprompt = \"bird\"\nimage = load_image(\n    \"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png\"\n)\n\nimage = pipe(prompt, image, generator=generator).images[0]\n\nimage.save(\"bird.png\")\n```\n\n### \u6587\u56fe\u751f\u6210 \uff08Text-to-Image Generation\uff09\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionPipeline\n\npipe = StableDiffusionPipeline.from_pretrained(\"stabilityai/stable-diffusion-2\")\n\n# \u8bbe\u7f6e\u968f\u673a\u79cd\u5b50\uff0c\u6211\u4eec\u53ef\u4ee5\u590d\u73b0\u4e0b\u9762\u7684\u7ed3\u679c\uff01\npaddle.seed(5232132133)\nprompt = \"a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing \"\nimage = pipe(prompt, guidance_scale=7.5, height=768, width=768).images[0]\n\nimage.save(\"shiba_dog_with_a_red_cap.png\")\n```\n<div align=\"center\">\n<img width=\"500\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204796701-d7911f76-8670-47d5-8d1b-8368b046c5e4.png\">\n</div>\n\n### \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u53d8\u6362\uff08Image-to-Image Text-Guided Generation\uff09\n\n<details><summary>&emsp;Image-to-Image Text-Guided Generation Demo </summary>\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionImg2ImgPipeline\nfrom ppdiffusers.utils import load_image\n\npipe = StableDiffusionImg2ImgPipeline.from_pretrained(\"Linaqruf/anything-v3.0\", safety_checker=None)\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png\"\nimage = load_image(url).resize((512, 768))\n\n# \u8bbe\u7f6e\u968f\u673a\u79cd\u5b50\uff0c\u6211\u4eec\u53ef\u4ee5\u590d\u73b0\u4e0b\u9762\u7684\u7ed3\u679c\uff01\npaddle.seed(42)\nprompt = \"Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress\"\nnegative_prompt = \"lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry\"\n\nimage = pipe(prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5).images[0]\nimage.save(\"image_Kurisu_img2img.png\")\n```\n<div align=\"center\">\n<img width=\"500\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204799529-cd89dcdb-eb1d-4247-91ac-b0f7bad777f8.png\">\n</div>\n</details>\n\n### \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\uff08Text-Guided Image Inpainting\uff09\n\n\u6ce8\u610f\uff01\u5f53\u524d\u6709\u4e24\u79cd\u7248\u672c\u7684\u56fe\u50cf\u7f16\u8f91\u4ee3\u7801\uff0c\u4e00\u4e2a\u662fLegacy\u7248\u672c\uff0c\u4e00\u4e2a\u662f\u6b63\u5f0f\u7248\u672c\uff0c\u4e0b\u9762\u5c06\u5206\u522b\u4ecb\u7ecd\u4e24\u79cd\u4ee3\u7801\u5982\u4f55\u4f7f\u7528\uff01\n\n<details><summary>&emsp;Legacy\u7248\u672c\u4ee3\u7801</summary>\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionInpaintPipelineLegacy\nfrom ppdiffusers.utils import load_image\n\n# \u53ef\u9009\u6a21\u578b\u6743\u91cd\n# CompVis/stable-diffusion-v1-4\n# runwayml/stable-diffusion-v1-5\n# stabilityai/stable-diffusion-2-base \uff08\u539f\u59cb\u7b56\u7565 512x512\uff09\n# stabilityai/stable-diffusion-2 \uff08v-objective 768x768\uff09\n# Linaqruf/anything-v3.0\n# ......\nimg_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png\"\nmask_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png\"\n\nimage = load_image(img_url).resize((512, 512))\nmask_image = load_image(mask_url).resize((512, 512))\n\npipe = StableDiffusionInpaintPipelineLegacy.from_pretrained(\"stabilityai/stable-diffusion-2-base\", safety_checker=None)\n\n# \u8bbe\u7f6e\u968f\u673a\u79cd\u5b50\uff0c\u6211\u4eec\u53ef\u4ee5\u590d\u73b0\u4e0b\u9762\u7684\u7ed3\u679c\uff01\npaddle.seed(10245)\nprompt = \"a red cat sitting on a bench\"\nimage = pipe(prompt=prompt, image=image, mask_image=mask_image, strength=0.75).images[0]\n\nimage.save(\"a_red_cat_legacy.png\")\n```\n<div align=\"center\">\n<img width=\"900\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204802186-5a6d302b-83aa-4247-a5bb-ebabfcc3abc4.png\">\n</div>\n\n</details>\n\n<details><summary>&emsp;\u6b63\u5f0f\u7248\u672c\u4ee3\u7801</summary>\n\nTips: \u4e0b\u9762\u7684\u4f7f\u7528\u65b9\u6cd5\u662f\u65b0\u7248\u672c\u7684\u4ee3\u7801\uff0c\u4e5f\u662f\u5b98\u65b9\u63a8\u8350\u7684\u4ee3\u7801\uff0c\u6ce8\u610f\u5fc5\u987b\u914d\u5408 **runwayml/stable-diffusion-inpainting** \u548c **stabilityai/stable-diffusion-2-inpainting** \u624d\u53ef\u6b63\u5e38\u4f7f\u7528\u3002\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionInpaintPipeline\nfrom ppdiffusers.utils import load_image\n\n# \u53ef\u9009\u6a21\u578b\u6743\u91cd\n# runwayml/stable-diffusion-inpainting\n# stabilityai/stable-diffusion-2-inpainting\nimg_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png\"\nmask_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png\"\n\nimage = load_image(img_url).resize((512, 512))\nmask_image = load_image(mask_url).resize((512, 512))\n\npipe = StableDiffusionInpaintPipeline.from_pretrained(\"stabilityai/stable-diffusion-2-inpainting\")\n\n# \u8bbe\u7f6e\u968f\u673a\u79cd\u5b50\uff0c\u6211\u4eec\u53ef\u4ee5\u590d\u73b0\u4e0b\u9762\u7684\u7ed3\u679c\uff01\npaddle.seed(1024)\nprompt = \"Face of a yellow cat, high resolution, sitting on a park bench\"\nimage = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]\n\nimage.save(\"a_yellow_cat.png\")\n```\n<div align=\"center\">\n<img width=\"900\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204801946-6cd043bc-f3db-42cf-82cd-6a6171484523.png\">\n</div>\n</details>\n\n### \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u653e\u5927 & \u8d85\u5206\uff08Text-Guided Image Upscaling & Super-Resolution\uff09\n\n<details><summary>&emsp;Text-Guided Image Upscaling Demo</summary>\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionUpscalePipeline\nfrom ppdiffusers.utils import load_image\n\npipe = StableDiffusionUpscalePipeline.from_pretrained(\"stabilityai/stable-diffusion-x4-upscaler\")\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png\"\n# \u6211\u4eec\u4eba\u5de5\u5c06\u539f\u59cb\u56fe\u7247\u7f29\u5c0f\u6210 128x128 \u5206\u8fa8\u7387\uff0c\u6700\u7ec8\u4fdd\u5b58\u7684\u56fe\u7247\u4f1a\u653e\u59274\u500d\uff01\nlow_res_img = load_image(url).resize((128, 128))\n\nprompt = \"a white cat\"\nimage = pipe(prompt=prompt, image=low_res_img).images[0]\n\nimage.save(\"upscaled_white_cat.png\")\n```\n<div align=\"center\">\n<img width=\"200\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204806180-b7f1b9cf-8a62-4577-b5c4-91adda08a13b.png\">\n<img width=\"400\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204806202-8c110be3-5f48-4946-95ea-21ad5a9a2340.png\">\n</div>\n</details>\n\n<details><summary>&emsp;Super-Resolution Demo</summary>\n\n```python\nimport paddle\nfrom ppdiffusers import LDMSuperResolutionPipeline\nfrom ppdiffusers.utils import load_image\n\npipe = LDMSuperResolutionPipeline.from_pretrained(\"CompVis/ldm-super-resolution-4x-openimages\")\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png\"\n\n# \u6211\u4eec\u4eba\u5de5\u5c06\u539f\u59cb\u56fe\u7247\u7f29\u5c0f\u6210 128x128 \u5206\u8fa8\u7387\uff0c\u6700\u7ec8\u4fdd\u5b58\u7684\u56fe\u7247\u4f1a\u653e\u59274\u500d\uff01\nlow_res_img = load_image(url).resize((128, 128))\n\nimage = pipe(image=low_res_img, num_inference_steps=100).images[0]\n\nimage.save(\"ldm-super-resolution-image.png\")\n```\n<div align=\"center\">\n<img width=\"200\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204804426-5e28b571-aa41-4f56-ba26-68cca75fdaae.png\">\n<img width=\"400\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/204804148-fe7c293b-6cd7-4942-ae9c-446369fe8410.png\">\n</div>\n\n</details>\n\n## \u6a21\u578b\u63a8\u7406\u90e8\u7f72\n\u9664\u4e86**Paddle\u52a8\u6001\u56fe**\u8fd0\u884c\u4e4b\u5916\uff0c\u5f88\u591a\u6a21\u578b\u8fd8\u652f\u6301\u5c06\u6a21\u578b\u5bfc\u51fa\u5e76\u4f7f\u7528\u63a8\u7406\u5f15\u64ce\u8fd0\u884c\u3002\u6211\u4eec\u63d0\u4f9b\u57fa\u4e8e[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)\u4e0a\u7684**StableDiffusion**\u6a21\u578b\u90e8\u7f72\u793a\u4f8b\uff0c\u6db5\u76d6\u6587\u751f\u56fe\u3001\u56fe\u751f\u56fe\u3001\u56fe\u50cf\u7f16\u8f91\u7b49\u4efb\u52a1\uff0c\u7528\u6237\u53ef\u4ee5\u6309\u7167\u6211\u4eec\u63d0\u4f9b[StableDiffusion\u6a21\u578b\u5bfc\u51fa\u6559\u7a0b](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/deploy/export.md)\u5c06\u6a21\u578b\u5bfc\u51fa\uff0c\u7136\u540e\u4f7f\u7528`FastDeployStableDiffusionMegaPipeline`\u8fdb\u884c\u9ad8\u6027\u80fd\u63a8\u7406\u90e8\u7f72\uff01\n\n<details><summary>&emsp; \u5df2\u9884\u5148\u5bfc\u51fa\u7684FastDeploy\u7248Stable Diffusion\u6743\u91cd </summary>\n\n**\u6ce8\u610f\uff1a\u5f53\u524d\u5bfc\u51fa\u7684vae encoder\u5e26\u6709\u968f\u673a\u56e0\u7d20\uff01**\n\n- CompVis/stable-diffusion-v1-4@fastdeploy\n- runwayml/stable-diffusion-v1-5@fastdeploy\n- runwayml/stable-diffusion-inpainting@fastdeploy\n- stabilityai/stable-diffusion-2-base@fastdeploy\n- stabilityai/stable-diffusion-2@fastdeploy\n- stabilityai/stable-diffusion-2-inpainting@fastdeploy\n- Linaqruf/anything-v3.0@fastdeploy\n- hakurei/waifu-diffusion-v1-3@fastdeploy\n\n</details>\n\n<details><summary>&emsp; FastDeploy Demo </summary>\n\n```python\nimport paddle\nimport fastdeploy as fd\nfrom ppdiffusers import FastDeployStableDiffusionMegaPipeline\nfrom ppdiffusers.utils import load_image\n\ndef create_runtime_option(device_id=0, backend=\"paddle\", use_cuda_stream=True):\n    option = fd.RuntimeOption()\n    if backend == \"paddle\":\n        option.use_paddle_backend()\n    else:\n        option.use_ort_backend()\n    if device_id == -1:\n        option.use_cpu()\n    else:\n        option.use_gpu(device_id)\n        if use_cuda_stream:\n            paddle_stream = paddle.device.cuda.current_stream(device_id).cuda_stream\n            option.set_external_raw_stream(paddle_stream)\n    return option\n\nruntime_options = {\n    \"text_encoder\": create_runtime_option(0, \"paddle\"),  # use gpu:0\n    \"vae_encoder\": create_runtime_option(0, \"paddle\"),  # use gpu:0\n    \"vae_decoder\": create_runtime_option(0, \"paddle\"),  # use gpu:0\n    \"unet\": create_runtime_option(0, \"paddle\"),  # use gpu:0\n}\n\nfd_pipe = FastDeployStableDiffusionMegaPipeline.from_pretrained(\n    \"Linaqruf/anything-v3.0@fastdeploy\", runtime_options=runtime_options\n)\n\n# text2img\nprompt = \"a portrait of shiba inu with a red cap growing on its head. intricate. lifelike. soft light. sony a 7 r iv 5 5 mm. cinematic post - processing \"\nimage_text2img = fd_pipe.text2img(prompt=prompt, num_inference_steps=50).images[0]\nimage_text2img.save(\"image_text2img.png\")\n\n# img2img\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/image_Kurisu.png\"\nimage = load_image(url).resize((512, 512))\nprompt = \"Kurisu Makise, looking at viewer, long hair, standing, 1girl, hair ornament, hair flower, cute, jacket, white flower, white dress\"\nnegative_prompt = \"lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry\"\n\nimage_img2img = fd_pipe.img2img(\n    prompt=prompt, negative_prompt=negative_prompt, image=image, strength=0.75, guidance_scale=7.5\n).images[0]\nimage_img2img.save(\"image_img2img.png\")\n\n# inpaint_legacy\nimg_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png\"\nmask_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png\"\nimage = load_image(img_url).resize((512, 512))\nmask_image = load_image(mask_url).resize((512, 512))\nprompt = \"a red cat sitting on a bench\"\n\nimage_inpaint_legacy = fd_pipe.inpaint_legacy(\n    prompt=prompt, image=image, mask_image=mask_image, strength=0.75, num_inference_steps=50\n).images[0]\nimage_inpaint_legacy.save(\"image_inpaint_legacy.png\")\n```\n</details>\n<div align=\"center\">\n<img width=\"900\" alt=\"image\" src=\"https://user-images.githubusercontent.com/50394665/205297240-46b80992-34af-40cd-91a6-ae76589d0e21.png\">\n</div>\n\n\n## \u66f4\u591a\u4efb\u52a1\u5206\u7c7b\u5c55\u793a\n### \u6587\u672c\u56fe\u50cf\u591a\u6a21\n\n<details open>\n<summary>&emsp;\u6587\u56fe\u751f\u6210\uff08Text-to-Image Generation\uff09</summary>\n\n#### text_to_image_generation-stable_diffusion\n\n```python\nfrom ppdiffusers import StableDiffusionPipeline\n\n# \u52a0\u8f7d\u6a21\u578b\u548cscheduler\npipe = StableDiffusionPipeline.from_pretrained(\"runwayml/stable-diffusion-v1-5\")\n\n# \u6267\u884cpipeline\u8fdb\u884c\u63a8\u7406\nprompt = \"a photo of an astronaut riding a horse on mars\"\nimage = pipe(prompt).images[0]\n\n# \u4fdd\u5b58\u56fe\u7247\nimage.save(\"astronaut_rides_horse_sd.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209322401-6ecfeaaa-6878-4302-b592-07a31de4e590.png\">\n</div>\n\n#### text_to_image_generation-stable_diffusion_xl\n\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionXLPipeline\n\npipe = StableDiffusionXLPipeline.from_pretrained(\n     \"stabilityai/stable-diffusion-xl-base-1.0\",\n     paddle_dtype=paddle.float16,\n     variant=\"fp16\"\n)\nprompt = \"a photo of an astronaut riding a horse on mars\"\ngenerator = paddle.Generator().manual_seed(42)\nimage = pipe(prompt=prompt, generator=generator, num_inference_steps=50).images[0]\nimage.save('sdxl_text2image.png')\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/d72729f9-8685-48f9-a238-e4ddf6d264f3\">\n</div>\n\n#### text_to_image_generation-sdxl_base_with_refiner\n\n```python\nfrom ppdiffusers import DiffusionPipeline\nimport paddle\n\n# load both base & refiner\nbase = DiffusionPipeline.from_pretrained(\n    \"stabilityai/stable-diffusion-xl-base-1.0\",\n    paddle_dtype=paddle.float16,\n)\nrefiner = DiffusionPipeline.from_pretrained(\n    \"stabilityai/stable-diffusion-xl-refiner-1.0\",\n    text_encoder_2=base.text_encoder_2,\n    vae=base.vae,\n    paddle_dtype=paddle.float16,\n    variant=\"fp16\",\n)\n\n# Define how many steps and what % of steps to be run on each experts (80/20) here\nn_steps = 40\nhigh_noise_frac = 0.8\n\nprompt = \"A majestic lion jumping from a big stone at night\"\nprompt = \"a photo of an astronaut riding a horse on mars\"\ngenerator = paddle.Generator().manual_seed(42)\n\n# run both experts\nimage = base(\n    prompt=prompt,\n    output_type=\"latent\",\n    generator=generator,\n).images\n\nimage = refiner(\n    prompt=prompt,\n    image=image,\n    generator=generator,\n).images[0]\nimage.save('text_to_image_generation-sdxl-base-with-refiner-result.png')\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/8ef36826-ed94-4856-a356-af1677f60d1b\">\n</div>\n\n#### text_to_image_generation-kandinsky2_2\n```python\nfrom ppdiffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline\n\npipe_prior = KandinskyV22PriorPipeline.from_pretrained(\"kandinsky-community/kandinsky-2-2-prior\")\nprompt = \"red cat, 4k photo\"\nout = pipe_prior(prompt)\nimage_emb = out.image_embeds\nzero_image_emb = out.negative_image_embeds\npipe = KandinskyV22Pipeline.from_pretrained(\"kandinsky-community/kandinsky-2-2-decoder\")\nimage = pipe(\n    image_embeds=image_emb,\n    negative_image_embeds=zero_image_emb,\n    height=768,\n    width=768,\n    num_inference_steps=50,\n).images\nimage[0].save(\"text_to_image_generation-kandinsky2_2-result-cat.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/188f76dd-4bd7-4a33-8f30-b893c7a9e249\">\n</div>\n\n#### text_to_image_generation-unidiffuser\n```python\nimport paddle\nfrom paddlenlp.trainer import set_seed\n\nfrom ppdiffusers import UniDiffuserPipeline\n\nmodel_id_or_path = \"thu-ml/unidiffuser-v1\"\npipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, paddle_dtype=paddle.float16)\nset_seed(42)\n\n# Text variation can be performed with a text-to-image generation followed by a image-to-text generation:\n# 1. Text-to-image generation\nprompt = \"an elephant under the sea\"\nsample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)\nt2i_image = sample.images[0]\nt2i_image.save(\"t2i_image.png\")\n````\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/a6eb11d2-ad27-4263-8cb4-b0d8dd42b36c\">\n</div>\n\n#### text_to_image_generation-deepfloyd_if\n\n```python\nimport paddle\n\nfrom ppdiffusers import DiffusionPipeline, IFPipeline, IFSuperResolutionPipeline\nfrom ppdiffusers.utils import pd_to_pil\n\n# Stage 1: generate images\npipe = IFPipeline.from_pretrained(\"DeepFloyd/IF-I-XL-v1.0\", variant=\"fp16\", paddle_dtype=paddle.float16)\npipe.enable_xformers_memory_efficient_attention()\nprompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says \"very deep learning\"'\nprompt_embeds, negative_embeds = pipe.encode_prompt(prompt)\nimage = pipe(\n    prompt_embeds=prompt_embeds,\n    negative_prompt_embeds=negative_embeds,\n    output_type=\"pd\",\n).images\n\n# save intermediate image\npil_image = pd_to_pil(image)\npil_image[0].save(\"text_to_image_generation-deepfloyd_if-result-if_stage_I.png\")\n# save gpu memory\npipe.to(paddle_device=\"cpu\")\n\n# Stage 2: super resolution stage1\nsuper_res_1_pipe = IFSuperResolutionPipeline.from_pretrained(\n    \"DeepFloyd/IF-II-L-v1.0\", text_encoder=None, variant=\"fp16\", paddle_dtype=paddle.float16\n)\nsuper_res_1_pipe.enable_xformers_memory_efficient_attention()\n\nimage = super_res_1_pipe(\n    image=image,\n    prompt_embeds=prompt_embeds,\n    negative_prompt_embeds=negative_embeds,\n    output_type=\"pd\",\n).images\n# save intermediate image\npil_image = pd_to_pil(image)\npil_image[0].save(\"text_to_image_generation-deepfloyd_if-result-if_stage_II.png\")\n# save gpu memory\nsuper_res_1_pipe.to(paddle_device=\"cpu\")\n```\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/246785766-700dfad9-159d-4bfb-bfc7-c18df938a052.png\">\n</div>\n<div align=\"center\">\n<center>if_stage_I</center>\n</div>\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/246785773-3359ca5f-dadf-4cc8-b318-ff1f9d4a2d35.png\">\n</div>\n<div align=\"center\">\n<center>if_stage_II</center>\n<!-- <img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/246785774-8870829a-354b-4a87-9d67-93af315f51e6.png\">\n<center>if_stage_III</center> -->\n</div>\n</details>\n\n\n<details><summary>&emsp;\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u653e\u5927\uff08Text-Guided Image Upscaling\uff09</summary>\n\n#### text_guided_image_upscaling-stable_diffusion_2\n\n```python\nfrom ppdiffusers import StableDiffusionUpscalePipeline\nfrom ppdiffusers.utils import load_image\n\npipe = StableDiffusionUpscalePipeline.from_pretrained(\"stabilityai/stable-diffusion-x4-upscaler\")\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png\"\nlow_res_img = load_image(url).resize((128, 128))\n\nprompt = \"a white cat\"\nupscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]\nupscaled_image.save(\"upsampled_cat_sd2.png\")\n```\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209324085-0d058b70-89b0-43c2-affe-534eedf116cf.png\">\n<center>\u539f\u56fe\u50cf</center>\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209323862-ce2d8658-a52b-4f35-90cb-aa7d310022e7.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n<details><summary>&emsp;\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\uff08Text-Guided Image Inpainting\uff09</summary>\n\n#### text_guided_image_inpainting-stable_diffusion_2\n\n```python\nimport paddle\n\nfrom ppdiffusers import PaintByExamplePipeline\nfrom ppdiffusers.utils import load_image\n\nimg_url = \"https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/image_example_1.png\"\nmask_url = \"https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/mask_example_1.png\"\nexample_url = \"https://paddlenlp.bj.bcebos.com/models/community/Fantasy-Studio/data/reference_example_1.jpeg\"\n\ninit_image = load_image(img_url).resize((512, 512))\nmask_image = load_image(mask_url).resize((512, 512))\nexample_image = load_image(example_url).resize((512, 512))\n\npipe = PaintByExamplePipeline.from_pretrained(\"Fantasy-Studio/Paint-by-Example\")\n\n# \u4f7f\u7528fp16\u52a0\u5feb\u751f\u6210\u901f\u5ea6\nwith paddle.amp.auto_cast(True):\n    image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]\nimage.save(\"image_guided_image_inpainting-paint_by_example-result.png\")\n```\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/247118364-5d91f433-f9ac-4514-b5f0-cb4599905847.png\" width=300>\n<center>\u539f\u56fe\u50cf</center>\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/247118361-0f78d6db-6896-4f8d-b1bd-8350192f7a4e.png\" width=300>\n<center>\u63a9\u7801\u56fe\u50cf</center>\n<div align=\"center\">\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/247118368-305a048d-ddc3-4a5f-8915-58591ef680f0.jpeg\" width=300>\n<center>\u53c2\u8003\u56fe\u50cf</center>\n<img alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/247117963-e5b9b754-39a3-480b-a557-46a2f9310e79.png\" width=300>\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n\n<details><summary>&emsp;\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u53d8\u6362\uff08Image-to-Image Text-Guided Generation\uff09</summary>\n\n#### text_guided_image_inpainting-kandinsky2_2\n```python\nimport numpy as np\nimport paddle\n\nfrom ppdiffusers import KandinskyV22InpaintPipeline, KandinskyV22PriorPipeline\nfrom ppdiffusers.utils import load_image\n\npipe_prior = KandinskyV22PriorPipeline.from_pretrained(\n    \"kandinsky-community/kandinsky-2-2-prior\", paddle_dtype=paddle.float16\n)\nprompt = \"a hat\"\nimage_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)\npipe = KandinskyV22InpaintPipeline.from_pretrained(\n    \"kandinsky-community/kandinsky-2-2-decoder-inpaint\", paddle_dtype=paddle.float16\n)\ninit_image = load_image(\n    \"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png\"\n)\nmask = np.zeros((768, 768), dtype=np.float32)\nmask[:250, 250:-250] = 1\nout = pipe(\n    image=init_image,\n    mask_image=mask,\n    image_embeds=image_emb,\n    negative_image_embeds=zero_image_emb,\n    height=768,\n    width=768,\n    num_inference_steps=50,\n)\nimage = out.images[0]\nimage.save(\"text_guided_image_inpainting-kandinsky2_2-result-cat_with_hat.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/64a943d5-167b-4433-91c3-3cf9279714db\">\n<center>\u539f\u56fe\u50cf</center>\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/f469c127-52f4-4173-a693-c06b92a052aa\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n\n#### image_to_image_text_guided_generation-stable_diffusion\n```python\nimport paddle\n\nfrom ppdiffusers import StableDiffusionImg2ImgPipeline\nfrom ppdiffusers.utils import load_image\n\n# \u52a0\u8f7dpipeline\npipe = StableDiffusionImg2ImgPipeline.from_pretrained(\"runwayml/stable-diffusion-v1-5\")\n\n# \u4e0b\u8f7d\u521d\u59cb\u56fe\u7247\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png\"\n\ninit_image = load_image(url).resize((768, 512))\n\nprompt = \"A fantasy landscape, trending on artstation\"\n# \u4f7f\u7528fp16\u52a0\u5feb\u751f\u6210\u901f\u5ea6\nwith paddle.amp.auto_cast(True):\n    image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]\n\nimage.save(\"fantasy_landscape.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209327142-d8e1d0c7-3bf8-4a08-a0e8-b11451fc84d8.png\">\n<center>\u539f\u56fe\u50cf</center>\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209325799-d9ff279b-0d57-435f-bda7-763e3323be23.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n\n#### image_to_image_text_guided_generation-stable_diffusion_xl\n```python\nimport paddle\nfrom ppdiffusers import StableDiffusionXLImg2ImgPipeline\nfrom ppdiffusers.utils import load_image\n\npipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(\n    \"stabilityai/stable-diffusion-xl-refiner-1.0\",\n    paddle_dtype=paddle.float16,\n    # from_hf_hub=True,\n    # from_diffusers=True,\n    variant=\"fp16\"\n)\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/westfish/develop-0-19-3/000000009.png\"\ninit_image = load_image(url).convert(\"RGB\")\nprompt = \"a photo of an astronaut riding a horse on mars\"\nimage = pipe(prompt, image=init_image).images[0]\nimage.save('sdxl_image2image.png')\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/41bd9381-2799-4bed-a5e2-ba312a2f8da9\">\n<center>\u539f\u56fe\u50cf</center>\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/db672d03-2e3a-46ac-97fd-d80cca18dbbe\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n\n#### image_to_image_text_guided_generation-kandinsky2_2\n```python\nimport paddle\n\nfrom ppdiffusers import KandinskyV22Img2ImgPipeline, KandinskyV22PriorPipeline\nfrom ppdiffusers.utils import load_image\n\npipe_prior = KandinskyV22PriorPipeline.from_pretrained(\n    \"kandinsky-community/kandinsky-2-2-prior\", paddle_dtype=paddle.float16\n)\nprompt = \"A red cartoon frog, 4k\"\nimage_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)\npipe = KandinskyV22Img2ImgPipeline.from_pretrained(\n    \"kandinsky-community/kandinsky-2-2-decoder\", paddle_dtype=paddle.float16\n)\n\ninit_image = load_image(\n    \"https://hf-mirror.com/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/frog.png\"\n)\nimage = pipe(\n    image=init_image,\n    image_embeds=image_emb,\n    negative_image_embeds=zero_image_emb,\n    height=768,\n    width=768,\n    num_inference_steps=100,\n    strength=0.2,\n).images\nimage[0].save(\"image_to_image_text_guided_generation-kandinsky2_2-result-red_frog.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/aae57109-94ad-408e-ae75-8cce650cebe5\">\n<center>\u539f\u56fe\u50cf</center>\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/23cf2c4e-416f-4f21-82a6-e57de11b5e83\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n\n</details>\n</details>\n\n<details><summary>&emsp;\u6587\u672c\u56fe\u50cf\u53cc\u5f15\u5bfc\u56fe\u50cf\u751f\u6210\uff08Dual Text and Image Guided Generation\uff09</summary>\n\n#### dual_text_and_image_guided_generation-versatile_diffusion\n```python\nfrom ppdiffusers import VersatileDiffusionDualGuidedPipeline\nfrom ppdiffusers.utils import load_image\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg\"\nimage = load_image(url)\ntext = \"a red car in the sun\"\n\npipe = VersatileDiffusionDualGuidedPipeline.from_pretrained(\"shi-labs/versatile-diffusion\")\npipe.remove_unused_weights()\n\ntext_to_image_strength = 0.75\nimage = pipe(prompt=text, image=image, text_to_image_strength=text_to_image_strength).images[0]\nimage.save(\"versatile-diffusion-red_car.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209325965-2475e9c4-a524-4970-8498-dfe10ff9cf24.jpg\" >\n<center>\u539f\u56fe\u50cf</center>\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209325293-049098d0-d591-4abc-b151-9291ac2636da.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n### \u6587\u672c\u89c6\u9891\u591a\u6a21\n\n<details open>\n<summary>&emsp;\u6587\u672c\u6761\u4ef6\u7684\u89c6\u9891\u751f\u6210\uff08Text-to-Video Generation\uff09</summary>\n\n#### text_to_video_generation-lvdm\n\n```python\nimport paddle\n\nfrom ppdiffusers import LVDMTextToVideoPipeline\n\n# \u52a0\u8f7d\u6a21\u578b\u548cscheduler\npipe = LVDMTextToVideoPipeline.from_pretrained(\"westfish/lvdm_text2video_orig_webvid_2m\")\n\n# \u6267\u884cpipeline\u8fdb\u884c\u63a8\u7406\nseed = 2013\ngenerator = paddle.Generator().manual_seed(seed)\nsamples = pipe(\n    prompt=\"cutting in kitchen\",\n    num_frames=16,\n    height=256,\n    width=256,\n    num_inference_steps=50,\n    generator=generator,\n    guidance_scale=15,\n    eta=1,\n    save_dir=\".\",\n    save_name=\"text_to_video_generation-lvdm-result-ddim_lvdm_text_to_video_ucf\",\n    encoder_type=\"2d\",\n    scale_factor=0.18215,\n    shift_factor=0,\n)\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/270906907-2b9d53c1-0272-4c7a-81b2-cd962d23bbee.gif\">\n</div>\n\n#### text_to_video_generation-synth\n\n```python\nimport imageio\n\nfrom ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline\n\npipe = TextToVideoSDPipeline.from_pretrained(\"damo-vilab/text-to-video-ms-1.7b\")\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\n\nprompt = \"An astronaut riding a horse.\"\nvideo_frames = pipe(prompt, num_inference_steps=25).frames\nimageio.mimsave(\"text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4\", video_frames, fps=8)\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/281259277-0ebe29a3-4eba-48ee-a98b-292e60de3c98.gif\">\n</div>\n\n\n#### text_to_video_generation-synth with zeroscope_v2_XL\n\n```python\nimport imageio\n\nfrom ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline\n\n# from ppdiffusers.utils import export_to_video\n\npipe = TextToVideoSDPipeline.from_pretrained(\"cerspense/zeroscope_v2_XL\")\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\n\nprompt = \"An astronaut riding a horse.\"\nvideo_frames = pipe(prompt, num_inference_steps=50, height=320, width=576, num_frames=24).frames\nimageio.mimsave(\"text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4\", video_frames, fps=8)\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://github.com/PaddlePaddle/PaddleMIX/assets/35400185/43ebbca0-9f07-458b-809a-acf296a2539b\">\n</div>\n\n#### text_to_video_generation-zero\n\n```python\nimport imageio\n\n# pip install imageio[ffmpeg]\nimport paddle\n\nfrom ppdiffusers import TextToVideoZeroPipeline\n\nmodel_id = \"runwayml/stable-diffusion-v1-5\"\npipe = TextToVideoZeroPipeline.from_pretrained(model_id, paddle_dtype=paddle.float16)\n\nprompt = \"A panda is playing guitar on times square\"\nresult = pipe(prompt=prompt).images\nresult = [(r * 255).astype(\"uint8\") for r in result]\nimageio.mimsave(\"text_to_video_generation-zero-result-panda.mp4\", result, fps=4)\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/246779321-c2b0c2b4-e383-40c7-a4d8-f417e8062b35.gif\">\n</div>\n\n</details>\n\n### \u6587\u672c\u97f3\u9891\u591a\u6a21\n<details>\n<summary>&emsp;\u6587\u672c\u6761\u4ef6\u7684\u97f3\u9891\u751f\u6210\uff08Text-to-Audio Generation\uff09</summary>\n\n#### text_to_audio_generation-audio_ldm\n\n```python\nimport paddle\nimport scipy\n\nfrom ppdiffusers import AudioLDMPipeline\n\npipe = AudioLDMPipeline.from_pretrained(\"cvssp/audioldm\", paddle_dtype=paddle.float16)\n\nprompt = \"Techno music with a strong, upbeat tempo and high melodic riffs\"\naudio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]\n\noutput_path = \"text_to_audio_generation-audio_ldm-techno.wav\"\n# save the audio sample as a .wav file\nscipy.io.wavfile.write(output_path, rate=16000, data=audio)\n```\n<div align = \"center\">\n  <thead>\n  </thead>\n  <tbody>\n   <tr>\n      <td align = \"center\">\n      <a href=\"https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/techno.wav\" rel=\"nofollow\">\n            <img align=\"center\" src=\"https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png\" width=\"200 style=\"max-width: 100%;\"></a><br>\n      </td>\n    </tr>\n  </tbody>\n</div>\n</details>\n\n### \u56fe\u50cf\n\n<details><summary>&emsp;\u65e0\u6761\u4ef6\u56fe\u50cf\u751f\u6210\uff08Unconditional Image Generation\uff09</summary>\n\n#### unconditional_image_generation-latent_diffusion_uncond\n\n```python\nfrom ppdiffusers import LDMPipeline\n\n# \u52a0\u8f7d\u6a21\u578b\u548cscheduler\npipe = LDMPipeline.from_pretrained(\"CompVis/ldm-celebahq-256\")\n\n# \u6267\u884cpipeline\u8fdb\u884c\u63a8\u7406\nimage = pipe(num_inference_steps=200).images[0]\n\n# \u4fdd\u5b58\u56fe\u7247\nimage.save(\"ldm_generated_image.png\")\n```\n<div align=\"center\">\n<img width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209327936-7fe914e0-0ea0-4e21-a433-24eaed6ee94c.png\">\n</div>\n</details>\n\n<details><summary>&emsp;\u8d85\u5206\uff08Super Superresolution\uff09</summary>\n\n#### super_resolution-latent_diffusion\n```python\nimport paddle\n\nfrom ppdiffusers import LDMSuperResolutionPipeline\nfrom ppdiffusers.utils import load_image\n\n# \u52a0\u8f7dpipeline\npipe = LDMSuperResolutionPipeline.from_pretrained(\"CompVis/ldm-super-resolution-4x-openimages\")\n\n# \u4e0b\u8f7d\u521d\u59cb\u56fe\u7247\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png\"\n\ninit_image = load_image(url).resize((128, 128))\ninit_image.save(\"original-image.png\")\n\n# \u4f7f\u7528fp16\u52a0\u5feb\u751f\u6210\u901f\u5ea6\nwith paddle.amp.auto_cast(True):\n    image = pipe(init_image, num_inference_steps=100, eta=1).images[0]\n\nimage.save(\"super-resolution-image.png\")\n```\n<div align=\"center\">\n<img  alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209328660-9700fdc3-72b3-43bd-9a00-23b370ba030b.png\">\n<center>\u539f\u56fe\u50cf</center>\n<img  alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209328479-4eaea5d8-aa4a-4f31-aa2a-b47e3c730f15.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n\n<details><summary>&emsp;\u56fe\u50cf\u7f16\u8f91\uff08Image Inpainting\uff09</summary>\n\n#### image_inpainting-repaint\n```python\nfrom ppdiffusers import RePaintPipeline, RePaintScheduler\nfrom ppdiffusers.utils import load_image\n\nimg_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/celeba_hq_256.png\"\nmask_url = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/mask_256.png\"\n\n# Load the original image and the mask as PIL images\noriginal_image = load_image(img_url).resize((256, 256))\nmask_image = load_image(mask_url).resize((256, 256))\n\nscheduler = RePaintScheduler.from_pretrained(\"google/ddpm-ema-celebahq-256\", subfolder=\"scheduler\")\npipe = RePaintPipeline.from_pretrained(\"google/ddpm-ema-celebahq-256\", scheduler=scheduler)\n\noutput = pipe(\n    original_image=original_image,\n    mask_image=mask_image,\n    num_inference_steps=250,\n    eta=0.0,\n    jump_length=10,\n    jump_n_sample=10,\n)\ninpainted_image = output.images[0]\n\ninpainted_image.save(\"repaint-image.png\")\n```\n<div align=\"center\">\n<img  alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209329052-b6fc2aaf-1a59-49a3-92ef-60180fdffd81.png\">\n<center>\u539f\u56fe\u50cf</center>\n<img  alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209329048-4fe12176-32a0-4800-98f2-49bd8d593799.png\">\n<center>mask\u56fe\u50cf</center>\n<img  alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209329241-b7e4d99e-468a-4b95-8829-d77ee14bfe98.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n\n\n<details><summary>&emsp;\u56fe\u50cf\u53d8\u5316\uff08Image Variation\uff09</summary>\n\n#### image_variation-versatile_diffusion\n```python\nfrom ppdiffusers import VersatileDiffusionImageVariationPipeline\nfrom ppdiffusers.utils import load_image\n\nurl = \"https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg\"\nimage = load_image(url)\n\npipe = VersatileDiffusionImageVariationPipeline.from_pretrained(\"shi-labs/versatile-diffusion\")\n\nimage = pipe(image).images[0]\nimage.save(\"versatile-diffusion-car_variation.png\")\n```\n<div align=\"center\">\n<img  width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209331434-51f6cdbd-b8e4-4faa-8e49-1cc852e35603.jpg\">\n<center>\u539f\u56fe\u50cf</center>\n<img  width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209331591-f6cc4cd8-8430-4627-8d22-bf404fb2bfdd.png\">\n<center>\u751f\u6210\u56fe\u50cf</center>\n</div>\n</details>\n\n\n\n\n\n### \u97f3\u9891\n<details>\n<summary>&emsp;\u65e0\u6761\u4ef6\u97f3\u9891\u751f\u6210\uff08Unconditional Audio Generation\uff09</summary>\n\n#### unconditional_audio_generation-audio_diffusion\n\n```python\nfrom scipy.io.wavfile import write\nfrom ppdiffusers import AudioDiffusionPipeline\nimport paddle\n\n# \u52a0\u8f7d\u6a21\u578b\u548cscheduler\npipe = AudioDiffusionPipeline.from_pretrained(\"teticio/audio-diffusion-ddim-256\")\npipe.set_progress_bar_config(disable=None)\ngenerator = paddle.Generator().manual_seed(42)\n\noutput = pipe(generator=generator)\naudio = output.audios[0]\nimage = output.images[0]\n\n# \u4fdd\u5b58\u97f3\u9891\u5230\u672c\u5730\nfor i, audio in enumerate(audio):\n    write(f\"audio_diffusion_test{i}.wav\", pipe.mel.config.sample_rate, audio.transpose())\n\n# \u4fdd\u5b58\u56fe\u7247\nimage.save(\"audio_diffusion_test.png\")\n```\n<div align = \"center\">\n  <thead>\n  </thead>\n  <tbody>\n   <tr>\n      <td align = \"center\">\n      <a href=\"https://paddlenlp.bj.bcebos.com/models/community/teticio/data/audio_diffusion_test0.wav\" rel=\"nofollow\">\n            <img align=\"center\" src=\"https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png\" width=\"200 style=\"max-width: 100%;\"></a><br>\n      </td>\n    </tr>\n  </tbody>\n</div>\n\n<div align=\"center\">\n<img  width=\"300\" alt=\"image\" src=\"https://user-images.githubusercontent.com/20476674/209342125-93e8715e-895b-4115-9e1e-e65c6c2cd95a.png\">\n</div>\n\n\n#### unconditional_audio_generation-spectrogram_diffusion\n\n```python\nimport paddle\nimport scipy\n\nfrom ppdiffusers import MidiProcessor, SpectrogramDiffusionPipeline\nfrom ppdiffusers.utils.download_utils import ppdiffusers_url_download\n\n# Download MIDI from: wget https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid\nmid_file_path = ppdiffusers_url_download(\n    \"https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid\", cache_dir=\".\"\n)\npipe = SpectrogramDiffusionPipeline.from_pretrained(\"google/music-spectrogram-diffusion\", paddle_dtype=paddle.float16)\nprocessor = MidiProcessor()\noutput = pipe(processor(mid_file_path))\naudio = output.audios[0]\n\noutput_path = \"unconditional_audio_generation-spectrogram_diffusion-result-beethoven_hammerklavier_2.wav\"\n# save the audio sample as a .wav file\nscipy.io.wavfile.write(output_path, rate=16000, data=audio)\n```\n<div align = \"center\">\n  <thead>\n  </thead>\n  <tbody>\n   <tr>\n      <td align = \"center\">\n      <a href=\"https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/beethoven_hammerklavier_2.wav\" rel=\"nofollow\">\n            <img align=\"center\" src=\"https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png\" width=\"200 style=\"max-width: 100%;\"></a><br>\n      </td>\n    </tr>\n  </tbody>\n</div>\n</details>\n\n\n\n## License\nPPDiffusers \u9075\u5faa [Apache-2.0\u5f00\u6e90\u534f\u8bae](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/LICENSE)\u3002\n\nStable Diffusion \u9075\u5faa [The CreativeML OpenRAIL M \u5f00\u6e90\u534f\u8bae](https://huggingface.co/spaces/CompVis/stable-diffusion-license)\u3002\n> The CreativeML OpenRAIL M is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which this license is based.\n\n## Acknowledge\n\u6211\u4eec\u501f\u9274\u4e86\ud83e\udd17 Hugging Face\u7684[Diffusers](https://github.com/huggingface/diffusers)\u5173\u4e8e\u9884\u8bad\u7ec3\u6269\u6563\u6a21\u578b\u4f7f\u7528\u7684\u4f18\u79c0\u8bbe\u8ba1\uff0c\u5728\u6b64\u5bf9Hugging Face\u4f5c\u8005\u53ca\u5176\u5f00\u6e90\u793e\u533a\u8868\u793a\u611f\u8c22\u3002\n\n## Citation\n\n```bibtex\n@misc{ppdiffusers,\n  author = {PaddlePaddle Authors},\n  title = {PPDiffusers: State-of-the-art diffusion model toolkit based on PaddlePaddle},\n  year = {2022},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers}}\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "PPDiffusers: Diffusers toolbox implemented based on PaddlePaddle",
    "version": "0.24.0",
    "project_urls": {
        "Homepage": "https://github.com/PaddlePaddle/PaddleMIX/ppdiffusers"
    },
    "split_keywords": [
        "ppdiffusers",
        " paddle",
        " paddlemix"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "928e3559a40532febdfc32b015d60054427164ee64e6bc186bea9f27510f8d2a",
                "md5": "11e6b9339a33fc04741a52f17447e65a",
                "sha256": "488ca2b3cf824eb7782c22146f02d4f50fb2451053d7f1a857366e0b2361c1e6"
            },
            "downloads": -1,
            "filename": "ppdiffusers-0.24.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11e6b9339a33fc04741a52f17447e65a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 2717537,
            "upload_time": "2024-04-18T03:44:12",
            "upload_time_iso_8601": "2024-04-18T03:44:12.254423Z",
            "url": "https://files.pythonhosted.org/packages/92/8e/3559a40532febdfc32b015d60054427164ee64e6bc186bea9f27510f8d2a/ppdiffusers-0.24.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13b57ea2119bf9eea4d570a47927dc5e4cf1de9d49d2d9b88944045bbc91416e",
                "md5": "92f9608be8b13e3861bb9b58bf78a441",
                "sha256": "78ee20a955536ee026d272fdeb5e32b94a8ee8c27a9868ec1e543a01bd93d9c6"
            },
            "downloads": -1,
            "filename": "ppdiffusers-0.24.0.tar.gz",
            "has_sig": false,
            "md5_digest": "92f9608be8b13e3861bb9b58bf78a441",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 1888275,
            "upload_time": "2024-04-18T03:44:17",
            "upload_time_iso_8601": "2024-04-18T03:44:17.599162Z",
            "url": "https://files.pythonhosted.org/packages/13/b5/7ea2119bf9eea4d570a47927dc5e4cf1de9d49d2d9b88944045bbc91416e/ppdiffusers-0.24.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-18 03:44:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PaddlePaddle",
    "github_project": "PaddleMIX",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "ppdiffusers"
}
        
Elapsed time: 0.24052s