标签:
- 文本到图像
- 稳定扩散
- LoRA
- 扩散模型
- 模板:sd-lora
基础模型: stabilityai/stable-diffusion-xl-base-1.0
许可证: cc-by-nc-nd-4.0
推理: 否
⚡ 闪电扩散: FlashSDXL ⚡
闪电扩散是一种扩散蒸馏方法,由Jasper Research的Clément Chadebec、Onur Tasar、Eyal Benaroche和Benjamin Aubin在论文《闪电扩散:加速任何条件扩散模型实现少步图像生成》中提出。
该模型是SDXL的1.08亿参数LoRA蒸馏版本,能够在4步内生成图像,旨在复现论文的核心成果。
查看我们的在线演示和官方Github仓库。
使用方法
可直接通过diffusers
库的DiffusionPipeline
使用该模型,将采样步骤缩减至4步。
from diffusers import DiffusionPipeline, LCMScheduler
adapter_id = "jasperai/flash-sdxl"
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
use_safetensors=True,
)
pipe.scheduler = LCMScheduler.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
subfolder="scheduler",
timestep_spacing="trailing",
)
pipe.to("cuda")
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
prompt = "一只浣熊在茂密森林中读书。"
image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
在ComfyUI中使用
本地使用ComfyUI运行FlashSDXL需执行以下步骤:
- 确保ComfyUI为最新版本
- 从Hugging Face下载模型文件(进入"Files and Version"页面,找到
comfy/
文件夹,点击FlashSDXL.safetensors
旁的下载按钮)
- 将模型文件移至本地
comfyUI/models/loras/
目录
- 作为LoRA加载至
sd_xl_base_1.0_0.9vae.safetensors
基础模型上,本仓库提供简易ComfyUI工作流文件(位于comfy/
文件夹)
注意:模型默认配置为cfg scale=1和LCM调度器,但参数可微调。
与现有LoRA结合 🎨
FlashSDXL可与现有LoRA无需训练结合实现少步生成。以下为Hugging Face管道集成示例:
from diffusers import DiffusionPipeline, LCMScheduler
import torch
user_lora_id = "TheLastBen/Papercut_SDXL"
触发词 = "papercut"
flash_lora_id = "jasperai/flash-sdxl"
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16"
)
pipe.scheduler = LCMScheduler.from_config(
pipe.scheduler.config
)
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")
pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)
prompt = f"{触发词} 可爱的柯基犬"
image = pipe(
prompt,
num_inference_steps=4,
guidance_scale=0
).images[0]
提示 💡:也可通过提供的Comfy工作流加载额外LoRA进行本地测试。
与ControlNet结合 🎨
FlashSDXL还能与ControlNet结合实现少步生成。以下是Hugging Face管道集成示例:
import torch
import cv2
import numpy as np
from PIL import Image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
flash_lora_id = "jasperai/flash-sdxl"
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None,
variant="fp16"
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()
image = pipe(
"蒙娜丽莎画像",
image=canny_image,
num_inference_steps=4,
guidance_scale=0,
controlnet_conditioning_scale=0.5,
cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
训练详情
模型在4块H100 GPU上训练了2万次迭代(约176 GPU小时)。具体参数详见论文。
COCO 2014验证集指标(表3)
- FID-10k: 21.62(4步)
- CLIP分数: 0.327(4步)
引用
若此工作对您有帮助,请考虑引用:
@misc{chadebec2024flash,
title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation},
author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
year={2024},
eprint={2406.02347},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
许可
本模型采用知识共享BY-NC协议发布。