DMD2开源图像合成模型 - 用改进方法实现快速高效图像生成

首页

DMD2

由 tianweiy 开发

DMD2是一种基于扩散模型的快速图像合成技术，通过改进的分布匹配蒸馏方法实现高效的图像生成。

图像生成 #4步快速图像生成 #1步超快合成 #扩散蒸馏技术

下载量 39.89k

发布时间 : 5/23/2024

模型简介

DMD2是一种基于稳定扩散XL模型的改进版本，专注于通过分布匹配蒸馏技术实现快速图像合成。该模型支持4步甚至1步推理生成高质量图像，显著提高了生成效率。

模型特点

快速图像合成

支持4步甚至1步推理即可生成高质量图像，显著提高生成效率

多种推理模式

提供UNet直接生成、LoRA生成和T2I适配器等多种使用方式

高质量输出

即使在大幅减少推理步数的情况下仍能保持较高的图像质量

兼容性强

可与Stable Diffusion XL基础模型和各种适配器配合使用

模型能力

文本生成图像

快速图像合成

图像风格转换

条件图像生成

使用案例

创意设计

概念艺术创作

快速生成各种风格的概念艺术图像

4步内生成高质量概念图

产品设计原型

为产品设计快速生成视觉原型

高效迭代设计概念

内容创作

社交媒体内容生成

快速生成社交媒体所需的视觉内容

高质量图像快速产出

🚀 DMD2模型卡片

DMD2模型基于改进的分布匹配蒸馏技术，可实现快速图像合成，在图像生成领域具有高效、优质的特点，能为相关研究和应用提供有力支持。

image/jpeg

Improved Distribution Matching Distillation for Fast Image Synthesis
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman

📞 联系方式

如果您对论文有任何疑问，请随时与我们联系！

Tianwei Yin tianweiy@mit.edu

💻 使用示例

基础用法

我们可以使用标准的扩散器管道：

4步UNet生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"

# LCMScheduler's default timesteps are different from the one we used for training 
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]

4步LoRA生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_lora_fp16.safetensors"
# Load model.
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=1.0)  # we might want to make the scale smaller for community models

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"

# LCMScheduler's default timesteps are different from the one we used for training 
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]

1步UNet生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_1step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[399]).images[0]

4步T2I适配器

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, AutoencoderKL, UNet2DConditionModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.canny import CannyDetector
from huggingface_hub import hf_hub_download
import torch

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    base_model_id, unet=unet, vae=vae, adapter=adapter, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()

canny_detector = CannyDetector()

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
image = load_image(url)

# Detect the canny map in low resolution to avoid high-frequency details
image = canny_detector(image, detect_resolution=384, image_resolution=1024)#.resize((1024, 1024))

prompt = "Mystical fairy in real, magic, 4k picture, high quality"

gen_images = pipe(
  prompt=prompt,
  image=image,
  num_inference_steps=4,
  guidance_scale=0, 
  adapter_conditioning_scale=0.8, 
  adapter_conditioning_factor=0.5,
  timesteps=[999, 749, 499, 249]
).images[0]
gen_images.save('out_canny.png')

更多信息，请参考代码仓库

📄 许可证

改进的分布匹配蒸馏技术遵循知识共享署名 - 非商业性使用 - 相同方式共享 4.0 国际许可协议。

📚 引用

如果您发现DMD2对您的研究有用或相关，请引用我们的论文：

@article{yin2024improved,
    title={Improved Distribution Matching Distillation for Fast Image Synthesis},
    author={Yin, Tianwei and Gharbi, Micha{\"e}l and Park, Taesung and Zhang, Richard and Shechtman, Eli and Durand, Fredo and Freeman, William T},
    journal={arXiv:2405.14867},
    year={2024}
}

@inproceedings{yin2024onestep,
    title={One-step Diffusion with Distribution Matching Distillation},
    author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung},
    booktitle={CVPR},
    year={2024}
}

🙏 致谢

这项工作是在Tianwei Yin作为麻省理工学院全日制学生期间完成的。它基于我们对原始DMD论文的重新实现而开发。这项工作得到了美国国家科学基金会合作协议PHY - 2019786（NSF人工智能与基础相互作用研究所，http://iaifi.org/）、NSF资助2105819、NSF CISE奖1955864以及谷歌、GIST、亚马逊和广达电脑的资助。