controlnet-canny-sdxl-1.0开源图像生成模型 - 借边缘检测精准生成高画质图

首页

Controlnet Canny Sdxl 1.0

由 xinsir 开发

一款强大的控制网络模型，能够生成视觉效果媲美Midjourney的高分辨率图像，通过Canny边缘检测实现精准控制。

图像生成开源协议:Apache-2.0 #高分辨率图像生成 #Canny边缘控制 #Midjourney级画质

下载量 25.79k

发布时间 : 5/10/2024

模型简介

该模型基于Stable Diffusion XL 1.0微调，专注于文本生成图像任务，特别擅长通过Canny边缘图控制生成细节丰富的高质量图像。

模型特点

高质量生成

通过超过1000万张精选图像训练，生成效果媲美Midjourney级别

精准控制

采用Canny边缘检测实现构图控制，支持复杂场景生成

多风格适配

支持真实照片和动漫风格（需切换基础模型）

先进训练技术

采用数据增强、多重损失和多分辨率训练等技巧优化模型性能

模型能力

基于文本生成图像

通过边缘图控制构图

高分辨率图像生成

多风格图像生成

使用案例

艺术创作

概念艺术设计

根据线稿生成完整艺术概念图

可生成复杂华丽的艺术构图（如示例中的亡灵节主题）

插画创作

将简单线稿转化为完整插画

支持水彩、油画等多种艺术风格（如示例中的沃特豪斯风格）

商业设计

产品展示

生成产品宣传图

可生成专业级美食摄影（如示例中的披萨图）

广告设计

快速生成广告概念图

支持节日主题等商业场景（如示例中的星星背景图）

🚀 控制网络 - 边缘检测 - SDXL - 1.0

本模型是一款强大的ControlNet，可生成视觉效果与Midjourney相媲美的高分辨率图像，推动了稳定扩散模型的应用发展。

images

🚀 快速开始

你可以使用以下代码来启动模型：

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from PIL import Image
import torch
import numpy as np
import cv2

def HWC3(x):
    assert x.dtype == np.uint8
    if x.ndim == 2:
        x = x[:, :, None]
    assert x.ndim == 3
    H, W, C = x.shape
    assert C == 1 or C == 3 or C == 4
    if C == 3:
        return x
    if C == 1:
        return np.concatenate([x, x, x], axis=2)
    if C == 4:
        color = x[:, :, 0:3].astype(np.float32)
        alpha = x[:, :, 3:4].astype(np.float32) / 255.0
        y = color * alpha + 255.0 * (1.0 - alpha)
        y = y.clip(0, 255).astype(np.uint8)
        return y

controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'

eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")

controlnet = ControlNetModel.from_pretrained(
    "xinsir/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# need to resize the image resolution to 1024 * 1024 or same bucket resolution to get the best performance

controlnet_img = cv2.imread("your image path")
height, width, _  = controlnet_img.shape
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = cv2.resize(controlnet_img, (new_width, new_height))

controlnet_img = cv2.Canny(controlnet_img, 100, 200)
controlnet_img = HWC3(controlnet_img)
controlnet_img = Image.fromarray(controlnet_img)

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

✨ 主要特性

高质量图像生成：基于大量高质量数据（超过10000000张图像）训练，能够生成高分辨率图像，视觉效果可与Midjourney相媲美。
训练技巧丰富：训练过程中采用了数据增强、多损失和多分辨率等实用技巧，仅经过1阶段训练，性能就超越了其他开源的Canny模型。
应用广泛：作为ControlNet系列中重要的模型之一，可应用于许多与绘画和设计相关的工作。

📚 详细文档

模型详情

模型描述

开发者：xinsir
模型类型：ControlNet_SDXL
许可证：apache - 2.0
微调基础模型（可选）：stabilityai/stable-diffusion-xl-base-1.0

模型来源（可选）

论文（可选）：https://arxiv.org/abs/2302.05543

用途

示例

提示词：A closeup of two day of the dead models, looking to the side, large flowered headdress, full dia de Los muertoe make up, lush red lips, butterflies, flowers, pastel colors, looking to the side, jungle, birds, color harmony , extremely detailed, intricate, ornate, motion, stunning, beautiful, unique, soft lighting
提示词：ghost with a plague doctor mask in a venice carnaval hyper realistic
提示词：A picture surrounded by blue stars and gold stars, glowing, dark navy blue and gray tones, distributed in light silver and gold, playful, festive atmosphere, pure fabric, chalk, FHD 8K
提示词：Delicious vegetarian pizza with champignon mushrooms, tomatoes, mozzarella, peppers and black olives, isolated on white background , transparent isolated white background , top down view, studio photo, transparent png, Clean sharp focus. High end retouching. Food magazine photography. Award winning photography. Advertising photography. Commercial photography
提示词：a blonde woman in a wedding dress in a maple forest in summer with a flower crown laurel. Watercolor painting in the style of John William Waterhouse. Romanticism. Ethereal light.

动漫示例（注意：需要将基础模型更改为CounterfeitXL，其他保持不变）

images_5) images_6) images_7) images_8) images_9)

评估指标

Laion美学评分 [https://laion.ai/blog/laion-aesthetics/]
感知相似度 [https://github.com/richzhang/PerceptualSimilarity]

评估数据

测试数据从Midjourney的放大图像中随机抽取，并带有提示词。因为该项目的目的是让人们能够像Midjourney一样绘制图像，而Midjourney的用户包括大量专业设计师，其放大图像往往具有更高的美感评分和提示词一致性，所以适合将其用作测试集来评判ControlNet的能力。我们随机选择了300个提示词 - 图像对，每个提示词生成4张图像，总共生成了1200张图像。我们计算Laion美学评分来衡量图像的美感，计算感知相似度来衡量控制能力，发现图像质量与指标值具有良好的一致性。我们将我们的方法与其他SOTA的Hugging Face模型进行了比较，并将结果列在下面。我们的模型具有最高的美学评分，如果正确使用提示词，可以生成视觉上吸引人的图像。

定量结果

指标	xinsir/controlnet-canny-sdxl-1.0	diffusers/controlnet-canny-sdxl-1.0	TheMistoAI/MistoLine
laion_aesthetic	6.03	5.93	5.82
perceptual similarity	0.4200	0.5053	0.5387

laion_aesthetic（越高越好）
perceptual similarity（越低越好）

注意：这些值是在保存为webp格式时计算的，如果保存为png格式，美学值将增加0.1 - 0.3，但相对关系保持不变。

训练细节

该模型使用高质量数据进行训练，仅进行了1阶段训练，分辨率设置与sdxl - base相同，为1024 * 1024。我们使用随机阈值来生成Canny图像，就像张路明一样，找到合适的超参数来实现数据增强至关重要，太简单或太难都会影响模型性能。此外，我们使用随机掩码来随机掩盖Canny图像的随机百分比，以迫使模型学习提示词和线条之间更多的语义含义。我们使用了超过10000000张经过精心注释的图像，事实证明cogvlm是一个强大的图像描述模型[https://github.com/THUDM/CogVLM?tab=readme-ov-file]。对于漫画图像，建议使用waifu tagger来生成特殊标签[https://huggingface.co/spaces/SmilingWolf/wd-tagger]。训练模型使用了超过64个A100，使用累积梯度批次时的实际批量大小为2560。

训练数据

数据来自多个来源，包括Midjourney、laion 5B、danbooru等。数据经过了精心筛选和注释。

结论

在我们的评估中，与stabilityai/stable-diffusion-xl-base-1.0相比，该模型在真实图像上获得了更好的美学评分，在卡通风格图像上表现相当。由于采用了更强的数据增强和更多的训练步骤，该模型在使用感知相似度进行测试时，控制能力更好。此外，该模型生成异常图像（往往包含一些异常人体结构）的概率较低。