许可证: mit
标签:
- 稳定扩散
- 稳定扩散-扩散器
- 图像到图像
- 艺术
小部件:
- 来源: >-
https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png
提示: 将以下图像卡通化
数据集:
- instruction-tuning-sd/cartoonization
指令调优的稳定扩散卡通化模型(微调版)
此流程是基于Stable Diffusion (v1.5)的“指令调优”版本。它是从现有的InstructPix2Pix检查点微调而来。
流程描述
该流程的动机部分来自FLAN,部分来自InstructPix2Pix。主要思路是首先创建一个指令提示的数据集(如我们的博客所述),然后进行InstructPix2Pix风格的训练。最终目标是使Stable Diffusion更好地遵循涉及图像变换操作的特定指令。
了解更多请关注这篇文章。
训练过程与结果
训练是在instruction-tuning-sd/cartoonization数据集上进行的。详情请参考此仓库。训练日志可在此处查看。
以下是该流程生成的一些结果示例:
预期用途与限制
您可以使用该流程对输入图像和提示进行卡通化处理。
使用方法
以下是使用该模型的方法:
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
from diffusers.utils import load_image
model_id = "instruction-tuning-sd/cartoonizer"
pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
model_id, torch_dtype=torch.float16, use_auth_token=True
).to("cuda")
image_path = "https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
image = load_image(image_path)
image = pipeline("将以下图像卡通化", image=image).images[0]
image.save("image.png")
关于限制、误用、恶意使用及超出范围的使用,请参考此处的模型卡片。
引用
FLAN
@inproceedings{
wei2022finetuned,
title={Finetuned Language Models are Zero-Shot Learners},
author={Jason Wei and Maarten Bosma and Vincent Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V Le},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=gEZrGCozdqR}
}
InstructPix2Pix
@InProceedings{
brooks2022instructpix2pix,
author = {Brooks, Tim and Holynski, Aleksander and Efros, Alexei A.},
title = {InstructPix2Pix: Learning to Follow Image Editing Instructions},
booktitle = {CVPR},
year = {2023},
}
Stable Diffusion指令调优博客
@article{
Paul2023instruction-tuning-sd,
author = {Paul, Sayak},
title = {Instruction-tuning Stable Diffusion with InstructPix2Pix},
journal = {Hugging Face Blog},
year = {2023},
note = {https://huggingface.co/blog/instruction-tuning-sd},
}