license: apache-2.0
library_name: diffusers
datasets:
- VisualCloze/Graph200K
base_model:
- black-forest-labs/FLUX.1-Fill-dev
pipeline_tag: image-to-image
tags:
- text-to-image
- image-to-image
- flux
- lora
- in-context-learning
- universal-image-generation
- ai-tools
VisualCloze:基于视觉上下文学习的通用图像生成框架(基于Diffusers实现)

如果您认为VisualCloze对您有帮助,请考虑为Github仓库点亮⭐星标。感谢!
📰 最新动态
🌠 核心特性
基于上下文学习的通用图像生成框架:
- 支持多种领域内任务
- 通过上下文学习泛化至未见任务
- 将多任务统一为单步生成,可同时输出目标图像与中间结果
- 支持从目标图像反推生成条件集
🔥 更多示例详见项目主页。
🔧 安装指南
推荐安装官方 diffusers:
pip install git+https://github.com/huggingface/diffusers.git
💻 Diffusers使用指南

我们发布了分辨率
为512的模型(见模型卡片),而本模型采用384分辨率
。分辨率
参数用于在拼接前调整图像尺寸以避免内存溢出。生成高分辨率图像时,我们使用SDEdit技术对结果进行超采样。
深度图转图像示例:
import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image
image_paths = [
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2_depth-anything-v2_Large.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2.jpg'),
],
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/79f2ee632f1be3ad64210a641c4e201b/79f2ee632f1be3ad64210a641c4e201b_depth-anything-v2_Large.jpg'),
None,
],
]
task_prompt = "每行描述从[IMAGE1]带物体轮廓的灰度深度图到生成[IMAGE2]完美清晰图像的逻辑过程"
content_prompt = """一位沉静的长发年轻女性肖像,身着金线刺绣米色长裙,立于柔光房间中央。她手捧装在黑盒中的淡粉色玫瑰大花束,左侧背景有高大绿植,右侧墙面悬挂装饰画。左侧窗户透入自然光,女子低垂眼眸凝视花束,面容安详。柔和自然光,暖色调,高对比度,照片级真实感,优雅私密,视觉平衡,静谧氛围。"""
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")
image_result = pipe(
task_prompt=task_prompt,
content_prompt=content_prompt,
image=image_paths,
upsampling_width=1024,
upsampling_height=1024,
upsampling_strength=0.4,
guidance_scale=30,
num_inference_steps=30,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]
image_result.save("visualcloze.png")
虚拟试衣示例:
import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image
image_paths = [
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/03673_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
],
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00555_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/12265_00.jpg'),
None
],
]
task_prompt = "每行展示将[IMAGE2]衣物虚拟穿戴到[IMAGE1]人物身上,生成[IMAGE3]穿着效果图的流程"
content_prompt = None
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")
image_result = pipe(
task_prompt=task_prompt,
content_prompt=content_prompt,
image=image_paths,
upsampling_height=1632,
upsampling_width=1232,
upsampling_strength=0.3,
guidance_scale=30,
num_inference_steps=30,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]
image_result.save("visualcloze.png")
引用
如果您在研究中使用了VisualCloze,请引用以下文献:
@article{li2025visualcloze,
title={VisualCloze:基于视觉上下文学习的通用图像生成框架},
author={李忠宇 and 杜若易 and 严俊成 and 卓乐 and 李振 and 高鹏 and 马占宇 and 程明明},
journal={arXiv预印本 arXiv:2504.07960},
year={2025}
}