CogView4-6B-Edit-LoRA-v0开源图像编辑模型 - 按文本指令风格转换与内容编辑

首页

Cogview4 6B Edit LoRA V0

由 finetrainers 开发

基于THUDM/CogView4-6B模型的图像编辑微调控制LoRA，支持通过文本指令对图像进行风格转换和内容编辑

文本生成图像 #图像风格编辑 #LoRA微调 #多条件控制

下载量 20

发布时间 : 4/6/2025

模型简介

这是一个实验性的图像编辑微调模型，通过LoRA技术对CogView4-6B进行适配，使其能够根据文本指令修改图像风格和内容。

模型特点

文本引导的图像编辑

通过自然语言指令实现对图像的风格转换和内容编辑

LoRA微调技术

采用低秩适应(LoRA)技术对大型基础模型进行高效微调

多风格支持

支持厚涂绘画、季节变换、太空场景等多种风格转换

模型能力

文本到图像生成

图像风格转换

图像内容编辑

基于文本提示的图像修改

使用案例

创意设计

艺术风格转换

将普通图像转换为特定艺术风格(如厚涂绘画、古埃及壁画等)

output1.png

场景变换

改变图像中的季节或环境(如春季开满花的树木、暴风雨太空等)

output2.png, output3.png

内容创作

概念设计

快速生成不同风格的概念艺术图像

🚀 图像微调控制LoRA项目

本项目是一个控制LoRA（Low-Rank Adaptation），用于借助 THUDM/CogView4-6B 模型对图像进行细微编辑。它能够根据不同的文本提示，对图像进行风格转换等操作，为图像编辑提供了新的可能性。

🚀 快速开始

基础信息

属性	详情
基础模型	THUDM/CogView4-6B
训练数据集	sayapaul/OmniEdit-mini
库名称	diffusers

项目链接

代码仓库：https://github.com/a-r-r-o-w/finetrainers

重要提示

⚠️ 重要提示

这是一个实验性的检查点，其泛化能力较差是已知的情况。

推理代码

💻 使用示例

基础用法

# For now, must use this branch of finetrainers: https://github.com/a-r-r-o-w/finetrainers/blob/f3e27cc39a2bc804cb373ea15522576e57f46d23/finetrainers/models/cogview4/control_specification.py

import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat

dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)

in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)

load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)

prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024

with torch.no_grad():
    latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
    control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
    control_image = control_image.to(device=device, dtype=dtype)
    control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
    control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor

with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
    image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]

image.save("output.png")

示例效果

以下是一些示例文本提示及其对应的输出图像：

提示文本：Change it to look like it's in the style of an impasto painting.
- 输出图像：output1.png
提示文本：change the setting to spring with blooming trees
- 输出图像：output2.png
提示文本：transform the setting to a stormy space
- 输出图像：output3.png