LeX-Enhancer开源轻量级提示增强模型 - 免费生成文本转图像高质量提示

首页

Lex Enhancer

由 X-ART 开发

LeX-Enhancer 是一个轻量级的提示增强模型，基于 DeepSeek-R1 蒸馏而来，用于生成高质量、视觉丰富的提示输出，适合文本到图像生成。

大型语言模型

Transformers

其他开源协议:MIT #文本到图像提示增强 #LoRA微调 #高保真视觉描述

下载量 160

发布时间 : 3/10/2025

模型简介

LeX-Enhancer 是一个提示增强模型，能够将简单的标题转换为详细的视觉描述，适用于高保真的文本到图像生成任务。

模型特点

高效提示增强

能够将简单的标题转换为详细的视觉描述，提升文本到图像生成的质量。

轻量级模型

基于 LoRA 微调，模型轻量且高效，适合大规模生成。

高质量输出

生成的提示描述视觉丰富，细节详尽，适合高保真图像生成。

模型能力

文本生成

提示增强

视觉描述生成

使用案例

文本到图像生成

感谢卡设计

将简单的感谢卡标题转换为详细的视觉描述，用于生成高质量的图像。

生成的描述包含丰富的视觉元素，如背景色调、文字样式、光影效果等。

艺术创作

艺术提示生成

为艺术家提供详细的视觉描述，辅助创作过程。

生成的提示描述具有艺术性和视觉吸引力。

🚀 🎨 LeX-Enhancer: 提示词增强器

LeX-Enhancer 是一个轻量级的 提示词增强模型，它从 DeepSeek-R1 中蒸馏而来。我们收集了 60,856 对提示词（DeepSeek-R1 增强前后的提示词），并使用 LoRA 对 DeepSeek-R1-Distilled-Qwen-14B 模型进行微调，以重现高质量、可视化丰富的提示词输出。

这使得 高效、大规模地生成提示词增强 成为可能，非常适合高保真的文本到图像生成。

🚀 快速开始

基础信息

属性	详情
基础模型	deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
数据集	X-ART/LeX-R1-60K
库名称	transformers
许可证	mit
任务类型	text-to-image
标签	art、text-rendering

示例：从简单提示词到增强提示词

🧾 输入（简单提示词）：
一张写有“非常感谢”字样的感谢卡，上面的文字“VERY”是黑色的，“MUCH”是黄色的。

🧾 输出（增强提示词）：
图像中是一张长方形的感谢卡，背景采用柔和的暖色调，由米色和淡粉色渐变而成，表面有细腻的纹理，为画面增添了层次感，但又不会喧宾夺主。卡片顶部中央，“VERY”一词用粗体无衬线黑色大写字母呈现，字母微微弯曲，与整体构图自然融合。正下方的“MUCH”则以鲜艳的明黄色显示，字体与“VERY”相同，但稍大一些，与“VERY”的下边缘整齐对齐，形成了平衡的视觉层次。两个单词都带有淡淡的阴影，在柔和的光线背景下更加清晰易读。卡片边缘整齐，有与背景色调相近的浅色简约边框，整体布局对称，散发着精致、现代的美感。柔和的环境光在卡片角落投下柔和的漫射阴影，暗示着下方有轻微纹理的表面。由于没有过多的装饰，画面焦点完全集中在文字上。整个调色板将温暖与中性色调完美融合，既突出了文字，又营造出宁静、亲切的氛围。

💻 使用示例

基础用法

import torch, os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

# System instruction for reasoning + answering
SYSTEM_TEMPLATE = (
    "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
    "The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
    "The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
    "<think> reasoning process here </think> <answer> answer here </answer>."
)

model_path = 'X-ART/LeX-Enhancer'

# Your simple caption goes here
simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."

def create_chat_template(user_prompt):
    return [
        {"role": "system", "content": SYSTEM_TEMPLATE},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": "<think>"}
    ]

def create_direct_template(user_prompt):
    return user_prompt + "<think>"

def create_user_prompt(simple_caption):
    return (
        "Below is the simple caption of an image with text. Please deduce the detailed description of the image based on this simple caption. "
        "Note: 1. The description should only include visual elements and should not contain any extended meanings. "
        "2. The visual elements should be as rich as possible, such as the main objects in the image, their respective attributes, "
        "the spatial relationships between the objects, lighting and shadows, color style, any text in the image and its style, etc. "
        "3. The output description should be a single paragraph and should not be structured. "
        "4. The description should avoid certain situations, such as pure white or black backgrounds, blurry text, excessive rendering of text, "
        "or harsh visual styles. "
        "5. The detailed caption should be human readable and fluent. "
        "6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
        "7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
        "8. The entire output should be limited to 200 words.

"
        f"SIMPLE CAPTION: {simple_caption}"
    )

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)

# Prepare input prompt
messages = create_direct_template(create_user_prompt(simple_caption))
input_ids = tokenizer.encode(messages, return_tensors="pt").to(model.device)

# Stream output
streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
output = model.generate(
    input_ids,
    max_length=2048,
    num_return_sequences=1,
    do_sample=True,
    temperature=0.6,
    repetition_penalty=1.1,
    streamer=streamer
)

print("*" * 80)
# Output will stream via TextStreamer

引用信息

@article{zhao2025lexart,
    title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
    author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
    journal={arXiv preprint arXiv:2503.21749},
    year={2025}
}