🚀 🎨 LeX-Enhancer: 提示词增强器
LeX-Enhancer 是一个轻量级的 提示词增强模型,它从 DeepSeek-R1 中蒸馏而来。我们收集了 60,856 对提示词(DeepSeek-R1 增强前后的提示词),并使用 LoRA 对 DeepSeek-R1-Distilled-Qwen-14B 模型进行微调,以重现高质量、可视化丰富的提示词输出。
这使得 高效、大规模地生成提示词增强 成为可能,非常适合高保真的文本到图像生成。

🚀 快速开始
基础信息
属性 |
详情 |
基础模型 |
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
数据集 |
X-ART/LeX-R1-60K |
库名称 |
transformers |
许可证 |
mit |
任务类型 |
text-to-image |
标签 |
art、text-rendering |
示例:从简单提示词到增强提示词
🧾 输入(简单提示词):
一张写有“非常感谢”字样的感谢卡,上面的文字“VERY”是黑色的,“MUCH”是黄色的。
🧾 输出(增强提示词):
图像中是一张长方形的感谢卡,背景采用柔和的暖色调,由米色和淡粉色渐变而成,表面有细腻的纹理,为画面增添了层次感,但又不会喧宾夺主。卡片顶部中央,“VERY”一词用粗体无衬线黑色大写字母呈现,字母微微弯曲,与整体构图自然融合。正下方的“MUCH”则以鲜艳的明黄色显示,字体与“VERY”相同,但稍大一些,与“VERY”的下边缘整齐对齐,形成了平衡的视觉层次。两个单词都带有淡淡的阴影,在柔和的光线背景下更加清晰易读。卡片边缘整齐,有与背景色调相近的浅色简约边框,整体布局对称,散发着精致、现代的美感。柔和的环境光在卡片角落投下柔和的漫射阴影,暗示着下方有轻微纹理的表面。由于没有过多的装饰,画面焦点完全集中在文字上。整个调色板将温暖与中性色调完美融合,既突出了文字,又营造出宁静、亲切的氛围。
💻 使用示例
基础用法
import torch, os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
SYSTEM_TEMPLATE = (
"A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
"The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
"The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
"<think> reasoning process here </think> <answer> answer here </answer>."
)
model_path = 'X-ART/LeX-Enhancer'
simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."
def create_chat_template(user_prompt):
return [
{"role": "system", "content": SYSTEM_TEMPLATE},
{"role": "user", "content": user_prompt},
{"role": "assistant", "content": "<think>"}
]
def create_direct_template(user_prompt):
return user_prompt + "<think>"
def create_user_prompt(simple_caption):
return (
"Below is the simple caption of an image with text. Please deduce the detailed description of the image based on this simple caption. "
"Note: 1. The description should only include visual elements and should not contain any extended meanings. "
"2. The visual elements should be as rich as possible, such as the main objects in the image, their respective attributes, "
"the spatial relationships between the objects, lighting and shadows, color style, any text in the image and its style, etc. "
"3. The output description should be a single paragraph and should not be structured. "
"4. The description should avoid certain situations, such as pure white or black backgrounds, blurry text, excessive rendering of text, "
"or harsh visual styles. "
"5. The detailed caption should be human readable and fluent. "
"6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
"7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
"8. The entire output should be limited to 200 words.
"
f"SIMPLE CAPTION: {simple_caption}"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
messages = create_direct_template(create_user_prompt(simple_caption))
input_ids = tokenizer.encode(messages, return_tensors="pt").to(model.device)
streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
output = model.generate(
input_ids,
max_length=2048,
num_return_sequences=1,
do_sample=True,
temperature=0.6,
repetition_penalty=1.1,
streamer=streamer
)
print("*" * 80)
相关链接
- Github 仓库:https://github.com/zhaoshitian/LeX-Art
引用信息
@article{zhao2025lexart,
title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
journal={arXiv preprint arXiv:2503.21749},
year={2025}
}