🚀 🖼️🔗 图表转知识图谱模型
本模型是一个研究驱动的项目,由 Zackariya Solution 实习期间开发。它专注于从图像中提取结构化数据(JSON),特别是节点、边及其子属性,将视觉信息表示为知识图谱。
🚀 注意:本模型仅用于学习目的,不用于生产应用。提取的结构化数据可能会根据项目需求有所不同。
🚀 快速开始
安装依赖
%pip install -q "transformers>=4.49.0" accelerate datasets "qwen-vl-utils[decord]==0.0.8"
运行推理代码
import os
import PIL
import torch
from qwen_vl_utils import process_vision_info
from transformers import Qwen2_5_VLForConditionalGeneration, Qwen2_5_VLProcessor
MODEL_ID="zackriya/diagram2graph-adapters"
MAX_PIXELS = 1280 * 28 * 28
MIN_PIXELS = 256 * 28 * 28
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
MODEL_ID,
device_map="auto",
torch_dtype=torch.bfloat16
)
processor = Qwen2_5_VLProcessor.from_pretrained(
MODEL_ID,
min_pixels=MIN_PIXELS,
max_pixels=MAX_PIXELS
)
SYSTEM_MESSAGE = """You are a Vision Language Model specialized in extracting structured data from visual representations of process and flow diagrams.
Your task is to analyze the provided image of a diagram and extract the relevant information into a well-structured JSON format.
The diagram includes details such as nodes and edges. each of them have their own attributes.
Focus on identifying key data fields and ensuring the output adheres to the requested JSON structure.
Provide only the JSON output based on the extracted information. Avoid additional explanations or comments."""
def run_inference(image):
messages= [
{
"role": "system",
"content": [{"type": "text", "text": SYSTEM_MESSAGE}],
},
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{
"type": "text",
"text": "Extract data in JSON format, Only give the JSON",
},
],
},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
return_tensors="pt",
)
inputs = inputs.to('cuda')
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
out_ids[len(in_ids):]
for in_ids, out_ids
in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
return output_text
image = eval_dataset[9]['image']
output = run_inference(image)
import json
json.loads(output[0])
✨ 主要特性
- 专注于从图像中提取结构化数据(JSON),将视觉信息表示为知识图谱。
- 可用于图表转知识图谱的实验和理解AI驱动的图像结构化提取。
📋 模型详情
属性 |
详情 |
开发团队 |
Zackariya Solution 实习团队(Mohammed Safvan) |
微调基础模型 |
Qwen/Qwen2.5-VL-3B-Instruct |
许可证 |
Apache 2.0 |
语言 |
多语言(专注于结构化提取) |
模型类型 |
视觉语言Transformer(PEFT微调) |
🎯 使用场景
✅ 直接使用
- 进行图表转知识图谱的实验 📊
- 理解图像的AI驱动结构化提取
🚀 下游使用(潜在)
- 增强 BPMN/流程图 分析 🏗️
- 支持 自动化文档处理 📄
❌ 不适用场景
- 不适用于实际生产部署 ⚠️
- 可能无法在所有图表类型上很好地泛化
🏗️ 训练详情
- 数据集:内部整理的图表数据集 🖼️
- 微调方法:基于LoRA的优化 ⚡
- 精度:bf16混合精度训练 🎯
📈 评估
评估指标
- 指标:F1分数 🏆
- 局限性:可能在处理复杂、密集的图表时遇到困难 ⚠️
评估结果
样本 |
(基础)节点F1 |
(微调)节点F1 |
(基础)边F1 |
(微调)边F1 |
image_sample_1 |
0.46 |
1.0 |
0.59 |
0.71 |
image_sample_2 |
0.67 |
0.57 |
0.25 |
0.25 |
image_sample_3 |
1.0 |
1.0 |
0.25 |
0.75 |
image_sample_4 |
0.5 |
0.83 |
0.15 |
0.62 |
image_sample_5 |
0.72 |
0.78 |
0.0 |
0.48 |
image_sample_6 |
0.75 |
0.75 |
0.29 |
0.67 |
image_sample_7 |
0.6 |
1.0 |
1.0 |
1.0 |
image_sample_8 |
0.6 |
1.0 |
1.0 |
1.0 |
image_sample_9 |
1.0 |
1.0 |
0.55 |
0.77 |
image_sample_10 |
0.67 |
0.8 |
0.0 |
1.0 |
image_sample_11 |
0.8 |
0.8 |
0.5 |
1.0 |
image_sample_12 |
0.67 |
1.0 |
0.62 |
0.75 |
image_sample_13 |
1.0 |
1.0 |
0.73 |
0.67 |
image_sample_14 |
0.74 |
0.95 |
0.56 |
0.67 |
image_sample_15 |
0.86 |
0.71 |
0.67 |
0.67 |
image_sample_16 |
0.75 |
1.0 |
0.8 |
0.75 |
image_sample_17 |
0.8 |
1.0 |
0.63 |
0.73 |
image_sample_18 |
0.83 |
0.83 |
0.33 |
0.43 |
image_sample_19 |
0.75 |
0.8 |
0.06 |
0.22 |
image_sample_20 |
0.81 |
1.0 |
0.23 |
0.75 |
平均值 |
0.749 |
0.891 |
0.4605 |
0.6945 |
🤝 合作
如果您有兴趣针对自己的用例微调模型,或者想了解我们如何提供帮助,欢迎合作。
Zackriya Solutions
🔗 参考资料
🚀保持好奇,不断探索!🚀