library_name: transformers
tags: [OCR]
模型卡片:qwen-for-jawi-v1 模型
模型描述
该模型是基于 Qwen/Qwen2-VL-7B-Instruct 微调的版本,专门用于识别以爪夷文(阿拉伯字母书写的马来语)书写的历史马来文本的光学字符识别(OCR)。
模型架构
预期用途
主要用途
- 爪夷文书写的历史马来文献的OCR
- 马来文化遗产的数字化保存
- 支持历史马来文本的计算分析
非适用场景
- 通用阿拉伯文本识别
- 现代马来文本处理
- 实时OCR应用
训练数据
数据集描述
该模型的训练和评估使用了
训练过程
- 硬件配置: 1块H100 GPU
- 训练时长: 6小时
性能与局限性
性能指标
- 字符错误率(CER): 8.66
- 单词错误率(WER): 25.50
与其他模型的对比
我们对比了 https://github.com/VikParuchuri/surya 模型(该模型在阿拉伯语上表现优异,但在爪夷文数据上效果较差):
- 字符错误率(CER): 70.89%
- 单词错误率(WER): 91.73%
使用方法
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
from qwen_vl_utils import process_vision_info
from PIL import Image
model_name = 'mevsg/qwen-for-jawi-v1'
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map='auto'
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
image_path = 'path/to/image'
image = Image.open(image_path).convert('RGB')
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{"type": "text", "text": "Convert this image to text"},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
引用
@misc{qwen-for-jawi-v1,
title = {Qwen for Jawi v1: a model for Jawi OCR},
author = {[Miguel Escobar Varela]},
year = {2024},
publisher = {HuggingFace},
url = {[https://huggingface.co/mevsg/qwen-for-Jawi-v1]},
note = {Model created at National University of Singapore }
}
致谢
特别感谢 William Mattingly,其微调脚本为本项目的微调方法提供了基础:https://github.com/wjbmattingly/qwen2-vl-finetune-huggingface