Bpe-vocab-n-OCR开源文本提取工具 - 免费部署生成结构化分词输出

首页

Bpe Vocab N OCR

由 prithivMLmods 开发

Bpe-vocab-n-OCR 是一款基于OCR的高级文本提取工具，专为生成结构化、分词的输出而优化。

图像生成文本

Transformers

支持多种语言开源协议:Apache-2.0 #结构化OCR #多语言分词 #图像转文本

下载量 76

发布时间 : 2/18/2025

模型简介

该工具基于强大的视觉-语言架构，具备增强的OCR和多语言支持能力，能够从图像中准确提取文本，并以逗号分隔的序列形式返回。

模型特点

先进的OCR引擎

通过大量数据集微调，确保精准的文本识别和分词。

优化分词输出

生成结构化的逗号分隔文本，非常适合下游NLP任务、自动化流程和数据库集成。

增强的多语言OCR支持

支持多种语言的文本提取，包括英语、中文、日语、韩语、阿拉伯语等。

多模态处理

无缝处理图像和文本输入，提供结构化的分词输出。

安全且优化的模型权重

使用safetensors实现高效、安全的模型加载。

模型能力

文本提取

图像分析

多语言支持

结构化输出

使用案例

自动化流程

文档处理

从扫描文档中提取文本并生成结构化数据。

提高文档处理效率，减少人工干预。

数据库集成

数据录入

将图像中的文本转换为结构化数据并录入数据库。

简化数据录入流程，提高准确性。

🚀 Bpe-vocab-n-OCR

Bpe-vocab-n-OCR是一款先进的基于OCR的文本提取工具，优化后可生成结构化的分词输出。它基于强大的视觉语言架构，具备增强的OCR和多语言支持能力，能准确从图像中提取文本，并以逗号分隔的序列形式返回。

🚀 快速开始

代码示例

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# Load the Bpe-vocab-n-OCR model with optimized parameters
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Tokenized-OCR", torch_dtype="auto", device_map="auto"
)

# Recommended acceleration for performance optimization:
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "prithivMLmods/Tokenized-OCR",
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# Load the default processor for Bpe-vocab-n-OCR
processor = AutoProcessor.from_pretrained("prithivMLmods/Tokenized-OCR")

# Define the input messages with both an image and a text prompt
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://flux-generated.com/sample_image.jpeg",
            },
            {"type": "text", "text": "Extract and return the tokenized OCR text from the image, ensuring each word is accurately recognized and separated by commas."},
        ],
    }
]

# Prepare the input for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Generate the output
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)