llama-7b-v1-Receipt-Key-Extraction开源模型 - 免费实现英语和阿拉伯语收据关键信息提取

首页

Llama 7b V1 Receipt Key Extraction

由 abdoelsayed 开发

基于LLamA v1的70亿参数模型，用于英语和阿拉伯语收据条目的关键信息提取

大型语言模型

Transformers

支持多种语言#收据关键信息提取 #多语言支持 #零售数据分析

下载量 41

发布时间 : 9/21/2023

模型简介

该模型是一个基于LLamA v1架构的70亿参数模型，专门用于从收据文本中提取关键信息，支持英语和阿拉伯语。

模型特点

多语言支持

支持英语和阿拉伯语收据的关键信息提取

高精度提取

能够准确提取收据中的类别、品牌、重量、单价等多种关键信息

基于AMuRD数据集

使用带注释的多语言收据数据集训练，适用于跨语言关键信息提取

模型能力

文本信息提取

多语言处理

结构化数据生成

使用案例

零售业

收据信息自动化处理

自动从零售收据中提取商品信息

提高数据处理效率，减少人工录入错误

财务系统

费用报销自动化

自动识别和分类报销单据中的费用项目

简化报销流程，提高财务处理效率

🚀 llama-7b-v1-Receipt-Key-Extraction

llama-7b-v1-Receipt-Key-Extraction是一个基于LLamA v1的70亿参数模型，主要用于英文和阿拉伯文的收据关键信息提取研究。

🚀 快速开始

模型使用

该模型仅用于英文和阿拉伯文收据中物品的关键信息提取研究。

开始使用模型

使用以下代码开始使用该模型：

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

checkpoint = "abdoelsayed/llama-7b-v1-Receipt-Key-Extraction"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, model_max_length=512,
        padding_side="right",
        use_fast=False,)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

def generate_response(instruction, input_text, max_new_tokens=100, temperature=0.1,  num_beams=4 ,top_k=40):
    prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to(device)
    generation_config = GenerationConfig(
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            num_beams=num_beams,
        )
    with torch.no_grad():
        outputs = model.generate(input_ids,generation_config=generation_config, max_new_tokens=max_new_tokens)
    outputs = tokenizer.decode(outputs.sequences[0])
    return output.split("### Response:")[-1].strip().replace("</s>","")

instruction = "Extract the class, Brand, Weight, Number of units, Size of units, Price, T.Price, Pack, Unit from the following sentence"
input_text = "Americana Okra zero 400 gm"

response = generate_response(instruction, input_text)
print(response)

📚 详细文档

引用方式

请使用以下格式引用该模型：

@misc{abdallah2023amurd,
    title={AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification},
    author={Abdelrahman Abdallah and Mahmoud Abdalla and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
    year={2023},
    eprint={2309.09800},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

📄 许可证

本模型使用llama2许可证。

📋 模型信息

属性	详情
模型类型	llama-7b-v1-Receipt-Key-Extraction
支持语言	英文、阿拉伯文
评估指标	准确率、F1值
库名称	transformers
参考论文	AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification