t5-large-generation-squad-QuestionAnswer开源模型 - 从文本上下文免费生成问答内容

首页

T5 Large Generation Squad QuestionAnswer

由 potsawee 开发

该模型是基于t5-large架构，在SQuAD数据集上微调的文本生成模型，用于从给定上下文中生成问题及对应的答案。

问答系统

Transformers

英语开源协议:Apache-2.0 #问答生成 #上下文理解 #多选题干扰项生成

下载量 376

发布时间 : 3/12/2023

模型简介

该模型主要用于从文本上下文中生成问题及其对应的答案，特别适用于基于原文提取的问题-答案生成任务。

模型特点

基于原文的问题-答案生成

模型生成的答案高度依赖输入上下文，适合需要精确提取信息的场景。

支持多样化生成

通过设置do_sample=True，可以生成不同的问题-答案组合，增加多样性。

与干扰项生成模型兼容

可与干扰项生成模型配合使用，生成多选题的干扰选项。

模型能力

文本生成

问题生成

答案生成

使用案例

教育

自动生成阅读理解题

从给定的文章或段落中生成问题及其答案，用于教育评估。

生成的问题和答案基于原文，适合测试学生对文本的理解。

内容创作

生成FAQ

从产品说明或技术文档中生成常见问题及其答案。

帮助用户快速了解产品功能或技术细节。

🚀 t5-large针对SQuAD微调以生成问题+答案

本模型使用t5-large在SQuAD数据集上进行微调，可根据输入的上下文生成对应的问题和答案，为文本处理提供了高效的问答生成解决方案。

🚀 快速开始

模型输入输出

输入：context（例如新闻文章）
输出：question <sep> answer

训练数据（SQuAD）中的答案具有高度的可提取性，因此该模型将生成可提取的答案。如果您希望获得抽象的问题/答案，可以使用我们在RACE数据集上训练的模型：https://huggingface.co/potsawee/t5-large-generation-race-QuestionAnswer 。

✨ 主要特性

模型详情

t5-large模型针对SQuAD数据集进行了微调，输入为上下文/段落，输出为问题及对应的答案。这是我们MQAG论文中问题生成管道（即g1）的第一个组件，也可参考该项目的GitHub仓库：https://github.com/potsawee/mqag0 。

📦 安装指南

使用以下代码开始使用该模型。您还可以在generate()中设置do_sample=True以获得不同的问答对。

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")

>>> context = r"""Chelsea's mini-revival continued with a third victory in a row as they consigned struggling Leicester City to a fifth consecutive defeat.
Buoyed by their Champions League win over Borussia Dortmund, Chelsea started brightly and Ben Chilwell volleyed in from a tight angle against his old club.
Chelsea's Joao Felix and Leicester's Kiernan Dewsbury-Hall hit the woodwork in the space of two minutes, then Felix had a goal ruled out by the video assistant referee for offside.
Patson Daka rifled home an excellent equaliser after Ricardo Pereira won the ball off the dawdling Felix outside the box.
But Kai Havertz pounced six minutes into first-half injury time with an excellent dinked finish from Enzo Fernandez's clever aerial ball.
Mykhailo Mudryk thought he had his first goal for the Blues after the break but his effort was disallowed for offside.
Mateo Kovacic sealed the win as he volleyed in from Mudryk's header.
The sliding Foxes, who ended with 10 men following Wout Faes' late dismissal for a second booking, now just sit one point outside the relegation zone.
""".replace('\n', ' ')

>>> inputs = tokenizer(context, return_tensors="pt")
>>> outputs = model.generate(**inputs, max_length=100)
>>> question_answer = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> question_answer = question_answer.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
>>> question, answer = question_answer.split(tokenizer.sep_token)

>>> print("question:", question)
question:  Who scored the winner for Chelsea?
>>> print("answer:", answer)
answer:  Mateo Kovacic

📚 详细文档

生成干扰项（多项选择题中的其他选项）

Context ---> Question + (A) Answer (B) Distractor1 (C) Distractor2 (D) Distractor3 请参考我们的干扰项生成模型，例如：https://huggingface.co/potsawee/t5-large-generation-race-Distractor

📄 许可证

本项目采用Apache-2.0许可证。

🔖 引用

@article{manakul2023mqag,
  title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
  author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
  journal={arXiv preprint arXiv:2301.12307},
  year={2023}
}