🚀 t5-large针对SQuAD微调以生成问题+答案
本模型使用t5-large在SQuAD数据集上进行微调,可根据输入的上下文生成对应的问题和答案,为文本处理提供了高效的问答生成解决方案。
🚀 快速开始
模型输入输出
- 输入:
context
(例如新闻文章)
- 输出:
question <sep> answer
训练数据(SQuAD)中的答案具有高度的可提取性,因此该模型将生成可提取的答案。如果您希望获得抽象的问题/答案,可以使用我们在RACE数据集上训练的模型:https://huggingface.co/potsawee/t5-large-generation-race-QuestionAnswer 。
✨ 主要特性
模型详情
t5-large模型针对SQuAD数据集进行了微调,输入为上下文/段落,输出为问题及对应的答案。这是我们MQAG论文中问题生成管道(即g1
)的第一个组件,也可参考该项目的GitHub仓库:https://github.com/potsawee/mqag0 。
📦 安装指南
使用以下代码开始使用该模型。您还可以在generate()
中设置do_sample=True
以获得不同的问答对。
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")
>>> context = r"""Chelsea's mini-revival continued with a third victory in a row as they consigned struggling Leicester City to a fifth consecutive defeat.
Buoyed by their Champions League win over Borussia Dortmund, Chelsea started brightly and Ben Chilwell volleyed in from a tight angle against his old club.
Chelsea's Joao Felix and Leicester's Kiernan Dewsbury-Hall hit the woodwork in the space of two minutes, then Felix had a goal ruled out by the video assistant referee for offside.
Patson Daka rifled home an excellent equaliser after Ricardo Pereira won the ball off the dawdling Felix outside the box.
But Kai Havertz pounced six minutes into first-half injury time with an excellent dinked finish from Enzo Fernandez's clever aerial ball.
Mykhailo Mudryk thought he had his first goal for the Blues after the break but his effort was disallowed for offside.
Mateo Kovacic sealed the win as he volleyed in from Mudryk's header.
The sliding Foxes, who ended with 10 men following Wout Faes' late dismissal for a second booking, now just sit one point outside the relegation zone.
""".replace('\n', ' ')
>>> inputs = tokenizer(context, return_tensors="pt")
>>> outputs = model.generate(**inputs, max_length=100)
>>> question_answer = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> question_answer = question_answer.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
>>> question, answer = question_answer.split(tokenizer.sep_token)
>>> print("question:", question)
question: Who scored the winner for Chelsea?
>>> print("answer:", answer)
answer: Mateo Kovacic
📚 详细文档
生成干扰项(多项选择题中的其他选项)
Context ---> Question + (A) Answer (B) Distractor1 (C) Distractor2 (D) Distractor3
请参考我们的干扰项生成模型,例如:https://huggingface.co/potsawee/t5-large-generation-race-Distractor
📄 许可证
本项目采用Apache-2.0许可证。
🔖 引用
@article{manakul2023mqag,
title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
journal={arXiv preprint arXiv:2301.12307},
year={2023}
}