T5_11b_trueteacher_and_anli开源模型 - 免费评估摘要事实一致性

首页

T5 11b Trueteacher And Anli

由 google 开发

TrueTeacher是一个基于T5-11B架构的事实一致性评估模型，专门用于评估摘要中的事实一致性。

大型语言模型

Transformers

英语#摘要事实性评估 #大模型微调 #新闻摘要质检

下载量 444

发布时间 : 8/14/2023

模型简介

该模型通过TrueTeacher和ANLI数据集的混合进行微调，用于评估英文摘要的事实一致性，预测二元标签（1表示一致，0表示不一致）。

模型特点

高精度事实一致性评估

在TRUE基准测试的摘要子集上平均ROC AUC达到87.8，表现优异。

大规模预训练模型微调

基于T5-11B模型，结合TrueTeacher和ANLI数据集进行优化。

长文本处理能力

支持最大2048 tokens的输入长度，适应常见摘要数据集的输入需求。

模型能力

事实一致性评估

文本分类

自然语言推理

使用案例

文本摘要评估

新闻摘要事实检查

评估新闻文章摘要是否与原文事实一致

在CNN/DailyMail等数据集上表现优异

自动摘要质量评估

作为自动摘要系统的评估指标

可识别摘要中的事实性错误

🚀 TrueTeacher

这是一个事实一致性评估模型，在TrueTeacher论文（Gekhman等人，2023）中被提出。该模型旨在解决文本摘要中的事实一致性评估问题，为研究人员提供了一种有效的评估工具。

✨ 主要特性

专为评估摘要中的事实一致性而优化。
基于T5 - 11B模型微调，结合了多个数据集进行训练。
输入格式为 "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY"，并建议设置max_length为2048。
能够预测二元标签（'1' - 事实一致，'0' - 事实不一致）。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '1'), 
                             ('the cat is shiny', '0')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  outputs = model.generate(input_ids)
  result = tokenizer.decode(outputs[0], skip_special_tokens=True)
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'result: {result} (expected: {expected})\n')

高级用法

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer
import torch

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '>> 0.5'), 
                             ('the cat is shiny', '<< 0.5')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]])
  outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
  logits = outputs.logits
  probs = torch.softmax(logits[0], dim=-1)
  one_token_id = tokenizer('1').input_ids[0]
  entailment_prob = probs[0, one_token_id].item()
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'score: {entailment_prob:.3f} (expected: {expected})\n')

📚 详细文档

模型详情

该模型是论文中的主要模型（见表1中的 "T5 - 11B w. ANLI + TrueTeacher full"），基于T5 - 11B (Raffel等人，2020)，并使用以下数据集的混合进行微调：

TrueTeacher数据集包含来自CNN/DailyMail数据集训练分割的文章的模型生成摘要 (Hermann等人，2015)，这些摘要使用FLAN - PaLM 540B (Chung等人，2022)进行了事实一致性标注。摘要使用在XSum数据集上训练的摘要模型生成 (Narayan等人，2018)。

评估结果

该模型在TRUE基准测试（Honovich等人，2022）的摘要子集上取得了以下ROC AUC结果：

MNBM	QAGS - X	FRANK	SummEval	QAGS - C	平均值
78.1	89.4	93.6	88.5	89.4	87.8

预期用途

此模型旨在用于英语的研究用途（非商业用途）。推荐的用例是评估摘要中的事实一致性。

超出范围的使用

任何违反cc - by - nc - 4.0许可证的用例。
使用英语以外的语言。

🔧 技术细节

该模型的输入格式为 "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY"。为了适应常见摘要数据集的输入长度，建议将max_length设置为2048。模型预测一个二元标签（'1' - 事实一致，'0' - 事实不一致）。

📄 许可证

该模型使用的许可证为cc - by - nc - 4.0。

📚 引用

如果您在研究出版物中使用此模型，请引用TrueTeacher论文（使用下面的bibtex条目），以及上述提到的ANLI、CNN/DailyMail、XSum、T5和FLAN论文。

@misc{gekhman2023trueteacher,
      title={TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models}, 
      author={Zorik Gekhman and Jonathan Herzig and Roee Aharoni and Chen Elkind and Idan Szpektor},
      year={2023},
      eprint={2305.11171},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}