🚀 MLongT5 (瞬态全局注意力,基础规模模型)
MLongT5 是在多语言语料库上预训练的模型。该模型可高效处理长输入序列,适用于文本生成等任务,在多语言场景下表现出色。
✨ 主要特性
- 多语言预训练:在多语言语料库上进行预训练,支持多种语言。
- 高效注意力机制:可使用局部注意力或瞬态全局注意力,有效处理输入序列。
- 长序列处理:能够处理长达 16,384 个标记的长输入序列,在文本生成任务中表现出色。
📦 模型信息
属性 |
详情 |
模型类型 |
编码器 - 解码器变压器模型 |
训练数据 |
mc4 数据集 |
许可证 |
Apache-2.0 |
💻 使用示例
基础用法
以下代码展示了如何提取模型的最后隐藏表示:
from transformers import T5Tokenizer, LongT5Model
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
高级用法
S - 去噪
对于 S - 去噪,请确保按如下所示在文本前加上前缀 [S2S]
:
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"
inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
R - 去噪
对于 R - 去噪,请确保按如下所示在文本前加上前缀 [NLU]
:
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[NLU] Mr. Dursley was the director of a firm called <extra_id_0>, which made <extra_id_1>. He was a big, solid man with a bald head. Mrs. Dursley was thin and <extra_id_2> of neck, which came in very useful as she spent so much of her time <extra_id_3>. The Dursleys had a small son called Dudley and <extra_id_4>"
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
X - 去噪
对于 X - 去噪,请确保按如下所示在文本前加上前缀 [NLG]
:
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. <extra_id_0>"
model.cuda()
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
📚 详细文档
MLongT5 模型是在文本到文本去噪生成设置中预训练的编码器 - 解码器变压器模型(类似 Pegasus 的生成预训练)。它是 LongT5 模型 的扩展,支持使用两种不同的高效注意力机制:(1)局部注意力,或(2)瞬态全局注意力。注意力稀疏模式的使用使模型能够有效地处理输入序列。
该模型在针对文本生成(摘要、问答)进行微调时特别有效,这些任务需要处理长输入序列(最多 16,384 个标记)。
模型主要用于在有监督的数据集上进行微调。你可以查看 模型中心 以查找针对你感兴趣的任务进行微调的版本。
📄 许可证
本模型使用 Apache - 2.0 许可证。
📖 引用信息
如果你在研究中使用了该模型,请使用以下 BibTeX 条目进行引用:
@misc{uthus2023mlongt5,
title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences},
author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
year={2023},
eprint={2305.11129},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
本模型卡片由 Ahmed Elnaggar 编写。原发布团队未为此模型编写模型卡片。
作者信息:Ahmed Elnaggar/@Elnaggar_AI | LinkedIn