MLongT5开源文本生成模型 - 免费部署，多语言长序列文本处理必备！

首页

Mlong T5 Tglobal Base

由 agemagician 开发

MLongT5 是一个多语言预训练的文本生成模型，能够高效处理长输入序列，适用于多种语言场景下的文本生成任务。

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #多语言长文本生成 #瞬态全局注意力 #16K长序列处理

下载量 3,884

发布时间 : 5/19/2023

模型简介

MLongT5 是在多语言语料库上预训练的编码器-解码器变压器模型，支持局部注意力或瞬态全局注意力机制，能够处理长达16,384个标记的长输入序列，在文本生成任务中表现出色。

模型特点

多语言预训练

在多语言语料库上进行预训练，支持多种语言。

高效注意力机制

可使用局部注意力或瞬态全局注意力，有效处理输入序列。

长序列处理

能够处理长达16,384个标记的长输入序列，在文本生成任务中表现出色。

模型能力

文本生成

长文本处理

多语言文本处理

文本去噪

文本摘要

问答系统

使用案例

文本生成

S-去噪

使用[S2S]前缀进行序列到序列的文本去噪和生成。

生成连贯的文本补全结果。

R-去噪

使用[NLU]前缀进行自然语言理解的文本去噪和生成。

生成准确的文本补全结果。

X-去噪

使用[NLG]前缀进行自然语言生成的文本去噪和生成。

生成流畅的文本补全结果。

文本处理

长文本摘要

处理长输入序列并生成摘要。

生成准确且连贯的摘要。

🚀 MLongT5 (瞬态全局注意力，基础规模模型)

MLongT5 是在多语言语料库上预训练的模型。该模型可高效处理长输入序列，适用于文本生成等任务，在多语言场景下表现出色。

✨ 主要特性

多语言预训练：在多语言语料库上进行预训练，支持多种语言。
高效注意力机制：可使用局部注意力或瞬态全局注意力，有效处理输入序列。
长序列处理：能够处理长达 16,384 个标记的长输入序列，在文本生成任务中表现出色。

📦 模型信息

属性	详情
模型类型	编码器 - 解码器变压器模型
训练数据	mc4 数据集
许可证	Apache-2.0

💻 使用示例

基础用法

以下代码展示了如何提取模型的最后隐藏表示：

from transformers import T5Tokenizer, LongT5Model

tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

高级用法

S - 去噪

对于 S - 去噪，请确保按如下所示在文本前加上前缀 [S2S]：

from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")

input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"                                               

inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))

R - 去噪

对于 R - 去噪，请确保按如下所示在文本前加上前缀 [NLU]：

from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")

input_string = "[NLU] Mr. Dursley was the director of a firm called <extra_id_0>, which made <extra_id_1>. He was a big, solid man with a bald head. Mrs. Dursley was thin and <extra_id_2> of neck, which came in very useful as she spent so much of her time <extra_id_3>. The Dursleys had a small son called Dudley and <extra_id_4>"                                               

inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))

X - 去噪

对于 X - 去噪，请确保按如下所示在文本前加上前缀 [NLG]：

from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")

input_string = "[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. <extra_id_0>"                                               

model.cuda()
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))

📚 详细文档

MLongT5 模型是在文本到文本去噪生成设置中预训练的编码器 - 解码器变压器模型（类似 Pegasus 的生成预训练）。它是 LongT5 模型的扩展，支持使用两种不同的高效注意力机制：（1）局部注意力，或（2）瞬态全局注意力。注意力稀疏模式的使用使模型能够有效地处理输入序列。

该模型在针对文本生成（摘要、问答）进行微调时特别有效，这些任务需要处理长输入序列（最多 16,384 个标记）。

模型主要用于在有监督的数据集上进行微调。你可以查看模型中心以查找针对你感兴趣的任务进行微调的版本。

📄 许可证

本模型使用 Apache - 2.0 许可证。

📖 引用信息

如果你在研究中使用了该模型，请使用以下 BibTeX 条目进行引用：

@misc{uthus2023mlongt5,
      title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, 
      author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
      year={2023},
      eprint={2305.11129},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}