t5-small-finetuned-summarization-xsum开源文本摘要模型

首页

T5 Small Finetuned Summarization Xsum

由 Rahmat82 开发

基于t5-small在xsum数据集上微调的轻量级文本摘要模型，生成速度快且资源占用低

文本生成

Transformers

开源协议:Apache-2.0 #快速摘要生成 #轻量级T5 #新闻摘要优化

下载量 16

发布时间 : 2/25/2024

模型简介

该模型专门用于生成文本摘要，具有高效快速的特点，适合需要实时摘要的应用场景

模型特点

高效快速

亚秒级响应时间，特别适合需要实时摘要的应用场景

轻量级

基于t5-small架构，资源占用低，可在CPU/GPU上高效运行

ONNX加速支持

支持ONNX运行时加速，可进一步提升性能

模型能力

文本摘要生成

长文本压缩

关键信息提取

使用案例

新闻摘要

新闻文章摘要

将长篇新闻文章压缩为简洁摘要

生成符合人类阅读习惯的新闻摘要

技术文档处理

技术文档摘要

从技术文档中提取关键信息

帮助快速理解文档核心内容

🚀 t5-small-finetuned-summarization-xsum

该模型基于t5-small在xsum数据集上微调而来，具备快速、轻量的特点，能高效完成文本摘要任务。

🚀 快速开始

此模型是 t5-small 在 xsum 数据集上的微调版本，速度极快且轻量，能在不到 1 秒的时间内完成整个文本的摘要，在低资源使用场景下表现出色。

模型演示

点击此处查看模型演示

该模型在评估集上取得了以下结果：

损失值：2.2425
Rouge1：31.3222
Rouge2：10.0614
Rougel：25.0513
Rougelsum：25.0446
生成长度：18.8044

✨ 主要特性

轻量且速度极快，无论在 GPU 还是 CPU 上，都能在不到 1 秒的时间内完成文本摘要。若使用 optimum，速度可能更快。

📦 安装指南

使用该模型前，请确保安装必要的库：

pip install transformers accelerate optimum onnxruntime onnx

💻 使用示例

基础用法

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline

model_id = "Rahmat82/t5-small-finetuned-summarization-xsum"

model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)

text_to_summarize = """
The koala is regarded as the epitome of cuddliness. However, animal lovers
will be saddened to hear that this lovable marsupial has been moved to the
endangered species list. The Australian Koala Foundation estimates there are
somewhere between 43,000-100,000 koalas left in the wild. Their numbers have
been dwindling rapidly due to disease, loss of habitat, bushfires, being hit
by cars, and other threats. Stuart Blanch from the World Wildlife Fund in
Australia said: "Koalas have gone from no listing to vulnerable to endangered
within a decade. That is a shockingly fast decline." He added that koalas risk
"sliding toward extinction" 
"""

print(summarizer(text_to_summarize)[0]["summary_text"])

高级用法

使用 optimum/onnxruntime 实现超快速摘要：

#!pip install -q transformers accelerate optimum onnxruntime onnx

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from optimum.pipelines import pipeline
import accelerate

model_name = "Rahmat82/t5-small-finetuned-summarization-xsum"

model = ORTModelForSeq2SeqLM.from_pretrained(model_name, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer,
                                       device_map="auto", batch_size=12)

text_to_summarize = """
The koala is regarded as the epitome of cuddliness. However, animal lovers
will be saddened to hear that this lovable marsupial has been moved to the
endangered species list. The Australian Koala Foundation estimates there are
somewhere between 43,000-100,000 koalas left in the wild. Their numbers have
been dwindling rapidly due to disease, loss of habitat, bushfires, being hit
by cars, and other threats. Stuart Blanch from the World Wildlife Fund in
Australia said: "Koalas have gone from no listing to vulnerable to endangered
within a decade. That is a shockingly fast decline." He added that koalas risk
"sliding toward extinction" 
"""

print(summarizer(text_to_summarize)[0]["summary_text"])