flan-t5-small-keywords开源关键词提取模型 - 轻松从段落中精准提取关键词

首页

Flan T5 Small Keywords

由 agentlans 开发

基于Flan-T5小型版微调的关键词提取模型，专门用于从段落中提取关键词

大型语言模型

Transformers

英语开源协议:MIT #英文关键词提取 #文本摘要优化 #SEO辅助工具

下载量 1,101

发布时间 : 9/10/2024

模型简介

该模型利用T5架构的强大能力来识别并输出能概括文本核心内容的关键短语，适用于长文本摘要、生成文章标签和识别文档核心主题

模型特点

段落关键词提取

能够从长段落中准确提取概括文本核心内容的关键词或关键短语

多用途应用

适用于文本摘要、标签生成和SEO关键词识别等多种场景

基于Flan-T5架构

利用T5架构的强大能力进行序列到序列的任务处理

模型能力

文本关键词提取

长文本摘要

关键短语生成

使用案例

内容管理

文章标签生成

为博客或文章自动生成标签

提高内容分类和检索效率

元数据生成

为内容管理系统生成元数据

简化内容管理流程

SEO优化

SEO关键词识别

识别文档中的核心关键词用于SEO优化

提升网页搜索排名

🚀 关键词提取模型

本模型是 Flan - T5 small 模型的微调版本，专门用于从段落中提取关键词。它借助 T5 架构的强大能力，识别并输出能够概括输入文本核心内容的关键短语。

🚀 快速开始

此模型以段落作为输入，生成一系列关键词或关键短语，用以概括文本的主要主题。它在以下场景中特别有用：

总结长文本
为文章或博客文章生成标签
识别文档中的主要主题

✨ 主要特性

预期用途

快速总结长段落
为内容管理系统生成元数据
辅助进行 SEO 关键词识别

局限性

模型有时可能会生成不相关的关键词。
性能可能会因输入文本的长度和复杂度而有所不同：
- 为获得最佳效果，请使用较长且清晰的文本。
- 由于 Flan - T5 架构的限制，长度上限为 512 个标记。
该模型是在英文文本上进行训练的，在其他语言上的表现可能不佳。

📦 安装指南

此部分原文档未提及具体安装步骤，故跳过。

💻 使用示例

基础用法

以下是一个使用该模型的简单示例：

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "agentlans/flan-t5-small-keywords"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_text = "Your paragraph here..."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Process the output to get a list of keywords (split and remove duplicates)
keywords = list(set(decoded_output.split('||')))
print(keywords)

示例输入段落：

In the heart of the bustling city, a hidden gem awaits discovery: a quaint little bookstore that seems to have escaped the relentless march of time. As you step inside, the scent of aged paper and rich coffee envelops you, creating an inviting atmosphere that beckons you to explore its shelves. Each corner is adorned with carefully curated collections, from classic literature to contemporary bestsellers, inviting readers of all tastes to lose themselves in the pages of a good book. The soft glow of warm lighting casts a cozy ambiance, while the gentle hum of conversation among fellow book lovers adds to the charm. This bookstore is not just a place to buy books; it's a sanctuary for those seeking solace, inspiration, and a sense of community in the fast-paced world outside.

示例输出关键词：

['old paper coffee scent', 'cosy hum of conversation', 'quaint bookstore', 'community in the fast-paced world', 'solace inspiration', 'curated collections']

📚 详细文档

训练与评估

该模型在一个包含英文维基百科段落及其对应关键词的数据集上进行了微调，该数据集涵盖了各种不同的主题，以确保模型具有广泛的适用性。

局限性和偏差

此模型是在英文维基百科段落上进行训练的，这可能会引入偏差。用户应注意，生成的关键词可能反映这些偏差，因此应谨慎使用输出结果。

训练细节

属性	详情
模型类型	基于 Flan - T5 small 微调的关键词提取模型
训练数据	维基百科段落和关键词数据集
训练过程	对 google/flan - t5 - small 进行微调