KM45L6V2OC开源软件需求分类模型 - 精准区分功能与非功能需求

首页

KM45L6V2OC

由 kasrahabib 开发

基于sentence-transformers/all-MiniLM-L6-v2微调的软件需求分类模型，用于区分功能性(F)与非功能性(NF)需求

文本分类

Transformers

英语开源协议:Apache-2.0 #需求分类 #高精度F1 #软件工程

下载量 27

发布时间 : 3/3/2023

模型简介

该模型专门用于软件需求分类任务，能够准确识别功能性需求和非功能性需求，训练数据来自软件需求数据集(SWARD)。

模型特点

高精度分类

在评估集上达到0.99的宏观F1分数，能够准确区分功能性需求和非功能性需求

基于成熟模型微调

基于sentence-transformers/all-MiniLM-L6-v2模型微调，继承了原模型的优秀特性

简单易用

提供完整的pipeline使用示例，支持快速集成到现有系统中

模型能力

软件需求分类

功能性需求识别

非功能性需求识别

自然语言处理

使用案例

软件开发

需求文档分析

自动分析软件需求文档，分类功能性需求和非功能性需求

准确率高达99%

需求管理工具集成

集成到需求管理系统中，自动标记需求类型

🚀 kasrahabib/KM45L6V2OC

该模型是 sentence-transformers/all-MiniLM-L6-v2 的微调版本，用于在软件需求数据集（SWARD）上将软件需求分类为功能性（F）和非功能性（NF）类型。它在评估集上取得了以下结果：

训练损失：0.0107
验证损失：0.0404
轮数：14
最终宏 F1 分数：0.99

标签： 0 或 F -> 功能性； 1 或 NF -> 非功能性；

🚀 快速开始

✨ 主要特性

此模型基于预训练模型微调，能高效准确地将软件需求分类为功能性和非功能性类型，在评估集上有优异的表现。

📦 安装指南

若要在本地使用该模型，可按以下步骤操作：

克隆仓库：

git lfs install
git clone url_of_repo

找到下载目录的路径。
将路径链接写入 model_ckpt 变量。

💻 使用示例

基础用法

from transformers import pipeline

frame_work = 'tf'
task = 'text-classification'
model_ckpt = 'kasrahabib/KM45L6V2OC'

software_requirment_cls = pipeline(task = task, model = model_ckpt, framework = frame_work)

example_1_f = 'The START NEW PROJECT function shall allow the user to create a new project.'
example_2_nf = 'The email string consists of x@x.x and is less than 31 characters in length and is not empty.'
software_requirment_cls([example_1_f, example_2_nf])

运行结果：

[{'label': 'F', 'score': 0.9998922348022461},
 {'label': 'NF', 'score': 0.999846339225769}]

高级用法

import numpy as np
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model_ckpt = 'kasrahabib/KM45L6V2OC'
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = TFAutoModelForSequenceClassification.from_pretrained(model_ckpt)

example_1_f = 'The START NEW PROJECT function shall allow the user to create a new project.'
example_2_nf = 'The email string consists of x@x.x and is less than 31 characters in length and is not empty.'
requirements = [example_1_f, example_2_nf]

encoded_requirements = tokenizer(requirements, return_tensors = 'np', padding = 'longest')

y_pred = model(encoded_requirements).logits
classifications = np.argmax(y_pred, axis = 1)

classifications = [model.config.id2label[output] for output in classifications]
print(classifications)

运行结果：

['F', 'NF']

🔧 技术细节

训练超参数

训练期间使用了以下超参数：

优化器：{'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 9030, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
训练精度：float32