DeBERTa-v3-xsmall-mnli开源模型 - 免费部署助力零样本分类任务！

首页

Deberta V3 Xsmall Mnli Fever Anli Ling Binary

由 MoritzLaurer 开发

基于DeBERTa-v3-xsmall的二元自然语言推理模型，针对零样本分类任务优化

文本分类

Transformers

英语开源协议:MIT #零样本分类 #二元NLI #多数据集训练

下载量 10.77k

发布时间 : 3/2/2022

模型简介

该模型在四个NLI数据集上训练，专门用于预测'蕴含'或'不蕴含'的二元分类任务，特别适合零样本分类场景。

模型特点

二元分类优化

专门针对'蕴含'与'不蕴含'的二元分类场景设计，简化了传统三分类NLI任务

多数据集训练

基于MultiNLI、Fever-NLI、LingNLI和ANLI四个数据集训练，共782,357个假设-前提对

高效推理

xsmall版本在保持良好性能的同时提供更快的推理速度

模型能力

零样本文本分类

自然语言推理

二元文本分类

使用案例

文本分析

情感分析

判断文本是否蕴含特定情感倾向

在测试集上达到0.925准确率(mnli-m-2c)

事实核查

验证陈述是否蕴含于证据文本中

在fever-nli-2c上达到0.892准确率

🚀 DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary

本模型主要用于文本分类和零样本分类任务，通过在多个自然语言推理（NLI）数据集上训练，能有效预测“蕴含”或“非蕴含”关系，为相关自然语言处理任务提供支持。

🚀 快速开始

以下是使用该模型进行预测的示例代码：

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "not_entailment"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

✨ 主要特性

该模型在4个自然语言推理（NLI）数据集的782357个假设 - 前提对上进行训练。
专门针对零样本分类进行设计，在二分类NLI任务中预测“蕴含”或“非蕴含”。
基于微软的DeBERTa - v3 - xsmall模型，其v3变体通过不同的预训练目标显著优于先前版本。

📦 安装指南

文档未提供具体安装步骤，可参考Hugging Face Transformers库的安装方法，使用以下命令安装：

pip install transformers

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "not_entailment"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

高级用法

文档未提供高级用法示例。

📚 详细文档

模型描述

该模型在来自4个NLI数据集的782357个假设 - 前提对上进行训练，这些数据集包括：MultiNLI、Fever - NLI、LingNLI和ANLI。

需要注意的是，该模型在二分类NLI任务上进行训练，用于预测“蕴含”或“非蕴含”，这是专门为零样本分类设计的，因为在零样本分类中“中立”和“矛盾”之间的区别无关紧要。

基础模型是微软的DeBERTa - v3 - xsmall。DeBERTa的v3变体通过包含不同的预训练目标，显著优于该模型的先前版本，详情见DeBERTa - V3论文。

若追求更高性能（但速度较慢），建议使用https://huggingface.co/MoritzLaurer/DeBERTa - v3 - large - mnli - fever - anli - ling - wanli。

预期用途和限制

模型使用方法

见上文使用示例部分。

训练数据

该模型在来自4个NLI数据集的782357个假设 - 前提对上进行训练，这些数据集包括：MultiNLI、Fever - NLI、LingNLI和ANLI。

训练过程

DeBERTa - v3 - xsmall - mnli - fever - anli - ling - binary使用Hugging Face训练器进行训练，超参数如下：

training_args = TrainingArguments(
    num_train_epochs=5,              # total number of training epochs
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # batch size per device during training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_ratio=0.1,                # number of warmup steps for learning rate scheduler
    weight_decay=0.06,               # strength of weight decay
    fp16=True                        # mixed precision training
)

评估结果

该模型使用MultiNLI、ANLI、LingNLI的二分类测试集和Fever - NLI的二分类验证集（两个类别而非三个）进行评估，使用的指标是准确率。

数据集	mnli - m - 2c	mnli - mm - 2c	fever - nli - 2c	anli - all - 2c	anli - r3 - 2c	lingnli - 2c
准确率	0.925	0.922	0.892	0.676	0.665	0.888
速度（文本/秒，CPU，128批次）	6.0	6.3	3.0	5.8	5.0	7.6
速度（文本/秒，GPU Tesla P100，128批次）	473	487	230	390	340	586

🔧 技术细节

模型基于DeBERTa - v3 - xsmall架构，通过不同的预训练目标提升性能。
在二分类NLI任务上训练，专注于预测“蕴含”或“非蕴含”。
使用Hugging Face训练器进行训练，设置了特定的超参数，如训练轮数、学习率、批次大小等。

📄 许可证

本模型使用MIT许可证。

局限性和偏差

请参考原始DeBERTa论文和不同NLI数据集的相关文献，以了解潜在的偏差。

引用

如果使用此模型，请引用：Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. ‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. https://osf.io/74b8k.