MiniLM-L12-H384-uncased开源文本分类器 - 免费部署用于AI内容安全检测

首页

Minilm L12 H384 Uncased Nvidia Aegis AI Safety

由 AC 开发

基于microsoft/MiniLM-L12-H384-uncased微调的多标签文本分类器，用于AI内容安全检测

文本分类

Transformers

#AI内容安全检测 #多标签文本分类 #有害内容识别

下载量 14.56k

发布时间 : 6/23/2024

模型简介

该模型是基于MiniLM架构的文本分类器，专门用于检测14类不安全内容，包括暴力、仇恨言论、隐私泄露等。适用于内容审核、AI安全等场景。

模型特点

多标签分类

可同时检测文本中多种不安全内容类型

高准确率

在测试集上达到95.15%的准确率

低漏报率

针对AI安全场景优化，重点关注召回率指标

模型能力

文本内容安全检测

多标签分类

有害内容识别

使用案例

内容审核

社交媒体内容过滤

自动检测用户生成内容中的有害信息

可识别14类不安全内容

AI聊天安全

检测AI生成回复中的不安全内容

降低有害内容输出风险

🚀 AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety

这是一个基于microsoft/MiniLM-L12-H384-uncased模型，在nvidia/Aegis-AI-Content-Safety-Dataset-1.0数据集上进行微调的多标签文本分类器。它能对文本进行14个类别的分类，帮助识别文本中的不良信息。

🚀 快速开始

模型加载与预测示例

from accelerate import Accelerator
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import torch

accelerator = Accelerator()
device = accelerator.device

def load_model(model_path, accelerator_device=None):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path, 
        problem_type="multi_label_classification", 
    )

    if accelerator_device:
        model.to(accelerator_device)
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    return model, tokenizer

def predict(model, tokenizer, text, accelerator_device=None, threshold=0.5):
    if accelerator_device:
        inputs = tokenizer([text], return_tensors="pt").to(accelerator_device)
    else:
        inputs = tokenizer([text], return_tensors="pt")
        
    outputs = model(**inputs)
    probs = torch.nn.Sigmoid()((outputs.logits.squeeze().cpu()))
    predictions = np.zeros(probs.shape)
    predictions[np.where(probs >= threshold)] = 1
    return [model.config.id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]

# 使用CPU进行预测
hf_model, tokenizer = load_model("AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety")
predict(hf_model, tokenizer, "How to make a bomb?")

# 使用GPU进行预测
hf_model, tokenizer = load_model("AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety", device)
predict(hf_model, tokenizer, "How to make a bomb?", device)

✨ 主要特性

多标签分类：该模型可以同时对文本进行14个类别的分类，具体类别如下：
- "0": "管制/受监管物质"
- "1": "犯罪策划/认罪"
- "2": "欺骗/欺诈"
- "3": "枪支和非法武器"
- "4": "骚扰"
- "5": "仇恨/身份仇恨"
- "6": "需谨慎"
- "7": "个人身份信息/隐私"
- "8": "亵渎"
- "9": "性相关"
- "10": "性相关（未成年人）"
- "11": "自杀和自残"
- "12": "威胁"
- "13": "暴力"

📚 详细文档

评估指标

评估是在nvidia/Aegis-AI-Content-Safety-Dataset-1.0数据集的测试集上进行的，测试集共有359个样本。

在AI安全场景中，假阴性（文本实际有害但模型预测为安全）比假阳性（文本实际安全但模型预测为有害）更糟糕。

准确率（accuracy）：所有预测样本中，预测正确的比例。
F1值（f1）：综合考虑了精确率和召回率。
精确率（precision）：所有预测为有害的文本中，实际有害的比例。
召回率（recall）：所有实际有害的文本中，被预测为有害的比例。

由于我们希望减少假阴性，因此会更关注召回率。

指标	值
准确率	0.9514524472741743
F1值	0.5325670498084292
精确率	0.668269230769
召回率	0.442675159235668
真阳性（TP）	4643
真阴性（TN）	139
假阳性（FP）	69
假阴性（FN）	175

微调步骤

以下是对模型进行微调的详细步骤：

from accelerate import Accelerator
from datasets import load_dataset, Dataset, DatasetDict
from datetime import datetime
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score, coverage_error

import numpy as np
import torch
import os
import pandas as pd
import evaluate

accelerator = Accelerator()
device = accelerator.device

def load_model(model_path, accelerator_device):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path, 
        problem_type="multi_label_classification", 
        num_labels=len(all_labels),
        id2label=id2label,
        label2id=label2id
    )

    model.to(accelerator_device)
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    return model, tokenizer

def predict(model, tokenizer, text, threshold=0.5):
    inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model(**inputs)
    probs = torch.nn.Sigmoid()((outputs.logits.squeeze().cpu()))
    predictions = np.zeros(probs.shape)
    predictions[np.where(probs >= threshold)] = 1
    return [id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]

def tokenize_text(examples):
    final_labels = np.zeros(len(all_labels))
    for idx, label in enumerate(all_labels):
        final_labels[idx] = examples[label]
        
    examples["labels"] = final_labels
    return tokenizer(examples["text"], truncation=True, max_length=512)

### 数据预处理
all_labels = [
    'Controlled/Regulated Substances',
    'Criminal Planning/Confessions',
    'Deception/Fraud',
    'Guns and Illegal Weapons',
    'Harassment',
    'Hate/Identity Hate',
    'Needs Caution',
    'PII/Privacy',
    'Profanity',
    'Sexual',
    'Sexual (minor)',
    'Suicide and Self Harm',
    'Threat',
    'Violence'
]

id2label = {idx:label for idx, label in enumerate(all_labels)}
label2id = {label:idx for idx, label in enumerate(all_labels)}

base_model, tokenizer = load_model("microsoft/MiniLM-L12-H384-uncased", device)

train_df = pd.read_csv("nvidia_train.csv")
test_df = pd.read_csv("nvidia_test.csv")

dataset = DatasetDict({
    'train': Dataset.from_pandas(train_df),
    'test': Dataset.from_pandas(test_df)}
)

preprocessed_dataset = dataset.map(tokenize_text)

### 多标签分类的评估指标
clf_metrics = evaluate.combine(["accuracy", "f1", "precision", "recall"])

def sigmoid(x):
   return 1/(1 + np.exp(-x))

def compute_metrics(eval_pred):
   predictions, labels = eval_pred
   predictions = sigmoid(predictions)
   predictions = (predictions > 0.5).astype(int).reshape(-1)
   return clf_metrics.compute(predictions=predictions, references=labels.astype(int).reshape(-1))

### 微调
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

output_dir = f'./minilm_finetuned/minilm-{datetime.now().strftime("%d-%m-%Y_%H-%M")}' # 训练检查点将存储的输出目录

final_output_dir = './minilm_finetuned' # 训练器中最佳模型将保存的位置

training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=2e-5,
    per_device_train_batch_size=3,
    per_device_eval_batch_size=3,
    num_train_epochs=20,
    weight_decay=0.01,
    fp16=True,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=base_model,
    args=training_args,
    train_dataset=preprocessed_dataset["train"],
    eval_dataset=preprocessed_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

print("Saving model...")
trainer.save_model(final_output_dir)

### 评估模型
base_model, tokenizer = load_model(final_output_dir, device)
predict(base_model, tokenizer, "How to make a bomb?")