ai-detector开源AI生成内容检测模型 - 精准识别AI生成的文本

首页

Ai Detector

由 SuperAnnotate 开发

基于RoBERTa Large微调的生成文本检测模型，用于识别AI生成内容

文本分类

Transformers

英语开源协议:其他 #生成文本检测 #多模型覆盖 #教育防作弊

下载量 2,160

发布时间 : 9/25/2024

模型简介

该模型专为检测生成/合成文本而设计，对训练数据筛选、识别科学与教育领域的欺诈作弊行为具有关键意义。

模型特点

平衡训练数据

使用4.4万组均衡样本训练，包含人类文本与14种LLM生成内容

多领域覆盖

训练数据涵盖维基百科、Reddit问答和科研论文三大领域

抗过拟合设计

通过卡方检验移除关键n-gram，确保模型学习真实特征而非表面模式

良好校准性

优化损失函数与标签平滑处理，使预测置信度与实际准确率匹配

模型能力

检测AI生成文本

识别大语言模型内容

区分人类写作与机器生成

使用案例

教育领域

学术诚信检测

识别学生作业中的AI生成内容

可检测GPT-4生成文本准确率达98.5%

数据筛选

训练数据净化

过滤数据集中的合成文本

对LLaMA-Chat生成内容检测准确率98%

🚀 SuperAnnotate - AI Detector

SuperAnnotate 的 AI Detector 是基于 RoBERTa Large 微调的模型，旨在检测生成或合成文本。这一功能对于确定文本作者、检测欺诈和作弊行为至关重要，在科研和教育领域有重要应用价值。

🚀 快速开始

前置要求

安装 generated_text_detector，运行以下命令：

pip install git+https://github.com/superannotateai/generated_text_detector.git@v1.1.0

✨ 主要特性

能够检测生成或合成文本，对于确定文本作者、检测欺诈和作弊行为至关重要。
适用于训练数据、科研和教育领域。
基于预训练的 RoBERTa 进行微调，具有较高的准确性。

📦 安装指南

运行以下命令安装 generated_text_detector：

pip install git+https://github.com/superannotateai/generated_text_detector.git@v1.1.0

💻 使用示例

基础用法

from generated_text_detector.utils.model.roberta_classifier import RobertaClassifier
from generated_text_detector.utils.preprocessing import preprocessing_text
from transformers import AutoTokenizer
import torch.nn.functional as F


model = RobertaClassifier.from_pretrained("SuperAnnotate/ai-detector")
tokenizer = AutoTokenizer.from_pretrained("SuperAnnotate/ai-detector")

model.eval()

text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."

text_example = preprocessing_text(text_example)

tokens = tokenizer.encode_plus(
   text_example,
   add_special_tokens=True,
   max_length=512,
   padding='longest',
   truncation=True,
   return_token_type_ids=True,
   return_tensors="pt"
)

_, logits = model(**tokens)

proba = F.sigmoid(logits).squeeze(1).item()

print(proba)

高级用法

from generated_text_detector.utils.text_detector import GeneratedTextDetector


detector = GeneratedTextDetector(
    "SuperAnnotate/ai-detector",
    device="cuda",
    preprocessing=True
)

text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."

res = detector.detect_report(text_example)

print(res)

📚 详细文档

模型详情

模型描述

属性	详情
模型类型	基于预训练的 RoBERTa 进行二分类的自定义架构，具有单个输出标签。
语言	主要为英语。
许可证	SAIPL
微调模型	RoBERTa Large

模型来源

仓库：GitHub 提供 HTTP 服务。

训练数据

此版本的训练数据集包含 44k 对文本 - 标签样本，平均分为两部分：

自定义生成：数据集的前半部分使用自定义的特殊设计提示生成，人类版本源自三个领域：
- 维基百科
- Reddit ELI5 QA
- 科学论文（扩展到包含各部分的全文）
文本由四个主要大语言模型家族（GPT、LLaMA、Anthropic 和 Mistral）的 14 种不同模型生成。每个样本由一个单一提示与一个人工编写的响应和一个生成的响应配对组成，但提示不包含在训练输入中。
RAID 训练数据分层子集：后半部分是从 RAID 训练数据集中精心选择的分层子集，确保在领域、模型类型和攻击方法上具有平等的代表性。每个示例将人工编写的文本与相应的机器生成响应（由具有特定参数和攻击的单个模型生成）配对。

这种平衡的数据集结构保持了人工和生成文本样本的大致相等比例，确保每个提示都与一个真实答案和一个生成答案对齐。

⚠️ 重要提示

此外，利用卡方检验识别出与目标标签相关性最高的关键 n - 元组（n 范围从 2 到 5），并随后从训练数据中移除。

特点

在训练过程中，我们的优先事项之一不仅是最大化预测质量，还包括避免过拟合并获得一个具有足够置信度的预测器。我们很高兴实现了以下模型校准状态和高精度预测。

训练详情

选择自定义架构是因为它能够执行二分类，同时提供单一模型输出，并且其损失函数中集成了可定制的平滑设置。

训练参数：

基础模型：FacebookAI/roberta-large
训练轮数：20
学习率：5e - 05
权重衰减：0.0033
标签平滑：0.38
热身轮数：2
优化器：SGD
梯度裁剪：3.0
调度器：带硬重启的余弦调度器
调度器周期数：6

性能

该解决方案已在 RAID 训练数据集的分层子集中进行了验证。此基准测试包含一个多样化的数据集，涵盖：

11 种大语言模型
11 种对抗攻击
8 个领域

检测器的性能如下：

模型	准确率
人类	0.731
ChatGPT	0.992
GPT - 2	0.649
GPT - 3	0.945
GPT - 4	0.985
LLaMA - Chat	0.980
Mistral	0.644
Mistral - Chat	0.975
Cohere	0.823
Cohere - Chat	0.906
MPT	0.757
MPT - Chat	0.943
平均	0.852