Comment-Moderation开源内容审核系统 - 精准检测分类评论有害信息

首页

Comment Moderation

由 Vrandan 开发

基于DistilBERT架构构建的多标签内容审核系统，用于检测和分类用户评论中的潜在有害内容，具有高精度和轻量级特点。

文本分类

Transformers

英语开源协议:Openrail #多标签内容审核 #轻量级部署 #边缘设备优化

下载量 45.47k

发布时间 : 1/23/2025

模型简介

该模型是一个高效的内容审核工具，能够识别9种不同类型的潜在有害内容，适合在边缘设备和移动应用上部署。

模型特点

高准确率

模型在文本审核任务上达到95.4%的准确率。

多标签分类

能够同时识别9种不同类型的有害内容。

轻量级部署

67M参数的紧凑尺寸，适合在边缘设备和移动应用上部署。

低延迟推理

优化的架构实现快速响应，适合实时内容分析。

消费级硬件训练

模型在NVIDIA RTX 3080上训练，证明可在消费级硬件上开发。

模型能力

文本分类

内容审核

评论审核

有害内容检测

多标签分类

使用案例

社交媒体审核

用户评论审核

自动检测社交媒体平台上的用户评论是否包含有害内容

可识别9种不同类型的有害内容，准确率95.4%

在线社区管理

论坛内容过滤

自动过滤论坛中的不当言论

减少人工审核工作量，提高社区内容质量

应用集成

移动应用内容审核

集成到移动应用中实时审核用户生成内容

轻量级模型适合移动端部署，低延迟不影响用户体验

🚀 🛡️ 评论审核模型

这是一个强大的多标签内容审核系统，基于 DistilBERT 架构构建，旨在高精度地检测和分类用户生成评论中的潜在有害内容。基于提供的文本审核数据集，该模型在性能方面目前表现最佳。此外，它的占用空间最小，非常适合在边缘设备上部署。目前，它是 Hugging Face 上唯一经过训练，在保持相对于训练数据的最小规模的同时，还能实现如此高性能的模型。

🚀 快速开始

本地Python实现

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Initialize model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Vrandan/Comment-Moderation")
tokenizer = AutoTokenizer.from_pretrained("Vrandan/Comment-Moderation")

def analyze_text(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    probabilities = outputs.logits.softmax(dim=-1).squeeze()
    
    # Get predictions
    labels = [model.config.id2label[i] for i in range(len(probabilities))]
    predictions = sorted(zip(labels, probabilities), key=lambda x: x[1], reverse=True)
    
    return predictions

# Example usage
text = "Your text here"
results = analyze_text(text)
for label, prob in results:
    print(f"{label}: {prob:.4f}")

示例输出：

Label: OK - Probability: 0.9840
Label: H - Probability: 0.0043
Label: SH - Probability: 0.0039
Label: V - Probability: 0.0019
Label: S - Probability: 0.0018
Label: HR - Probability: 0.0015
Label: V2 - Probability: 0.0011
Label: S3 - Probability: 0.0010
Label: H2 - Probability: 0.0006

无服务器Python实现

import requests

API_URL = "https://api-inference.huggingface.co/models/Vrandan/Comment-Moderation"
headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()
    
output = query({
    "inputs": "Your text here",
})

Node.js JavaScript实现

require('dotenv').config();
const { HfInference } = require('@huggingface/inference');
const readline = require('readline');

// Initialize the Hugging Face client
// To use this, follow these steps:
// 1. Create a `.env` file in the root directory of your project.
// 2. Visit https://huggingface.co/settings/tokens to generate your access token (you may need to create an account if you haven't already).
// 3. Add the token to your `.env` file like this:
//    HUGGING_FACE_ACCESS_TOKEN=your_token_here
// 4. Install dotenv & huggingface/inference package (`npm install dotenv` & `npm install @huggingface/inference`) and load it in your project.
const hf = new HfInference(process.env.HUGGING_FACE_ACCESS_TOKEN);

// Create readline interface
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

async function analyzeText(text) {
    try {
        const result = await hf.textClassification({
            model: 'Vrandan/Comment-Moderation',
            inputs: text
        });
        
        console.log('\nResults:');
        result.forEach(pred => {
            console.log(`Label: ${pred.label} - Probability: ${pred.score.toFixed(4)}`);
        });
    } catch (error) {
        console.error('Error analyzing text:', error.message);
    }
}

async function main() {
    while (true) {
        try {
            const text = await new Promise(resolve => {
                rl.question('\nEnter text to analyze (or "quit" to exit): ', resolve);
            });

            if (text.toLowerCase() === 'quit') break;
            if (text.trim()) await analyzeText(text);
        } catch (error) {
            console.error('Error:', error.message);
        }
    }
    rl.close();
}

main().catch(console.error);

无服务器JavaScript实现

async function query(data) {
    const response = await fetch(
        "https://api-inference.huggingface.co/models/Vrandan/Comment-Moderation",
        {
            headers: {
                Authorization: "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                "Content-Type": "application/json",
            },
            method: "POST",
            body: JSON.stringify(data),
        }
    );
    const result = await response.json();
    return result;
}

query({"inputs": "Your text here"}).then((response) => {
    console.log(JSON.stringify(response));
});

✨ 主要特性

多标签分类
实时内容分析
95.4%的准确率
9种不同的内容类别
可通过API或本地实现轻松集成
轻量级部署占用空间
适用于 边缘设备和移动应用程序
低延迟推理
在保持高精度的同时节省资源
可以在消费级硬件上运行

📊 内容类别

该模型识别以下类型的潜在有害内容：

类别	标签	定义
色情	`S`	旨在引起性兴奋的内容，如对性行为的描述，或推广性服务（不包括性教育和健康内容）。
仇恨	`H`	基于种族、性别、民族、宗教、国籍、性取向、残疾状况或种姓表达、煽动或宣扬仇恨的内容。
暴力	`V`	宣扬或美化暴力，或庆祝他人痛苦或羞辱的内容。
骚扰	`HR`	可能用于在现实生活中折磨或骚扰个人，或增加骚扰发生可能性的内容。
自残	`SH`	宣扬、鼓励或描绘自残行为的内容，如自杀、割伤和饮食失调。
针对未成年人的色情内容	`S3`	包含18岁以下个人的色情内容。
仇恨/威胁	`H2`	包含对目标群体的暴力或严重伤害的仇恨内容。
暴力/血腥	`V2`	以极端详细的图形方式描绘死亡、暴力或严重身体伤害的暴力内容。
安全内容	`OK`	不违反任何准则的适当内容。

📈 性能指标

准确率: 95.4%
平均ROC AUC: 0.912
宏F1分数: 0.407
微F1分数: 0.802

查看详细性能指标

📊 详细模型性能

该模型已使用标准分类指标进行了广泛评估：

损失： 0.641
准确率： 0.954 (95.4%)
宏F1分数： 0.407
微F1分数： 0.802
加权F1分数： 0.763
宏精度： 0.653
微精度： 0.875
加权精度： 0.838
宏召回率： 0.349
微召回率： 0.740
加权召回率： 0.740
平均ROC AUC： 0.912

🔧 技术细节

该模型在家庭环境中的 NVIDIA RTX 3080 GPU 上进行训练，这表明使用消费级硬件也可以开发出有效的内容审核模型。这使得模型开发过程对个人开发者和小型组织来说更加容易实现。

关键训练规格：

硬件：NVIDIA RTX 3080
基础模型：DistilBERT
模型大小：6700万个参数（针对高效部署进行了优化）
训练环境：本地工作站
训练类型：微调

尽管模型相对紧凑（6700万个参数），但它实现了令人印象深刻的性能指标，适用于在各种设备和环境中部署。该模型的效率与性能比表明，无需大量计算资源也可以实现有效的内容审核。

⚠️ 重要提示

道德使用

定期进行偏差监测
实现时考虑上下文
以隐私为先

局限性

可能会遗漏上下文细微差别
存在误报的可能性
受文化背景差异影响

📚 数据集信息

该模型基于OpenAI发布的数据集进行训练，相关内容在他们的论文 "A Holistic Approach to Undesired Content Detection" 中有所描述。

数据集来源

📄 原始论文 (PDF)
💾 数据集仓库

引用

如果您在研究中使用此模型或数据集，请引用：

@article{openai2022moderation,
  title={A Holistic Approach to Undesired Content Detection},
  author={Todor Markov and Chong Zhang and Sandhini Agarwal and Tyna Eloundou and Teddy Lee and Steven Adler and Angela Jiang and Lilian Weng},
  journal={arXiv preprint arXiv:2208.03274},
  year={2022}
}