nsfw_image_detection开源模型 - 精准区分正常与NSFW内容，守护健康视觉环境

首页

Nsfw Image Detection

由 Falconsai 开发

基于ViT架构的NSFW图像分类模型，通过监督学习在ImageNet-21k数据集上预训练，并在80,000张图像上微调，用于区分正常和NSFW内容。

图像分类

Transformers

开源协议:Apache-2.0 #NSFW图像检测 #高精度分类 #内容审核

下载量 82.4M

发布时间 : 10/13/2023

模型简介

该模型主要用于对NSFW（不适合工作场所）图像进行分类，适用于过滤各种应用中的显式或不适当内容。

模型特点

高性能分类

在评估集上达到98.04%的准确率，能够有效区分正常和NSFW内容。

基于ViT架构

采用Vision Transformer架构，结合了Transformer在图像处理中的优势。

大规模数据训练

在80,000张图像的专有数据集上微调，涵盖高度多样性的内容。

模型能力

NSFW图像分类

图像内容识别

敏感内容过滤

使用案例

内容审核

社交媒体内容过滤

自动检测和过滤社交媒体平台上的不适当内容。

有效减少人工审核工作量，提高内容安全标准。

工作场所内容管理

用于企业内部系统，防止NSFW内容在工作场所传播。

维护专业工作环境，降低法律风险。

🚀 微调视觉变换器（ViT）用于NSFW图像分类模型卡片

本模型为微调后的视觉变换器（ViT），专门用于NSFW（不适宜工作场景）图像分类。它基于Transformer架构，在图像分类任务中表现出色，能有效识别正常和不适宜内容的图像，为内容安全和审核提供有力支持。

🚀 快速开始

模型用途

本模型主要用于NSFW图像分类，可在各类应用中过滤露骨或不适当的内容。

使用方法

以下是使用该模型对图像进行分类的示例代码：

基础用法

# 使用pipeline进行图像分类
from PIL import Image
from transformers import pipeline

img = Image.open("<path_to_image_file>")
classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
classifier(img)

高级用法

# 直接加载模型进行图像分类
import torch
from PIL import Image
from transformers import AutoModelForImageClassification, ViTImageProcessor

img = Image.open("<path_to_image_file>")
model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")
processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')
with torch.no_grad():
    inputs = processor(images=img, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits

predicted_label = logits.argmax(-1).item()
model.config.id2label[predicted_label]

YOLO版本使用方法

import os
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import onnxruntime as ort
import json # Added import for json

# Predict using YOLOv9 model
def predict_with_yolov9(image_path, model_path, labels_path, input_size):
    """
    Run inference using the converted YOLOv9 model on a single image.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.

    Returns:
        str: The predicted class label.
        PIL.Image.Image: The original loaded image.
    """
    def load_json(file_path):
        with open(file_path, "r") as f:
            return json.load(f)

    # Load labels
    labels = load_json(labels_path)

    # Preprocess image
    original_image = Image.open(image_path).convert("RGB")
    image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)
    image_np = np.array(image_resized, dtype=np.float32) / 255.0
    image_np = np.transpose(image_np, (2, 0, 1))  # [C, H, W]
    input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)

    # Load YOLOv9 model
    session = ort.InferenceSession(model_path)
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name # Assuming classification output

    # Run inference
    outputs = session.run([output_name], {input_name: input_tensor})
    predictions = outputs[0]

    # Postprocess predictions (assuming classification output)
    # Adapt this section if your model output is different (e.g., detection boxes)
    predicted_index = np.argmax(predictions)
    predicted_label = labels[str(predicted_index)] # Assumes labels are indexed by string numbers

    return predicted_label, original_image

# Display prediction for a single image
def display_single_prediction(image_path, model_path, labels_path, input_size):
    """
    Predicts the class for a single image and displays the image with its prediction.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.
    """
    try:
        # Run prediction
        prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)

        # Display image and prediction
        fig, ax = plt.subplots(1, 1, figsize=(8, 8)) # Create a single plot
        ax.imshow(img)
        ax.set_title(f"Prediction: {prediction}", fontsize=14)
        ax.axis("off") # Hide axes ticks and labels

        plt.tight_layout()
        plt.show()

    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
    except Exception as e:
        print(f"An error occurred: {e}")


# --- Main Execution ---

# Paths and parameters - **MODIFY THESE**
single_image_path = "path/to/your/single_image.jpg"  # <--- Replace with the actual path to your image file
model_path = "path/to/your/yolov9_model.onnx"    # <--- Replace with the actual path to your ONNX model
labels_path = "path/to/your/labels.json"        # <--- Replace with the actual path to your labels JSON file
input_size = (224, 224)                         # Standard input size, adjust if your model differs

# Check if the image file exists before proceeding (optional but recommended)
if os.path.exists(single_image_path):
    # Run prediction and display for the single image
    display_single_prediction(single_image_path, model_path, labels_path, input_size)
else:
    print(f"Error: The specified image file does not exist: {single_image_path}")

✨ 主要特性

针对性微调：针对NSFW图像分类任务进行了精细微调，能准确区分正常和不适宜内容的图像。
高性能表现：在训练数据上取得了高准确率，评估准确率达到0.980375。

📦 安装指南

文档未提及安装相关内容，如需使用可参考Hugging Face模型库的通用安装方法。

📚 详细文档

模型描述

微调视觉变换器（ViT）是Transformer编码器架构的变体，类似于BERT，适用于图像分类任务。本模型名为“google/vit-base-patch16-224-in21k”，在ImageNet - 21k数据集上进行了有监督的预训练。预训练时，图像被调整为224x224像素的分辨率，适用于广泛的图像识别任务。

在训练阶段，对超参数设置进行了精心调整，以确保模型性能达到最佳。模型以16的批量大小进行微调，既保证了计算效率，又能让模型从多样化的图像中有效学习。学习率设置为5e - 5，在快速收敛和稳定优化之间取得了平衡，使模型在训练过程中既能快速学习，又能稳步提升性能。

训练使用了包含约80,000张图像的专有数据集，该数据集具有高度的可变性，包含“正常”和“nsfw”两个类别。通过在该数据集上的训练，模型能够掌握细微的视觉模式，准确区分安全和露骨内容。

预期用途与局限性

预期用途

NSFW图像分类：主要用于NSFW图像分类，可在各类应用中过滤露骨或不适当的内容。

局限性

任务专业性：模型在NSFW图像分类任务上表现出色，但在其他任务上的性能可能有所不同。若要将其用于其他任务，建议在模型库中探索微调版本以获得最佳效果。

训练数据

模型的训练数据包含一个专有数据集，约有80,000张图像，具有高度的可变性，分为“正常”和“nsfw”两个类别。训练的目标是让模型能够有效区分安全和露骨内容。

训练统计信息

- 'eval_loss': 0.07463177293539047,
- 'eval_accuracy': 0.980375, 
- 'eval_runtime': 304.9846, 
- 'eval_samples_per_second': 52.462, 
- 'eval_steps_per_second': 3.279