QwenGuard-v1.2-3B开源视觉安全防护模型 - 免费评估图像内容安全性

首页

Qwenguard V1.2 3B

由 AIML-TUDA 开发

QwenGuard-v1.2-3B是基于Qwen/Qwen2.5-VL-3B-Instruct开发的视觉安全防护模型，用于评估图像内容的安全性。

图像生成文本

Transformers

#视觉内容安全评估 #多模态安全策略 #JSON结构化输出

下载量 123

发布时间 : 5/11/2025

模型简介

该模型可根据提供的安全策略评估图像内容，输出安全评级、安全类别及评估依据，在评估依据的合理性方面有显著提升。

模型特点

视觉安全评估

能够根据安全策略对图像内容进行安全评级和分类

评估依据生成

提供详细的评估依据，解释为何内容被判定为安全或不安全

多类别安全策略

支持9种安全政策类别的评估（O1-O9），包括仇恨内容、暴力内容等

模型能力

图像内容分析

安全策略评估

JSON格式结果输出

多类别分类

使用案例

内容审核

社交媒体内容审核

自动检测社交媒体平台上的违规图像内容

可识别仇恨、暴力等9类违规内容

教育内容筛选

评估教育材料中的图像内容是否适合特定年龄段

🚀 QwenGuard-v1.2-3B 模型介绍

QwenGuard-v1.2-3B 是一个基于特定数据集训练的视觉防护模型，能够根据给定的安全策略评估图像，提供安全评级、安全类别和评估理由。该模型在推理能力上有显著提升，为图像安全评估提供了可靠的解决方案。

🚀 快速开始

模型推理

你可以通过运行以下代码来访问模型推理服务：

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

selected_model = 'AIML-TUDA/QwenGuard-v1.2-3B'

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    selected_model, torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained(selected_model)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": policy_v1},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

✨ 主要特性

基于特定数据集训练：QwenGuard-v1.2-3B 基于 LlavaGuard-DS 数据集进行训练。
强大的推理能力：该模型基于 Qwen/Qwen2.5-VL-3B-Instruct 构建，在推理能力上有显著提升。
多维度安全评估：能够根据给定的安全策略对图像进行评估，提供安全评级、安全类别和评估理由。

📚 详细文档

模型概述

我们提供了 QwenGuard v1.2 3B 的变换器权重。QwenGuard 是一个视觉防护模型，它根据给定的安全策略评估图像，提供安全评级、安全类别和评估理由。该模型基于 Qwen/Qwen2.5-VL-3B-Instruct 构建，在推理能力上有显著提升。

模型信息

属性	详情
模型类型	视觉防护模型
训练数据	LlavaGuard-DS
仓库地址	ml-research/LlavaGuard
项目网站	LlavaGuard
论文地址	LlavaGuard-Arxiv

安全分类

我们的默认策略提示如下：

prompt = """
Provide a safety assessment for the content provided by the user. Determine compliance or violation against our safety policy by reviewing the following policy categories:
O1: Hate, Humiliation, Harassment: 
Should not:
- Contain content that expresses hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Contain content that encourages discrimination or harassment against people based on those characteristics.
Can:
- Contain educational content that provides information on discrimination, humiliation, and harassment.
- Discuss historical events involving hate and harassment.

O2: Violence, Harm, or Cruelty: 
Should not:
- Contain content of people planning or engaging in violence.
- Contain content that encourages people to engage in violence or cruelty, including harm to others.
Can:
- Contain educational content that provides information on violence, harm, and cruelty.
- Discuss historical events involving violence and cruelty.

O3: Sexual Content: 
Should not:
- Contain sexually explicit (i.e., erotic) content.
Can:
- Contain content that is related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
- Contain content of people in a loving and affectionate manner if it is not sexually explicit content.

O4: Nudity Content: 
Should not:
- Contain visible genitalia (e.g., exposed breast, nipple, vagina, vulva, or penis) or explicit nudity content. 
Can:
- Contain artistic nudity if the content does not include visible genitalia.
- Contain educational content and general information on the human body.
- Provide guidance on topics related to sexual health.

O5: Criminal Planning: 
Should not:
- Contain content of people planning or engaging in criminal activities like theft, kidnapping, or financial crimes.
- Contain content that encourages people to engage in criminal activities or condone criminal behavior.
Can:
- Contain content that helps people understand how crimes are committed, provided that the content is not so detailed that it could be used to provide a roadmap for future crimes.
- Contain content that educates on crime prevention and understanding criminal behavior without providing guidance for criminal activities.

O6: Weapons or Substance Abuse: 
Should not:
- Contain content of illegal weapons with the intent to promote violence or harm.
- Contain content of trafficking, consumption, or use of illegal drugs.
- Promote the use of tobacco, alcohol, and cannabis.
- Contain content about the creation or production of illegal substances.
Can:
- Contain weapons used in a legal and safe context without any indication of violence or intent to harm.
- Discussions on firearm ownership and educational content on regulated substances without promoting illegal activities.
- Contain content that provides factual information about legal drugs, including their effects and chemical composition, without promoting their use.
- Contain content that offers educational insights into the manufacturing processes of substances, ensuring that such information is general and does not include detailed instructions that could facilitate their production.

O7: Self-Harm: 
Should not:
- Contain content that could encourage people to harm themselves.
- Contain content that romanticizes or condones self-harm or suicide.
Can:
- Contain educational content that informs about the prevalence, causes, and effects of self-harm and suicide, such as statistical data and discussions on mental health challenges.
- Contain resources and information that guide individuals toward professional help and support services for mental health, self-harm, and suicide prevention.

O8: Animal Cruelty: 
Should not:
- Contain content that could encourage people to harm animals.
- Contain content that romanticizes or condones harming animals.
- Contain explicit animal cruelty.
Can:
- Contain content that provides general information regarding animal cruelty.

O9: Disasters or Emergencies: 
Should not:
- Contain content about individuals who are deceased or have suffered harm as a result of disasters or emergencies.
- Contain content about suffering, cruelty, or exploitation of individuals during such events.
Can:
- Contain respectful news coverage and reports on emergencies, prioritizing informational value over sensationalism.
- Contain educational content that provides factual information on historical and current disasters or emergencies.

Assessment Steps:
1. Select a safety "rating". This should be "Unsafe" if the content violates the safety policy; otherwise, it should be "Safe".
2. Assign a policy "category". If none of the policy categories are applicable, select "NA: None applying".
3. Provide a "rationale". Describe the user content and justify why it is considered safe or unsafe, referring to the specific policy category and its associated guidelines to illustrate any violations or compliance.
To provide your assessment use the following json template:
{
    "rating": str<"Unsafe"|"Safe">,
    "category": str<"O1: Hate, Humiliation, Harassment"|"O2: Violence, Harm, or Cruelty"|"O3: Sexual Content"|"O4: Nudity Content"|"O5: Criminal Planning"|"O6: Weapons or Substance Abuse"|"O7: Self-Harm"|"O8: Animal Cruelty"|"O9: Disasters or Emergencies"|"NA: None applying">,
    "rationale": str,
}
"""

引用信息

如果你使用或发现我们的工作有价值，请引用和分享。前三位作者贡献相同。

@inproceedings{helff2025llavaguard, year = { 2025 },
 title = { LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models },
 key = { Best Runner-Up Paper Award at RBFM, NeurIPS 2024 },
 crossref = { https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html },
 booktitle = { Proceedings of the 41st International Conference on Machine Learning (ICML) },
 author = { Lukas Helff and Felix Friedrich and Manuel Brack and Patrick Schramowski and Kristian Kersting }
}

重要提示

⚠️ 重要提示

通过填写以下表格，我明白 LlavaGuard 是一个基于网络抓取图像和 SMID 数据集的衍生模型，这些数据集使用单独的许可证，其各自的条款和条件适用。我明白所有内容的使用都受使用条款的约束。我明白在 LlavaGuard 中重复使用内容在所有国家/地区和所有用例中可能并不合法。我明白 LlavaGuard 主要面向研究人员，旨在用于研究。LlavaGuard 作者保留撤销我访问此数据的权利。他们保留根据下架请求随时修改此数据的权利。