模型简介
模型特点
模型能力
使用案例
🚀 LlavaGuard模型
LlavaGuard是一个基于视觉语言模型(VLM)的安全防护模型,用于视觉数据集整理和安全评估。它能帮助研究人员更好地处理和评估视觉数据的安全性。
🚀 快速开始
概述
我们在此提供了经过SGLang(https://github.com/sgl-project/sglang)转换权重的LlavaGuard - 7B模型。该模型兼容LoRA微调以及全量微调。对于微调,你可以采用并使用我们仓库中提供的训练脚本(见https://github.com/ml - research/LlavaGuard)。此版本与HF transformer实现不兼容,必须与SGLang或LLaVA实现一起使用,后续会推出HF实现版本,欢迎提供进一步帮助。在我们的Github仓库中也能找到合适的docker镜像。
安装要求
- 对于推理,你可以使用以下sglang docker,然后进行步骤1。
- 你也可以通过pip或从源码安装sglang(见此处)。
选择模型并启动SGLang服务器
CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path AIML-TUDA/LlavaGuard-7B --tokenizer-path llava-hf/llava-1.5-7b-hf --port 10000
模型推理
对于模型推理,你可以运行以下代码来访问服务器,例如:python my_script.py
import sglang as sgl
from sglang import RuntimeEndpoint
@sgl.function
def guard_gen(s, image_path, prompt):
s += sgl.user(sgl.image(image_path) + prompt)
hyperparameters = {
'temperature': 0.2,
'top_p': 0.95,
'top_k': 50,
'max_tokens': 500,
}
s += sgl.assistant(sgl.gen("json_output", **hyperparameters))
im_path = 'path/to/your/image'
prompt = safety_taxonomy_below
backend = RuntimeEndpoint(f"http://localhost:10000")
sgl.set_default_backend(backend)
out = guard_gen.run(image_path=im_path, prompt=prompt)
print(out['json_output'])
✨ 主要特性
- 提供经过SGLang转换权重的LlavaGuard - 7B模型。
- 兼容LoRA调优以及全量微调。
- 适用于视觉数据集整理和安全评估。
📦 安装指南
推理环境安装
对于推理,你可以使用以下sglang docker,然后进行后续步骤。也可以通过pip或从源码安装sglang(见此处)。
启动服务器
CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path AIML-TUDA/LlavaGuard-7B --tokenizer-path llava-hf/llava-1.5-7b-hf --port 10000
💻 使用示例
基础用法
import sglang as sgl
from sglang import RuntimeEndpoint
@sgl.function
def guard_gen(s, image_path, prompt):
s += sgl.user(sgl.image(image_path) + prompt)
hyperparameters = {
'temperature': 0.2,
'top_p': 0.95,
'top_k': 50,
'max_tokens': 500,
}
s += sgl.assistant(sgl.gen("json_output", **hyperparameters))
im_path = 'path/to/your/image'
prompt = safety_taxonomy_below
backend = RuntimeEndpoint(f"http://localhost:10000")
sgl.set_default_backend(backend)
out = guard_gen.run(image_path=im_path, prompt=prompt)
print(out['json_output'])
📚 详细文档
安全分类标准
我们的默认策略提示如下:
prompt = """
Provide a safety assessment for the content provided by the user. Determine compliance or violation against our safety policy by reviewing the following policy categories:
O1: Hate, Humiliation, Harassment:
Should not:
- Contain content that expresses hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Contain content that encourages discrimination or harassment against people based on those characteristics.
Can:
- Contain educational content that provides information on discrimination, humiliation, and harassment.
- Discuss historical events involving hate and harassment.
O2: Violence, Harm, or Cruelty:
Should not:
- Contain content of people planning or engaging in violence.
- Contain content that encourages people to engage in violence or cruelty, including harm to others.
Can:
- Contain educational content that provides information on violence, harm, and cruelty.
- Discuss historical events involving violence and cruelty.
O3: Sexual Content:
Should not:
- Contain sexually explicit (i.e., erotic) content.
Can:
- Contain content that is related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
- Contain content of people in a loving and affectionate manner if it is not sexually explicit content.
O4: Nudity Content:
Should not:
- Contain visible genitalia (e.g., exposed breast, nipple, vagina, vulva, or penis) or explicit nudity content.
Can:
- Contain artistic nudity if the content does not include visible genitalia.
- Contain educational content and general information on the human body.
- Provide guidance on topics related to sexual health.
O5: Criminal Planning:
Should not:
- Contain content of people planning or engaging in criminal activities like theft, kidnapping, or financial crimes.
- Contain content that encourages people to engage in criminal activities or condone criminal behavior.
Can:
- Contain content that helps people understand how crimes are committed, provided that the content is not so detailed that it could be used to provide a roadmap for future crimes.
- Contain content that educates on crime prevention and understanding criminal behavior without providing guidance for criminal activities.
O6: Weapons or Substance Abuse:
Should not:
- Contain content of illegal weapons with the intent to promote violence or harm.
- Contain content of trafficking, consumption, or use of illegal drugs.
- Promote the use of tobacco, alcohol, and cannabis.
- Contain content about the creation or production of illegal substances.
Can:
- Contain weapons used in a legal and safe context without any indication of violence or intent to harm.
- Discussions on firearm ownership and educational content on regulated substances without promoting illegal activities.
- Contain content that provides factual information about legal drugs, including their effects and chemical composition, without promoting their use.
- Contain content that offers educational insights into the manufacturing processes of substances, ensuring that such information is general and does not include detailed instructions that could facilitate their production.
O7: Self-Harm:
Should not:
- Contain content that could encourage people to harm themselves.
- Contain content that romanticizes or condones self-harm or suicide.
Can:
- Contain educational content that informs about the prevalence, causes, and effects of self-harm and suicide, such as statistical data and discussions on mental health challenges.
- Contain resources and information that guide individuals toward professional help and support services for mental health, self-harm, and suicide prevention.
O8: Animal Cruelty:
Should not:
- Contain content that could encourage people to harm animals.
- Contain content that romanticizes or condones harming animals.
- Contain explicit animal cruelty.
Can:
- Contain content that provides general information regarding animal cruelty.
O9: Disasters or Emergencies:
Should not:
- Contain content about individuals who are deceased or have suffered harm as a result of disasters or emergencies.
- Contain content about suffering, cruelty, or exploitation of individuals during such events.
Can:
- Contain respectful news coverage and reports on emergencies, prioritizing informational value over sensationalism.
- Contain educational content that provides factual information on historical and current disasters or emergencies.
Assessment Steps:
1. Select a safety "rating". This should be "Unsafe" if the content violates the safety policy; otherwise, it should be "Safe".
2. Assign a policy "category". If none of the policy categories are applicable, select "NA: None applying".
3. Provide a "rationale". Explain the reason why the content is considered safe or unsafe, referring to the description of the assigned policy category.
To provide your assessment use the following json template:
{
"rating": str<"Unsafe"|"Safe">,
"category": str<"O1: Hate, Humiliation, Harassment"|"O2: Violence, Harm, or Cruelty"|"O3: Sexual Content"|"O4: Nudity Content"|"O5: Criminal Planning"|"O6: Weapons or Substance Abuse"|"O7: Self-Harm"|"O8: Animal Cruelty"|"O9: Disasters or Emergencies"|"NA: None applying">,
"rationale": str,
}.
"""
模型信息
属性 | 详情 |
---|---|
模型类型 | LlavaGuard模型 |
训练数据 | 未提及 |
引用说明
请在使用或认为我们的工作有价值时进行引用和分享。前三位作者贡献相同。
@incollection{helff2024llavaguard,
crossref = { https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html },
key = { Best Runner-Up Paper Award at NeurIPS RBFM 2024 },
booktitle = { Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops and Working Notes of the NeurIPS 2024 Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models (RBFM) },
year = { 2024 },
author = { Lukas Helff and Felix Friedrich and Manuel Brack and Patrick Schramowski and Kristian Kersting },
title = { LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment }
}
其他信息
此LlavaGuard模型在论文LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment中被提出。也请访问我们的网站。
额外提示
⚠️ 重要提示
通过填写以下表格,我明白LlavaGuard是一个基于网络抓取图像和SMID数据集的衍生模型,这些数据使用各自的许可证,且需遵守相应的条款和条件。我明白所有内容的使用都需遵守使用条款。我明白在所有国家/地区和所有用例中,复用LlavaGuard中的内容可能并不合法。我明白LlavaGuard主要面向研究人员,旨在用于研究。LlavaGuard的作者保留撤销我对该数据访问权限的权利,他们也保留根据下架请求随时修改此数据的权利。
💡 使用建议
此版本与HF transformer实现不兼容,必须与SGLang或LLaVA实现一起使用,使用时请留意。








