HateBERT开源模型 - 免费检测英语侮辱性语言，净化交流环境

首页

Hatebert

由 GroNLP 开发

HateBERT是一个专门用于英语侮辱性语言检测的预训练BERT模型，基于Reddit被禁社区的帖子训练。

大型语言模型

Transformers

英语#Reddit仇恨言论检测 #英语冒犯性语言识别 #BERT微调优化

下载量 5,818

发布时间 : 3/2/2022

模型简介

HateBERT是一个英语预训练的BERT模型，专门针对侮辱性语言、仇恨言论和冒犯性语言检测任务进行了优化。

模型特点

专门针对侮辱性语言优化

通过对Reddit被禁社区的100多万条帖子进行训练，专门优化了侮辱性语言检测能力。

优于通用BERT模型

在所有测试数据集中，HateBERT在侮辱性语言检测任务上的表现均优于通用BERT模型。

可移植性研究

模型包含对不同数据集间可移植性的研究，结果表明可移植性受标注现象兼容性影响。

模型能力

文本分类

侮辱性语言检测

仇恨言论识别

冒犯性语言识别

使用案例

内容审核

社交媒体内容过滤

自动识别和过滤社交媒体平台上的侮辱性和仇恨言论内容。

提高内容审核效率，减少人工审核工作量

学术研究

网络语言行为研究

用于研究网络社区中的侮辱性语言模式和演变趋势。

为社会学和语言学提供量化分析工具

🚀 HateBERT模型

HateBERT是一个用于英文文本分类的预训练模型，它专注于检测辱骂性语言、仇恨言论和冒犯性语言，能有效解决在英文文本中识别不良语言的问题，为构建健康的网络语言环境提供有力支持。

🚀 快速开始

HateBERT是一个英文预训练的BERT模型，它通过使用来自Reddit被封禁社区的100多万条帖子，对英文BERT基础无大小写模型进行进一步训练而得到。该模型由格罗宁根大学、都灵大学和帕绍大学合作开发。

如需了解详细信息，请查看在 WOAH 2021 会议上发表的论文。代码和微调后的模型可在 OSF 上获取。

📚 详细文档

作者信息

BibTeX引用信息

@inproceedings{caselli-etal-2021-hatebert,
    \ttitle = "{H}ate{BERT}: Retraining {BERT} for Abusive Language Detection in {E}nglish",
    \tauthor = "Caselli, Tommaso  and
      Basile, Valerio  and
      Mitrovi{\'c}, Jelena  and
      Granitzer, Michael",
    \tbooktitle = "Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)",
    \tmonth = aug,
    \tyear = "2021",
    \taddress = "Online",
    \tpublisher = "Association for Computational Linguistics",
    \tturl = "https://aclanthology.org/2021.woah-1.3",
    \tdoi = "10.18653/v1/2021.woah-1.3",
    \tpages = "17--25",
    \tabstract = "We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have curated and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the retrained version on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the fine-tuned models across the datasets, suggesting that portability is affected by compatibility of the annotated phenomena.",
}