emotions-analyzer-bert开源情感分析模型 - 支持28种情感精准识别

首页

Emotions Analyzer Bert

由 logasanjeev 开发

基于BERT-base-uncased架构微调的多标签情感分类模型，支持28种情感识别

文本分类

Transformers

英语开源协议:MIT #多标签情感分析 #Reddit评论处理 #Focal Loss优化

下载量 3,764

发布时间 : 4/12/2025

模型简介

该模型基于GoEmotions数据集训练，专门用于分析文本中的多种情感，适用于社交媒体评论等场景的情感分析任务。

模型特点

多标签情感分类

可同时识别文本中的多种情感，支持28种不同情感标签

高效推理支持

提供PyTorch和ONNX两种推理方式，满足不同场景的性能需求

优化的阈值处理

使用优化后的分类阈值提高预测准确性

表情符号处理

能够识别和处理文本中的表情符号，将其转换为情感特征

模型能力

情感分析

多标签分类

文本预处理

表情符号识别

使用案例

社交媒体分析

Reddit评论情感分析

分析Reddit用户评论中的情感倾向

可识别28种不同情感，如快乐、愤怒、悲伤等

客户反馈分析

产品评价情感分析

分析客户对产品或服务的评价情感

帮助识别客户满意度及潜在问题

🚀 情感分析BERT模型

本项目基于 GoEmotions 数据集对 BERT-base-uncased 模型进行微调，用于多标签分类任务（28 种情感）。此更新版本提高了宏 F1 分数，支持 ONNX 以实现高效推理，并增加了可视化功能以提高可解释性。

🚀 快速开始

你可以通过 Gradio 演示进行准确预测，该演示使用了优化后的阈值。现在，演示除了提供阈值预测外，还包括预处理后的文本和前 5 个预测情感。以下是一些示例预测：

输入："Iâ€™m thrilled to win this award! ðŸ˜„"
- 输出：excitement: 0.5836, joy: 0.5290
输入："This is so frustrating, nothing works. ðŸ˜£"
- 输出：annoyance: 0.6147, anger: 0.4669
输入："I feel so sorry for what happened. ðŸ˜¢"
- 输出：sadness: 0.5321, remorse: 0.9107

✨ 主要特性

基于 BERT-base-uncased 架构，拥有 1.1 亿个参数。
使用 GoEmotions 数据集进行训练，包含 5.8 万条 Reddit 评论和 28 种情感。
采用 Focal Loss 作为损失函数（alpha=1，gamma=2）。
使用 AdamW 优化器（学习率为 2e-5，权重衰减为 0.01）。
训练 5 个轮次，批量大小为 16，最大长度为 128。
在 Kaggle P100 GPU（16GB）上进行训练。

📦 安装指南

快速推理（推荐使用 PyTorch）

使用 PyTorch 调用模型最简单的方法是通过编程方式从仓库中获取并使用 inference.py。该脚本会为你处理所有预处理、模型加载和推理任务。

编程式下载和推理

运行以下 Python 脚本下载 inference.py 并进行预测：

!pip install transformers torch huggingface_hub emoji -q

import shutil
import os
from huggingface_hub import hf_hub_download
from importlib import import_module

repo_id = "logasanjeev/emotions-analyzer-bert"
local_file = hf_hub_download(repo_id=repo_id, filename="inference.py")

current_dir = os.getcwd()
destination = os.path.join(current_dir, "inference.py")
shutil.copy(local_file, destination)

inference_module = import_module("inference")
predict_emotions = inference_module.predict_emotions

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
result, processed = predict_emotions(text)
print(f"Input: {text}")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

预期输出：

Input: Iâ€™m thrilled to win this award! ðŸ˜„
Processed: iâ€™m thrilled to win this award ! grinning_face_with_smiling_eyes
Predicted Emotions:
excitement: 0.5836
joy: 0.5290

替代方法：手动下载

如果你更喜欢手动下载 inference.py，请按以下步骤操作：

安装所需依赖：

pip install transformers torch huggingface_hub emoji

从仓库下载 inference.py。
在 Python 中或通过命令行使用它。

Python 示例：

from inference import predict_emotions

result, processed = predict_emotions("Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Input: Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

命令行示例：

python inference.py "Iâ€™m thrilled to win this award! ðŸ˜„"

ONNX 推理

若要使用 ONNX 进行更快速高效的推理，你可以使用 onnx_inference.py。该脚本利用 ONNX Runtime 进行推理，通常比 PyTorch 更轻量级。

编程式下载和推理

运行以下 Python 脚本下载 onnx_inference.py 并进行预测：

!pip install transformers onnxruntime huggingface_hub emoji numpy -q

import shutil
import os
from huggingface_hub import hf_hub_download
from importlib import import_module

repo_id = "logasanjeev/emotions-analyzer-bert"
local_file = hf_hub_download(repo_id=repo_id, filename="onnx_inference.py")

current_dir = os.getcwd()
destination = os.path.join(current_dir, "onnx_inference.py")
shutil.copy(local_file, destination)

onnx_inference_module = import_module("onnx_inference")
predict_emotions = onnx_inference_module.predict_emotions

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
result, processed = predict_emotions(text)
print(f"Input: {text}")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

预期输出：

Input: Iâ€™m thrilled to win this award! ðŸ˜„
Processed: iâ€™m thrilled to win this award ! grinning_face_with_smiling_eyes
Predicted Emotions:
excitement: 0.5836
joy: 0.5290

替代方法：手动下载

如果你更喜欢手动下载 onnx_inference.py，请按以下步骤操作：

安装所需依赖：

pip install transformers onnxruntime huggingface_hub emoji numpy

从仓库下载 onnx_inference.py。
在 Python 中或通过命令行使用它。

Python 示例：

from onnx_inference import predict_emotions

result, processed = predict_emotions("Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Input: Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

命令行示例：

python onnx_inference.py "Iâ€™m thrilled to win this award! ðŸ˜„"

💻 使用示例

基础用法

from transformers import BertForSequenceClassification, BertTokenizer
import torch
import json
import requests
import re
import emoji

def preprocess_text(text):
    text = re.sub(r'u/\w+', '[USER]', text)
    text = re.sub(r'r/\w+', '[SUBREDDIT]', text)
    text = re.sub(r'http[s]?://\S+', '[URL]', text)
    text = emoji.demojize(text, delimiters=(" ", " "))
    text = text.lower()
    return text

repo_id = "logasanjeev/emotions-analyzer-bert"
model = BertForSequenceClassification.from_pretrained(repo_id)
tokenizer = BertTokenizer.from_pretrained(repo_id)

thresholds_url = f"https://huggingface.co/{repo_id}/raw/main/optimized_thresholds.json"
thresholds_data = json.loads(requests.get(thresholds_url).text)
emotion_labels = thresholds_data["emotion_labels"]
thresholds = thresholds_data["thresholds"]

text = "Iâ€™m just chilling today."
processed_text = preprocess_text(text)
encodings = tokenizer(processed_text, padding='max_length', truncation=True, max_length=128, return_tensors='pt')
with torch.no_grad():
    logits = torch.sigmoid(model(**encodings).logits).numpy()[0]
predictions = [(emotion_labels[i], round(logit, 4)) for i, (logit, thresh) in enumerate(zip(logits, thresholds)) if logit >= thresh]
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
print(predictions)
# 输出: [('neutral', 0.8147)]

高级用法

import onnxruntime as ort
import numpy as np

onnx_url = f"https://huggingface.co/{repo_id}/raw/main/model.onnx"
with open("model.onnx", "wb") as f:
    f.write(requests.get(onnx_url).content)

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
processed_text = preprocess_text(text)
encodings = tokenizer(processed_text, padding='max_length', truncation=True, max_length=128, return_tensors='np')
session = ort.InferenceSession("model.onnx")
inputs = {
    'input_ids': encodings['input_ids'].astype(np.int64),
    'attention_mask': encodings['attention_mask'].astype(np.int64)
}
logits = session.run(None, inputs)[0][0]
logits = 1 / (1 + np.exp(-logits))  # Sigmoid
predictions = [(emotion_labels[i], round(logit, 4)) for i, (logit, thresh) in enumerate(zip(logits, thresholds)) if logit >= thresh]
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
print(predictions)
# 输出: [('excitement', 0.5836), ('joy', 0.5290)]

📚 详细文档

模型详情

属性	详情
模型架构	BERT-base-uncased（1.1 亿个参数）
训练数据	GoEmotions（5.8 万条 Reddit 评论，28 种情感）
损失函数	Focal Loss（alpha=1，gamma=2）
优化器	AdamW（学习率为 2e-5，权重衰减为 0.01）
训练轮次	5
批量大小	16
最大长度	128
硬件	Kaggle P100 GPU（16GB）

性能指标

微 F1 分数：0.6006（优化阈值）
宏 F1 分数：0.5390
精确率：0.5371
召回率：0.6812
汉明损失：0.0377
平均正预测数：1.4789

如需查看包括各类别准确率、精确率、召回率、F1 分数、MCC、支持度和阈值等详细评估指标，以及可视化结果，请查看 Kaggle 笔记本。

各类别性能

以下表格展示了使用优化阈值（见 optimized_thresholds.json）在测试集上的各类别指标：

情感	准确率	精确率	召回率	F1 分数	MCC	支持度	阈值
admiration	0.9410	0.6649	0.7361	0.6987	0.6672	504	0.4500
amusement	0.9801	0.7635	0.8561	0.8071	0.7981	264	0.4500
anger	0.9694	0.6176	0.4242	0.5030	0.4970	198	0.4500
annoyance	0.9121	0.3297	0.4750	0.3892	0.3502	320	0.3500
approval	0.8843	0.2966	0.5755	0.3915	0.3572	351	0.3500
caring	0.9759	0.5196	0.3926	0.4473	0.4396	135	0.4500
confusion	0.9711	0.4861	0.4575	0.4714	0.4567	153	0.4500
curiosity	0.9368	0.4442	0.8275	0.5781	0.5783	284	0.4000
desire	0.9865	0.5714	0.4819	0.5229	0.5180	83	0.4000
disappointment	0.9565	0.2906	0.3907	0.3333	0.3150	151	0.3500
disapproval	0.9235	0.3405	0.5918	0.4323	0.4118	267	0.3500
disgust	0.9810	0.6250	0.4065	0.4926	0.4950	123	0.5500
embarrassment	0.9947	0.7000	0.3784	0.4912	0.5123	37	0.5000
excitement	0.9790	0.4486	0.4660	0.4571	0.4465	103	0.4000
fear	0.9836	0.4599	0.8077	0.5860	0.6023	78	0.3000
gratitude	0.9888	0.9450	0.8778	0.9102	0.9049	352	0.5500
grief	0.9985	0.3333	0.3333	0.3333	0.3326	6	0.3000
joy	0.9768	0.6061	0.6211	0.6135	0.6016	161	0.4500
love	0.9825	0.7826	0.8319	0.8065	0.7978	238	0.5000
nervousness	0.9952	0.4348	0.4348	0.4348	0.4324	23	0.4000
optimism	0.9689	0.5436	0.5699	0.5564	0.5405	186	0.4000
pride	0.9980	0.8571	0.3750	0.5217	0.5662	16	0.4000
realization	0.9737	0.5217	0.1655	0.2513	0.2838	145	0.4500
relief	0.9982	0.5385	0.6364	0.5833	0.5845	11	0.3000
remorse	0.9912	0.5426	0.9107	0.6800	0.6992	56	0.3500
sadness	0.9757	0.5845	0.5321	0.5570	0.5452	156	0.4500
surprise	0.9724	0.4772	0.6667	0.5562	0.5504	141	0.3500
neutral	0.7485	0.5821	0.8372	0.6867	0.5102	1787	0.4000