license: cc-by-sa-4.0
datasets:
- Kostya165/ru_emotion_dvach
language:
- ru
metrics:
- accuracy
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
tags:
- russian
- emotion
- sentiment
- sentiment-analisys
- emotion-analisys
- emotion-classification
- emotion-detection
- rubert
- rubert-tiny
rubert_tiny2_russian_emotion_sentiment
模型说明
rubert_tiny2_russian_emotion_sentiment
是基于轻量级模型cointegrated/rubert-tiny2
微调的情感分类模型,用于识别俄语文本中的五种情绪:
- 0: 攻击性(агрессия)
- 1: 焦虑(тревожность)
- 2: 中性(нейтральное состояние)
- 3: 积极(позитив)
- 4: 讽刺(сарказм)
验证集表现
指标 |
数值 |
准确率 |
0.8911 |
宏观F1 |
0.8910 |
微观F1 |
0.8911 |
各类别准确率:
- 攻击性 (0): 0.9120
- 焦虑 (1): 0.9462
- 中性 (2): 0.8663
- 积极 (3): 0.8884
- 讽刺 (4): 0.8426
使用方式
pip install transformers torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = [
"今天真是美好的一天!",
"这一切都让我火冒三丈。"
]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
id2label = model.config.id2label
labels = [id2label[p] for p in preds]
print(labels)
训练细节
- 基础模型:
cointegrated/rubert-tiny2
- 训练数据:
Kostya165/ru_emotion_dvach
- 训练轮次: 2
- 批大小: 32
- 学习率: 1e-5
- 混合精度: FP16
- 正则化: Dropout 0.1, weight_decay 0.01, warmup_ratio 0.1
依赖环境
transformers>=4.30.0
torch>=1.10.0
datasets
evaluate
许可协议
CC-BY-SA 4.0.
引用方式
@article{rubert_tiny2_russian_emotion_sentiment,
title = {基于RuBERT-tiny2的俄语情感分类模型},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}
英文版说明
rubert_tiny2_russian_emotion_sentiment
模型概述
本模型是在轻量级cointegrated/rubert-tiny2
基础上微调的俄语情感分类器,可识别五种情绪:
- 0: 攻击性
- 1: 焦虑
- 2: 中性
- 3: 积极
- 4: 讽刺
性能指标
指标 |
数值 |
准确率 |
0.8911 |
宏观F1 |
0.8910 |
微观F1 |
0.8911 |
分项准确率:
- 攻击性: 0.9120
- 焦虑: 0.9462
- 中性: 0.8663
- 积极: 0.8884
- 讽刺: 0.8426
快速调用
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = ["今天真是美好的一天!", "这一切都让我火冒三丈。"]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
labels = [model.config.id2label[p] for p in preds]
print(labels)
训练参数
- 基础架构:
cointegrated/rubert-tiny2
- 训练数据:
Kostya165/ru_emotion_dvach
- 训练轮次: 2
- 批处理量: 32
- 学习率: 1e-5
- 精度模式: FP16
- 正则化: Dropout 0.1, 权重衰减 0.01, 预热比例 0.1
运行环境
transformers>=4.30.0
torch>=1.10.0
datasets
evaluate
版权声明
CC-BY-SA 4.0.
学术引用
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}