SER-Odyssey-Baseline-WavLM-Valence开源模型 - 精准预测语音0到1效价值

首页

SER Odyssey Baseline WavLM Valence

由 3loi 开发

基于WavLM架构的语音情绪识别模型，专门用于预测语音中的效价（valence）值，范围为0到1。

音频分类

Transformers

英语开源协议:MIT #语音效价识别 #WavLM架构 #MSP-Podcast数据集

下载量 71

发布时间 : 3/15/2024

模型简介

该模型是针对Odyssey 2024情绪识别竞赛开发的基线模型，专注于单任务效价预测，基于MSP-Podcast数据集训练。

模型特点

高效价预测

专注于语音中的效价（valence）预测，输出范围在0到1之间。

基于MSP-Podcast数据集

使用MSP-Podcast数据集训练，该数据集专门用于语音情绪识别研究。

竞赛基线模型

作为Odyssey 2024情绪识别竞赛的基线模型，具有可靠的性能基准。

模型能力

语音情绪识别

效价预测

音频分类

使用案例

语音情绪分析

语音情感分析

分析语音中的情绪效价，用于情感计算和人机交互。

在Odyssey竞赛Test3和开发集上CCC指标分别为0.607和0.709。

🚀 音频分类模型

该模型用于音频分类，可对音频中的情感效价（valence）进行预测，是基于Transformer架构的音频情感识别模型，在特定竞赛中表现出色。

🚀 快速开始

本模型是为2024年奥德赛情感识别竞赛基线而在 MSP-Podcast 数据集上训练的。此特定模型是单任务的效价（valence）专业模型，可预测效价，范围约为0到1。

📚 详细文档

基准测试

基于奥德赛竞赛的Test3和开发集的CCC（一致性相关系数）。

单任务设置	Test 3	开发集
效价（Val）	0.607	0.709

更多详细信息请参考：演示、论文和 GitHub。

引用信息

@InProceedings{Goncalves_2024,
            author={L. Goncalves and A. N. Salman and A. {Reddy Naini} and L. Moro-Velazquez and T. Thebaud and L. {Paola Garcia} and N. Dehak and B. Sisman and C. Busso},
            title={Odyssey2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results},
            booktitle={Odyssey 2024: The Speaker and Language Recognition Workshop)},
            volume={To appear},
            year={2024},
            month={June},
            address =  {Quebec, Canada},
}

💻 使用示例

基础用法

from transformers import AutoModelForAudioClassification
import librosa, torch

#load model
model = AutoModelForAudioClassification.from_pretrained("3loi/SER-Odyssey-Baseline-WavLM-Valence", trust_remote_code=True)

#get mean/std
mean = model.config.mean
std = model.config.std

#load an audio file
audio_path = "/path/to/audio.wav"
raw_wav, _ = librosa.load(audio_path, sr=model.config.sampling_rate)

#normalize the audio by mean/std
norm_wav = (raw_wav - mean) / (std+0.000001)

#generate the mask
mask = torch.ones(1, len(norm_wav))

#batch it (add dim)
wavs = torch.tensor(norm_wav).unsqueeze(0)

#predict
with torch.no_grad():
    pred = model(wavs, mask)

print(model.config.id2label) 
print(pred)
#{0: 'valence'}
#tensor([[0.3670]])