基础模型: facebook/w2v-bert-2.0
库名称: transformers
语言:
- 乌克兰语
许可证: "apache-2.0"
任务类别:
- 自动语音识别
标签:
- 音频
数据集:
- Yehor/openstt-uk
评估指标:
- 词错误率(WER)
模型索引:
- 名称: w2v-bert-uk-v2.1
结果:
- 任务:
名称: 自动语音识别
类型: automatic-speech-recognition
数据集:
名称: common_voice_10_0
类型: common_voice_10_0
配置: 乌克兰语
拆分: 测试集
参数: 乌克兰语
指标:
- 名称: 词错误率(WER)
类型: wer
值: 17.34
- 名称: 字符错误率(CER)
类型: cer
值: 3.33
w2v-bert-uk v2.1
版本
社区
- Discord社区: https://bit.ly/discord-uds
- 乌克兰语语音识别交流: https://t.me/speech_recognition_uk
- 乌克兰语语音合成交流: https://t.me/speech_synthesis_uk
其他乌克兰语模型请见: https://github.com/egorsmkv/speech-recognition-uk
概述
本模型是 https://huggingface.co/Yehor/w2v-bert-uk 的迭代版本
性能指标
- 声学模型(F16精度):
- 词错误率(WER): 0.1734 (17.34%)
- 字符错误率(CER): 0.0333 (3.33%)
- 词语准确率: 82.66%
- 字符准确率: 96.67%
在线演示
访问 https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo 体验模型对您上传音频的识别效果
使用方法
import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0'
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)
paths = [
'示例音频1.wav',
]
audio_inputs = []
for path in paths:
audio_input, _ = sf.read(path)
audio_inputs.append(audio_input)
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).half().to(device)
with torch.inference_mode():
logits = asr_model(features).logits
predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)
print('识别结果:')
print(predictions)
引用本工作
@misc {smoliakov_2025,
author = { {Smoliakov} },
title = { w2v-bert-uk-v2.1 (版本号 094c59d) },
year = 2025,
url = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
doi = { 10.57967/hf/4554 },
publisher = { Hugging Face }
}