基础模型:
- antony66/whisper-large-v3-russian
- bond005/whisper-large-v3-ru-podlodka
语言:
- 俄语(ru)
库名称: transformers
标签:
- 自动语音识别(asr)
- whisper模型
- 俄语处理
- mergekit工具
- 模型融合
数据集:
- mozilla-foundation/common_voice_17_0
- bond005/taiga_speech_v2
- bond005/podlodka_speech
- bond005/rulibrispeech
评估指标:
- 词错误率(wer)
模型详情
本模型采用TIES融合方法进行合并。
融合方法: ties
参数:
稀疏密度: 0.9
编码器权重:
- 0.8
- 0.2
解码器权重:
- 0.2
- 0.8
模型来源:
模型A: "/mnt/cloud/llm/whisper/whisper-large-v3-russian"
模型B: "/mnt/cloud/llm/whisper/whisper-large-v3-ru-podlodka"
输出目录: "/mnt/cloud/llm/whisper/whisper-large-v3-russian-ties-podlodka"
简易API服务
可通过兼容OpenAPI的简易服务端调用:https://github.com/kreolsky/whisper-api-server/
使用说明
为处理电话录音,强烈建议在执行语音识别前进行音量预处理。例如:
sox record.wav -r 8000 record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-50,-40,-15,0,0 -7 0 0.15
语音识别代码示例如下:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
torch_dtype = torch.bfloat16
device = 'cpu'
if torch.cuda.is_available():
device = 'cuda'
elif torch.backends.mps.is_available():
device = 'mps'
setattr(torch.distributed, "is_initialized", lambda : False)
device = torch.device(device)
whisper = WhisperForConditionalGeneration.from_pretrained(
"antony66/whisper-large-v3-russian",
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
)
processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")
asr_pipeline = pipeline(
"automatic-speech-recognition",
model=whisper,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=256,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
from io import BufferIO
wav = BytesIO()
with open('record-normalized.wav', 'rb') as f:
wav.write(f.read())
wav.seek(0)
asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)
print(asr['text'])
开发状态
当前模型处于开发阶段,主要目标是对电话通话场景的语音识别进行优化。如果您有相关优质数据集或愿意参与贡献,请联系我们,您的帮助将极大推动项目进展。