Whisper Large V3 Russian Ties Podlodka V1.2

由 Apel-sin 开发

基于TIES融合方法的俄语语音识别模型，整合了两个Whisper-large-v3俄语变体，针对电话录音场景优化

语音识别

Transformers

其他

#俄语电话录音识别 #TIES融合模型 #低资源优化

下载量 2,408

发布时间 : 4/2/2025

模型介绍

内容详情

替代品

模型简介

该模型通过TIES融合方法合并了两个俄语Whisper模型，专注于提高俄语语音识别准确率，特别优化了电话通话场景下的识别性能

模型特点

TIES融合技术

采用先进的TIES模型融合方法，稀疏密度0.9，编码器/解码器差异化权重分配(0.8/0.2和0.2/0.8)

电话场景优化

专门针对电话录音场景优化，建议配合音频预处理流程使用

多数据集训练

融合了Common Voice 17.0、Taiga Speech、Podlodka等多个俄语语音数据集

模型能力

俄语语音转文本

长音频分块处理

时间戳生成

低资源设备支持

使用案例

语音转录

电话录音转写

将俄语电话通话内容转换为文字记录

针对电话音频优化的识别准确率

会议记录生成

自动生成俄语会议音频的文字记录

支持长音频分块处理

基础模型:

antony66/whisper-large-v3-russian
bond005/whisper-large-v3-ru-podlodka 语言:
俄语(ru) 库名称: transformers 标签:
自动语音识别(asr)
whisper模型
俄语处理
mergekit工具
模型融合数据集:
mozilla-foundation/common_voice_17_0
bond005/taiga_speech_v2
bond005/podlodka_speech
bond005/rulibrispeech 评估指标:
词错误率(wer)

模型详情

本模型采用TIES融合方法进行合并。

融合方法: ties
参数:
  稀疏密度: 0.9
  编码器权重:
    - 0.8
    - 0.2
  解码器权重:
    - 0.2
    - 0.8
模型来源:
  模型A: "/mnt/cloud/llm/whisper/whisper-large-v3-russian"
  模型B: "/mnt/cloud/llm/whisper/whisper-large-v3-ru-podlodka"
输出目录: "/mnt/cloud/llm/whisper/whisper-large-v3-russian-ties-podlodka"

简易API服务

可通过兼容OpenAPI的简易服务端调用：https://github.com/kreolsky/whisper-api-server/

使用说明

为处理电话录音，强烈建议在执行语音识别前进行音量预处理。例如：

sox record.wav -r 8000 record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-50,-40,-15,0,0 -7 0 0.15

语音识别代码示例如下：

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline

# 设置计算精度
torch_dtype = torch.bfloat16 

# 自动检测运行设备
device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'
elif torch.backends.mps.is_available():
    device = 'mps'
    setattr(torch.distributed, "is_initialized", lambda : False) # 兼容性补丁
device = torch.device(device)

# 加载模型
whisper = WhisperForConditionalGeneration.from_pretrained(
    "antony66/whisper-large-v3-russian", 
    torch_dtype=torch_dtype, 
    low_cpu_mem_usage=True, 
    use_safetensors=True,
    # 若GPU支持可添加 attn_implementation="flash_attention_2"
)

# 加载处理器
processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")

# 构建语音识别管道
asr_pipeline = pipeline(
    "automatic-speech-recognition",
    model=whisper,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=256,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

# 读取音频文件
from io import BufferIO
wav = BytesIO()
with open('record-normalized.wav', 'rb') as f:
    wav.write(f.read())
wav.seek(0)

# 获取转写结果
asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)

print(asr['text'])