German-RAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI开源模型 - 精准提升德语语音识别准确率

首页

German RAG WHISPER LARGE V3 TURBO HESSIAN AI

由 avemio 开发

基于Whisper Large v3 Turbo优化的德语语音识别模型，在13小时精选数据集上微调，显著提升德语识别准确率

语音识别

Transformers

德语开源协议:MIT #德语语音识别 #低词错误率 #商务场景优化

下载量 282

发布时间 : 1/7/2025

模型简介

专注于德语语音识别的高性能模型，通过精心筛选的训练数据优化了德语识别能力，特别适合德语语音转文本任务

模型特点

优化的德语识别

在精选的13小时德语数据集上微调，显著提升德语识别准确率

多数据集表现优异

在Tuda-De、Common Voice等多个测试数据集上表现优于基础模型

混合内容处理

能够处理德语会话及英语商务短语混合内容

模型能力

德语语音识别

英语短语识别

长音频处理

带时间戳的转录

使用案例

语音转录

德语会议记录

将德语会议录音自动转录为文字记录

词错误率显著低于基础模型

多媒体内容字幕

为德语视频内容生成准确的字幕

支持带时间戳的转录

商务应用

跨语言商务沟通

处理包含德语和英语混合内容的商务对话

能准确识别两种语言混合的内容

🚀 德国-RAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI

该模型是基于自动语音识别任务，在精心挑选的13小时数据集上微调得到的模型，可有效提升德语语音识别的准确率。

🚀 快速开始

本模型基于transformers库，以下是使用该模型进行自动语音识别的示例代码：

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "avemio/German-RAG-WHISPER-LARGE-v3-TURBO"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])

✨ 主要特性

精心微调：在精心挑选的13小时数据集上进行微调，优化识别性能。
多数据集表现出色：在多个测试数据集上的单词错误率表现良好。

📦 安装指南

使用该模型前，你需要安装以下依赖库：

- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0

📚 详细文档

评估 - 单词错误率

测试数据集	openai-whisper-large-v3-turbo	German-RAG-WHISPER-LARGE-v3-TURBO	primeline-whisper-large-v3-turbo-german
Tuda-De	8.195	6.360	6.441
common_voice_19_0	3.839	3.249	3.217
multilingual librispeech	3.202	2.071	2.067
全部	3.641	2.633	2.630

评估数据和代码可在此处获取。

训练数据

该模型的训练数据包含德语口语对话，并混合了一些英语商务短语。数据经过精心挑选和处理，以优化识别性能。由于数据用于语音克隆的情况尚不明确，因此该数据集不会公开。收集的数据仅用于训练语音转文本模型的预期用途。

框架版本

属性	详情
模型类型	自动语音识别
训练数据	包含德语口语对话和英语商务短语的数据集
框架版本	Transformers 4.47.1、Pytorch 2.5.1+cu121、Datasets 3.2.0、Tokenizers 0.21.0