Monsoon-Whisper-Medium-GigaSpeech2开源泰语语音识别模型

首页

Monsoon Whisper Medium Gigaspeech2

由 scb10x 开发

Monsoon-Whisper-Medium-GigaSpeech2 是一款泰语自动语音识别（ASR）模型，基于Whisper-Medium并在GigaSpeech2数据集上进行了微调，适用于真实场景下的语音识别。

语音识别

Transformers

开源协议:Apache-2.0 #泰语语音识别 #低词错误率 #嘈杂环境适配

下载量 546

发布时间 : 7/12/2024

模型简介

该模型专注于泰语自动语音识别任务，在YouTube音频和嘈杂环境下的语音识别中表现优异。

模型特点

泰语语音识别

专注于泰语语音识别任务，在真实场景中表现优异。

基于Whisper-Medium微调

基于Whisper-Medium架构，并在GigaSpeech2数据集上进行了微调。

高性能

在WER和CER指标上表现优于同类模型。

模型能力

泰语语音识别

嘈杂环境下的语音识别

使用案例

语音识别

YouTube音频转录

适用于转录YouTube视频中的泰语语音内容。

嘈杂环境下的语音识别

在嘈杂环境下仍能保持较高的识别准确率。

🚀 Monsoon-Whisper-Medium-Gigaspeech2

Monsoon-Whisper-Medium-Gigaspeech2 是一个🇹🇭 泰语 自动语音识别（ASR）模型。它基于 Whisper-Medium 构建，并在 GigaSpeech2 上进行了微调。

该模型最初是作为自动语音识别任务中涌现能力研究的规模实验而开发的。它在实际应用中表现出色，包括处理来自 YouTube 的音频以及在嘈杂环境中的音频。

更多详细信息可在我们的 Typhoon-Audio 发布博客中找到。

🚀 快速开始

本部分将为你介绍如何快速使用 Monsoon-Whisper-Medium-Gigaspeech2 模型进行泰语自动语音识别。

✨ 主要特性

基于 Whisper-Medium 架构，在 GigaSpeech2 数据集上微调，适用于泰语自动语音识别。
最初作为研究自动语音识别任务中涌现能力的规模实验开发。
在实际应用中表现出色，能处理来自 YouTube 的音频和嘈杂环境中的音频。

📦 安装指南

要求：transformers 4.38.0 或更高版本。

💻 使用示例

基础用法

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch

model_path = "scb10x/monsoon-whisper-medium-gigaspeech2"
device = "cuda"
filepath = 'audio.wav'

processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(
    model_path, torch_dtype=torch.bfloat16
)
model.to(device)
model.eval()

model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="th", task="transcribe"
)
array, sr = torchaudio.load(filepath)
input_features = (
    processor(array, sampling_rate=sr, return_tensors="pt")
    .to(device)
    .to(torch.bfloat16)
    .input_features
)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

📚 详细文档

模型描述

属性	详情
模型类型	Whisper Medium
要求	transformers 4.38.0 或更高版本
主要语言	泰语 🇹🇭
许可证	Apache 2.0

评估结果

模型	WER (GS2)	WER (CV17)	CER (GS2)	CER (CV17)
whisper-large-v3	37.02	22.63	24.03	8.49
whisper-medium	55.64	43.01	37.55	16.41
biodatlab-whisper-th-medium-combined	31.00	14.25	21.20	5.69
biodatlab-whisper-th-large-v3-combined	29.02	15.72	19.96	6.32
monsoon-whisper-medium-gigaspeech2	22.74	20.79	14.15	6.92

预期用途与限制

本模型为实验性模型，可能并非始终准确。开发者应在具体应用场景中仔细评估潜在风险。

关注我们与支持

https://twitter.com/opentyphoon
https://discord.gg/us5gAYmrxw

台风团队

Kunat Pipatanakul、Potsawee Manakul、Sittipong Sripaisarnmongkol、Natapong Nitarach、Warit Sirichotedumrong、Adisai Na-Thalang、Phatrasek Jirabovonvisut、Parinthapat Pengpun、Krisanapong Jirayoot、Pathomporn Chokchainant、Kasima Tharnpipitchai