teochew - whisper - medium开源潮汕话语音识别模型，精准识别闽南语系潮汕方言

首页

Teochew Whisper Medium

由 efficient-nlp 开发

基于Whisper中型模型微调的潮汕话(潮州话)语音识别模型，专门用于识别中国南方闽南语系的潮汕方言。

语音识别

Transformers

开源协议:MIT #潮汕话识别 #方言语音转写 #短音频处理

下载量 194

发布时间 : 1/26/2024

模型简介

该模型是针对潮汕话优化的自动语音识别(ASR)系统，适用于潮汕话的语音转文字任务。

模型特点

方言优化

专门针对潮汕话进行微调，相比通用语音模型在方言识别上有更好表现

中等规模

基于Whisper中型模型，在准确率和计算资源需求间取得平衡

有限时长处理

最适合处理10秒以内的短音频片段

模型能力

潮汕话语音识别

语音转文字

方言处理

使用案例

媒体处理

影视剧字幕生成

为潮汕话影视作品自动生成字幕

在清晰发音场景下WER为0.31

喜剧节目转录

转录潮汕话喜剧节目内容

日常对话场景WER为0.68

语言研究

方言语音存档

将潮汕话口语资料转为文字存档

🚀 潮汕语Whisper Medium模型

本模型是Whisper medium模型的微调版本，用于识别潮汕话（潮州话），这是一种在中国南方使用的闽南语系语言。该模型能够有效助力潮汕话的语音识别场景，为潮汕话相关的语音处理提供了有力支持。

🚀 快速开始

示例代码

以下脚本可用于下载模型，并使用Gradio启动一个运行该模型的演示：

import torch
import torchaudio
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import gradio as gr

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
WHISPER_SAMPLE_RATE = 16000

processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
model = WhisperForConditionalGeneration.from_pretrained(
    "efficient-nlp/teochew-whisper-medium"
).to(DEVICE)


def preprocess_audio(audio_path: str) -> torch.Tensor:
    audio, sample_rate = torchaudio.load(audio_path)
    # Resample if necessary
    if sample_rate != WHISPER_SAMPLE_RATE:
        resampler = torchaudio.transforms.Resample(
            orig_freq=sample_rate, new_freq=WHISPER_SAMPLE_RATE
        )
        audio = resampler(audio)
    # Convert to mono
    if audio.shape[0] > 1:
        audio = torch.mean(audio, dim=0)
    return audio.squeeze()


def transcribe(audio_path: str) -> str:
    audio_input = preprocess_audio(audio_path)
    input_features = processor(
        audio_input,
        sampling_rate=WHISPER_SAMPLE_RATE,
        return_tensors="pt",
        language="Chinese",
    ).input_features.to(DEVICE)

    predicted_ids = model.generate(input_features)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
    return transcription


iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(type="filepath"),
    outputs="text",
    title="Teochew Speech Recognition",
)
iface.launch()