whisper-th-medium-combined开源模型 - 免费用于泰语自动语音识别

首页

Whisper Th Medium Combined

由 biodatlab 开发

基于openai/whisper-medium在增强版的泰语数据集上进行微调，用于泰语自动语音识别

语音识别

Transformers

开源协议:Apache-2.0 #泰语语音识别 #低WER转录 #多数据集微调

下载量 4,167

发布时间 : 12/14/2022

模型简介

本模型是基于openai/whisper-medium在增强版的mozilla-foundation/common_voice_13_0泰语数据集、google/fleurs数据集以及精选数据集上进行微调得到的泰语自动语音识别模型。

模型特点

高精度泰语识别

在common-voice-13测试集上取得了7.42的字错率（WER）

多数据集微调

基于mozilla-foundation/common_voice_13_0、google/fleurs及精选数据集进行微调

支持长音频处理

支持chunk_length_s=30的长音频分段处理

模型能力

泰语语音识别

长音频转录

使用案例

语音转录

泰语语音转文字

将泰语语音文件转换为文字

字错率7.42

🚀 Whisper Medium (泰语)：Combined V3

本模型是基于 openai/whisper-medium 在增强版的 mozilla-foundation/common_voice_13_0 泰语数据集、google/fleurs 数据集以及精选数据集上进行微调得到的。它在 common-voice-13 测试集上取得了以下成绩：

字错率（WER）：7.42（使用 Deepcut 分词器）

🚀 快速开始

模型描述

可以按照以下方式使用 huggingface 的 transformers 库来使用该模型：

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # 指定模型名称
lang = "th"  # 更改为泰语

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # 传入音频文件并进行转录

💻 使用示例

基础用法

# 使用示例代码保持不变
from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

高级用法

暂未提供高级用法示例。

🔧 技术细节

训练超参数

训练过程中使用了以下超参数：

学习率（learning_rate）：1e-05
训练批次大小（train_batch_size）：16
评估批次大小（eval_batch_size）：16
随机种子（seed）：42
优化器（optimizer）：AdamW，其中 betas=(0.9, 0.999)，epsilon=1e-08
学习率调度器类型（lr_scheduler_type）：线性
学习率调度器热身步数（lr_scheduler_warmup_steps）：500
训练步数（training_steps）：10000
混合精度训练（mixed_precision_training）：原生自动混合精度（Native AMP）

框架版本

Transformers 4.37.2
Pytorch 2.1.0
Datasets 2.16.1
Tokenizers 0.15.1

📄 许可证

本模型采用 Apache-2.0 许可证。

📚 详细文档

引用

使用 BibTeX 进行引用：

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

信息表格

属性	详情
模型类型	基于微调的 Whisper 模型，用于泰语自动语音识别
训练数据	mozilla-foundation/common_voice_13_0、google/fleurs 以及精选数据集