whisper-medium-pt开源语音识别模型 - 精准支持葡萄牙语语音识别

首页

Whisper Medium Pt

由 jlondonobo 开发

针对葡萄牙语优化的Whisper Medium语音识别模型，在Common Voice 11数据集上达到6.579词错误率(WER)

语音识别

Transformers

其他开源协议:Apache-2.0 #高精度语音转录 #葡萄牙语优化 #低词错误率

下载量 85

发布时间 : 12/6/2022

模型简介

基于OpenAI Whisper-medium微调的葡萄牙语语音识别模型，提供高效准确的葡萄牙语音频转文本服务

模型特点

高性能识别

在Common Voice 11测试集上达到6.579词错误率，比原始whisper-medium提升1.2倍

多场景支持

支持短音频直接转录和长音频分块处理

优化训练

使用mozilla-foundation/common_voice_11数据集专门微调，针对葡萄牙语优化

模型能力

葡萄牙语音频转录

长音频处理

高准确率语音识别

使用案例

语音转写

会议记录

将葡萄牙语会议录音自动转为文字记录

准确率93.42%

媒体字幕生成

为葡萄牙语视频内容自动生成字幕

语音助手

葡萄牙语语音指令识别

用于智能家居或应用的葡萄牙语语音指令识别

🚀 葡萄牙语中型Whisper模型 🇧🇷🇵🇹

本模型可快速、可靠地将葡萄牙语音频转录为文本，在Common Voice 11数据集上，其最先进的词错误率（WER）仅为6.579，与之前最先进的wav2vec2模型相比，精度提高了2倍。与原始的whisper-medium模型相比，性能提升了1.2倍 🚀。

🚀 快速开始

如果你想快速、可靠地将葡萄牙语音频转录为文本，那么你来对地方了！

✨ 主要特性

本模型是在mozilla-foundation/common_voice_11数据集上对openai/whisper-medium进行微调得到的版本。
在Common Voice 11中拥有出色的词错误率（WER），仅为6.579，相比其他模型有显著的精度提升。

以下表格展示了本模型与Hugging Face Hub上下载量最多的葡萄牙语自动语音识别模型的结果对比 🗣：

模型	词错误率（WER）	参数数量
openai/whisper-medium	8.100	769M
jlondonobo/whisper-medium-pt	6.579 🤗	769M
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese	11.310	317M
Edresson/wav2vec2-large-xlsr-coraa-portuguese	20.080	317M

📦 安装指南

pip install git+https://github.com/huggingface/transformers --force-reinstall
pip install torch

💻 使用示例

基础用法

>>> from transformers import pipeline
>>> import torch

>>> device = 0 if torch.cuda.is_available() else "cpu"

# 加载pipeline
>>> transcribe = pipeline(
...     task="automatic-speech-recognition",
...     model="jlondonobo/whisper-medium-pt",
...     chunk_length_s=30,
...     device=device,
... )

# 强制模型以葡萄牙语转录
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="pt", task="transcribe")

# 转录音频文件
>>> transcribe("audio.m4a")["text"]
'Eu falo português.'

高级用法

长文本转录

为了提高长文本转录的性能，你可以将Hugging Face模型转换为whisper模型，并使用原论文的匹配算法。为此，你必须安装whisper和@bayartsogt开发的一组工具。

pip install git+https://github.com/openai/whisper.git
pip install git+https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets

然后转换Hugging Face模型并进行转录：

>>> import torch
>>> import whisper
>>> from multiple_datasets.hub_default_utils import convert_hf_whisper

>>> device = "cuda" if torch.cuda.is_available() else "cpu"

# 将Hugging Face模型写入本地whisper模型
>>> convert_hf_whisper("jlondonobo/whisper-medium-pt", "local_whisper_model.pt")

# 加载whisper模型
>>> model = whisper.load_model("local_whisper_model.pt", device=device)

# 转录任意长度的音频
>>> model.transcribe("long_audio.m4a", language="pt")["text"]
'Olá eu sou o José. Tenho 23 anos e trabalho...'