distil-whisper-large-v3-ptbr开源模型 - 免费实现巴西葡萄牙语自动语音识别

首页

Distil Whisper Large V3 Ptbr

由 freds0 开发

这是distil-whisper-large-v3的微调版本，专门用于巴西葡萄牙语的自动语音识别（ASR），结合了Common Voice 16数据集和私有数据集进行训练。

语音识别

Safetensors

开源协议:MIT #巴西葡语ASR #高精度语音转录 #低WER模型

下载量 580

发布时间 : 9/17/2024

模型简介

该模型旨在高精度地完成巴西葡萄牙语的自动语音转录任务，通过结合公开和私有数据集优化了转录准确性。

模型特点

高精度转录

在Common Voice 16验证集上实现了8.221%的单词错误率（WER），能够高精度地完成巴西葡萄牙语的自动语音转录。

多数据集训练

结合了Common Voice 16数据集和使用Whisper Large v3转录的私有数据集进行训练，提升了模型的泛化能力。

模型能力

巴西葡萄牙语语音识别

高精度语音转录

使用案例

语音转录

巴西葡萄牙语语音转文字

将巴西葡萄牙语的语音内容转换为文字，适用于会议记录、语音笔记等场景。

在Common Voice 16验证集上单词错误率（WER）为8.221%。

🚀 用于巴西葡萄牙语的Distil-Whisper-Large-v3

本模型是distil-whisper-large-v3的微调版本，用于巴西葡萄牙语的自动语音识别（ASR）。它结合了Common Voice 16数据集和使用Whisper Large v3转录的私有数据集进行训练，旨在高精度地完成巴西葡萄牙语的自动语音转录任务。

🚀 快速开始

你可以使用Transformers库来使用该模型：

from datasets import load_dataset
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# 加载葡萄牙语Common Voice数据集的验证集
common_voice = load_dataset("mozilla-foundation/common_voice_11_0", "pt", split="validation")

# 加载预训练模型和处理器
processor = WhisperProcessor.from_pretrained("freds0/distil-whisper-large-v3-ptbr")
model = WhisperForConditionalGeneration.from_pretrained("freds0/distil-whisper-large-v3-ptbr")

# 从数据集中选择一个样本
sample = common_voice[0]  # 你可以更改索引以选择不同的样本

# 获取音频数组和采样率
audio_input = sample["audio"]["array"]
sampling_rate = sample["audio"]["sampling_rate"]

# 预处理音频
input_features = processor(audio_input, sampling_rate=sampling_rate, return_tensors="pt").input_features

# 生成转录结果
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("Transcription:", transcription[0])

✨ 主要特性

高精度转录：通过结合Common Voice 16的数据和自动转录的私有数据集，该模型在Common Voice 16验证集上实现了8.221%的单词错误率（WER），能够高精度地完成巴西葡萄牙语的自动语音转录。

📦 安装指南

文档未提及安装步骤，暂不提供。

💻 使用示例

基础用法

from datasets import load_dataset
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# 加载葡萄牙语Common Voice数据集的验证集
common_voice = load_dataset("mozilla-foundation/common_voice_11_0", "pt", split="validation")

# 加载预训练模型和处理器
processor = WhisperProcessor.from_pretrained("freds0/distil-whisper-large-v3-ptbr")
model = WhisperForConditionalGeneration.from_pretrained("freds0/distil-whisper-large-v3-ptbr")

# 从数据集中选择一个样本
sample = common_voice[0]  # 你可以更改索引以选择不同的样本

# 获取音频数组和采样率
audio_input = sample["audio"]["array"]
sampling_rate = sample["audio"]["sampling_rate"]

# 预处理音频
input_features = processor(audio_input, sampling_rate=sampling_rate, return_tensors="pt").input_features

# 生成转录结果
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("Transcription:", transcription[0])

高级用法

文档未提及高级用法代码示例，暂不提供。

📚 详细文档

模型描述

该模型旨在高精度地完成巴西葡萄牙语的自动语音转录。通过将Common Voice 16的数据与自动转录的私有数据集相结合，该模型在Common Voice 16验证集上实现了8.221%的单词错误率（WER）。

属性	详情
模型类型	基于distil-whisper-large-v3的语音识别模型
语言（NLP）	巴西葡萄牙语（pt-BR）
许可证	MIT
微调基础模型（可选）	distil-whisper/distil-large-v3