库名称: transformers
基础模型: openai/whisper-tiny
标签:
- 训练生成
数据集:
- common_voice_11_0
模型索引:
- 名称: whisper-fa-tinyyy
结果: []
许可证: mit
语言:
- 波斯语
评估指标:
- 词错误率
管道标签: 自动语音识别
whisper-fa-tinyyy 模型
本模型是基于openai/whisper-tiny在common_voice_11_0数据集上微调的版本,在评估集上取得如下结果:
模型描述
需补充更多信息
使用场景与限制
需补充更多信息
训练与评估数据
需补充更多信息
训练流程
训练超参数
训练过程中使用以下超参数:
- 学习率: 0.0001
- 训练批大小: 8
- 评估批大小: 8
- 随机种子: 42
- 梯度累积步数: 4
- 总训练批大小: 32
- 优化器: 使用OptimizerNames.ADAMW_TORCH,参数betas=(0.9,0.999),epsilon=1e-08,无额外优化器参数
- 学习率调度器类型: linear
- 学习率预热步数: 500
- 训练轮数: 1
- 混合精度训练: Native AMP
训练结果
训练损失 |
轮次 |
步数 |
验证损失 |
0.0186 |
0.9998 |
2357 |
0.0246 |
框架版本
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
在Colab中使用本模型的方法:
# 安装必要包
!pip install torch torchaudio transformers pydub google-colab
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files
# 加载模型和处理器
model_id = "hackergeek98/whisper-fa-tinyyy"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)
# 创建处理管道
whisper_pipe = pipeline(
"automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)
# 音频转WAV格式
def convert_to_wav(audio_path):
audio = AudioSegment.from_file(audio_path)
wav_path = "converted_audio.wav"
audio.export(wav_path, format="wav")
return wav_path
# 分割长音频为片段
def split_audio(audio_path, chunk_length_ms=30000): # 默认每段30秒
audio = AudioSegment.from_wav(audio_path)
chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
chunk_paths = []
for i, chunk in enumerate(chunks):
chunk_path = f"chunk_{i}.wav"
chunk.export(chunk_path, format="wav")
chunk_paths.append(chunk_path)
return chunk_paths
# 转录长音频文件
def transcribe_long_audio(audio_path):
wav_path = convert_to_wav(audio_path)
chunk_paths = split_audio(wav_path)
transcription = ""
for chunk in chunk_paths:
result = whisper_pipe(chunk)
transcription += result["text"] + "\n"
os.remove(chunk) # 删除已处理片段
os.remove(wav_path) # 清理原始文件
# 保存转录文本
text_path = "transcription.txt"
with open(text_path, "w") as f:
f.write(transcription)
return text_path
# 在Colab中上传并处理音频
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe_long_audio(audio_file)
# 下载转录文件
files.download(transcription_file)