Kotoba Whisper V2.0 Faster_分类| AIbase模型库

首页

Kotoba Whisper V2.0 Faster

由 kotoba-tech 开发

适用于CTranslate2的Whisper语音识别模型，专为日语优化，提供高效的语音转文本功能。

语音识别日语开源协议:MIT #日语语音识别 #高效推理 #长音频处理

下载量 202

发布时间 : 9/17/2024

模型简介

该模型是kotoba-whisper-v2.0的CTranslate2转换版本，支持日语语音识别，适用于需要快速、高效语音转文本的应用场景。

模型特点

高效推理

通过CTranslate2优化，提供比原始Whisper模型更快的推理速度。

日语优化

专为日语语音识别优化，提供更准确的转录结果。

FP16量化

模型权重以FP16格式保存，支持灵活的计算类型选择。

模型能力

日语语音识别

长格式音频处理

实时转录

使用案例

语音转文本

日语会议记录

将日语会议录音自动转录为文本，便于后续整理和查阅。

高效准确的转录结果，支持长时间录音处理。

日语播客转录

将日语播客内容转录为文本，便于内容索引和搜索。

支持分块处理，提高长音频转录的准确性。

🚀 Whisper kotoba-whisper-v2.0 用于 CTranslate2 模型

本仓库包含将 kotoba-tech/kotoba-whisper-v2.0 转换为 CTranslate2 模型格式的内容。

该模型可用于 CTranslate2 或基于 CTranslate2 的项目，如 faster-whisper。

🚀 快速开始

安装库并下载示例音频

pip install faster-whisper
wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/sample_ja_speech.wav

使用 kotoba-whisper-v2.0-faster 进行推理

from faster_whisper import WhisperModel

model = WhisperModel("kotoba-tech/kotoba-whisper-v2.0-faster")

segments, info = model.transcribe("sample_ja_speech.wav", language="ja", chunk_length=15, condition_on_previous_text=False)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

💻 使用示例

基础用法

from faster_whisper import WhisperModel

model = WhisperModel("kotoba-tech/kotoba-whisper-v2.0-faster")

segments, info = model.transcribe("sample_ja_speech.wav", language="ja", chunk_length=15, condition_on_previous_text=False)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

🔧 技术细节

基准测试

我们在以下配置的 MacBook Pro 上，使用四个不同的日语语音音频测量了不同 kotoba-whisper-v2.0 实现的推理速度：

Apple M2 Pro
32GB 内存
14 英寸，2023 款
操作系统：Sonoma 版本 14.4.1 (23E224)

音频文件	音频时长 (分钟)	whisper.cpp (秒)	faster-whisper (秒)	hf pipeline (秒)
音频 1	50.3	581	2601	807
音频 2	5.6	41	73	61
音频 3	4.9	30	141	54
音频 4	5.6	35	126	69

重新运行实验的脚本可在以下位置找到：

此外，目前 whisper.cpp 和 faster-whisper 支持顺序长格式解码，而只有 Huggingface pipeline 支持分块长格式解码，我们通过经验发现分块长格式解码优于顺序长格式解码。

转换细节

原始模型使用以下命令进行转换：

ct2-transformers-converter --model kotoba-tech/kotoba-whisper-v2.0 --output_dir kotoba-whisper-v2.0-faster \
    --copy_files tokenizer.json preprocessor_config.json --quantization float16

请注意，模型权重以 FP16 格式保存。在使用 CTranslate2 中的 compute_type 选项加载模型时，可以更改此类型。