Whisper-large-v2-onnx-int4-inc开源语音识别模型 - 免费实现语音精准识别与翻译

首页

Whisper Large V2 Onnx Int4 Inc

由 Intel 开发

Whisper是一个预训练的自动语音识别（ASR）和语音翻译模型，经过68万小时标注数据的训练，展现出强大的泛化能力。此仓库包含ONNX格式的Whisper large v2模型的INT4仅权重量化版本。

语音识别

Transformers

开源协议:Apache-2.0 #低精度量化 #多语言ASR #高效推理

下载量 19

发布时间 : 10/8/2023

模型简介

Whisper是一个强大的自动语音识别和语音翻译模型，无需微调即可适应多种数据集和领域。此模型为INT4仅权重量化版本，由英特尔®神经压缩器驱动。

模型特点

INT4仅权重量化

模型经过INT4仅权重量化，显著减小模型大小，同时保持较高的识别准确率。

强大的泛化能力

经过68万小时标注数据的训练，模型无需微调即可适应多种数据集和领域。

ONNX格式

模型以ONNX格式提供，便于在不同平台上部署和推理。

模型能力

自动语音识别

语音翻译

使用案例

语音识别

语音转文字

将语音内容转换为文字，适用于会议记录、字幕生成等场景。

词错率低至2.99%

🚀 INT4 Whisper large-v2 ONNX模型

Whisper是一个用于自动语音识别（ASR）和语音翻译的预训练模型。该模型在68万小时的标注数据上进行训练，展现出了强大的泛化能力，无需微调即可适用于多种数据集和领域。本仓库提供了采用ONNX格式的Whisper large v2模型的INT4仅权重量化版本，由英特尔®神经压缩器和英特尔®Transformer扩展提供支持。

此INT4 ONNX模型由英特尔®神经压缩器的仅权重量化方法生成。

✨ 主要特性

模型详情

属性	详情
模型作者 - 公司	Intel
日期	2023年10月8日
版本	1
类型	语音识别
论文或其他资源	-
许可证	Apache 2.0
问题或评论	社区板块

预期用途

属性	详情
主要预期用途	可使用原始模型进行自动语音识别推理
主要预期用户	任何进行自动语音识别推理的人员
超出范围的用途	大多数情况下，该模型需要针对特定任务进行微调。不得使用该模型故意为他人营造敌对或疏离的环境。

📦 安装指南

导出为ONNX模型

使用openai/whisper-large-v2导出FP32模型：

optimum-cli export onnx --model openai/whisper-large-v2 whisper-large-v2-with-past/ --task automatic-speech-recognition-with-past --opset 13

安装ONNX运行时

安装onnxruntime>=1.16.0以支持MatMulFpQ4运算符：

pip install onnxruntime>=1.16.0

💻 使用示例

运行量化

从主分支构建英特尔®神经压缩器并运行INT4仅权重量化。

仅权重量化配置如下：

数据类型	分组大小	方案	算法
INT4	32	sym	RTN

以下是关键代码，完整脚本请参考whisper示例：

from neural_compressor import quantization, PostTrainingQuantConfig
from neural_compressor.utils.constant import FP32
import os

model_list = ['encoder_model.onnx', 'decoder_model.onnx', 'decoder_with_past_model.onnx']
for model in model_list:
    config = PostTrainingQuantConfig(
        approach="weight_only",
        calibration_sampling_size=[8],
        op_type_dict={".*": {"weight": {"bits": 4, 
                                        "algorithm": ["RTN"], 
                                        "scheme": ["sym"], 
                                        "group_size": 32}}},)
    q_model = quantization.fit(
        os.path.join("/path/to/whisper-large-v2-with-past", model), # FP32 model path
        config,
        calib_dataloader=dataloader)
    q_model.save(os.path.join("/path/to/whisper-large-v2-onnx-int4", model)) # INT4 model path

评估

运算符统计

以下展示了INT4 ONNX模型中的运算符统计信息：

模型	运算符类型	总数	INT4权重	FP32权重
encoder_model	MatMul	256	192	64
decoder_model	MatMul	449	321	128
decoder_with_past_model	MatMul	385	257	128

WER评估

使用以下代码在librispeech_asr数据集上评估模型：

import os
from evaluate import load
from datasets import load_dataset
from transformers import WhisperForConditionalGeneration, WhisperProcessor, AutoConfig
model_name = 'openai/whisper-large-v2'
model_path = 'whisper-large-v2-onnx-int4'
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
wer = load("wer")
librispeech_test_clean = load_dataset("librispeech_asr", "clean", split="test")

from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import PretrainedConfig
model_config = PretrainedConfig.from_pretrained(model_name)
predictions = []
references = []
sessions = ORTModelForSpeechSeq2Seq.load_model(
            os.path.join(model_path, 'encoder_model.onnx'),
            os.path.join(model_path, 'decoder_model.onnx'),
            os.path.join(model_path, 'decoder_with_past_model.onnx'))
model = ORTModelForSpeechSeq2Seq(sessions[0], sessions[1], model_config, model_path, sessions[2])
for idx, batch in enumerate(librispeech_test_clean):
    audio = batch["audio"]
    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
    reference = processor.tokenizer._normalize(batch['text'])
    references.append(reference)
    predicted_ids = model.generate(input_features)[0]
    transcription = processor.decode(predicted_ids)
    prediction = processor.tokenizer._normalize(transcription)
    predictions.append(prediction)
wer_result = wer.compute(references=references, predictions=predictions)
print(f"Result wer: {wer_result * 100}")