开源Whisper-large-onnx-int4-inc模型 - 免费实现自动语音识别与翻译

首页

Whisper Large Onnx Int4 Inc

由 Intel 开发

Whisper是一种预训练模型，用于自动语音识别（ASR）和语音翻译。本仓库提供ONNX格式的Whisper大型模型INT4权重量化版本，由英特尔®神经压缩器和英特尔®Transformers扩展驱动。

语音识别

Transformers

开源协议:Apache-2.0 #INT4量化 #多领域ASR #低资源推理

下载量 44

发布时间 : 10/8/2023

模型简介

Whisper是一种预训练模型，经过68万小时标注数据训练后，展现出强大的泛化能力，无需微调即可适应多种数据集和领域。本模型为INT4量化版本，适用于自动语音识别推理。

模型特点

INT4量化

模型经过INT4权重量化，显著减小模型大小（从8.8GB降至1.9GB），同时保持高性能。

ONNX格式

模型以ONNX格式提供，便于在不同平台上部署和推理。

高性能

量化后的模型在librispeech_asr数据集上词错误率仅3.05%，与FP32版本（3.04%）几乎相同。

无需微调

模型展现出强大的泛化能力，无需微调即可适应多种数据集和领域。

模型能力

自动语音识别

语音翻译

使用案例

语音识别

语音转文字

将语音内容转换为文字，适用于会议记录、字幕生成等场景。

词错误率3.05%

🚀 INT4 Whisper large ONNX模型

Whisper是一个用于自动语音识别（ASR）和语音翻译的预训练模型。该模型在68万小时的标注数据上进行训练，展现出强大的泛化能力，无需微调即可适用于众多数据集和领域。本仓库提供的是采用ONNX格式的Whisper large模型的INT4仅权重量化版本，由英特尔®神经压缩器和英特尔®Transformer扩展提供支持。

这个INT4 ONNX模型是通过英特尔®神经压缩器的仅权重量化方法生成的。

🚀 快速开始

导出为ONNX模型

使用openai/whisper-large导出FP32模型：

optimum-cli export onnx --model openai/whisper-large whisper-large-with-past/ --task automatic-speech-recognition-with-past --opset 13

安装ONNX运行时

安装onnxruntime>=1.16.0以支持MatMulFpQ4运算符。

运行量化

从主分支构建英特尔®神经压缩器并运行INT4仅权重量化。

仅权重量化配置如下：

数据类型	分组大小	方案	算法
INT4	32	对称	RTN

以下是关键代码，完整脚本请参考whisper示例。

from neural_compressor import quantization, PostTrainingQuantConfig
from neural_compressor.utils.constant import FP32

model_list = ['encoder_model.onnx', 'decoder_model.onnx', 'decoder_with_past_model.onnx']
for model in model_list:
    config = PostTrainingQuantConfig(
        approach="weight_only",
        calibration_sampling_size=[8],
        op_type_dict={".*": {"weight": {"bits": 4, 
                                        "algorithm": ["RTN"], 
                                        "scheme": ["sym"], 
                                        "group_size": 32}}},)
    q_model = quantization.fit(
        os.path.join("/path/to/whisper-large-with-past", model), # FP32模型路径
        config,
        calib_dataloader=dataloader)
    q_model.save(os.path.join("/path/to/whisper-large-onnx-int4", model)) # INT4模型路径

评估

运算符统计

以下展示了INT4 ONNX模型中的运算符统计信息：

模型	运算符类型	总数	INT4权重	FP32权重
encoder_model	MatMul	256	192	64
decoder_model	MatMul	449	321	128
decoder_with_past_model	MatMul	385	257	128

字错误率（wer）评估

使用以下代码在librispeech_asr数据集上评估模型：

import os
from evaluate import load
from datasets import load_dataset
from transformers import WhisperForConditionalGeneration, WhisperProcessor, AutoConfig
model_name = 'openai/whisper-large'
model_path = 'whisper-large-onnx-int4'
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
wer = load("wer")
librispeech_test_clean = load_dataset("librispeech_asr", "clean", split="test")

from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import PretrainedConfig
model_config = PretrainedConfig.from_pretrained(model_name)
predictions = []
references = []
sessions = ORTModelForSpeechSeq2Seq.load_model(
            os.path.join(model_path, 'encoder_model.onnx'),
            os.path.join(model_path, 'decoder_model.onnx'),
            os.path.join(model_path, 'decoder_with_past_model.onnx'))
model = ORTModelForSpeechSeq2Seq(sessions[0], sessions[1], model_config, model_path, sessions[2])
for idx, batch in enumerate(librispeech_test_clean):
    audio = batch["audio"]
    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
    reference = processor.tokenizer._normalize(batch['text'])
    references.append(reference)
    predicted_ids = model.generate(input_features)[0]
    transcription = processor.decode(predicted_ids)
    prediction = processor.tokenizer._normalize(transcription)
    predictions.append(prediction)
wer_result = wer.compute(references=references, predictions=predictions)
print(f"字错误率结果: {wer_result * 100}")

✨ 主要特性

强大的泛化能力：在68万小时的标注数据上训练，无需微调即可适用于众多数据集和领域。
低精度量化：采用INT4仅权重量化，有效减小模型大小。

📦 模型详情

属性	详情
模型作者 - 公司	英特尔
日期	2023年10月8日
版本	1
模型类型	语音识别
论文或其他资源	-
许可证	Apache 2.0
问题或评论	社区板块

📚 预期用途

预期用途	描述
主要预期用途	可使用原始模型进行自动语音识别推理
主要预期用户	任何进行自动语音识别推理的人员
非预期用途	大多数情况下，该模型需要针对特定任务进行微调。不得使用该模型故意为人们创造敌对或排斥的环境。