whisper-large-v3-turbo-atcosim-finetune开源模型 - 优化用于空中交通管制通信转录

首页

Whisper Large V3 Turbo Atcosim Finetune

由 tclin 开发

基于OpenAI Whisper Large V3 Turbo微调的模型，专门优化用于空中交通管制（ATC）通信的转录。

语音识别

Transformers

英语开源协议:MIT #航空管制语音识别 #低词错误率(3.7%)#专业术语优化

下载量 28

发布时间 : 4/15/2025

模型简介

该模型在ATCOSIM数据集上进行了微调，专注于转录ATC无线电通信，支持航空安全研究和空域管理决策。

模型特点

ATC通信优化

专门针对空中交通管制通信进行微调，提升术语识别和呼号转录准确率。

噪声处理能力

增强对无线电传输噪声的处理能力，提高在嘈杂环境下的识别准确率。

高效微调策略

采用部分冻结策略平衡效率与适应性，前24层编码器保持冻结。

模型能力

语音识别

ATC通信转录

航空术语识别

噪声环境语音处理

使用案例

航空安全

ATC通信转录

转录空中交通管制无线电通信内容

词错误率3.73%

拥堵模式分析

分析ATC通信中的拥堵模式

空域管理

决策支持

为空域管理提供数据驱动的决策支持

🚀 大语言模型Whisper Large V3 Turbo：针对空管领域微调

本模型是OpenAI的 Whisper Large V3 Turbo 的微调版本，专门针对空中交通管制（ATC）通信转录进行了优化。模型在 ATCOSIM 数据集上进行了微调，该数据集包含来自实际运行环境的真实 ATC 通信。

📦 模型信息

属性	详情
模型类型	whisper-large-v3-turbo-atcosim-finetune
训练数据	jlvdoorn/atco2-asr-atcosim
评估指标	Word Error Rate (WER)
库名称	transformers

🚀 快速开始

本模型专为以下用途而设计：

转录 ATC 无线电通信
支持航空安全研究
分析 ATC 通信的拥塞模式
为空域管理提供数据驱动的决策支持

✨ 主要特性

与基础 Whisper 模型相比，该模型在航空通信转录准确性方面有显著提升，尤其在以下方面表现出色：

ATC 术语识别
呼号转录准确性
处理无线电传输噪声
识别标准化术语

🔧 技术细节

训练方法

模型采用部分冻结的方法进行微调，以平衡效率和适应性：

冻结前 24 个编码器层
冻结所有卷积层和位置嵌入
微调后续编码器层和解码器

训练超参数

学习率：1e-5
训练步数：5000
热身步数：500
启用梯度检查点
FP16 精度
每设备批量大小：16
评估指标：字错误率（WER）

📈 性能表现

训练指标

在 5000 步（10 个 epoch）的训练过程中，模型的训练进度如下：

步数	训练损失	验证损失	字错误率（WER）
1000	0.090100	0.081074	5.81697
2000	0.021100	0.080030	4.00939
3000	0.010000	0.080892	5.67438
4000	0.002500	0.080460	3.88357
5000	0.001400	0.080753	3.73678

最终模型的字错误率（WER）为 3.73678%，在训练过程中表现出显著的改进，证明了其在 ATC 通信方面的强大性能。

💻 使用示例

基础用法

import torch
from transformers import pipeline

# 配置设备和精度
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# 使用管道加载模型
transcriber = pipeline(
    "automatic-speech-recognition", 
    model="tclin/whisper-large-v3-turbo-atcosim-finetune",
    chunk_length_s=30,
    max_new_tokens=128,
    torch_dtype=torch_dtype,
    device=device
)

# 转录音频文件
result = transcriber("path_to_atc_audio.wav")
print(f"转录结果: {result['text']}")

高级用法

import torch
import torchaudio
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# 加载和预处理音频
audio_path = "path_to_atc_audio.wav"
waveform, sample_rate = torchaudio.load(audio_path)

# 重采样到 16kHz（Whisper 模型要求）
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
    waveform = resampler(waveform)

# 如果需要，将立体声转换为单声道
if waveform.shape[0] > 1:
    waveform = waveform.mean(dim=0, keepdim=True)
    
# 转换为 numpy 数组
waveform_np = waveform.squeeze().cpu().numpy()

# 配置设备和精度
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# 加载模型和处理器
model = WhisperForConditionalGeneration.from_pretrained("tclin/whisper-large-v3-turbo-atcosim-finetune")
model = model.to(device=device, dtype=torch_dtype)  # 显式设置设备和数据类型
processor = WhisperProcessor.from_pretrained("tclin/whisper-large-v3-turbo-atcosim-finetune")

# 方法 1：直接使用处理器（推荐用于精确控制）
input_features = processor(waveform_np, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to(device=device, dtype=torch_dtype)

generated_ids = model.generate(input_features, max_new_tokens=128)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"转录结果: {transcription}")

# 方法 2：使用管道处理预处理后的音频
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    torch_dtype=torch_dtype,
    device=device
)

result = pipe(waveform_np)
print(f"转录结果: {result['text']}")

⚠️ 重要提示

处理音频前，务必将音频重采样至 16kHz。
使用 GPU 时，使用 model.to(device=device, dtype=torch_dtype) 显式设置设备和数据类型。
处理较长音频文件时，使用 chunk_length_s 参数。
模型在具有标准术语的清晰 ATC 通信上表现最佳。

🌐 更广泛的应用

该模型是 ATC 通信语音到分析管道的一个组成部分，该管道包括：

音频到文本转录（本模型）
使用上下文知识进行特定领域的文本重新格式化
基于转录通信的拥塞分析

📄 许可证

本模型采用 MIT 许可证。

📚 引用

如果您在研究中使用了此模型，请引用：

@misc{ta-chun_lin_2025,
	author       = { Ta-Chun Lin },
	title        = { whisper-large-v3-turbo-atcosim-finetune (Revision 4b2d400) },
	year         = 2025,
	url          = { https://huggingface.co/tclin/whisper-large-v3-turbo-atcosim-finetune },
	doi          = { 10.57967/hf/5272 },
	publisher    = { Hugging Face }
}