language: es
datasets:
- ciempiess/ciempiess_light
- ciempiess/ciempiess_balance
- ciempiess/ciempiess_fem
- common_voice
- hub4ne_es_LDC98S74
- callhome_es_LDC96S35
tags:
- audio
- automatic-speech-recognition
- spanish
- xlrs-53-spanish
- ciempiess
- cimpiess-unam
license: cc-by-4.0
model-index:
- name: wav2vec2-large-xlsr-53-spanish-ep5-944h
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 10.0 (Test)
type: mozilla-foundation/common_voice_10_0
split: test
args:
language: es
metrics:
- name: WER
type: wer
value: 9.20
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 10.0 (Dev)
type: mozilla-foundation/common_voice_10_0
split: validation
args:
language: es
metrics:
- name: WER
type: wer
value: 8.02
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CIEMPIESS-TEST
type: ciempiess/ciempiess_test
split: test
args:
language: es
metrics:
- name: WER
type: wer
value: 11.17
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: 1997 Spanish Broadcast News Speech (HUB4-NE)
type: HUB4NE_LDC98S74
split: test
args:
language: es
metrics:
- name: WER
type: wer
value: 7.48
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CALLHOME Spanish Speech (Test)
type: callhome_LDC96S35
split: test
args:
language: es
metrics:
- name: WER
type: wer
value: 39.12
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: CALLHOME Spanish Speech (Dev)
type: callhome_LDC96S35
split: validation
args:
language: es
metrics:
- name: WER
type: wer
value: 40.39
wav2vec2-large-xlsr-53-spanish-ep5-944h
论文: 墨西哥西班牙语语音识别的端到端系统现状
"wav2vec2-large-xlsr-53-spanish-ep5-944h"是一个适用于西班牙语自动语音识别的声学模型。该模型通过对"facebook/wav2vec2-large-xlsr-53"进行5个周期的微调而得到,使用了由CIEMPIESS-UNAM项目自2012年以来收集或开发的约944小时西班牙语数据。大部分数据可在CIEMPIESS-UNAM项目主页http://www.ciempiess.org/获取,其余数据可在LDC或OpenSLR等公共存储库中找到。
用于微调模型的具体语料库列表如下:
微调过程于2022年11月在冰岛雷克雅未克大学(https://lvl.ru.is/)的语言与语音实验室服务器上完成,由Carlos Daniel Hernández Mena执行。
评估
import torch
from transformers import Wav2Vec2Processor
from transformers import Wav2Vec2ForCTC
MODEL_NAME="carlosdanielhernandezmena/wav2vec2-large-xlsr-53-spanish-ep5-944h"
processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("ciempiess/ciempiess_test", split="test")
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
def prepare_dataset(batch):
audio = batch["audio"]
batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
with processor.as_target_processor():
batch["labels"] = processor(batch["normalized_text"]).input_ids
return batch
ds = ds.map(prepare_dataset, remove_columns=ds.column_names,num_proc=1)
import numpy as np
wer_metric = load_metric("wer")
def compute_metrics(pred):
pred_logits = pred.predictions
pred_ids = np.argmax(pred_logits, axis=-1)
pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
pred_str = processor.batch_decode(pred_ids)
label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
wer = wer_metric.compute(predictions=pred_str, references=label_str)
return {"wer": wer}
model = model.to(torch.device("cuda"))
def map_to_result(batch):
with torch.no_grad():
input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
logits = model(input_values).logits
pred_ids = torch.argmax(logits, dim=-1)
batch["pred_str"] = processor.batch_decode(pred_ids)[0]
batch["sentence"] = processor.decode(batch["labels"], group_tokens=False)
return batch
results = ds.map(map_to_result,remove_columns=ds.column_names)
print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["sentence"])))
测试结果: 0.112
BibTeX条目及引用信息
基于这些模型发表结果时请引用:
@misc{mena2022xlrs53spanish,
title={西班牙语声学模型: wav2vec2-large-xlsr-53-spanish-ep5-944h.},
author={Hernandez Mena, Carlos Daniel},
url={https://huggingface.co/carlosdanielhernandezmena/wav2vec2-large-xlsr-53-spanish-ep5-944h},
year={2022}
}
致谢
作者要感谢墨西哥国立自治大学(UNAM)工程学院(FI)的"语音技术开发"社会服务项目,以及所有社会服务学生的辛勤工作。
特别感谢Jón Guðnason,语言与语音实验室负责人,提供了实现该模型所需的计算资源。作者还要感谢由Almannarómur管理协调,并由冰岛教育、科学与文化部资助的"2019-2023年冰岛语言技术计划"。