缩略图: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
语言: 日语
许可证: Apache-2.0
数据集: reazon-research/reazonspeech
管道标签: 特征提取
推理: 不支持
标签:
rinna/japanese-wav2vec2-base

概述
这是由rinna株式会社训练的日语wav2vec 2.0基础模型。
-
模型概要
模型架构与原始wav2vec 2.0基础模型相同,包含12个Transformer层,每层有12个注意力头。
模型使用官方仓库的代码进行训练,详细训练配置可在同一仓库和原始论文中找到。
-
训练数据
模型在约19,000小时的日语语音数据集ReazonSpeech v1上进行训练。
-
贡献者
-
发布日期
2024年3月7日
如何使用该模型
import soundfile as sf
from transformers import AutoFeatureExtractor, AutoModel
model_name = "rinna/japanese-wav2vec2-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()
raw_speech_16kHz, sr = sf.read(audio_file)
inputs = feature_extractor(
raw_speech_16kHz,
return_tensors="pt",
sampling_rate=sr,
)
outputs = model(**inputs)
print(f"输入: {inputs.input_values.size()}")
print(f"输出: {outputs.last_hidden_state.size()}")
fairseq格式的模型检查点文件也可在此处获取。
引用方式
@misc{rinna-japanese-wav2vec2-base,
title = {rinna/japanese-wav2vec2-base},
author = {Hono, Yukiya and Mitsui, Kentaro and Sawada, Kei},
url = {https://huggingface.co/rinna/japanese-wav2vec2-base}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
参考文献
@inproceedings{baevski2020wav2vec,
title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
booktitle = {Advances in Neural Information Processing Systems},
year = {2020},
volume = {33},
pages = {12449--12460},
url = {https://proceedings.neurips.cc/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html}
}
许可证
Apache 2.0许可证