阿拉伯Transformer小型模型(B6-6-6带解码器)
论文:
阿拉伯Transformer:基于漏斗Transformer和ELECTRA目标的高效大型阿拉伯语模型
摘要
基于Transformer的模型(如BERT和ELECTRA)在阿拉伯语语料库集合上的预训练,通过AraBERT和AraELECTRA的展示,在下游任务中表现出令人印象深刻的结果。然而,基于Transformer的语言模型预训练计算成本高昂,尤其是对于大规模模型。最近,漏斗Transformer通过压缩隐藏状态序列解决了Transformer架构内部的序列冗余问题,从而显著降低了预训练成本。本文实证研究了使用漏斗Transformer和ELECTRA目标构建阿拉伯语模型的性能和效率。我们发现,与其他基于BERT的模型相比,尽管使用了更少的计算资源,我们的模型在多个阿拉伯语下游任务上仍能达到最先进的结果。
描述
该模型使用漏斗Transformer与ELECTRA目标在44GB的阿拉伯语语料库上进行了预训练。该模型的参数数量(1.39倍)比ELECTRA-base架构更多,同时具有相似或略长的推理和微调时间。该模型的预训练资源消耗显著低于最先进的模型。
阿拉伯TyDi QA上的结果
模型 |
EM |
F1 |
AraBERT02-Large |
73.72 |
86.03 |
AraELECTRA-Base |
74.91 |
86.68 |
ArabicTransformer-Small |
74.70 |
85.89 |
ArabicTransformer-Base |
75.57 |
87.22 |
Google Colab示例
GitHub页面
https://github.com/salrowili/ArabicTransformer
致谢
我们要感谢TPU研究云(TRC)团队的支持,他们为我们提供了TPUv3单元的访问权限。
@inproceedings{alrowili-shanker-2021-arabictransformer-efficient,
title = "{A}rabic{T}ransformer: Efficient Large {A}rabic Language Model with Funnel Transformer and {ELECTRA} Objective",
author = "Alrowili, Sultan and
Shanker, Vijay",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.108",
pages = "1255--1261",
abstract = "Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.",
}