🚀 ESCOXLM - R:面向就业市场领域的多语言预训练模型
ESCOXLM - R 是一个基于 XLM - r(large)的语言模型,它在欧洲技能、能力、资格和职业(ESCO)分类法上进行领域自适应预训练,覆盖 27 种语言。该模型在多个序列标注和分类任务中取得了优异的成绩,为就业市场领域的自然语言处理任务提供了强大的支持。
🚀 快速开始
ESCOXLM - R 模型主要用于就业市场领域的自然语言处理任务。它基于 XLM - r(large)进行了领域自适应预训练,在欧洲技能、能力、资格和职业(ESCO)分类法上进行训练,覆盖 27 种语言。预训练目标包括动态掩码语言建模和一个用于诱导多语言分类 ESCO 关系的新目标。
📚 详细文档
论文信息
本模型伴随以下论文:
ESCOXLM - R: Multilingual Taxonomy - driven Pre - training for the Job Market Domain
Mike Zhang, Rob van der Goot, and Barbara Plank. In ACL (2023).
模型介绍
我们引入了一种名为 ESCOXLM - R 的语言模型,它基于 XLM - r(large),在欧洲技能、能力、资格和职业(ESCO)分类法上进行领域自适应预训练,覆盖 27 种语言。ESCOXLM - R 的预训练目标包括动态掩码语言建模和一个用于诱导多语言分类 ESCO 关系的新目标(更多详细信息请参阅论文)。
实验评估
我们全面评估了 ESCOXLM - R 在 4 种语言的 6 个序列标注和 3 个分类任务中的性能,发现它在 9 个数据集中的 6 个上取得了最先进的结果。
引用信息
如果您使用了这项工作,请引用以下内容:
@inproceedings{zhang-etal-2023-escoxlm,
title = "{ESCOXLM}-{R}: Multilingual Taxonomy-driven Pre-training for the Job Market Domain",
author = "Zhang, Mike and
van der Goot, Rob and
Plank, Barbara",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.662",
pages = "11871--11890",
abstract = "The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification. While some approaches have been developed that are specific to the job market domain, there is a lack of generalized, multilingual models and benchmarks for these tasks. In this study, we introduce a language model called ESCOXLM-R, based on XLM-R-large, which uses domain-adaptive pre-training on the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy, covering 27 languages. The pre-training objectives for ESCOXLM-R include dynamic masked language modeling and a novel additional objective for inducing multilingual taxonomical ESCO relations. We comprehensively evaluate the performance of ESCOXLM-R on 6 sequence labeling and 3 classification tasks in 4 languages and find that it achieves state-of-the-art results on 6 out of 9 datasets. Our analysis reveals that ESCOXLM-R performs better on short spans and outperforms XLM-R-large on entity-level and surface-level span-F1, likely due to ESCO containing short skill and occupation titles, and encoding information on the entity-level.",
}
更多信息
您可以在 Github 仓库中找到更多信息:https://github.com/jjzha/escoxlmr
📄 许可证
本模型使用 Apache - 2.0 许可证。
📦 模型信息
属性 |
详情 |
模型类型 |
基于 XLM - r(large)的多语言预训练模型 |
训练数据 |
欧洲技能、能力、资格和职业(ESCO)分类法,覆盖 27 种语言 |
适用任务 |
序列标注、分类任务 |