language: it
license: afl-3.0
widget:
- text: Il [MASK] ha chiesto revocarsi l'obbligo di pagamento
意大利法律BERT:面向意大利法律的预训练Transformer语言模型
ITALIAN-LEGAL-BERT基于bert-base-italian-xxl-cased模型,通过在意大利民法语料库上进行额外预训练而成。相较于通用型意大利语BERT,该模型在多项领域特定任务中表现更优。
ITALIAN-LEGAL-BERT衍生版本[新发布!!!]

针对长文档处理:
注:我们正在完善论文扩展版,将包含更多细节及新模型的实验结果,敬请期待
训练流程
我们以ITALIAN XXL BERT为初始化模型,使用Huggingface PyTorch-Transformers库在国家司法档案3.7GB预处理文本上进行了4个额外epoch的预训练。采用BERT架构配合语言建模头,优化器为AdamW,初始学习率5e-5(线性衰减至2.525e-9),序列长度512,批量大小10(受GPU容量限制),训练步数840万,使用1*V100 16GB GPU。
使用方法
可通过以下方式加载ITALIAN-LEGAL-BERT模型:
from transformers import AutoModel, AutoTokenizer
model_name = "dlicari/Italian-Legal-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
使用Transformers库的fill-mask管道进行推理:
from transformers import pipeline
model_name = "dlicari/Italian-Legal-BERT"
fill_mask = pipeline("fill-mask", model_name)
fill_mask("Il [MASK] ha chiesto revocarsi l'obbligo di pagamento")
通过COLAB: ITALIAN-LEGAL-BERT意大利法律下游任务快速入门了解其在句子相似度、文本分类和命名实体识别中的应用:
- https://colab.research.google.com/drive/1ZOWaWnLaagT_PX6MmXMP2m3MAOVXkyRK?usp=sharing
引用
若您认为本资源或论文有价值,请在您的论文中包含以下引用:
@inproceedings{licari_italian-legal-bert_2022,
address = {Bozen-Bolzano, Italy},
series = {{CEUR} {Workshop} {Proceedings}},
title = {{ITALIAN}-{LEGAL}-{BERT}: {A} {Pre}-trained {Transformer} {Language} {Model} for {Italian} {Law}},
volume = {3256},
shorttitle = {{ITALIAN}-{LEGAL}-{BERT}},
url = {https://ceur-ws.org/Vol-3256/#km4law3},
language = {en},
urldate = {2022-11-19},
booktitle = {Companion {Proceedings} of the 23rd {International} {Conference} on {Knowledge} {Engineering} and {Knowledge} {Management}},
publisher = {CEUR},
author = {Licari, Daniele and Comandè, Giovanni},
editor = {Symeonidou, Danai and Yu, Ran and Ceolin, Davide and Poveda-Villalón, María and Audrito, Davide and Caro, Luigi Di and Grasso, Francesca and Nai, Roberto and Sulis, Emilio and Ekaputra, Fajar J. and Kutz, Oliver and Troquard, Nicolas},
month = sep,
year = {2022},
note = {ISSN: 1613-0073},
file = {Full Text PDF:https://ceur-ws.org/Vol-3256/km4law3.pdf},
}