camembert-base-squadFR-fquad-piaf
模型描述
基于法语问答任务的模型,使用基础版CamemBERT在三个法语问答数据集上微调而成:
- PIAFv1.1
- FQuADv1.0
- SQuAD-FR (法译版SQuAD数据集)
训练超参数
python run_squad.py \
--model_type camembert \
--model_name_or_path camembert-base \
--do_train --do_eval \
--train_file data/SQuAD+fquad+piaf.json \
--predict_file data/fquad_valid.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 4 \
--max_seq_length 384 \
--doc_stride 128 \
--save_steps 10000
评估结果
FQuAD v1.0 评估
{"f1": 79.81, "exact_match": 55.14}
SQuAD-FR 评估
{"f1": 80.61, "exact_match": 59.54}
使用方式
from transformers import pipeline
nlp = pipeline('question-answering', model='etalab-ia/camembert-base-squadFR-fquad-piaf', tokenizer='etalab-ia/camembert-base-squadFR-fquad-piaf')
nlp({
'question': "克劳德·莫奈是谁?",
'context': "克劳德·莫奈,1840年11月14日生于巴黎,1926年12月5日卒于吉维尼,是法国画家,印象派创始人之一。"
})
致谢
本工作使用了GENCI–IDRIS提供的高性能计算资源(资助编号2020-AD011011224)。
引用文献
PIAF
@inproceedings{KeraronLBAMSSS20,
author = {Rachel Keraron and
Guillaume Lancrenon and
Mathilde Bras and
Fr{\'{e}}d{\'{e}}ric Allary and
Gilles Moyse and
Thomas Scialom and
Edmundo{-}Pavel Soriano{-}Morales and
Jacopo Staiano},
title = {Project {PIAF:} Building a Native French Question-Answering Dataset},
booktitle = {{LREC}},
pages = {5481--5490},
publisher = {European Language Resources Association},
year = {2020}
}
FQuAD
@article{dHoffschmidt2020FQuADFQ,
title={FQuAD: French Question Answering Dataset},
author={Martin d'Hoffschmidt and Maxime Vidal and Wacim Belblidia and Tom Brendl'e and Quentin Heinrich},
journal={ArXiv},
year={2020},
volume={abs/2002.06071}
}
SQuAD-FR
@MISC{kabbadj2018,
author = "Kabbadj, Ali",
title = "法语文本挖掘与信息提取新突破(通用聊天机器人):最大规模法语问答训练数据集(110,000+条)",
editor = "linkedin.com",
month = "11月",
year = "2018",
url = "\url{https://www.linkedin.com/pulse/something-new-french-text-mining-information-chatbot-largest-kabbadj/}",
note = "[在线资源;发布于2018年11月11日]",
}
CamemBERT
HF模型卡片:https://huggingface.co/camembert-base
@inproceedings{martin2020camembert,
title={CamemBERT: 美味的法语语言模型},
author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
booktitle={第58届计算语言学协会年会论文集},
year={2020}
}