french-camembert-postag-model开源法语词性标注模型

首页

French Camembert Postag Model

由 gilf 开发

基于Camembert-base的法语词性标注模型，使用free-french-treebank数据集训练

序列标注

Transformers

法语#法语词性标注 #CamemBERT基础 #高精度标注

下载量 950.03k

发布时间 : 3/2/2022

模型简介

该模型是一个法语词性标注模型，能够对法语文本中的每个词进行词性标注，支持多种词性标签。

模型特点

广泛的词性标签支持

支持多达28种不同的法语词性标签，包括名词、动词、形容词、副词等多种类型。

基于高质量数据集

使用free-french-treebank数据集训练，该数据集是公开可用的法语树库资源。

基于Camembert模型

以Camembert-base为基础模型，继承了其优秀的法语语言理解能力。

模型能力

法语词性标注

文本分析

自然语言处理

使用案例

自然语言处理

法语文本分析

对法语文本进行词性标注，用于后续的语法分析或语义理解。

能够准确识别文本中各个词的词性类别

语言学研究

用于法语语言学研究，分析文本中的词性分布规律。

教育应用

法语学习辅助

帮助法语学习者理解句子中各个词的语法功能。

🚀 法语词性标注模型

法语词性标注模型（french - camembert - postag - model）是一个针对法语的词性标注模型，它基于特定数据集训练，能有效识别法语句子中各词汇的词性，为法语自然语言处理提供有力支持。

🚀 快速开始

法语词性标注模型（french - camembert - postag - model）是一个用于法语的词性标注模型，它在 [github](https://github.com/nicolashernandez/free - french - treebank) 上可用的 free - french - treebank 数据集上进行训练。训练使用的基础分词器和模型是 'camembert - base'。

✨ 主要特性

该模型使用以下词性标签进行词性标注：

标签	类别	额外信息
ADJ	形容词
ADJWH	形容词
ADV	副词
ADVWH	副词
CC	并列连词
CLO	代词	宾语
CLR	代词	反身
CLS	代词	主语
CS	从属连词
DET	限定词
DETWH	限定词
ET	外来词
I	感叹词
NC	普通名词
NPP	专有名词
P	介词
P+D	介词 + 限定词
PONCT	标点符号
PREF	前缀
PRO	其他代词
PROREL	其他代词	关系
PROWH	其他代词	疑问
U	未知
V	动词
VIMP	命令式动词
VINF	不定式动词
VPP	过去分词
VPR	现在分词
VS	虚拟式

关于这些标签的更多信息可在此处找到：[http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi - taln2008 - final.pdf](http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi - taln2008 - final.pdf)

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("gilf/french-camembert-postag-model")
model = AutoModelForTokenClassification.from_pretrained("gilf/french-camembert-postag-model")

from transformers import pipeline

nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)

nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')

上述代码在 Jupyter 笔记本上运行时会输出类似如下内容：

[{'entity_group': 'NC', 'score': 0.5760144591331482, 'word': '<s>'},
 {'entity_group': 'U', 'score': 0.9946700930595398, 'word': 'Face'},
 {'entity_group': 'P', 'score': 0.999615490436554, 'word': 'à'},
 {'entity_group': 'DET', 'score': 0.9995906352996826, 'word': 'un'},
 {'entity_group': 'NC', 'score': 0.9995531439781189, 'word': 'choc'},
 {'entity_group': 'ADJ', 'score': 0.999183714389801, 'word': 'inédit'},
 {'entity_group': 'P', 'score': 0.3710663616657257, 'word': ','},
 {'entity_group': 'DET', 'score': 0.9995903968811035, 'word': 'les'},
 {'entity_group': 'NC', 'score': 0.9995649456977844, 'word': 'mesures'},
 {'entity_group': 'VPP', 'score': 0.9988670349121094, 'word': 'mises'},
 {'entity_group': 'P', 'score': 0.9996246099472046, 'word': 'en'},
 {'entity_group': 'NC', 'score': 0.9995329976081848, 'word': 'place'},
 {'entity_group': 'P', 'score': 0.9996233582496643, 'word': 'par'},
 {'entity_group': 'DET', 'score': 0.9995935559272766, 'word': 'le'},
 {'entity_group': 'NC', 'score': 0.9995369911193848, 'word': 'gouvernement'},
 {'entity_group': 'V', 'score': 0.9993771314620972, 'word': 'ont'},
 {'entity_group': 'VPP', 'score': 0.9991101026535034, 'word': 'permis'},
 {'entity_group': 'DET', 'score': 0.9995885491371155, 'word': 'une'},
 {'entity_group': 'NC', 'score': 0.9995636343955994, 'word': 'protection'},
 {'entity_group': 'ADJ', 'score': 0.9991781711578369, 'word': 'forte'},
 {'entity_group': 'CC', 'score': 0.9991298317909241, 'word': 'et'},
 {'entity_group': 'ADJ', 'score': 0.9992275238037109, 'word': 'efficace'},
 {'entity_group': 'P+D', 'score': 0.9993300437927246, 'word': 'des'},
 {'entity_group': 'NC', 'score': 0.8353511393070221, 'word': 'ménages</s>'}]