模型介绍
内容详情
替代品
模型简介
GLiNER 为传统 NER 模型提供了实用替代方案,后者仅限于预定义实体,而大型语言模型(LLMs)虽然灵活,但在资源受限的场景下成本高昂且体积庞大。
模型特点
灵活实体识别
能够识别任何用户定义的实体类型,而不仅限于预定义实体。
高效性能
相比大型语言模型,在资源受限的场景下更高效且体积更小。
多语言支持
支持多种语言的命名实体识别任务。
模型能力
命名实体识别
多语言文本处理
自定义实体类型识别
使用案例
信息提取
人物信息提取
从文本中识别和提取人物姓名及相关信息。
如识别 'Cristiano Ronaldo dos Santos Aveiro' 为 'person' 实体
奖项信息提取
从文本中识别和提取奖项名称。
如识别 'Ballon d'Or' 为 'award' 实体
日期信息提取
从文本中识别和提取日期信息。
如识别 '5 February 1985' 为 'date' 实体
体育新闻分析
球队信息提取
从体育新闻中识别和提取球队名称。
如识别 'Al Nassr' 和 'Portugal national team' 为 'teams' 实体
比赛信息提取
从体育新闻中识别和提取比赛名称。
如识别 'UEFA Champions Leagues' 和 'UEFA European Championship' 为 'competitions' 实体
license: apache-2.0
language:
- multilingual
library_name: gliner
datasets: - urchade/pile-mistral-v0.1
pipeline_tag: token-classification
关于
GLiNER 是一个命名实体识别(NER)模型,能够通过双向 Transformer 编码器(类似 BERT)识别任何实体类型。它为传统 NER 模型提供了实用替代方案,后者仅限于预定义实体,而大型语言模型(LLMs)虽然灵活,但在资源受限的场景下成本高昂且体积庞大。
链接
- 论文: https://arxiv.org/abs/2311.08526
- 代码库: https://github.com/urchade/GLiNER
安装
要使用此模型,必须安装 GLiNER Python 库:
!pip install gliner -U
使用
下载 GLiNER 库后,可以导入 GLiNER 类。然后使用 GLiNER.from_pretrained
加载此模型,并通过 predict_entities
预测实体。
from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-community/gliner_small-v2.5", load_tokenizer=True)
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions
命名实体识别基准测试结果
以下是模型当前版本与之前版本的性能对比:
其他数据集上的结果
模型 | 数据集 | 精确率 | 召回率 | F1 分数 |
---|---|---|---|---|
gliner-community/gliner_small-v2.5 | ACE 2004 | 35.18% | 22.81% | 27.67% |
ACE 2005 | 35.89% | 22.39% | 27.58% | |
AnatEM | 49.12% | 31.31% | 38.24% | |
Broad Tweet Corpus | 59.51% | 77.85% | 67.46% | |
CoNLL 2003 | 63.16% | 70.43% | 66.60% | |
FabNER | 23.78% | 22.55% | 23.15% | |
FindVehicle | 37.46% | 40.06% | 38.72% | |
GENIA_NER | 45.90% | 54.11% | 49.67% | |
HarveyNER | 13.20% | 32.58% | 18.78% | |
MultiNERD | 45.87% | 87.01% | 60.07% | |
Ontonotes | 23.05% | 41.16% | 29.55% | |
PolyglotNER | 31.88% | 67.22% | 43.25% | |
TweetNER7 | 40.98% | 39.91% | 40.44% | |
WikiANN en | 55.35% | 60.06% | 57.61% | |
WikiNeural | 64.52% | 86.24% | 73.81% | |
bc2gm | 51.70% | 49.99% | 50.83% | |
bc4chemd | 30.78% | 57.56% | 40.11% | |
bc5cdr | 63.48% | 69.65% | 66.42% | |
ncbi | 63.36% | 66.67% | 64.97% | |
平均 | 46.58% | |||
------------------------------------ | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_small-v2.1 | ACE 2004 | 38.89% | 23.53% | 29.32% |
ACE 2005 | 42.09% | 26.82% | 32.76% | |
AnatEM | 63.71% | 19.45% | 29.80% | |
Broad Tweet Corpus | 57.01% | 70.49% | 63.04% | |
CoNLL 2003 | 57.11% | 62.66% | 59.76% | |
FabNER | 32.41% | 12.33% | 17.87% | |
FindVehicle | 43.47% | 33.02% | 37.53% | |
GENIA_NER | 61.03% | 37.25% | 46.26% | |
HarveyNER | 23.12% | 15.16% | 18.32% | |
MultiNERD | 43.63% | 83.60% | 57.34% | |
Ontonotes | 23.25% | 35.41% | 28.07% | |
PolyglotNER | 29.47% | 64.41% | 40.44% | |
TweetNER7 | 44.78% | 30.83% | 36.52% | |
WikiANN en | 52.58% | 58.31% | 55.30% | |
WikiNeural | 53.38% | 82.19% | 64.72% | |
bc2gm | 66.64% | 30.56% | 41.90% | |
bc4chemd | 42.01% | 56.03% | 48.02% | |
bc5cdr | 72.03% | 58.58% | 64.61% | |
ncbi | 68.88% | 46.71% | 55.67% | |
平均 | 43.54% | |||
------------------------------------ | --------------------- | ----------- | -------- | ---------- |
EmergentMethods/gliner_small-v2.1 | ACE 2004 | 39.92% | 17.50% | 24.34% |
ACE 2005 | 38.53% | 16.58% | 23.18% | |
AnatEM | 55.95% | 25.69% | 35.22% | |
Broad Tweet Corpus | 66.63% | 72.00% | 69.21% | |
CoNLL 2003 | 62.89% | 58.96% | 60.86% | |
FabNER | 32.76% | 13.33% | 18.95% | |
FindVehicle | 42.93% | 43.20% | 43.06% | |
GENIA_NER | 51.28% | 43.75% | 47.22% | |
HarveyNER | 24.82% | 21.52% | 23.05% | |
MultiNERD | 59.27% | 80.69% | 68.34% | |
Ontonotes | 32.97% | 37.59% | 35.13% | |
PolyglotNER | 33.60% | 63.30% | 43.90% | |
TweetNER7 | 46.90% | 28.66% | 35.58% | |
WikiANN en | 51.91% | 55.43% | 53.61% | |
WikiNeural | 70.65% | 82.21% | 75.99% | |
bc2gm | 49.95% | 43.13% | 46.29% | |
bc4chemd | 35.88% | 71.64% | 47.81% | |
bc5cdr | 68.41% | 68.90% | 68.65% | |
ncbi | 55.31% | 59.87% | 57.50% | |
平均 | 46.20% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
gliner-community/gliner_medium-v2.5 | ACE 2004 | 33.06% | 20.96% | 25.66% |
ACE 2005 | 33.65% | 19.65% | 24.81% | |
AnatEM | 52.03% | 35.28% | 42.05% | |
Broad Tweet Corpus | 60.57% | 79.09% | 68.60% | |
CoNLL 2003 | 63.80% | 68.31% | 65.98% | |
FabNER | 26.20% | 22.26% | 24.07% | |
FindVehicle | 41.95% | 40.68% | 41.30% | |
GENIA_NER | 51.83% | 62.34% | 56.60% | |
HarveyNER | 14.04% | 32.17% | 19.55% | |
MultiNERD | 47.63% | 88.78% | 62.00% | |
Ontonotes | 21.68% | 38.41% | 27.71% | |
PolyglotNER | 32.73% | 68.27% | 44.24% | |
TweetNER7 | 40.39% | 37.64% | 38.97% | |
WikiANN en | 56.41% | 59.90% | 58.10% | |
WikiNeural | 65.61% | 86.28% | 74.54% | |
bc2gm | 55.20% | 56.71% | 55.95% | |
bc4chemd | 35.94% | 63.67% | 45.94% | |
bc5cdr | 63.50% | 70.09% | 66.63% | |
ncbi | 62.96% | 68.55% | 65.63% | |
平均 | 47.81% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_medium-v2.1 | ACE 2004 | 36.33% | 22.74% | 27.97% |
ACE 2005 | 40.49% | 25.46% | 31.27% | |
AnatEM | 59.75% | 16.87% | 26.31% | |
Broad Tweet Corpus | 60.89% | 67.25% | 63.91% | |
CoNLL 2003 | 60.62% | 62.39% | 61.50% | |
FabNER | 27.72% | 12.24% | 16.98% | |
FindVehicle | 41.55% | 31.31% | 35.71% | |
GENIA_NER | 60.86% | 43.93% | 51.03% | |
HarveyNER | 23.20% | 23.16% | 23.18% | |
MultiNERD | 41.25% | 83.74% | 55.27% | |
Ontonotes | 20.58% | 34.11% | 25.67% | |
PolyglotNER | 31.32% | 64.22% | 42.11% | |
TweetNER7 | 44.52% | 33.42% | 38.18% | |
WikiANN en | 54.57% | 56.47% | 55.51% | |
WikiNeural | 57.60% | 81.57% | 67.52% | |
bc2gm | 67.98% | 33.45% | 44.84% | |
bc4chemd | 45.66% | 52.00% | 48.62% | |
bc5cdr | 72.20% | 58.12% | 64.40% | |
ncbi | 73.12% | 49.74% | 59.20% | |
平均 | 44.17% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
EmergentMethods/gliner_news_medium-v2.1 | ACE 2004 | 39.21% | 17.24% | 23.95% |
ACE 2005 | 39.82% | 16.48% | 23.31% | |
AnatEM | 57.67% | 23.57% | 33.46% | |
Broad Tweet Corpus | 69.52% | 65.94% | 67.69% | |
CoNLL 2003 | 68.26% | 58.45% | 62.97% | |
FabNER | 30.74% | 15.51% | 20.62% | |
FindVehicle | 40.33% | 37.37% | 38.79% | |
GENIA_NER | 53.70% | 47.73% | 50.54% | |
HarveyNER | 26.29% | 27.05% | 26.67% | |
MultiNERD | 56.78% | 81.96% | 67.08% | |
Ontonotes | 30.90% | 35.86% | 33.19% | |
PolyglotNER | 35.98% | 60.96% | 45.25% | |
TweetNER7 | 52.37% | 30.50% | 38.55% | |
WikiANN en | 53.81% | 52.29% | 53.04% | |
WikiNeural | 76.84% | 78.92% | 77.86% | |
bc2gm | 62.97% | 44.24% | 51.96% | |
bc4chemd | 44.90% | 65.56% | 53.30% | |
bc5cdr | 73.93% | 67.03% | 70.31% | |
ncbi | 69.53% | 60.82% | 64.88% | |
平均 | 47.55% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
gliner-community/gliner_large-v2.5 | ACE 2004 | 31.64% | 22.81% | 26.51% |
ACE 2005 | 32.10% | 22.56% | 26.49% | |
AnatEM | 53.64% | 27.82% | 36.64% | |
Broad Tweet Corpus | 61.93% | 76.85% | 68.59% | |
CoNLL 2003 | 62.83% | 67.71% | 65.18% | |
FabNER | 24.54% | 27.03% | 25.73% | |
FindVehicle | 40.71% | 56.24% | 47.23% | |
GENIA_NER | 43.56% | 52.56% | 47.64% | |
HarveyNER | 14.85% | 27.05% | 19.17% | |
MultiNERD | 38.04% | 89.17% | 53.33% | |
Ontonotes | 17.28% | 40.16% | 24.16% | |
PolyglotNER | 32.88% | 63.31% | 43.28% | |
TweetNER7 | 38.03% | 41.43% | 39.66% | |
WikiANN en | 57.80% | 60.54% | 59.14% | |
WikiNeural | 67.72% | 83.94% | 74.96% | |
bc2gm | 54.74% | 48.54% | 51.45% | |
bc4chemd | 40.20% | 58.66% | 47.71% | |
bc5cdr | 66.27% | 71.95% | 69.00% | |
ncbi | 68.09% | 61.55% | 64.65% | |
平均 | 46.87% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_large-v2.1 | ACE 2004 | 37.52% | 25.38% | 30.28% |
ACE 2005 | 39.02% | 29.00% | 33.27% | |
AnatEM | 52.86% | 13.64% | 21.68% | |
Broad Tweet Corpus | 51.44% | 71.73% | 59.91% | |
CoNLL 2003 | 54.86% | 64.98% | 59.49% | |
FabNER | 23.98% | 16.00% | 19.19% | |
FindVehicle | 47.04% | 57.53% | 51.76% | |
GENIA_NER | 58.10% | 49.98% | 53.74% | |
HarveyNER | 16.29% | 21.93% | 18.69% | |
MultiNERD | 34.09% | 85.43% | 48.74% | |
Ontonotes | 14.02% | 32.01% | 19.50% | |
PolyglotNER | 28.53% | 64.92% | 39.64% | |
TweetNER7 | 38.00% | 34.34% | 36.08% | |
WikiANN en | 51.69% | 59.92% | 55.50% | |
WikiNeural | 50.94% | 82.08% | 62.87% | |
bc2gm | 64.48% | 32.47% | 43.19% | |
bc4chemd | 48.66% | 57.52% | 52.72% | |
bc5cdr | 72.19% | 64.27% | 68.00% | |
ncbi | 69.54% | 52.25% | 59.67% | |
平均 | 43.89% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
EmergenMethods/fliner_news_large-v2.1 | ACE 2004 | 43.19% | 18.39% | 25.80% |
ACE 2005 | 45.24% | 21.20% | 28.87% | |
AnatEM | 61.51% | 21.66% | 32.04% | |
Broad Tweet Corpus | 69.38% | 68.99% | 69.18% | |
CoNLL 2003 | 61.47% | 52.18% | 56.45% | |
FabNER | 27.42% | 19.11% | 22.52% | |
FindVehicle | 46.30% | 62.48% | 53.19% | |
GENIA_NER | 54.13% | 54.02% | 54.07% | |
HarveyNER | 15.91% | 15.78% | 15.84% | |
MultiNERD | 53.73% | 79.07% | 63.98% | |
Ontonotes | 26.78% | 39.77% | 32.01% | |
PolyglotNER | 34.28% | 55.87% | 42.49% | |
TweetNER7 | 48.06% | 28.18% | 35.53% | |
WikiANN en | 53.66% | 51.34% | 52.47% | |
WikiNeural | 69.81% | 70.75% | 70.28% | |
bc2gm | 59.83% | 37.62% | 46.20% | |
bc4chemd | 46.24% | 69.15% | 55.42% | |
bc5cdr | 71.94% | 70.37% | 71.15% | |
ncbi | 70.17% | 61.44% | 65.52% | |
平均 | 47.00% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
其他可用模型
版本 | 模型名称 | 参数量 | 语言 | 许可证 |
---|---|---|---|---|
v0 | urchade/gliner_base urchade/gliner_multi |
209M 209M |
英语 多语言 |
cc-by-nc-4.0 |
v1 | urchade/gliner_small-v1 urchade/gliner_medium-v1 urchade/gliner_large-v1 |
166M 209M 459M |
英语 英语 英语 |
cc-by-nc-4.0 |
v2 | urchade/gliner_small-v2 urchade/gliner_medium-v2 urchade/gliner_large-v2 |
166M 209M 459M |
英语 英语 英语 |
apache-2.0 |
v2.1 | urchade/gliner_small-v2.1 urchade/gliner_medium-v2.1 urchade/gliner_large-v2.1 urchade/gliner_multi-v2.1 |
166M 209M 459M 209M |
英语 英语 英语 多语言 |
apache-2.0 |
模型作者
模型作者包括:
- Urchade Zaratiana
- Ihor Stepanov
- Nadi Tomeh
- Pierre Holat
- Thierry Charnois
引用
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Indonesian Roberta Base Posp Tagger
MIT
这是一个基于印尼语RoBERTa模型微调的词性标注模型,在indonlu数据集上训练,用于印尼语文本的词性标注任务。
序列标注
Transformers

其他
I
w11wo
2.2M
7
Bert Base NER
MIT
基于BERT微调的命名实体识别模型,可识别四类实体:地点(LOC)、组织机构(ORG)、人名(PER)和杂项(MISC)
序列标注
英语
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
该模型是基于RoBERTa微调的序列标注模型,用于识别和移除医疗记录中的受保护健康信息(PHI/PII)。
序列标注
Transformers

支持多种语言
D
obi
1.1M
33
Ner English Fast
Flair自带的英文快速4类命名实体识别模型,基于Flair嵌入和LSTM-CRF架构,在CoNLL-03数据集上达到92.92的F1分数。
序列标注
PyTorch
英语
N
flair
978.01k
24
French Camembert Postag Model
基于Camembert-base的法语词性标注模型,使用free-french-treebank数据集训练
序列标注
Transformers

法语
F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
基于XLM-Roberta-large架构微调的西班牙语命名实体识别模型,在CoNLL-2002数据集上表现优异。
序列标注
Transformers

西班牙语
X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
基于NusaBert-v1.3在印尼语NER任务上微调的命名实体识别模型
序列标注
Transformers

其他
N
cahya
759.09k
3
Ner English Large
Flair框架内置的英文4类大型NER模型,基于文档级XLM-R嵌入和FLERT技术,在CoNLL-03数据集上F1分数达94.36。
序列标注
PyTorch
英语
N
flair
749.04k
44
Punctuate All
MIT
基于xlm-roberta-base微调的多语言标点符号预测模型,支持12种欧洲语言的标点符号自动补全
序列标注
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
基于xlm-roberta-base微调的日语命名实体识别模型
序列标注
Transformers

支持多种语言
X
tsmatz
630.71k
25
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers

支持多种语言
L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers

英语
C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统
中文
R
uer
2,694
98
AIbase是一个专注于MCP服务的平台,为AI开发者提供高质量的模型上下文协议服务,助力AI应用开发。
简体中文