🚀 BERTimbau Base(又名 "bert-base-portuguese-cased")
BERTimbau Base是一个针对巴西葡萄牙语的预训练BERT模型,在命名实体识别、句子文本相似度和识别文本蕴含这三个下游NLP任务中达到了最先进的性能。它有基础版和大型版两种规格。
🚀 快速开始
BERTimbau Base是一个针对巴西葡萄牙语的预训练BERT模型,若想了解更多信息或提出请求,请访问 BERTimbau仓库。
✨ 主要特性
BERTimbau Base在三个下游NLP任务(命名实体识别、句子文本相似度和识别文本蕴含)中达到了最先进的性能,且有基础版和大型版两种规格可供选择。
📦 安装指南
文档未提及具体安装步骤,跳过此章节。
💻 使用示例
基础用法
from transformers import AutoTokenizer
from transformers import AutoModelForPreTraining
from transformers import AutoModel
model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-base-portuguese-cased')
tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased', do_lower_case=False)
高级用法
掩码语言建模预测示例
from transformers import pipeline
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('Tinha uma [MASK] no meio do caminho.')
获取BERT嵌入
import torch
model = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')
input_ids = tokenizer.encode('Tinha uma pedra no meio do caminho.', return_tensors='pt')
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1]
📚 详细文档
可用模型
模型 |
架构 |
层数 |
参数数量 |
neuralmind/bert-base-portuguese-cased |
BERT基础版 |
12 |
1.1亿 |
neuralmind/bert-large-portuguese-cased |
BERT大型版 |
24 |
3.35亿 |
📄 许可证
本项目采用MIT许可证。
🔗 引用
如果您使用了我们的工作,请引用以下文献:
@inproceedings{souza2020bertimbau,
author = {F{\'a}bio Souza and
Rodrigo Nogueira and
Roberto Lotufo},
title = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20-23 (to appear)},
year = {2020}
}