language: zh
SKEP-Roberta
简介
SKEP(情感知识增强预训练模型)由百度于2020年提出,专为情感分析任务设计。该模型通过情感掩码技术和三项情感预训练目标,融合了多类型知识进行预训练。
更多细节详见:https://aclanthology.org/2020.acl-main.374.pdf
已发布模型信息
模型名称 |
语言 |
模型结构 |
skep-roberta-large |
英文 |
层数:24, 隐藏层维度:1024, 注意力头数:24 |
本PyTorch版本模型由官方PaddlePaddle版SKEP模型转换而来,并通过系列实验验证了转换准确性。
- 官方PaddlePaddle版SKEP资源:
- https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/skep
- https://github.com/baidu/Senta
- PyTorch转换代码库:暂未公开
使用指南
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
model = AutoModel.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
import torch
from transformers import RobertaTokenizer, RobertaForMaskedLM
tokenizer = RobertaTokenizer.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
input_tx = "<s> He like play with student, so he became a <mask> after graduation </s>"
tokenized_text = tokenizer.tokenize(input_tx)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([[0] * len(tokenized_text)])
model = RobertaForMaskedLM.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
model.eval()
with torch.no_grad():
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
predictions = outputs[0]
predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0, (len(tokenized_text) - 1))]
predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in
range(1, (len(tokenized_text) - 1))]
print('预测结果为:', predicted_token)
引用文献
@article{tian2020skep,
title={SKEP:面向情感分析的情感知识增强预训练模型},
author={田浩, 高灿, 肖新彦, 刘浩, 何波涛, 吴华, 王海峰, 吴枫},
journal={arXiv预印本 arXiv:2005.05635},
year={2020}
}
参考资源:
https://github.com/nghuyong/ERNIE-Pytorch