语言: 中文
标签:
- roformer-v2
- pytorch
- tf2.0
推理: 否
简介
TensorFlow版本
https://github.com/ZhuiyiTechnology/roformer-v2
PyTorch版本+TensorFlow 2.0版本
https://github.com/JunnYu/RoFormer_pytorch
安装步骤
- 执行命令:pip install roformer==0.4.3
性能对比
CLUE开发集分类任务基准测试(base与large版本)
|
科大讯飞 |
腾讯新闻 |
AFQMC |
CMNLI |
OCNLI |
WSC |
CSL |
BERT |
60.06 |
56.80 |
72.41 |
79.56 |
73.93 |
78.62 |
83.93 |
RoBERTa |
60.64 |
58.06 |
74.05 |
81.24 |
76.00 |
87.50 |
84.50 |
RoFormer |
60.91 |
57.54 |
73.52 |
80.92 |
76.07 |
86.84 |
84.63 |
RoFormerV2* |
60.87 |
56.54 |
72.75 |
80.34 |
75.36 |
80.92 |
84.67 |
GAU-α |
61.41 |
57.76 |
74.17 |
81.82 |
75.86 |
79.93 |
85.67 |
RoFormer-pytorch(本仓库) |
60.60 |
57.51 |
74.44 |
80.79 |
75.67 |
86.84 |
84.77 |
RoFormerV2-pytorch(本仓库) |
62.87 |
59.03 |
76.20 |
80.85 |
79.73 |
87.82 |
91.87 |
GAU-α-pytorch(Adafactor优化器) |
61.18 |
57.52 |
73.42 |
80.91 |
75.69 |
80.59 |
85.5 |
GAU-α-pytorch(AdamW权重衰减0.01预热0.1) |
60.68 |
57.95 |
73.08 |
81.02 |
75.36 |
81.25 |
83.93 |
RoFormerV2-large-pytorch(本仓库) |
61.75 |
59.21 |
76.14 |
82.35 |
81.73 |
91.45 |
91.5 |
Chinesebert-large-pytorch |
61.25 |
58.67 |
74.70 |
82.65 |
79.63 |
87.83 |
84.97 |
CLUE-1.0测试集分类任务结果(base与large版本)
|
科大讯飞 |
腾讯新闻 |
AFQMC |
CMNLI |
OCNLI |
WSC |
CSL |
RoFormer-pytorch(本仓库) |
59.54 |
57.34 |
74.46 |
80.23 |
73.67 |
80.69 |
84.57 |
RoFormerV2-pytorch(本仓库) |
63.15 |
58.24 |
75.42 |
80.59 |
74.17 |
83.79 |
83.73 |
GAU-α-pytorch(Adafactor优化器) |
61.38 |
57.08 |
74.05 |
80.37 |
73.53 |
74.83 |
85.6 |
GAU-α-pytorch(AdamW权重衰减0.01预热0.1) |
60.54 |
57.67 |
72.44 |
80.32 |
72.97 |
76.55 |
84.13 |
RoFormerV2-large-pytorch(本仓库) |
61.85 |
59.13 |
76.38 |
80.97 |
76.23 |
85.86 |
84.33 |
Chinesebert-large-pytorch |
61.54 |
58.57 |
74.8 |
81.94 |
76.93 |
79.66 |
85.1 |
说明:
- 标注RoFormerV2*的模型为未进行多任务学习的版本,该模型未开源(感谢苏剑林老师的说明)。
- 非pytorch后缀结果引自GAU-alpha仓库。
- pytorch后缀结果为自行训练所得。
- 原代码直接使用CLS标签分类,本仓库采用增强分类头结构(增加2个Dropout层、1个全连接层和ReLU激活):
class RoFormer分类头(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.hidden_size, config.num_labels)
self.config = config
def forward(self, features, **kwargs):
x = features[:, 0, :]
x = self.dropout(x)
x = self.dense(x)
x = 激活函数[self.config.hidden_act](x)
x = self.dropout(x)
x = self.out_proj(x)
return x
PyTorch & TensorFlow 2.0使用示例
import torch
import tensorflow as tf
from transformers import BertTokenizer
from roformer import RoFormerForMaskedLM, TFRoFormerForMaskedLM
text = "今天[MASK]很好,我[MASK]去公园玩。"
tokenizer = BertTokenizer.from_pretrained("junnyu/roformer_v2_chinese_char_base")
pt_model = RoFormerForMaskedLM.from_pretrained("junnyu/roformer_v2_chinese_char_base")
tf_model = TFRoFormerForMaskedLM.from_pretrained(
"junnyu/roformer_v2_chinese_char_base", from_pt=True
)
pt_inputs = tokenizer(text, return_tensors="pt")
tf_inputs = tokenizer(text, return_tensors="tf")
with torch.no_grad():
pt_outputs = pt_model(**pt_inputs).logits[0]
pt_outputs_sentence = "PyTorch预测: "
for i, id in enumerate(tokenizer.encode(text)):
if id == tokenizer.mask_token_id:
tokens = tokenizer.convert_ids_to_tokens(pt_outputs[i].topk(k=5)[1])
pt_outputs_sentence += "[" + "||".join(tokens) + "]"
else:
pt_outputs_sentence += "".join(
tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
)
print(pt_outputs_sentence)
tf_outputs = tf_model(**tf_inputs, training=False).logits[0]
tf_outputs_sentence = "TensorFlow预测: "
for i, id in enumerate(tokenizer.encode(text)):
if id == tokenizer.mask_token_id:
tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(tf_outputs[i], k=5)[1])
tf_outputs_sentence += "[" + "||".join(tokens) + "]"
else:
tf_outputs_sentence += "".join(
tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
)
print(tf_outputs_sentence)
文献引用
Bibtex格式:
@misc{su2021roformer,
title={RoFormer: 基于旋转位置编码的增强型Transformer模型},
author={苏剑林 and 卢宇 and 潘胜峰 and 温波 and 刘云峰},
year={2021},
eprint={2104.09864},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@techreport{roformerv2,
title={RoFormerV2: 更快速更强大的RoFormer - 追一科技},
author={苏剑林, 潘胜峰, 温波, 刘云峰},
year={2022},
url="https://github.com/ZhuiyiTechnology/roformer-v2",
}