许可证: MIT
标签:
- 训练生成
基础模型: neuralmind/bert-base-portuguese-cased
模型索引:
- 名称: output
结果: []
output
该模型是基于neuralmind/bert-base-portuguese-cased在None数据集上微调的版本。
在评估集上取得如下结果:
模型描述
需要更多信息
使用范围与限制
需要更多信息
训练与评估数据
需要更多信息
训练流程
训练超参数
训练过程中使用的超参数如下:
- 学习率: 0.0001
- 训练批次大小: 16
- 评估批次大小: 16
- 随机种子: 42
- 梯度累积步数: 8
- 总训练批次大小: 128
- 优化器: Adam (β1=0.9, β2=0.999, ε=1e-06)
- 学习率调度器类型: 线性
- 学习率预热步数: 10000
- 训练轮数: 15.0
- 混合精度训练: Native AMP
训练结果
训练损失 |
轮次 |
步数 |
验证损失 |
1.1985 |
0.22 |
2500 |
1.0940 |
1.0937 |
0.44 |
5000 |
1.0033 |
1.0675 |
0.66 |
7500 |
0.9753 |
1.0565 |
0.87 |
10000 |
0.9801 |
1.0244 |
1.09 |
12500 |
0.9526 |
0.9943 |
1.31 |
15000 |
0.9298 |
0.9799 |
1.53 |
17500 |
0.9035 |
0.95 |
1.75 |
20000 |
0.8835 |
0.933 |
1.97 |
22500 |
0.8636 |
0.9079 |
2.18 |
25000 |
0.8507 |
0.8938 |
2.4 |
27500 |
0.8397 |
0.8781 |
2.62 |
30000 |
0.8195 |
0.8647 |
2.84 |
32500 |
0.8088 |
0.8422 |
3.06 |
35000 |
0.7954 |
0.831 |
3.28 |
37500 |
0.7871 |
0.8173 |
3.5 |
40000 |
0.7721 |
0.8072 |
3.71 |
42500 |
0.7611 |
0.8011 |
3.93 |
45000 |
0.7532 |
0.7828 |
4.15 |
47500 |
0.7431 |
0.7691 |
4.37 |
50000 |
0.7367 |
0.7659 |
4.59 |
52500 |
0.7292 |
0.7606 |
4.81 |
55000 |
0.7245 |
0.8082 |
5.02 |
57500 |
0.7696 |
0.8114 |
5.24 |
60000 |
0.7695 |
0.8022 |
5.46 |
62500 |
0.7613 |
0.7986 |
5.68 |
65000 |
0.7558 |
0.8018 |
5.9 |
67500 |
0.7478 |
0.782 |
6.12 |
70000 |
0.7435 |
0.7743 |
6.34 |
72500 |
0.7367 |
0.774 |
6.55 |
75000 |
0.7313 |
0.7692 |
6.77 |
77500 |
0.7270 |
0.7604 |
6.99 |
80000 |
0.7200 |
0.7468 |
7.21 |
82500 |
0.7164 |
0.7486 |
7.43 |
85000 |
0.7117 |
0.7399 |
7.65 |
87500 |
0.7043 |
0.7306 |
7.86 |
90000 |
0.6956 |
0.7243 |
8.08 |
92500 |
0.6959 |
0.7132 |
8.3 |
95000 |
0.6916 |
0.71 |
8.52 |
97500 |
0.6853 |
0.7128 |
8.74 |
100000 |
0.6855 |
0.7088 |
8.96 |
102500 |
0.6809 |
0.7002 |
9.18 |
105000 |
0.6784 |
0.6953 |
9.39 |
107500 |
0.6737 |
0.695 |
9.61 |
110000 |
0.6714 |
0.6871 |
9.83 |
112500 |
0.6687 |
0.7161 |
10.05 |
115000 |
0.6961 |
0.7265 |
10.27 |
117500 |
0.7006 |
0.7284 |
10.49 |
120000 |
0.6941 |
0.724 |
10.7 |
122500 |
0.6887 |
0.7266 |
10.92 |
125000 |
0.6931 |
0.7051 |
11.14 |
127500 |
0.6846 |
0.7106 |
11.36 |
130000 |
0.6816 |
0.7011 |
11.58 |
132500 |
0.6830 |
0.6997 |
11.8 |
135000 |
0.6784 |
0.6969 |
12.02 |
137500 |
0.6734 |
0.6968 |
12.23 |
140000 |
0.6709 |
0.6867 |
12.45 |
142500 |
0.6656 |
0.6925 |
12.67 |
145000 |
0.6661 |
0.6795 |
12.89 |
147500 |
0.6606 |
0.6774 |
13.11 |
150000 |
0.6617 |
0.6756 |
13.33 |
152500 |
0.6563 |
0.6728 |
13.54 |
155000 |
0.6547 |
0.6732 |
13.76 |
157500 |
0.6520 |
0.6704 |
13.98 |
160000 |
0.6492 |
0.6666 |
14.2 |
162500 |
0.6446 |
0.6615 |
14.42 |
165000 |
0.6488 |
0.6638 |
14.64 |
167500 |
0.6523 |
0.6588 |
14.85 |
170000 |
0.6415 |
框架版本
- Transformers 4.12.5
- Pytorch 1.10.1+cu113
- Datasets 1.17.0
- Tokenizers 0.10.3
引用与作者
若使用本作品,请引用:
@incollection{Viegas_2023,
doi = {10.1007/978-3-031-36805-9_24},
url = {https://doi.org/10.1007%2F978-3-031-36805-9_24},
year = 2023,
publisher = {Springer Nature Switzerland},
pages = {349--365},
author = {Charles F. O. Viegas and Bruno C. Costa and Renato P. Ishii},
title = {{JurisBERT}: A New Approach that~Converts a~Classification Corpus into~an~{STS} One},
booktitle = {Computational Science and Its Applications {\textendash} {ICCSA} 2023}
}