library_name: transformers
tags:
- generated_from_trainer
model-index:
- name: wiki_13
results: []
wiki_13
该模型是基于未知数据集微调的版本,在评估集上取得如下结果:
模型描述
需补充更多信息
预期用途与限制
需补充更多信息
训练与评估数据
需补充更多信息
训练流程
训练超参数
训练过程中使用的超参数如下:
- 学习率:0.0001
- 训练批大小:16
- 评估批大小:16
- 随机种子:13
- 梯度累积步数:2
- 总训练批大小:32
- 优化器:Adam(beta1=0.9,beta2=0.999,epsilon=1e-08)
- 学习率调度器类型:线性
- 学习率预热步数:40000
- 总训练步数:100000
训练结果
训练损失 |
周期 |
步数 |
验证损失 |
无记录 |
1.5662 |
2000 |
7.6811 |
7.6872 |
3.1323 |
4000 |
6.6323 |
7.6872 |
4.6985 |
6000 |
6.5117 |
6.5145 |
6.2647 |
8000 |
6.4461 |
6.5145 |
7.8309 |
10000 |
6.3755 |
6.3572 |
9.3970 |
12000 |
6.3094 |
6.3572 |
10.9632 |
14000 |
6.2737 |
6.2339 |
12.5294 |
16000 |
6.2091 |
6.2339 |
14.0955 |
18000 |
6.1964 |
6.1124 |
15.6617 |
20000 |
6.1261 |
6.1124 |
17.2279 |
22000 |
5.9661 |
5.9136 |
18.7940 |
24000 |
5.5933 |
5.9136 |
20.3602 |
26000 |
5.0449 |
5.109 |
21.9264 |
28000 |
4.4819 |
5.109 |
23.4926 |
30000 |
4.0711 |
4.1502 |
25.0587 |
32000 |
3.7598 |
4.1502 |
26.6249 |
34000 |
3.4685 |
3.5347 |
28.1911 |
36000 |
3.3009 |
3.5347 |
29.7572 |
38000 |
3.1496 |
3.1576 |
31.3234 |
40000 |
3.0139 |
3.1576 |
32.8896 |
42000 |
2.9557 |
2.8847 |
34.4558 |
44000 |
2.8395 |
2.8847 |
36.0219 |
46000 |
2.7659 |
2.6809 |
37.5881 |
48000 |
2.6953 |
2.6809 |
39.1543 |
50000 |
2.6246 |
2.5261 |
40.7204 |
52000 |
2.5583 |
2.5261 |
42.2866 |
54000 |
2.5142 |
2.4073 |
43.8528 |
56000 |
2.4925 |
2.4073 |
45.4190 |
58000 |
2.4343 |
2.3129 |
46.9851 |
60000 |
2.4278 |
2.3129 |
48.5513 |
62000 |
2.3707 |
2.23 |
50.1175 |
64000 |
2.3806 |
2.23 |
51.6836 |
66000 |
2.3299 |
2.1662 |
53.2498 |
68000 |
2.3031 |
2.1662 |
54.8160 |
70000 |
2.2718 |
2.1093 |
56.3821 |
72000 |
2.2745 |
2.1093 |
57.9483 |
74000 |
2.2610 |
2.0596 |
59.5145 |
76000 |
2.2490 |
2.0596 |
61.0807 |
78000 |
2.1928 |
2.0165 |
62.6468 |
80000 |
2.1660 |
2.0165 |
64.2130 |
82000 |
2.1797 |
1.9818 |
65.7792 |
84000 |
2.1873 |
1.9818 |
67.3453 |
86000 |
2.1384 |
1.9505 |
68.9115 |
88000 |
2.1419 |
1.9505 |
70.4777 |
90000 |
2.1471 |
1.9231 |
72.0439 |
92000 |
2.1419 |
1.9231 |
73.6100 |
94000 |
2.1390 |
1.9072 |
75.1762 |
96000 |
2.1414 |
1.9072 |
76.7424 |
98000 |
2.1240 |
1.8894 |
78.3085 |
100000 |
2.1200 |
框架版本
- Transformers 4.45.2
- Pytorch 2.5.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1