模型简介
模型特点
模型能力
使用案例
base_model: meta-llama/Meta-Llama-3-70B inference: false model_creator: astronomer-io model_name: Meta-Llama-3-70B model_type: llama pipeline_tag: text-generation license: other license_name: llama-3 license_link: https://huggingface.co/meta-llama/Meta-Llama-3-70B/blob/main/README.md tags:
- llama
- llama-3
- meta
- astronomer
- pretrained
- finetuned
- autotrain_compatible
- endpoints_compatible

本模型由Astronomer慷慨创建并开源。
Astronomer是Apache Airflow的实际维护者,这是数据编排和MLOps领域最受信赖的开源框架。
Llama-3-70B-特殊标记调整版
- 专为微调优化的稳定版Llama-3-70B
- 原始模型创建者:Meta
- 原始模型:meta-llama/Meta-Llama-3-70B
- 使用本模型需遵守Llama 3社区许可协议
- 基于Meta Llama 3构建
- 由Astronomer的David Xue创建
模型说明
本模型与原始模型(meta-llama/Meta-Llama-3-70B)完全相同,但对输入输出嵌入矩阵中未训练特殊标记的权重进行了调整——使用已训练标记的均值进行填充。这些未训练标记曾导致用户在微调基础模型(无论是添加新标记还是使用现有特殊标记)时普遍出现问题。
开发动机
Llama 3基础版(非指令微调版)模型虽功能强大,但其架构中部分用于指令跟随的特殊标记未被训练,可能破坏后续微调过程。该问题最初由Daniel Han在X平台指出,揭示了这一被广泛使用模型的关键但可修复缺陷。

发布此修补版模型的主要目的是解决该问题,使社区能直接使用Llama 3模型而无需面对训练不稳定性(如梯度爆炸或NaN
梯度),也不必在微调前自行修复模型。
注:对于70B模型,未训练特殊标记的嵌入权重并非全零值,因此该问题的影响可能不如8B基础版严重。本模型应社区要求制作,理论上直接微调原模型也可行。
调整细节
从HuggingFace直接获取meta-llama/Meta-Llama-3-70B模型后,通过transformers加载。随后通过model.get_input_embeddings().weight.data
和model.get_output_embeddings().weight.data
获取输入输出嵌入矩阵。这两个矩阵形状相同,每行代表一个标记ID,每列代表一个嵌入特征。
通过定位嵌入值整行~全零~小于9e-7的行(70B模型中无全零行,故采用9e-7阈值筛选欠训练标记),可发现特殊(未训练且有问题的)标记。这些标记在Meta的预训练阶段未被训练,可能导致下游任务微调时出现严重计算问题(如梯度爆炸或NaN
梯度)。
点击查看符合"未训练"特征的标记列表:
['À', 'Á', 'õ', 'ö', '÷', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'þ', 'ÿ', '">ččĊ', ';čččĊ', 'ĉTokenNameIdentifier', 'ĠForCanBeConverted', 'ĠForCanBeConvertedToF', 'PostalCodesNL', '$PostalCodesNL', 'useRalative', 'Û±Û', 'аÑĢакÑĤ', 'аÑĤиÑģÑı', 'иÑĤиÑģÑı', 'ávajÃŃcÃŃ', 'İTESİ', 'илакÑĤи', 'илаÑģÑı', 'ÑĭÑŁN', 'ÐİÑĭÑŁN', 'ılmaktadır', 'ÐİÑĭÑŁNÐİÑĭÑŁN', 'ıldıģında', '<|reserved_special_token_0|>', '<|reserved_special_token_1|>', '<|reserved_special_token_2|>', '<|reserved_special_token_3|>', '<|start_header_id|>', '<|end_header_id|>', '<|reserved_special_token_4|>', '<|eot_id|>', '<|reserved_special_token_5|>', '<|reserved_special_token_6|>', '<|reserved_special_token_7|>', '<|reserved_special_token_8|>', '<|reserved_special_token_9|>', '<|reserved_special_token_10|>', '<|reserved_special_token_11|>', '<|reserved_special_token_12|>', '<|reserved_special_token_13|>', '<|reserved_special_token_14|>', '<|reserved_special_token_15|>', '<|reserved_special_token_16|>', '<|reserved_special_token_17|>', '<|reserved_special_token_18|>', '<|reserved_special_token_19|>', '<|reserved_special_token_20|>', '<|reserved_special_token_21|>', '<|reserved_special_token_22|>', '<|reserved_special_token_23|>', '<|reserved_special_token_24|>', '<|reserved_special_token_25|>', '<|reserved_special_token_26|>', '<|reserved_special_token_27|>', '<|reserved_special_token_28|>', '<|reserved_special_token_29|>', '<|reserved_special_token_30|>', '<|reserved_special_token_31|>', '<|reserved_special_token_32|>', '<|reserved_special_token_33|>', '<|reserved_special_token_34|>', '<|reserved_special_token_35|>', '<|reserved_special_token_36|>', '<|reserved_special_token_37|>', '<|reserved_special_token_38|>', '<|reserved_special_token_39|>', '<|reserved_special_token_40|>', '<|reserved_special_token_41|>', '<|reserved_special_token_42|>', '<|reserved_special_token_43|>', '<|reserved_special_token_44|>', '<|reserved_special_token_45|>', '<|reserved_special_token_46|>', '<|reserved_special_token_47|>', '<|reserved_special_token_48|>', '<|reserved_special_token_49|>', '<|reserved_special_token_50|>', '<|reserved_special_token_51|>', '<|reserved_special_token_52|>', '<|reserved_special_token_53|>', '<|reserved_special_token_54|>', '<|reserved_special_token_55|>', '<|reserved_special_token_56|>', '<|reserved_special_token_57|>', '<|reserved_special_token_58|>', '<|reserved_special_token_59|>', '<|reserved_special_token_60|>', '<|reserved_special_token_61|>', '<|reserved_special_token_62|>', '<|reserved_special_token_63|>', '<|reserved_special_token_64|>', '<|reserved_special_token_65|>', '<|reserved_special_token_66|>', '<|reserved_special_token_67|>', '<|reserved_special_token_68|>', '<|reserved_special_token_69|>', '<|reserved_special_token_70|>', '<|reserved_special_token_71|>', '<|reserved_special_token_72|>', '<|reserved_special_token_73|>', '<|reserved_special_token_74|>', '<|reserved_special_token_75|>', '<|reserved_special_token_76|>', '<|reserved_special_token_77|>', '<|reserved_special_token_78|>', '<|reserved_special_token_79|>', '<|reserved_special_token_80|>', '<|reserved_special_token_81|>', '<|reserved_special_token_82|>', '<|reserved_special_token_83|>', '<|reserved_special_token_84|>', '<|reserved_special_token_85|>', '<|reserved_special_token_86|>', '<|reserved_special_token_87|>', '<|reserved_special_token_88|>', '<|reserved_special_token_89|>', '<|reserved_special_token_90|>', '<|reserved_special_token_91|>', '<|reserved_special_token_92|>', '<|reserved_special_token_93|>', '<|reserved_special_token_94|>', '<|reserved_special_token_95|>', '<|reserved_special_token_96|>', '<|reserved_special_token_97|>', '<|reserved_special_token_98|>', '<|reserved_special_token_99|>', '<|reserved_special_token_100|>', '<|reserved_special_token_101|>', '<|reserved_special_token_102|>', '<|reserved_special_token_103|>', '<|reserved_special_token_104|>', '<|reserved_special_token_105|>', '<|reserved_special_token_106|>', '<|reserved_special_token_107|>', '<|reserved_special_token_108|>', '<|reserved_special_token_109|>', '<|reserved_special_token_110|>', '<|reserved_special_token_111|>', '<|reserved_special_token_112|>', '<|reserved_special_token_113|>', '<|reserved_special_token_114|>', '<|reserved_special_token_115|>', '<|reserved_special_token_116|>', '<|reserved_special_token_117|>', '<|reserved_special_token_118|>', '<|reserved_special_token_119|>', '<|reserved_special_token_120|>', '<|reserved_special_token_121|>', '<|reserved_special_token_122|>', '<|reserved_special_token_123|>', '<|reserved_special_token_124|>', '<|reserved_special_token_125|>', '<|reserved_special_token_126|>', '<|reserved_special_token_127|>', '<|reserved_special_token_128|>', '<|reserved_special_token_129|>', '<|reserved_special_token_130|>', '<|reserved_special_token_131|>', '<|reserved_special_token_132|>', '<|reserved_special_token_133|>', '<|reserved_special_token_134|>', '<|reserved_special_token_135|>', '<|reserved_special_token_136|>', '<|reserved_special_token_137|>', '<|reserved_special_token_138|>', '<|reserved_special_token_139|>', '<|reserved_special_token_140|>', '<|reserved_special_token_141|>', '<|reserved_special_token_142|>', '<|reserved_special_token_143|>', '<|reserved_special_token_144|>', '<|reserved_special_token_145|>', '<|reserved_special_token_146|>', '<|reserved_special_token_147|>', '<|reserved_special_token_148|>', '<|reserved_special_token_149|>', '<|reserved_special_token_150|>', '<|reserved_special_token_151|>', '<|reserved_special_token_152|>', '<|reserved_special_token_153|>', '<|reserved_special_token_154|>', '<|reserved_special_token_155|>', '<|reserved_special_token_156|>', '<|reserved_special_token_157|>', '<|reserved_special_token_158|>', '<|reserved_special_token_159|>', '<|reserved_special_token_160|>', '<|reserved_special_token_161|>', '<|reserved_special_token_162|>', '<|reserved_special_token_163|>', '<|reserved_special_token_164|>', '<|reserved_special_token_165|>', '<|reserved_special_token_166|>', '<|reserved_special_token_167|>', '<|reserved_special_token_168|>', '<|reserved_special_token_169|>', '<|reserved_special_token_170|>', '<|reserved_special_token_171|>', '<|reserved_special_token_172|>', '<|reserved_special_token_173|>', '<|reserved_special_token_174|>', '<|reserved_special_token_175|>', '<|reserved_special_token_176|>', '<|reserved_special_token_177|>', '<|reserved_special_token_178|>', '<|reserved_special_token_179|>', '<|reserved_special_token_180|>', '<|reserved_special_token_181|>', '<|reserved_special_token_182|>', '<|reserved_special_token_183|>', '<|reserved_special_token_184|>', '<|reserved_special_token_185|>', '<|reserved_special_token_186|>', '<|reserved_special_token_187|>', '<|reserved_special_token_188|>', '<|reserved_special_token_189|>', '<|reserved_special_token_190|>', '<|reserved_special_token_191|>', '<|reserved_special_token_192|>', '<|reserved_special_token_193|>', '<|reserved_special_token_194|>', '<|reserved_special_token_195|>', '<|reserved_special_token_196|>', '<|reserved_special_token_197|>', '<|reserved_special_token_198|>', '<|reserved_special_token_199|>', '<|reserved_special_token_200|>', '<|reserved_special_token_201|>', '<|reserved_special_token_202|>', '<|reserved_special_token_203|>', '<|reserved_special_token_204|>', '<|reserved_special_token_205|>', '<|reserved_special_token_206|>', '<|reserved_special_token_207|>', '<|reserved_special_token_208|>', '<|reserved_special_token_209|>', '<|reserved_special_token_210|>', '<|reserved_special_token_211|>', '<|reserved_special_token_212|>', '<|reserved_special_token_213|>', '<|reserved_special_token_214|>', '<|reserved_special_token_215|>', '<|reserved_special_token_216|>', '<|reserved_special_token_217|>', '<|reserved_special_token_218|>', '<|reserved_special_token_219|>', '<|reserved_special_token_220|>', '<|reserved_special_token_221|>', '<|reserved_special_token_222|>', '<|reserved_special_token_223|>', '<|reserved_special_token_224|>', '<|reserved_special_token_225|>', '<|reserved_special_token_226|>', '<|reserved_special_token_227|>', '<|reserved_special_token_228|>', '<|reserved_special_token_229|>', '<|reserved_special_token_230|>', '<|reserved_special_token_231|>', '<|reserved_special_token_232|>', '<|reserved_special_token_233|>', '<|reserved_special_token_234|>', '<|reserved_special_token_235|>', '<|reserved_special_token_236|>', '<|reserved_special_token_237|>', '<|reserved_special_token_238|>', '<|reserved_special_token_239|>', '<|reserved_special_token_240|>', '<|reserved_special_token_241|>', '<|reserved_special_token_242|>', '<|reserved_special_token_243|>', '<|reserved_special_token_244|>', '<|reserved_special_token_245|>', '<|reserved_special_token_246|>', '<|reserved_special_token_247|>', '<|reserved_special_token_248|>', '<|reserved_special_token_249|>', '<|reserved_special_token_250|>']识别这些未训练标记后,通过计算已训练标记各特征列嵌入值的总和除以已训练标记数量,得到均值。该操作同时对输入输出矩阵执行。
最后,将两个嵌入矩阵中有问题标记的行替换为计算所得的均值,完成调整。
贡献者
- David Xue,Astronomer机器学习工程师


