🚀 TinyStories-656K
这是一个在TinyStoriesV2数据集上从头开始训练的语言模型。目标是打造一个仅用600k左右参数就能生成故事的Transformer语言模型。
🚀 快速开始
本项目旨在构建一个轻量级的Transformer语言模型,通过在TinyStoriesV2数据集上进行训练,实现故事生成功能。你可以通过以下链接获取项目代码:Here
✨ 主要特性
- 架构方面:采用Llama架构。
- 技术应用:运用GQA(Grouped Query Attention)技术。
- 参数设置:隐藏层大小为128;使用
tie_word_embeddings
;词汇表大小为2048(在TinystoriesV2上使用BPE从头开始训练);包含2个Transformer层。
📦 安装指南
暂未提供相关安装步骤。
💻 使用示例
基础用法
以下是完整的训练参数设置代码:
training_args = TrainingArguments(
do_train=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=0.004629403549377777,
lr_scheduler_type="constant",
bf16=True,
logging_steps=5,
num_train_epochs=2,
save_steps=10000000,
seed=3407,report_to=None
)
高级用法
生成模板
<|start_story|>Once upon a time,
生成示例
Once upon a time, there was a little boy named Tim. Tim had a toy car that he loved to play with. One day, he went to the park with his mom. Tim saw a toy car on the ground. Tim wanted to play with the car to his mom and said, "Mom, can I play with your car with my car too?"
His mom said, "Yes, but we must not take turns." Tim felt sad, but he knew he had to go. He asked his mom for help. His mom said, "Okay, let's clean it together." They went to play together and played the toy car. They had a lot of fun.
After they finished the car together, Tim and his mom were surprised. They did not know that the car was not a toy car like it was a magic car. Tim had an idea. He put the car in the car and put the car on it. He pushed the car on the car on the car car and pulled it down. Tim was so happy. He played with the car with his car all day long, and Tim was very happy.<|end_story|>
推荐生成配置
do_sample=True,
top_k=40,
top_p=0.9,
temperature=0.6
🔧 技术细节
该模型在TinyStoriesV2数据集上从头开始训练,致力于以较少的参数(约600k)实现故事生成功能。采用Llama架构和GQA技术,隐藏层大小为128,使用tie_word_embeddings
,词汇表大小为2048,通过BPE在TinystoriesV2上进行训练,包含2个Transformer层。训练过程中使用了特定的训练参数,如学习率、批量大小等,以确保模型的性能和稳定性。
📄 许可证
本项目采用Apache-2.0许可证。