TinyStories-656K开源故事生成模型 - 轻量级参数免费创作故事

首页

Tinystories 656K

由 raincandy-u 开发

TinyStories-656K 是一个轻量级的Transformer语言模型，专注于故事生成，仅使用约600k参数。

文本生成

Transformers

英语开源协议:Apache-2.0 #轻量级故事生成 #Llama架构 #GQA注意力

下载量 119

发布时间 : 6/12/2024

模型简介

该模型在TinyStoriesV2数据集上从头开始训练，采用Llama架构和GQA技术，旨在以较少的参数实现故事生成功能。

模型特点

轻量级设计

仅使用约600k参数，适合资源有限的环境。

高效训练

在TinyStoriesV2数据集上从头开始训练，采用BPE词汇表训练。

先进架构

采用Llama架构和GQA技术，提升模型性能。

模型能力

故事生成

文本续写

使用案例

教育

儿童故事生成

生成适合儿童阅读的简短故事。

生成的故事具有逻辑性和趣味性，适合儿童阅读。

娱乐

创意写作辅助

帮助用户进行创意写作，提供故事开头或续写。

生成的文本具有连贯性和创造性。

🚀 TinyStories-656K

这是一个在TinyStoriesV2数据集上从头开始训练的语言模型。目标是打造一个仅用600k左右参数就能生成故事的Transformer语言模型。

🚀 快速开始

本项目旨在构建一个轻量级的Transformer语言模型，通过在TinyStoriesV2数据集上进行训练，实现故事生成功能。你可以通过以下链接获取项目代码：Here

✨ 主要特性

架构方面：采用Llama架构。
技术应用：运用GQA（Grouped Query Attention）技术。
参数设置：隐藏层大小为128；使用tie_word_embeddings；词汇表大小为2048（在TinystoriesV2上使用BPE从头开始训练）；包含2个Transformer层。

📦 安装指南

暂未提供相关安装步骤。

💻 使用示例

基础用法

以下是完整的训练参数设置代码：

training_args = TrainingArguments(
  do_train=True,
  per_device_train_batch_size=16,
  gradient_accumulation_steps=1,
  learning_rate=0.004629403549377777,
  lr_scheduler_type="constant",
  bf16=True,
  logging_steps=5,
  num_train_epochs=2,
  save_steps=10000000,
  seed=3407,report_to=None
)

高级用法

生成模板

<|start_story|>Once upon a time,

生成示例

Once upon a time, there was a little boy named Tim. Tim had a toy car that he loved to play with. One day, he went to the park with his mom. Tim saw a toy car on the ground. Tim wanted to play with the car to his mom and said, "Mom, can I play with your car with my car too?"
His mom said, "Yes, but we must not take turns." Tim felt sad, but he knew he had to go. He asked his mom for help. His mom said, "Okay, let's clean it together." They went to play together and played the toy car. They had a lot of fun.
After they finished the car together, Tim and his mom were surprised. They did not know that the car was not a toy car like it was a magic car. Tim had an idea. He put the car in the car and put the car on it. He pushed the car on the car on the car car and pulled it down. Tim was so happy. He played with the car with his car all day long, and Tim was very happy.<|end_story|>

🔧 技术细节

该模型在TinyStoriesV2数据集上从头开始训练，致力于以较少的参数（约600k）实现故事生成功能。采用Llama架构和GQA技术，隐藏层大小为128，使用tie_word_embeddings，词汇表大小为2048，通过BPE在TinystoriesV2上进行训练，包含2个Transformer层。训练过程中使用了特定的训练参数，如学习率、批量大小等，以确保模型的性能和稳定性。