Qwen2.5-0.5B-Instruct-Gensyn-Swarm开源对话模型 - 基于微调训练优化对话体验

首页

Qwen2.5 0.5B Instruct Gensyn Swarm Fierce Placid Whale

由 gangchen 开发

基于Gensyn/Qwen2.5-0.5B-Instruct微调的版本，采用TRL框架和GRPO算法训练

大型语言模型

Transformers

#强化学习微调 #GRPO算法优化 #小参数指令模型

下载量 3,053

发布时间 : 4/2/2025

模型简介

一个经过强化学习群体训练的指令微调语言模型，专注于文本生成任务

模型特点

GRPO算法训练

采用源自DeepSeekMath论文的GRPO方法进行训练

TRL框架

使用Hugging Face的Transformer强化学习框架进行训练

强化学习群体

通过群体训练方式优化模型性能

模型能力

文本生成

指令理解

对话生成

使用案例

创意写作

时光机选择场景

生成关于时间旅行选择的创意回答

可产生富有想象力的文本输出

对话系统

开放域对话

用于构建开放域对话系统

能够理解指令并生成连贯回复

🚀 Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale

本模型是基于Transformer架构的微调语言模型，在问答、文本生成等自然语言处理任务中表现出色。它基于预训练模型进一步优化，能更精准地理解和生成文本。

🚀 快速开始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="gangchen/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

本模型是 Gensyn/Qwen2.5-0.5B-Instruct 的微调版本。
使用 TRL 进行训练。

🔧 技术细节

训练方法

本模型使用GRPO方法进行训练，该方法在论文 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 中被提出。

框架版本

TRL: 0.15.2
Transformers: 4.51.3
Pytorch: 2.5.1
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 许可证

本模型遵循 license 许可证。

📚 详细文档

模型信息

属性	详情
基础模型	Gensyn/Qwen2.5-0.5B-Instruct
库名称	transformers
模型名称	Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale
标签	generated_from_trainer, rl-swarm, grpo, gensyn, I am fierce placid whale, trl

引用信息

引用GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

引用TRL

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}