OpenRS3-GRPO-ja开源AI模型 - 免费部署助力日语数学推理任务

首页

Openrs3 GRPO Ja

由 EQUES 开发

OpenRS3-GRPO-ja是基于SakanaAI/TinySwallow-1.5B-Instruct模型在日语数学指令数据集上微调的版本，采用GRPO方法训练，专注于数学推理任务。

大型语言模型

Transformers

#数学推理优化 #日语指令微调 #GRPO训练

下载量 25

发布时间 : 4/4/2025

模型简介

该模型是一个日语语言模型，专门针对数学推理任务进行了优化，适用于生成数学相关的指令响应。

模型特点

GRPO训练方法

采用DeepSeekMath论文中提出的GRPO方法进行训练，优化数学推理能力。

日语数学指令优化

在OpenMathInstruct-1-1.8m-ja日语数学指令数据集上微调，擅长处理日语数学问题。

TRL框架训练

使用TRL(基于Transformer的强化学习)框架进行训练，共进行了300步训练。

模型能力

日语文本生成

数学问题解答

指令理解与响应

使用案例

教育

数学问题解答

帮助学生理解和解答数学问题

生成详细的解题步骤和解释

研究

数学推理研究

用于数学推理能力的研究和评估

🚀 OpenRS3 - GRPO - ja

OpenRS3 - GRPO - ja 是一个经过微调的模型，基于特定数据集对基础模型进行优化，可用于文本生成等任务，为相关领域的应用提供了有力支持。

🚀 快速开始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="stardust-eques/OpenRS-GRPO-ja", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

该模型是 SakanaAI/TinySwallow - 1.5B - Instruct 在 kunishou/OpenMathInstruct - 1 - 1.8m - ja 数据集上的微调版本。
使用 TRL 进行训练，训练步数为 300。

📚 详细文档

训练过程

本模型使用 GRPO 方法进行训练，该方法在论文 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 中被提出。

框架版本

TRL: 0.16.0.dev0
Transformers: 4.49.0
Pytorch: 2.5.1
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 许可证

本项目遵循指定的许可证（licence: license）。

📚 引用信息

引用 GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

引用 TRL

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

信息表格

属性	详情
基础模型	SakanaAI/TinySwallow - 1.5B - Instruct
训练数据集	kunishou/OpenMathInstruct - 1 - 1.8m - ja
库名称	transformers
模型名称	OpenRS3 - GRPO - ja
标签	generated_from_trainer、open - r1、trl、grpo