STILL-3-1.5B-preview开源AI模型 - 强化推理实现数学测试高准确率

首页

STILL 3 1.5B Preview

由 RUC-AIBOX 开发

STILL-3-1.5B-preview是一款采用强化学习技术增强推理能力的慢思考模型，在AIME基准测试中达到39.33%准确率

大型语言模型

Transformers

#慢思考推理 #数学推理增强 #强化学习优化

下载量 2,186

发布时间 : 1/25/2025

模型简介

15亿参数规模的慢思考推理模型，通过强化学习技术提升数学推理能力，特别适合解决复杂数学问题

模型特点

慢思考推理能力

采用强化学习技术增强模型的逐步推理能力

小模型高性能

在15亿参数规模下实现接近大模型的数学推理能力

多基准测试验证

在MATH、AIME、OMNI和LiveAOPS等多个数学基准上表现优异

模型能力

数学问题求解

符号推理

多步数学推导

极坐标转换等数学运算

使用案例

数学教育

数学竞赛题解答

解决AIME等数学竞赛中的复杂问题

AIME基准准确率39.33%

数学概念应用

坐标系转换等数学概念的实际应用

学术研究

小模型推理能力研究

研究强化学习对小模型推理能力的提升效果

相对基线提升37.18%

🚀 STILL-3-1.5B-preview：慢思考推理模型

我们发布了 STILL-3-1.5B-preview，这是一个慢思考推理模型，在AIME基准测试中达到了39.33%的准确率！我们在15亿参数的模型上应用了强化学习，并观察到随着训练步数的增加，模型性能持续提升。为了更好地复现我们的工作并推动研究进展，我们开源了代码、模型和数据。

代码链接：https://github.com/RUCAIBox/Slow_Thinking_with_LLMs

🚀 快速开始

from transformers import AutoTokenizer, AutoModelForCausalLM
from vllm import LLM, SamplingParams

# 加载模型和分词器
tokenizer = AutoTokenizer.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")
model = AutoModelForCausalLM.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")

# 输入文本
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"

input_prompts = tokenizer.apply_chat_template(
                [
                {"role": "user", "content": question}],
                tokenize=False,
                add_generation_prompt=True
            )

# 参数设置
llm = LLM(model=model_path, tensor_parallel_size=1, dtype='bfloat16')

sampling_params_gs = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=32768, stop=stop_words, seed=42, skip_special_tokens=False)

# 生成结果
responses = model.generate(input_prompts, sampling_params)
print(responses[0].outputs[0].text)

✨ 主要特性

我们对模型在四个基准测试上进行了评估：MATH、AIME、OMNI和LiveAOPS。对于MATH和AIME，我们采用了采样解码设置，采样温度为0.6，top-p采样概率为0.95。每个问题采样64次，并计算平均分。对于OMNI和LiveAOPS（2024年8月 - 11月），我们随机抽取了一部分答案作为整数以方便自动评估，并使用贪心搜索解码进行评估。训练后的模型STILL-3-1.5B-preview取得了显著的改进。AIME任务的准确率从28.67%提高到39.33%，相对提升了37.18%。

	MATH	AIME	OMNI	LiveAOPS	平均
基础模型	84.04	28.67	25.60	33.33	42.91
STILL-3-1.5B-preview	85.48	39.33	33.00	39.50	49.33

📚 详细文档

如果我们的报告对您的研究有帮助，请引用以下内容：

@article{Slow_Thinking_with_LLMs_3_Preview,
  title={STILL-3-1.5B-preview: Enhancing Slow Thinking Abilities of Small Models through Reinforcement Learning
},
  author={RUCAIBox STILL Team},
  url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
  year={2025}
}

@article{Slow_Thinking_with_LLMs_1,
  title={Enhancing LLM Reasoning with Reward-guided Tree Search},
  author={Jiang, Jinhao and Chen, Zhipeng and Min, Yingqian and Chen, Jie and Cheng, Xiaoxue and Wang, Jiapeng and Tang, Yiru and Sun, Haoxiang and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Yan, Dong and Xie, Jian and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2411.11694},
  year={2024}
}

@article{Slow_Thinking_with_LLMs_2,
  title={Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems},
  author={Min, Yingqian and Chen, Zhipeng and Jiang, Jinhao and Chen, Jie and Deng, Jia and Hu, Yiwen and Tang, Yiru and Wang, Jiapeng and Cheng, Xiaoxue and Song, Huatong and Zhao, Wayne Xin and Liu, Zheng and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2412.09413},
  year={2024}
}