library_name: transformers
license: other
license_name: nvidia-open-model-license
license_link: >-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
pipeline_tag: text-generation
language:
- en
tags:
- nvidia
- reasoning
- math
- code
- reinforcement learning
- pytorch
AceReason-Nemotron:通过强化学习提升数学与代码推理能力
我们隆重推出AceReason-Nemotron-7B——一个完全通过强化学习(RL)训练的数学与代码推理模型,其基础模型为DeepSeek-R1-Distilled-Qwen-7B。该模型取得了令人瞩目的成果:在AIME 2024上达到69.0%(提升14.5%),AIME 2025上53.6%(提升17.4%),LiveCodeBench v5上51.8%(提升8%),LiveCodeBench v6上44.1%(提升7%)。我们通过大量消融实验系统研究了RL训练过程,并提出了一种简单而有效的方法:先对纯数学提示进行RL训练,再对纯代码提示进行RL训练。值得注意的是,我们发现纯数学RL不仅能显著增强强蒸馏模型在数学基准测试中的表现,还能提升代码推理任务的表现。此外,扩展的纯代码RL进一步提高了代码基准测试成绩,同时对数学结果的影响微乎其微。我们发现RL不仅能激发模型在预训练和监督微调(如蒸馏)中获得的基础推理能力,还能突破模型推理能力的极限,使其能够解决之前无法解决的问题。
我们在技术报告中分享了训练方案和训练日志。
成果展示
我们在AIME 2024、AIME 2025、LiveCodeBench v5(2024/08/01 - 2025/02/01)和LiveCodeBench v6(2025/02/01-2025/05/01)上,将我们的模型与Qwen2.5和Llama3.1系列中规模相当的竞争性推理模型进行了对比评估。更多评估结果详见技术报告。
模型 |
AIME 2024 (avg@64) |
AIME 2025 (avg@64) |
LCB v5 (avg@8) |
LCB v6 (avg@8) |
QwQ-32B |
79.5 |
65.8 |
63.4 |
- |
DeepSeek-R1-671B |
79.8 |
70.0 |
65.9 |
- |
Llama-Nemotron-Ultra-253B |
80.8 |
72.5 |
66.3 |
- |
o3-mini (medium) |
79.6 |
76.7 |
67.4 |
- |
Light-R1-7B |
59.1 |
44.3 |
40.6 |
36.4 |
Light-R1-14B |
74 |
60.2 |
57.9 |
51.5 |
DeepCoder-14B (32K Inference) |
71 |
56.1 |
57.9 |
50.4 |
OpenMath-Nemotron-7B |
74.8 |
61.2 |
- |
- |
OpenCodeReasoning-Nemotron-7B |
- |
- |
51.3 |
46.1 |
Llama-Nemotron-Nano-8B-v1 |
61.3 |
47.1 |
46.6 |
46.2 |
DeepSeek-R1-Distilled-Qwen-7B |
55.5 |
39.0 |
37.6 |
34.1 |
DeepSeek-R1-Distilled-Qwen-14B |
69.7 |
50.2 |
53.1 |
47.9 |
DeepSeek-R1-Distilled-Qwen-32B |
72.6 |
54.9 |
57.2 |
- |
AceReason-Nemotron-7B 🤗 |
69.0 |
53.6 |
51.8 |
44.1 |
AceReason-Nemotron-14B 🤗 |
78.6 |
67.4 |
61.1 |
54.9 |
使用方法
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768,
temperature=0.6,
top_p=0.95
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
使用建议
- 不要包含系统提示,直接将所有指令放在用户提示中。
- 对于数学问题,建议使用以下指令:请逐步推理,并将最终答案放在\boxed{}中。
- 对于代码问题,建议使用以下指令:
question = ""
starter_code = ""
code_instruction_nostartercode = """编写Python代码解决问题。请将解决方案代码按以下格式放置:\n```python\n# 你的解决方案代码\n```"""
code_instruction_hasstartercode = """请将解决方案代码按以下格式放置:\n```python\n# 你的解决方案代码\n```"""
if starter_code != "":
question += "\n\n" + "从提供的函数头开始解决问题。\n\n函数头:\n" + "```\n" + starter_code + "\n```"
question += "\n\n" + code_instruction_hasstartercode
else:
question += "\n\n" + code_instruction_nostartercode
final_prompt = "<|User|>" + question + "<|Assistant|><think>\n"
- 我们的评估推理引擎是vLLM==0.7.3,使用top-p=0.95,temperature=0.6,max_tokens=32768。
- 数学评估使用AceMath评分器,代码评估使用LiveCodeBench官方脚本。
联系方式
Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)
许可证
本模型的使用受NVIDIA开放模型许可证约束。
引用
@article{acereason2025,
title={AceReason-Nemotron:通过强化学习提升数学与代码推理能力},
author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint},
year={2025}
}