AceMath-7B-Instruct开源数学推理模型 - 免费部署解决英文数学问题

首页

Acemath 7B Instruct

由 nvidia 开发

AceMath-7B-Instruct是英伟达推出的专为数学推理设计的指导模型，基于改进版Qwen架构开发，擅长通过思维链(CoT)推理解决英文数学问题。

大型语言模型

Safetensors

英语#数学推理 #思维链优化 #多阶段微调

下载量 1,454

发布时间 : 1/13/2025

模型简介

AceMath系列模型专为数学推理设计，包含不同规模的指导模型和奖励模型。指导模型擅长通过思维链推理解决数学问题，奖励模型则专注于数学解答的评估与评分。

模型特点

数学专项优化

专为数学推理设计，通过多阶段监督微调流程提升数学问题解决能力。

思维链推理

擅长通过思维链(CoT)推理解决复杂的数学问题。

性能优异

7B版本在多项数学推理基准上显著超越此前最佳模型，性能接近10倍参数量的72B版本。

完整训练数据公开

公开全部训练数据以支持相关研究。

模型能力

数学问题求解

思维链推理

英文文本生成

使用案例

教育

数学问题解答

帮助学生理解和解决复杂的数学问题。

在多项数学推理基准上表现优异。

研究

数学推理研究

支持数学推理和思维链相关的研究。

公开的训练数据可用于进一步研究。

🚀 AceMath - 前沿数学推理模型

AceMath 是一系列专为数学推理设计的前沿模型。该系列模型在数学推理方面表现卓越，能够有效解决各类数学问题，为数学研究和应用提供强大支持。

🚀 快速开始

模型简介

AceMath 家族模型包括 AceMath - 1.5B/7B/72B - Instruct 和 AceMath - 7B/72B - RM，这些模型基于 Qwen 进行改进。其中，AceMath - 1.5B/7B/72B - Instruct 模型擅长使用思维链（CoT）推理解决英文数学问题，而 AceMath - 7B/72B - RM 模型作为结果奖励模型，专门用于评估和打分数学解决方案。

AceMath - 1.5B/7B/72B - Instruct 模型是在 Qwen2.5 - Math - 1.5B/7B/72B - Base 模型的基础上，通过多阶段监督微调（SFT）过程开发而成：先使用通用 SFT 数据，再使用特定数学 SFT 数据。我们将发布所有训练数据，以支持该领域的进一步研究。

我们仅建议使用 AceMath 模型解决数学问题。为支持其他任务，我们还发布了 AceInstruct - 1.5B/7B/72B，这是一系列旨在处理代码、数学和通用知识任务的通用 SFT 模型，它们基于 Qwen2.5 - 1.5B/7B/72B - Base 构建。

如需了解更多关于 AceMath 的信息，请访问我们的网站和论文。

✨ 主要特性

强大的数学推理能力

AceMath - 7B - Instruct 在各种数学推理基准测试中，大幅超越了之前同类最佳的 Qwen2.5 - Math - 7B - Instruct（平均通过率@1：67.2 对比 62.9），接近 10 倍大的 Qwen2.5 - Math - 72B - Instruct 的性能（67.2 对比 68.2）。值得注意的是，我们的 AceMath - 72B - Instruct 大幅超越了最先进的 Qwen2.5 - Math - 72B - Instruct（71.8 对比 68.2）、GPT - 4o（67.4）和 Claude 3.5 Sonnet（65.6）。

优秀的奖励模型

我们的奖励模型 AceMath - 72B - RM 实现的 rm@8 准确率（8 选最佳）在这些推理基准测试中创下了新纪录，不包括依赖大规模推理计算的 OpenAI 的 o1 模型。

📦 全部资源

AceMath 指令模型

[AceMath - 1.5B - Instruct](https://huggingface.co/nvidia/AceMath - 1.5B - Instruct)
[AceMath - 7B - Instruct](https://huggingface.co/nvidia/AceMath - 7B - Instruct)
[AceMath - 72B - Instruct](https://huggingface.co/nvidia/AceMath - 72B - Instruct)

AceMath 奖励模型

[AceMath - 7B - RM](https://huggingface.co/nvidia/AceMath - 7B - RM)
[AceMath - 72B - RM](https://huggingface.co/nvidia/AceMath - 72B - RM)

评估与训练数据

[AceMath - RewardBench](https://huggingface.co/datasets/nvidia/AceMath - RewardBench)
[AceMath - Instruct 训练数据](https://huggingface.co/datasets/nvidia/AceMath - Instruct - Training - Data)
[AceMath - RM 训练数据](https://huggingface.co/datasets/nvidia/AceMath - RM - Training - Data)

通用指令模型

[AceInstruct - 1.5B](https://huggingface.co/nvidia/AceInstruct - 1.5B)
[AceInstruct - 7B](https://huggingface.co/nvidia/AceInstruct - 7B)
[AceInstruct - 72B](https://huggingface.co/nvidia/AceInstruct - 72B)

📊 基准测试结果（AceMath - Instruct + AceMath - 72B - RM）

![AceMath 基准测试结果](acemath - pic.png)

在上表中，我们将 AceMath 与领先的专有和开放访问数学模型进行了比较。我们的 AceMath - 7B - Instruct 在各种数学推理基准测试中表现出色，大幅超越了之前同类最佳的 Qwen2.5 - Math - 7B - Instruct，接近 10 倍大的 Qwen2.5 - Math - 72B - Instruct 的性能。值得注意的是，我们的 AceMath - 72B - Instruct 大幅超越了最先进的 Qwen2.5 - Math - 72B - Instruct、GPT - 4o 和 Claude 3.5 Sonnet。我们还报告了我们的奖励模型 AceMath - 72B - RM 实现的 rm@8 准确率，在这些推理基准测试中创下了新纪录。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nvidia/AceMath-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📬 联系方式

Zihan Liu (zihanl@nvidia.com)
Yang Chen (yachen@nvidia.com)
Wei Ping (wping@nvidia.com)

📚 引用信息

如果您觉得我们的工作有帮助，请引用我们的论文：

@article{acemath2024,
  title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
  author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2024}
}

📄 许可证

AceMath 家族的所有模型仅用于非商业用途，需遵守 [OpenAI 数据使用条款](https://openai.com/policies/row - terms - of - use/)。我们将 AceMath 模型置于 [知识共享署名 - 非商业性使用 4.0 国际许可协议](https://spdx.org/licenses/CC - BY - NC - 4.0) 之下。