AceMath-72B-Instruct开源数学推理模型 - 免费部署解决英文数学难题

首页

Acemath 72B Instruct

由 nvidia 开发

AceMath 是一系列专为数学推理设计的前沿模型，基于 Qwen 改进，擅长使用思维链（CoT）推理解决英文数学问题。

大型语言模型

Safetensors

英语#数学推理 #思维链推理 #奖励评估

下载量 3,141

发布时间 : 1/14/2025

模型简介

AceMath 系列模型专注于数学推理任务，包括指令模型和奖励模型，适用于解决复杂的数学问题和评估数学解决方案。

模型特点

强大的数学推理能力

AceMath 模型在多种数学推理基准测试中表现出色，大幅超过同类最佳模型。

专业的奖励模型

AceMath - RM 模型专门用于评估和评分数学解决方案，在推理基准测试中创下新纪录。

多阶段监督微调

采用多阶段监督微调（SFT）过程，先使用通用 SFT 数据，再使用特定数学 SFT 数据。

模型能力

数学问题解决

数学推理

数学解决方案评估

思维链（CoT）推理

使用案例

教育

数学竞赛问题解决

解决复杂的数学竞赛问题，如概率计算、组合数学等。

在数学推理基准测试中表现优异，超越 GPT-4o 和 Claude 3.5 Sonnet。

研究

数学解决方案评估

使用奖励模型对数学解决方案进行评估和评分。

在推理基准测试中创下 rm@8 准确率（8 选最佳）的新纪录。

🚀 AceMath：前沿数学推理模型

AceMath 是一系列专为数学推理设计的前沿模型。AceMath 家族的模型，包括 AceMath - 1.5B/7B/72B - Instruct 和 AceMath - 7B/72B - RM，基于 Qwen 进行了改进。AceMath - 1.5B/7B/72B - Instruct 模型擅长使用思维链（CoT）推理解决英文数学问题，而 AceMath - 7B/72B - RM 模型作为结果奖励模型，专门用于评估和评分数学解决方案。

🚀 快速开始

AceMath 模型可用于解决数学问题，若需支持其他任务，我们还发布了 AceInstruct - 1.5B/7B/72B 系列通用监督微调（SFT）模型，可处理代码、数学和通用知识任务。这些模型基于 Qwen2.5 - 1.5B/7B/72B - Base 构建。

✨ 主要特性

强大的数学推理能力：AceMath - 1.5B/7B/72B - Instruct 模型在多种数学推理基准测试中表现出色，如 AceMath - 7B - Instruct 在平均通过率@1 上大幅超过之前同类最佳的 Qwen2.5 - Math - 7B - Instruct（67.2 对比 62.9），接近 10 倍规模的 Qwen2.5 - Math - 72B - Instruct（67.2 对比 68.2）。AceMath - 72B - Instruct 更是超越了最先进的 Qwen2.5 - Math - 72B - Instruct（71.8 对比 68.2）、GPT - 4o（67.4）和 Claude 3.5 Sonnet（65.6）。
专业的奖励模型：AceMath - 7B/72B - RM 作为结果奖励模型，可对数学解决方案进行评估和评分，其 AceMath - 72B - RM 在推理基准测试中创下了 rm@8 准确率（8 选最佳）的新纪录。
多阶段监督微调：AceMath - 1.5B/7B/72B - Instruct 模型基于 Qwen2.5 - Math - 1.5B/7B/72B - Base 模型开发，采用多阶段监督微调（SFT）过程，先使用通用 SFT 数据，再使用特定数学 SFT 数据。

📦 安装指南

文档未提供安装步骤，暂不展示。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nvidia/AceMath-72B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 详细文档

所有资源

AceMath 指令模型

AceMath 奖励模型

评估与训练数据

通用指令模型

基准测试结果

AceMath Benchmark Results

我们将 AceMath 与领先的专有和开放访问数学模型进行了比较。在各种数学推理基准测试中，我们的 AceMath - 7B - Instruct 大幅优于之前同类最佳的 Qwen2.5 - Math - 7B - Instruct（平均通过率@1：67.2 对比 62.9），接近 10 倍规模的 Qwen2.5 - Math - 72B - Instruct（67.2 对比 68.2）。值得注意的是，我们的 AceMath - 72B - Instruct 大幅超越了最先进的 Qwen2.5 - Math - 72B - Instruct（71.8 对比 68.2）、GPT - 4o（67.4）和 Claude 3.5 Sonnet（65.6）。我们还报告了我们的奖励模型 AceMath - 72B - RM 实现的 rm@8 准确率（8 选最佳），这在这些推理基准测试中创下了新纪录。这不包括依赖大规模推理计算的 OpenAI 的 o1 模型。

联系方式

Zihan Liu (zihanl@nvidia.com) Yang Chen (yachen@nvidia.com) Wei Ping (wping@nvidia.com)

引用

如果您认为我们的工作有帮助，请引用我们：

@article{acemath2024,
  title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
  author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2024}
}