Skywork-o1-Open-PRM-Qwen-2.5-1.5B开源模型 - 提升小规模复杂问题求解能力

首页

Skywork O1 Open PRM Qwen 2.5 1.5B

由 Skywork 开发

Skywork o1 Open-PRM-Qwen-2.5-1.5B是基于Qwen2.5-Math-1.5B-Instruct训练的增量过程奖励模型，专为增强小规模复杂问题求解能力而设计。

大型语言模型

PyTorch

开源协议:其他 #数学推理增强 #过程奖励模型 #小规模高效推理

下载量 4,368

发布时间 : 11/26/2024

模型简介

该模型是Skywork o1开放模型系列的一部分，专注于通过增量过程奖励增强推理能力，特别适合数学和代码相关的推理任务。

模型特点

增量过程奖励

通过PRM机制增强推理能力，特别适合复杂问题求解

小规模高效

1.5B参数规模下实现优秀的推理性能

数学推理能力

在GSM8K、MATH等数学数据集上表现优异

模型能力

数学问题求解

代码生成

复杂推理

多步问题解答

使用案例

教育

数学问题解答

帮助学生解决数学问题，提供详细解题步骤

在GSM8K数据集上达到94.5%准确率

编程

代码生成与优化

根据问题描述生成代码解决方案

在HumanEval等代码数据集上表现良好

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由昆仑万维Skywork团队开发，该系列模型创新性地融入了类似o1的慢思考与推理能力，为AI领域带来了全新的解决方案，显著提升了模型在推理任务中的表现。

🤗 Hugging Face • 🤖 ModelScope

🚀 快速开始

若你想了解PRM的详细使用说明，请参考 Skywork-o1-PRM-Inference仓库。

Hugging Face推理

克隆Skywork PRM推理仓库

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

运行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服务器推理

安装vllm并安装vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

启动vllm服务器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

向服务器发起推理请求

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 数据预处理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

✨ 主要特性

创新推理能力：融入类似o1的慢思考与推理能力，在输出中展现出思考、规划和反思能力，显著提升推理技能。
多模型选择：包含三款先进模型，分别基于不同基础模型训练，满足不同规模和复杂度的推理任务需求。
性能卓越：在标准基准测试中，推理能力有显著提升，推动基础模型向推理任务的先进水平迈进。

📦 安装指南

Hugging Face推理

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

vllm服务器推理

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

📚 详细文档

模型介绍

Skywork o1 Open模型系列由昆仑万维Skywork团队开发，包括三款先进模型：

Skywork o1 Open-Llama-3.1-8B：基于Llama-3.1-8B训练的强大聊天模型，通过“o1风格”数据显著增强推理能力。
Skywork o1 Open-PRM-Qwen-2.5-1.5B：专为通过增量过程奖励提升推理能力而设计的专业模型，适用于小规模复杂问题解决。
Skywork o1 Open-PRM-Qwen-2.5-7B：在1.5B模型基础上扩展，能够处理更具挑战性的推理任务。

模型信息

Skywork-o1-Open-PRM系列基于 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 训练。

PRM评估

评估设置

数学评估：使用 Qwen2.5-Math 的评估脚本并遵循其配置，选择的数据集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中，GaoKao 和 CN-Middle School 24 为中文数据集，其余为英文数据集。OlympiadBench、AIME-24 和 AMC-23 为竞赛级数据集。
代码评估：采用 Qwen2.5-Coder 的评估脚本并保持相同配置，选择的数据集包括 HumanEval、MBPP 和 LiveCodeBench，其中 LiveCodeBench 采用 2024.01 - 2024 - 11 版本。由于先前版本在测试和代码清理方面存在问题，使用 evalplus 的最新版本（0.3.1）。

评估基础模型

评估RM在三款基础模型上的性能：Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。进行数据采样以验证RM在不同模型上的性能，数学问题采样温度设置为 0.7，代码相关任务采样温度设置为 1.0。

对比RM

Qwen2.5-Math-RM-72B：由Qwen团队提供的开源ORM。
OpenR-MATH-psa-PRM-7B：来自OpenR项目的开源PRM。
RLHFlow-Deepseek-Data-PRM-8B：来自RLHFlow项目的开源PRM。

评估指标

Greedy Sampling Pass@1：使用贪心采样生成答案。
Majority Voting@64：随机采样64个答案。
Best-of-N@64：根据奖励模型（RM）提供的输出值对64个答案进行排序。ORM和PRM的加权方法不同：
- ORM：仅使用最后一步的奖励。
- PRM：使用所有步骤的平均奖励进行加权。

评估结果

数学评估

基础模型	奖励模型	方法	GSM8K	MATH	GaoKao	CN-Middle School 24	OlympiadBench	AIME-24	AMC-23	平均
Skywork-o1-Open-8B	N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
Skywork-o1-Open-8B	N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
Skywork-o1-Open-8B	OpenR-MATH-psa-PRM-7B	Best-of-N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
Skywork-o1-Open-8B	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Skywork-o1-Open-8B	Qwen2.5-Math-RM-72B	Best-of-N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-7B	Best-of-N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3
Qwen2.5-7B-Instruct	N/A	Greedy Sampling Pass@1	91.9	75.2	55.6	75.2	39.1	13.3	45.0	56.5
Qwen2.5-7B-Instruct	N/A	Majority Voting@64	93.5	78.4	55.3	78.2	40.1	13.3	62.5	60.2
Qwen2.5-7B-Instruct	OpenR-MATH-psa-PRM-7B	Best-of-N@64	93.9	77.9	52.4	73.3	40.7	10.0	55.0	57.6
Qwen2.5-7B-Instruct	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.1	78.1	53.2	75.2	39.1	16.7	55.0	58.8
Qwen2.5-7B-Instruct	Qwen2.5-Math-RM-72B	Best-of-N@64	94.8	82.4	65.2	80.2	45.0	13.3	62.5	63.4
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	93.3	79.8	56.1	74.3	43.9	10.0	67.5	60.7
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	94.9	81.9	56.3	75.2	44.9	13.3	65.0	61.6
Llama3.1-8B-Instruct	N/A	Greedy Sampling Pass@1	85.3	49.7	25.3	47.5	16.6	6.7	27.5	36.9
Llama3.1-8B-Instruct	N/A	Majority Voting@64	90.9	62.9	28.0	56.4	26.4	13.3	37.5	45.1
Llama3.1-8B-Instruct	OpenR-MATH-psa-PRM-7B	Best-of-N@64	91.8	59.4	24.7	47.5	23.0	13.3	35.0	42.1
Llama3.1-8B-Instruct	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	89.8	56.1	24.0	40.6	20.4	0.0	35.0	38.0
Llama3.1-8B-Instruct	Qwen2.5-Math-RM-72B	Best-of-N@64	94.9	72.5	44.9	65.3	34.4	23.3	60.0	56.5
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	91.7	65.6	26.8	49.5	27.0	16.7	60.0	48.2
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	94.0	69.8	32.0	56.4	29.9	16.7	52.5	50.2

代码评估

由于对比的PRM未在代码相关任务上进行训练，本节仅关注Skywork-o1-Open-PRM的性能。

基础模型	奖励模型	方法	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench-2024.01-2024-11
Skywork-o1-Open-8B	N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-7B	Best-of-N@64	81.2	68.5	81.1	74.4	31.3
Qwen2.5-7B-Instruct	N/A	Greedy Sampling Pass@1	81.7	69.3	84.8	78.0	25.3
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	84.9	72.5	83.5	78.0	30.7
Llama3.1-8B-Instruct	N/A	Greedy Sampling Pass@1	75.7	64.0	68.3	63.4	16.1
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	77.8	66.1	73.8	67.7	23.3

🔧 技术细节

评估脚本：数学评估使用 Qwen2.5-Math 的评估脚本，代码评估采用 Qwen2.5-Coder 的评估脚本。
数据采样：在不同基础模型上进行数据采样，数学问题采样温度设置为 0.7，代码相关任务采样温度设置为 1.0。
奖励模型加权：ORM仅使用最后一步的奖励，PRM使用所有步骤的平均奖励进行加权。

📄 许可证

Skywork模型的社区使用需要遵循Skywork社区许可证。该模型支持商业使用，若计划将Skywork模型或其衍生产品用于商业目的，必须遵守Skywork社区许可证中的条款和条件。

免责声明

我们声明，Skywork模型不得用于任何危害国家或社会安全的活动，或从事非法行为。此外，我们要求用户在未进行适当的安全审查和记录的情况下，不要将Skywork模型部署到互联网服务中。我们希望所有用户遵守这一原则，确保技术在规范和合法的环境中发展。

我们已尽力确保模型训练过程中使用的数据合规，但由于模型和数据的复杂性，仍可能存在不可预测的风险和问题。因此，若因使用Skywork开源模型而出现任何问题，包括但不限于数据安全问题、舆论风险，或因模型被误导、滥用、传播或不当使用而产生的任何风险和问题，我们将不承担任何责任。

引用

如果您认为我们的工作有帮助，请使用以下BibTeX条目引用我们：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}