math-shepherd-mistral-7b-prm开源模型 - 评估数学解题步骤正确性超实用

首页

Math Shepherd Mistral 7b Prm

由 peiyi9979 开发

基于Mistral-7B微调的过程奖励模型，用于评估数学问题解答步骤的正确性

大型语言模型

Transformers

#数学推理评估 #步骤逻辑验证 #过程奖励模型

下载量 3,536

发布时间 : 1/3/2024

模型简介

该模型是Math-Shepherd项目的组成部分，专门设计用于对数学问题解答过程中的每个步骤进行评分，通过特殊标记识别步骤并输出逻辑值判断其正确性。

模型特点

步骤级评估

通过特殊标记'ки'识别解题步骤，对每个数学推导步骤进行独立评分

高精度判断

示例显示对正确和错误步骤能输出显著不同的置信度分数（如0.9983 vs 0.0240）

轻量级微调

基于高效能的Mistral-7B模型进行针对性微调，保持原模型优势的同时适配特定任务

模型能力

数学步骤正确性判断

多步骤问题分解评估

数值计算验证

逻辑推理验证

使用案例

教育科技

自动作业批改

自动评估学生数学作业的解题过程而不仅是最终答案

可识别具体错误步骤，提供针对性反馈

智能辅导系统

在在线学习平台中实时验证解题步骤的正确性

帮助学生理解错误根源，改进解题方法

学术研究

数学推理研究

分析大型语言模型在数学推理中的典型错误模式

为改进模型数学能力提供数据支持

🚀 数学奖励模型（Mistral-7B）

本项目是用于 Math-Shepherd 的过程奖励模型（mistral-7b）。该模型可根据输入的问题及逐步解决方案，输出相应的对数几率（logits），通过后处理可得到每一步的得分。

🚀 快速开始

输入格式

输入为问题和带有特殊步骤标签 ки 的逐步解决方案，例如：

Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes .... ? Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки

输出格式

输出为对数几率（logits），你需要对其进行后处理以得到每一步的得分。

💻 使用示例

基础用法

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import torch

good_token = '+'
bad_token = '-'
step_tag = 'ки'

tokenizer = AutoTokenizer.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm')
candidate_tokens = tokenizer.encode(f"{good_token} {bad_token}")[1:] # [648, 387]
step_tag_id = tokenizer.encode(f"{step_tag}")[-1] # 12902
model = AutoModelForCausalLM.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm').eval()

question = """Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"""
output1 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки""" # 18 is right
output2 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $17 every day at the farmers' market. The answer is: 17 ки""" # 17 is wrong

for output in [output1, output2]:
    input_for_prm = f"{question} {output}"
    input_id = torch.tensor([tokenizer.encode(input_for_prm)])

    with torch.no_grad():
        logits = model(input_id).logits[:,:,candidate_tokens]
        scores = logits.softmax(dim=-1)[:,:,0] 
        step_scores = scores[input_id == step_tag_id]
        print(step_scores)
        
# tensor([0.9955, 0.9958, 0.9983, 0.9957])
# tensor([0.9955, 0.9958, 0.9983, 0.0240])