库名称:transformers
许可证:其他
基础模型:deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
标签:
- llama-factory
- full
- generated_from_trainer
模型索引:
- 名称:ReasonFlux-F1-7B
结果:[]
ReasonFlux:基于思维模板扩展的分层大语言模型推理
革命性的模板增强推理范式使32B模型在推理任务中超越o1-mini及DeepSeek-R1蒸馏模型。
任务/Pass@1 |
ReasonFlux-F1-32B |
ReasonFlux-Zero-32B |
R1-Distill-32B |
o1-mini |
LIMO-32B |
s1-32B |
MATH500 |
96.0 |
91.2 |
94.3 |
90.0 |
90.6 |
93.0 |
AIME 2024 |
76.7 |
56.7 |
72.6 |
56.7 |
50.0 |
56.7 |
AIME 2025 |
53.3 |
37.2 |
46.67 |
50.8 |
37.2 |
49.3 |
GPQA-Diamond |
67.2 |
61.2 |
62.1 |
60.0 |
65.2 |
59.6 |
ReasonFlux-F1-7B
ReasonFlux-F1-7B是我们通过ReasonFlux-Zero的模板增强推理轨迹微调出的SOTA级推理大语言模型。
评估
我们在AIME2024、AIME2025、MATH500和GPQA-Diamond等挑战性推理任务上展示了ReasonFlux-F1-32B的评估结果。为确保公平比较,所有模型的测试结果均基于ReasonFlux-F1的评估脚本生成。
模型 |
AIME2024@pass1 |
AIME2025@pass1 |
MATH500@pass1 |
GPQA@pass1 |
QwQ-32B-Preview |
46.7 |
37.2 |
90.6 |
65.2 |
LIMO-32B |
56.3 |
44.5 |
94.8 |
58.1 |
s1-32B |
56.7 |
49.3 |
93.0 |
59.6 |
OpenThinker-32B |
66.0 |
53.3 |
94.8 |
60.1 |
R1-Distill-32B |
70.0 |
46.7 |
92.0 |
59.6 |
ReasonFlux-Zero-32B |
56.7 |
37.2 |
91.2 |
61.2 |
ReasonFlux-F1-32B |
76.7 |
53.3 |
96.0 |
67.2 |
快速使用VLLM
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = 'Gen-Verse/ReasonFlux-F1-7B'
model = LLM(
model_id,
tensor_parallel_size=8,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
sampling_params = SamplingParams(
max_tokens=32768,
)
question = """设\(x, y\)和\(z\)为正实数,满足以下方程组:
\[
\begin{array}{c}
\sqrt{2x-xy} + \sqrt{2y-xy} = 1 \\
\sqrt{2y-yz} + \sqrt{2z-yz} = \sqrt{2} \\
\sqrt{2z-zx} + \sqrt{2x-zx} = \sqrt{3} .
\end{array}
\]
则\(\left[(1-x)(1-y)(1-z)\right]^{2}\)可表示为\(\frac{m}{n}\),其中\(m\)和\(n\)为互质的正整数。求\(m+n\)。"""
ds_prompt = "\n" + question + "\n"
output = model.generate(ds_prompt, sampling_params=sampling_params)
print(output[0].outputs[0].text)
引用
@article{yang2025reasonflux,
title={ReasonFlux:基于思维模板扩展的分层大语言模型推理},
author={杨凌 and 余兆晨 and 崔斌 and 王梦迪},
journal={arXiv预印本 arXiv:2502.06772},
year={2025}
}