license: apache-2.0
datasets:
- open-thoughts/OpenThoughts2-1M
- Vinnnf/Hybrid-OpenThoughts2-1M-1.5B
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: text-generation
library_name: transformers
无思:大模型学会何时思考

简介
[!NOTE]
大模型能学会何时思考吗?
我们提出无思框架(Thinkless),这种可学习框架使大模型能根据任务复杂度和自身能力,自适应选择简短推理或长链推理。该框架采用强化学习范式训练,使用两个控制符:<short>触发简略回答,<think>触发详细推理。方法核心是解耦分组相对策略优化算法(DeGRPO),将混合推理的学习目标分解为:(1) 控制符损失——决定推理模式选择;(2) 响应损失——提升生成答案准确性。这种解耦结构能精细调控各目标贡献度,稳定训练过程,有效避免原始GRPO出现的崩溃现象。实验表明,在Minerva代数、MATH-500和GSM8K等基准测试中,无思框架能减少50%-90%的长链推理使用,显著降低推理语言模型的计算成本。
流程架构

快速开始
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Vinnnf/Thinkless-1.5B-Warmup"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
instruction = "请逐步推理,并将最终答案置于\\boxed{}中。"
prompt = f"{instruction}\n若7、2、$x$和10的算术平均数为9,求$x$的值?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
think_mode = True
if think_mode:
text = f"{text}<think>"
else:
text = f"{text}<short>"
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text+response)
print(f"\n思考模式: {think_mode}")
print(f"生成token数: {num_tokens}")
引用
若此工作对您有帮助,请引用:
@article{fang2025thinkless,
title={无思:大模型学会何时思考},
author={方功凡 and 马新寅 and 王新超},
journal={arXiv预印本 arXiv:2505.13379},
year={2025}
}