Thinkless-1.5B-RL-DeepScaleR开源大语言模型 - 自适应推理模式降低计算成本

首页

Thinkless 1.5B RL DeepScaleR

由 Vinnnf 开发

Thinkless是一个通过强化学习训练的大语言模型，能够自适应选择简答或长链推理模式，显著降低推理计算成本。

大型语言模型

Transformers

开源协议:Apache-2.0 #自适应推理 #计算效率优化 #数学推理

下载量 197

发布时间 : 5/16/2025

模型简介

Thinkless框架通过解耦分组相对策略优化算法（DeGRPO）训练大语言模型，使其能够根据问题复杂度自适应选择简答（<short>标记）或长链推理（<think>标记）模式。

模型特点

自适应推理模式

模型能根据问题复杂度自动选择简答或长链推理模式，减少50%-90%的长链推理使用。

解耦分组相对策略优化

采用DeGRPO算法将混合推理目标分解为控制标记损失和响应损失，避免原始GRPO的坍缩问题。

高效推理

在保持准确性的同时显著降低推理计算成本，适用于需要快速响应的应用场景。

模型能力

数学推理

问题解答

自适应推理模式选择

使用案例

教育

数学问题解答

解答从简单到复杂的数学问题，根据问题难度自动选择简答或详细推理模式。

在Minerva Algebra、MATH-500和GSM8K等基准测试中表现良好

智能助手

高效问答系统

构建能够根据问题复杂度自适应调整回答长度的智能问答系统。

减少不必要的长链推理，提高响应速度

🚀 Thinkless：大语言模型学会何时思考

Thinkless是一个可学习的框架，它使大语言模型能够根据任务复杂度和模型自身能力，自适应地在短形式和长形式推理之间进行选择。该框架基于强化学习范式进行训练，能显著降低推理语言模型的计算成本。

🚀 快速开始

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

✨ 主要特性

我们提出了Thinkless框架，它能让大语言模型根据任务复杂度和自身能力，自适应地选择短形式或长形式推理。该框架基于强化学习范式进行训练，使用两个控制令牌：<short>用于简洁响应，<think>用于详细推理。核心是解耦组相对策略优化（DeGRPO）算法，它将混合推理的学习目标分解为控制令牌损失和响应损失，实现了对每个目标贡献的细粒度控制，稳定了训练过程，有效防止了普通GRPO中出现的崩溃问题。在多个基准测试中，Thinkless能够将长链思考的使用减少50% - 90%，显著降低了推理语言模型的计算成本。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

案例分析

简单问题对比

用户提问：

1+1=?

Thinkless - 1.5B回答：

<short>
To solve the problem \(1 + 1\), we will follow a step-by-step approach:

1. **Identify the numbers involved**: We have the numbers 1 and 1.
2. **Understand the operation**: The operation we need to perform is addition.
3. **Perform the addition**: When we add 1 and 1 together, we get 2.

So, the final answer is \(\boxed{2}\).

DeepSeek - R1回答：

Okay, so I need to figure out what 1 plus 1 equals. Let me start by thinking about basic addition. When you add two numbers together, you're combining their values. So if I have one apple and someone gives me another apple, how many apples do I have in total? That should be two apples.

Wait, let me make sure I'm not missing something here. In the simplest terms, addition is taking two quantities and putting them together. So 1 plus 1 means combining a single unit with another single unit. If I use my fingers, holding up one finger on my left hand and one on my right, then when I put them together, I have two fingers up. That seems right.

But maybe there's a trick to this question. Sometimes problems that look easy might have a different answer depending on the context. For example, in binary, 1 + 1 equals 10. But the question doesn't specify a number system, so I should assume it's in the decimal system, which is the default. So in decimal, 1 + 1 is definitely 2.

Another way to check is by using the number line. Starting at 1, if I move one unit to the right, I land on 2. That confirms it. Or using basic arithmetic rules: the sum of 1 and 1 is the next number in the sequence after 1, which is 2.

I guess there's no ambiguity here. Unless the question is trying to be philosophical or something, but mathematically, it's straightforward. So the answer should be 2.

The sum of 1 and 1 is calculated as follows:

1 + 1 = 2

In decimal arithmetic, adding the numbers 1 and 1 results in the number 2.

复杂问题示例

用户提问：

What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?

Thinkless - 1.5B回答：
Note: The long reasoning process is omitted.

<think>
Okay, let's try to figure out the smallest positive perfect cube that can be written as the sum of three consecutive integers. Hmm, let's start by understanding the problem step by step.

First, a perfect cube is a number raised to the power of 3. So, we need to find the smallest positive integer \( n \) such that \( n = a + (a+1) + (a+2) \), where \( a \) is an integer. Then, \( n \) should also be a perfect cube. Let me write that down:

\( n = a + (a+1) + (a+2) \)

...

Next, 27:

Is 27 a perfect cube? Yes, \( 3^3 = 27 \). Let's check:

...
**Final Answer**
The smallest positive perfect cube that can be written as the sum of three consecutive integers is \boxed{27}.
</think>

To find the smallest positive perfect cube that can be written as the sum of three consecutive integers, we start by expressing the sum of three consecutive integers. Let the three consecutive integers be \( a \), \( a+1 \), and \( a+2 \). Their sum is:

\[
a + (a+1) + (a+2) = 3a + 3
\]

This sum must be a perfect cube. Let \( n \) be the perfect cube, so we have:

\[
n = 3(a + 1)
\]

Thus, \( n \) must be a multiple of 3. The smallest perfect cubes are 1, 8, 27, 64, 125, etc. We check each of these to see if they can be written as \( 3(a + 1) \):

- For \( n = 1 \): \( 3(a + 1) = 1 \) gives \( a = -\frac{2}{3} \), not an integer.
- For \( n = 8 \): \( 3(a + 1) = 8 \) gives \( a = \frac{5}{3} \), not an integer.
- For \( n = 27 \): \( 3(a + 1) = 27 \) gives \( a = 8 \), which is an integer. The three consecutive integers are 8, 9, and 10, and their sum is \( 8 + 9 + 10 = 27 \).

Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integer values for \( a \). Therefore, the smallest positive perfect cube that can be written as the sum of three consecutive integers is:

\[
\boxed{27}
\]

📚 详细文档

模型信息

属性	详情
模型类型	文本生成
训练数据	agentica-org/DeepScaleR - Preview - Dataset
基础模型	Vinnnf/Thinkless - 1.5B - Warmup
库名称	transformers

内容	链接
论文链接	ArXiv
GitHub仓库	VainF/Thinkless
RL模型	Thinkless - 1.5B - RL - DeepScaleR
预热模型	Thinkless - 1.5B - Warmup
预热数据	Hybrid - OpenThoughts2 - 1M - 1.5B
RL数据	agentica-org/DeepScaleR - Preview - Dataset

模型流程

📄 许可证

本项目采用Apache - 2.0许可证。

📚 引用

如果您觉得本工作有帮助，请引用：

@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}