Skywork o1 Open-Llama-3.1-8B开源对话模型 - 增强推理能力，提供优质问答服务

首页

Skywork O1 Open Llama 3.1 8B

由 Skywork 开发

Skywork o1 Open-Llama-3.1-8B是基于Llama-3.1-8B训练的强力对话模型，通过'慢思考'推理风格数据显著增强推理能力。

大型语言模型

Safetensors

开源协议:其他 #慢思考推理 #数学编程增强 #渐进式奖励训练

下载量 583

发布时间 : 11/26/2024

模型简介

具备类o1慢思考推理能力的先进模型，专长解决数学、编程和逻辑推理等复杂问题。

模型特点

慢思考推理能力

通过专有训练方法实现深度思考、规划和反思能力

三阶段训练方案

包含反思式推理训练、推理能力强化学习和推理规划

Q*在线推理算法

首次实现并公开发布的先进推理算法，优化推理路径

模型能力

复杂问题求解

数学推理

逻辑推理

编程问题解答

伦理决策分析

自我验证

使用案例

教育

数学问题解答

解决各类数学应用题和方程式

在GSM8K等数学测试集上表现优异

编程

算法问题解决

分析并解决编程算法问题

在HumanEval等编程测试中表现突出

逻辑分析

逻辑推理题

解答各类逻辑关系和类比推理问题

展示出优秀的逻辑分析能力

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由昆仑万维的Skywork团队开发，该系列模型融入了类似o1的慢思考与推理能力，能有效解决复杂的推理问题，在数学、逻辑等多领域展现出卓越性能。

🚀 快速开始

若要使用Skywork-o1-Open-Llama3.1-8B进行推理，可参考以下代码：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

system_prompt = """You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. After completing your thoughts, you then provide a detailed explanation of the solution process in your response."""

# An Example Case
problem = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"

user_message = problem

conversation = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user", 
        "content": user_message
    }
]

model_name = "Skywork-o1-Open-Llama3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True, 
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

generation = model.generate(
    input_ids=input_ids,
    max_new_tokens=2048,
    do_sample=False,
    pad_token_id=128009,
    temperature=0)

completion = tokenizer.decode(
    generation[0][len(input_ids[0]):], 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True)

print(completion)

✨ 主要特性

增强的模型思考与规划能力：能够对复杂问题进行深入思考和规划，找到最优解决方案。
先进的自我反思与自我验证能力：可对推理过程和结果进行自我反思与验证，确保结果的准确性。
出色的推理能力：相较于以往的大模型，Skywork o1 Open系列能够出色应对各种推理挑战，包括常识、逻辑、数学、伦理决策和逻辑陷阱等问题。

📚 详细文档

模型介绍

Skywork o1 Open模型系列包含三款先进模型：

Skywork o1 Open-Llama-3.1-8B：基于Llama-3.1-8B训练的强大聊天模型，通过“o1风格”数据显著提升了推理能力。
Skywork o1 Open-PRM-Qwen-2.5-1.5B：专门设计的模型，通过增量过程奖励提升推理能力，适合小规模复杂问题的解决。
Skywork o1 Open-PRM-Qwen-2.5-7B：在1.5B模型的基础上进行扩展，能够处理更具挑战性的推理任务。

训练方法

Skywork o1 Open系列的卓越认知能力通过三阶段训练方案实现：

反思推理训练：利用专有多智能体系统为长思考任务生成高质量、多样化的数据，随后进行持续预训练和监督微调。
推理能力强化学习：引入Skywork o1过程奖励模型（PRM），专门用于增强逐步推理能力。实验证实，Skywork-PRM结合专有推理强化算法，能有效捕捉中间推理步骤对最终结果的影响。
推理规划：部署天宫专有的Q在线推理算法，结合基于模型的思考方式，搜索最优推理路径。这是Q算法的首次实现和公开发布，显著提升了模型的在线推理能力。

模型亮点

Skywork o1 Open 8B：该模型在各种数学和编码基准测试中表现出显著改进，将Llama-3.1-8B的性能提升至同类领先水平，超越了先前的SOTA模型（类似规模）Qwen-2.5-7B instruct。
Skywork o1 Open PRM：推出两款用于推理任务的先进过程奖励模型（PRM）：
- Skywork o1 Open-PRM-Qwen2.5-1.5B：达到8B模型的性能，可与RLHFlow的Llama3.1-8B-PRM-Deepseek-Data和OpenR的Math-psa-7B等先进模型竞争。
- Skywork o1 Open-PRM-Qwen2.5-7B：在大多数基准测试中与Qwen2.5-Math-RM-72B等更大规模模型相匹配或超越，为AI推理设定了新标准。

推理代码可在此处获取。

💻 使用示例

基础用法

以下是Skywork-o1-Open-Llama3.1-8B在不同类型问题上的推理示例：

中文数学问题

Input: 一项工程，山东队独做需15天完成，江苏队独做需10天完成。两队合作，几天可以完成？

Output: 好的,让我们来解决这个有趣的工程问题。我会用轻松的语气,一步步地分析这个问题。

首先,让我们理解一下题目给出的信息:
1. 山东队独做需要15天完成工程
2. 江苏队独做需要10天完成工程

现在,我们需要找出两队合作时的完成时间。让我们从每队的工作效率开始分析:

1. 山东队的效率:
   15天完成1个工程,所以每天完成的工程量是 1/15

2. 江苏队的效率:
   10天完成1个工程,所以每天完成的工程量是 1/10

稍等,让我们再检查一下这个计算...没错,这个计算是正确的。

现在,两队合作时,每天完成的工程量是两队效率的和:
1/15 + 1/10 = 2/30 + 3/30 = 5/30 = 1/6

这意味着每天两队合作完成的工程量是1/6。

接下来,我们需要计算完成1个工程需要多少天。我们可以用以下等式:
(1/6) × x = 1

解这个等式:
x = 1 ÷ (1/6) = 6

所以,两队合作完成工程需要6天。

让我们再验证一下:
- 6天 × (1/6) = 1
- 这确实等于1,说明我们的计算是正确的。

因此,最终答案是:两队合作完成工程需要6天。

英文数学问题

Input: Determine the value of $y$ in its simplest form, where $y = \frac{p + q\sqrt{r}}{s}$, given that $\frac{7y}{5} - 2 = \frac{4}{y}$, and $p, q, r, s$ are integers. What is $\frac{pqr}{s}$?

Output: To determine the value of \( y \) in the form \( y = \frac{p + q\sqrt{r}}{s} \), we start with the given equation:

\[
\frac{7y}{5} - 2 = \frac{4}{y}
\]

First, we eliminate the denominators by multiplying both sides of the equation by \( 5y \):

\[
7y^2 - 10y = 20
\]

Next, we rearrange the equation to set it to zero:

\[
7y^2 - 10y - 20 = 0
\]

We solve this quadratic equation using the quadratic formula \( y = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \), where \( a = 7 \), \( b = -10 \), and \( c = -20 \). Plugging in these values, we get:

\[
y = \frac{-(-10) \pm \sqrt{(-10)^2 - 4 \cdot 7 \cdot (-20)}}{2 \cdot 7} = \frac{10 \pm \sqrt{100 + 560}}{14} = \frac{10 \pm \sqrt{660}}{14}
\]

We simplify \( \sqrt{660} \) as follows:

\[
\sqrt{660} = \sqrt{4 \cdot 165} = 2\sqrt{165}
\]

Substituting this back into the quadratic formula, we have:

\[
y = \frac{10 \pm 2\sqrt{165}}{14} = \frac{5 \pm \sqrt{165}}{7}
\]

Since we are asked for the value of \( y \) in the form \( \frac{p + q\sqrt{r}}{s} \), we can choose either the positive or negative root. Let's take the positive root for simplicity:

\[
y = \frac{5 + \sqrt{165}}{7}
\]

Here, \( p = 5 \), \( q = 1 \), \( r = 165 \), and \( s = 7 \). We need to find the value of \( \frac{pqr}{s} \):

\[
\frac{pqr}{s} = \frac{5 \cdot 1 \cdot 165}{7} = \frac{825}{7}
\]

Thus, the final answer is:

\[
\boxed{\frac{825}{7}}
\]

中文逻辑问题

Input: 火药∶鞭炮∶二踢脚 可以对应下列选项中的：A.乌铁∶刀具∶三棱刀\nB.红砖∶建筑∶四合院\nC.清水∶白酒∶五粮液\nD.杉木∶乐器∶六弦琴

Output: 好的，让我们一起来解决这个逻辑关系题。我们需要找出与"火药∶鞭炮∶二踢脚"最相似的关系。

首先，让我们分析给定的关系：
火药 → 鞭炮 → 二踢脚

1. 火药是制作鞭炮的原料。
2. 鞭炮是由火药制成的。
3. 二踢脚是鞭炮的一种。

现在，让我们逐个检查选项：

A. 乌铁∶刀具∶三棱刀
   乌铁是制作刀具的原料，刀具是由乌铁制成的，三棱刀是刀具的一种。
   这个关系看起来很相似！

B. 红砖∶建筑∶四合院
   红砖是建筑的一种材料，但不是所有建筑都用红砖。
   四合院是建筑的一种，但不是所有建筑都叫四合院。
   这个关系不太匹配。

C. 清水∶白酒∶五粮液
   清水是制作白酒的原料，白酒是由清水制成的。
   但是五粮液是白酒的一种，而不是白酒的原料。
   这个关系不太匹配。

D. 杉木∶乐器∶六弦琴
   杉木是制作乐器的原料，乐器是由杉木制成的。
   但是六弦琴是乐器的一种，而不是乐器的原料。
   这个关系也不太匹配。

稍等，让我再仔细思考一下...

在重新审视这些选项后，我更加确信A选项是最佳答案。让我解释一下为什么：

1. 乌铁作为原料，刀具作为产品，三棱刀作为具体产品的关系，与火药、鞭炮、二踢脚的关系最为相似。
2. 每个选项都遵循了"原料 → 产品 → 具体产品"的模式。
3. 其他选项要么缺少了原料和产品的关系（如B和C），要么将产品和具体产品的关系颠倒了（如D）。

因此，最终答案是：A. 乌铁∶刀具∶三棱刀。

这个选项最准确地反映了原始关系中的逻辑结构。

英文逻辑问题

Input: How many `r`s  in strawberry?

Ouput: To determine how many `r`s are in the word "strawberry," we will follow a step-by-step approach:

1. **Identify the word**: The word we need to analyze is "strawberry."
2. **List each letter**: Write down each letter in the word to ensure we don't miss any.
   - s
   - t
   - r
   - a
   - w
   - b
   - e
   - r
   - r
   - y
3. **Count the occurrences of `r`**: Now, we count how many times the letter `r` appears in the list.
   - The first `r` is in the third position.
   - The second `r` is in the seventh position.
   - The third `r` is in the ninth position.
4. **Verify the count**: Double-check that we have counted all the `r`s correctly and haven't missed any or counted any other letters as `r`.

After following these steps, we find that there are three `r`s in the word "strawberry."

Therefore, the number of `r`s in strawberry is \(\boxed{3}\).

代码问题

Input: Given an array of integers `nums` with a length of `n` and an integer `k`, please determine if there exist two adjacent subarrays of length `k` that are strictly increasing. Specifically, you need to check if there are two subarrays starting from indices `a` and `b` (where `a < b`) that meet the following conditions:
- Both subarrays `nums[a..a + k - 1]` and `nums[b..b + k - 1]` are strictly increasing.
- These two subarrays must be adjacent, meaning `b = a + k`.
If such two subarrays can be found, return `true`; otherwise, return `false`.
A subarray is a contiguous sequence of elements within an array.

Example 1:
Input: `nums = [2,5,7,8,9,2,3,4,3,1]`, `k = 3`
Output: `true`
Explanation:
The subarray starting from index 2 is `[7,8,9]`, which is strictly increasing. The subarray starting from index 5 is `[2,3,4]`, which is also strictly increasing. The two subarrays are adjacent, so the result is `true`.

Example 2:
Input: `nums = [1,2,3,4,4,4,4,5,6,7]`, `k = 5`
Output: `false`

Hints:
- `2 <= nums.length <= 1000`
- `1 <= 2 * k <= nums.length`
- `-1000 <= nums[i] <= 1000`

Please output the answer in the following format:

Output: ### Planning Phase

#### Key Steps:
1. **Iterate through the array**: Start from the first element and check every possible starting point for a subarray of length `k`.
2. **Check for strictly increasing subarray**: For each starting point, check if the subarray of length `k` is strictly increasing.
3. **Check the next subarray**: If the current subarray is strictly increasing, check if the next subarray of length `k` starting from the next element is also strictly increasing.
4. **Return the result**: If both subarrays are strictly increasing and adjacent, return `true`. If no such pair is found after checking all possible starting points, return `false`.

#### Important Considerations:
- **Edge Cases**: 
  - Arrays with fewer than `2*k` elements.
  - Arrays where no subarray of length `k` is strictly increasing.
  - Arrays where multiple subarrays of length `k` are strictly increasing, but none are adjacent.
- **Pitfalls**:
  - Ensure that the subarrays are strictly increasing, not just non-decreasing.
  - Handle cases where the array length is exactly `2*k`.

### Code Solution


class Solution(object):
    def hasincreasingSubarrays(self, nums, k):
        # Iterate through the array, stopping at the point where the next subarray would go out of bounds
        for i in range(len(nums) - k + 1):
            # Check if the current subarray is strictly increasing
            is_increasing = True
            for j in range(1, k):
                if nums[i + j] <= nums[i + j - 1]:
                    is_increasing = False
                    break
            
            # If the current subarray is strictly increasing, check the next subarray
            if is_increasing:
                # Check if the next subarray is strictly increasing
                for j in range(1, k):
                    if nums[i + k + j] <= nums[i + k + j - 1]:
                        break
                else:
                    # If the next subarray is also strictly increasing, return True
                    return True
        
        # If no such pair of subarrays is found, return False
        return False

### Solution Explanation

#### Detailed Explanation:
1. **Iterate through the array**:
   - We use a loop to iterate through the array from the first element to the point where the next subarray of length `k` would go out of bounds (`len(nums) - k + 1`).

2. **Check for strictly increasing subarray**:
   - For each starting index `i`, we check if the subarray `nums[i:i+k]` is strictly increasing.
   - We use a nested loop to compare each element with the previous one. If any element is not greater than the previous one, we set `is_increasing` to `False` and break out of the loop.

3. **Check the next subarray**:
   - If the current subarray is strictly increasing, we then check the next subarray starting from `i + k`.
   - We use another nested loop to check if `nums[i+k:i+2*k]` is strictly increasing.
   - If the next subarray is also strictly increasing, we return `True`.

4. **Return the result**:
   - If we complete the loop without finding any such pair of subarrays, we return `False`.

#### Clarity and Breakdown:
- **Outer Loop**: The outer loop iterates through each possible starting point for a subarray of length `k`.
- **Inner Loop for Current Subarray**: The inner loop checks if the current subarray is strictly increasing.
- **Inner Loop for Next Subarray**: If the current subarray is strictly increasing, the inner loop checks if the next subarray is also strictly increasing.
- **Return True/False**: If both subarrays are strictly increasing and adjacent, we return `True`. If no such pair is found, we return `False`.

This approach ensures that we efficiently check all possible pairs of adjacent strictly increasing subarrays of length `k` in the given array.

🔧 技术细节

Skywork o1 Open系列的卓越认知能力通过三阶段训练方案实现：

反思推理训练：利用专有多智能体系统为长思考任务生成高质量、多样化的数据，随后进行持续预训练和监督微调。
推理能力强化学习：引入Skywork o1过程奖励模型（PRM），专门用于增强逐步推理能力。实验证实，Skywork-PRM结合专有推理强化算法，能有效捕捉中间推理步骤对最终结果的影响。
推理规划：部署天宫专有的Q在线推理算法，结合基于模型的思考方式，搜索最优推理路径。这是Q算法的首次实现和公开发布，显著提升了模型的在线推理能力。

📄 许可证

Skywork模型的社区使用需要遵循Skywork社区许可证。Skywork模型支持商业使用。如果您计划将Skywork模型或其衍生产品用于商业目的，则必须遵守Skywork社区许可证中的条款和条件。

📞 联系我们

如果您有任何问题，请随时通过{jujie.he, liang.zhao, liang.zeng, tianwen.wei}@kunlun-inc.com与我们联系。

⚠️ 免责声明

我们在此声明，不得将Skywork模型用于任何危害国家或社会安全的活动或从事非法行为。此外，我们要求用户在未进行适当的安全审查和记录的情况下，不要将Skywork模型部署到互联网服务中。我们希望所有用户都能遵守这一原则，以确保技术进步在规范和合法的环境中进行。

我们已尽最大努力确保模型训练过程中使用的数据符合规定。然而，尽管我们付出了巨大努力，但由于模型和数据的复杂性，仍可能存在不可预测的风险和问题。因此，如果因使用Skywork开源模型而出现任何问题，包括但不限于数据安全问题、舆论风险，或因模型被误导、滥用、传播或不当使用而产生的任何风险和问题，我们将不承担任何责任。

📖 引用

如果您认为我们的工作有帮助，请使用以下BibTeX条目引用我们：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}