360智脑3-7B-O1.5开源模型 - 免费部署支持复杂推理任务长思维链

首页

360zhinao3 7B O1.5

由 qihoo360 开发

360智脑3-7B-O1.5是奇虎360开源的长思维链模型，基于360智脑3-7B-Instruct微调，支持复杂推理任务。

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #多语言大模型 #长文本推理 #开源商用

下载量 35

发布时间 : 4/23/2025

模型简介

360智脑3系列模型是奇虎360开源的7B参数大语言模型，包含基础版、指令版和长思维链版。O1.5版本针对复杂推理任务优化，支持长思维链推理。

模型特点

长思维链推理

专门针对复杂推理任务优化，支持长思维链推理过程

多语言支持

支持中文和英文处理

开源商用

采用Apache 2.0许可证，支持免费商用

模型能力

文本生成

复杂推理

问答系统

数学计算

代码生成

使用案例

教育

数学问题解答

解决复杂的数学应用题

在AIME24测试中获得54.2分

科研

科学问题推理

处理需要多步推理的科学问题

在GPQA钻石级测试中获得40分

🚀 360Zhinao3 (360智脑)

360智脑3是奇虎360开源并升级的模型，具备多种能力且可免费商用。其在多个基准测试中表现优异，为自然语言处理等领域提供了强大支持。

🤗 HuggingFace | 💬 WeChat (微信)

欢迎访问360智脑的官方网站 https://ai.360.com 进行更多体验。

🚀 快速开始

使用🤗Transformers快速使用模型

以下是使用🤗Transformers快速使用360Zhinao3-7B、360Zhinao3-7B-Instruct和360Zhinao3-7B-O1.5的简单示例。

基础模型推理示例

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

指令模型推理示例

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

长思维链模型推理示例

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考过程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "现有一笼子，里面有鸡和兔子若干只，数一数，共有头14个，腿38条，求鸡和兔子各有多少只？"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))

✨ 主要特性

开源升级：奇虎360开源并升级了自研的7B参数模型360Zhinao3-7B，现已在Github开源社区360zhinao3发布，可免费用于商业用途。
性能提升：360Zhinao3-7B在360Zhinao2-7B的基础上使用700B高质量token进行持续预训练，模型性能的提升主要源于训练数据质量的提高。
多场景适用：提供了基础模型、指令模型和长思维链模型等多种类型，满足不同场景的需求。

📦 下载地址

大小	模型	BF16
7B	360Zhinao3-7B	🤗
7B	360Zhinao3-7B-Instruct	🤗
7B	360Zhinao3-7B-O1.5	🤗

📚 详细文档

模型评估

基础模型

使用开源工具opencompass对模型进行了多维度评估，该模型在小于10B参数的模型中基准平均得分排名第一，在同规模模型中具有竞争力。

类型	数据集	语言	glm4 - 9b	Qwen2.5 - 7B	internlm2.5 - 7b	Yi1.5 - 9B	gemma2 - 9b	Llama3.1 - 8B	360Zhinao2 - 7B	360Zhinao3 - 7B
考试	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
考试	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
考试	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
考试	ARC - c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
考试	ARC - e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
语言	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
语言	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
知识	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
知识	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
理解	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
理解	race - middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
理解	race - high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
理解	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
理解	eprstmt - dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
理解	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
推理	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
推理	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
推理	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
代码	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
代码	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
数学	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
数学	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
总体	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
总体	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

指令模型

在IFEval、MT - bench和CF - Bench三个流行评估中对360Zhinao3 - 7B - Instruct模型进行了评估和比较。MT - bench和CFBench在同级别开源模型中均排名第一，具有较强的竞争力。在IFEval（严格提示）中，仅次于glm4 - 9b，在7B规模中得分最高。

模型	MT - bench	IFEval(严格提示)	CFBench(CSR,ISR,PSR)
Qwen2.5 - 7B - Instruct	8.07	0.556	0.81	0.46	0.57
Yi - 9B - 16k - Chat	7.44	0.455	0.75	0.4	0.52
GLM4 - 9B - Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5 - 7B - Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2 - 7B - Chat - 4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3 - 7B - Instruct	8.17	0.626	0.83	0.52	0.64

长思维链模型

使用之前开源的智脑[Light - R1](https://github.com/Qihoo360/Light - R1)方法对360Zhinao3 - 7B - Instruct的长思维链进行了继续微调，以及RFT和GRPO。与最新的OpenThinker2 - 7B相比仍有一定差距，但超越了所有基于通用Qwen2.5 - 7B - Instruct的先前模型。

模型	日期	基础模型	AIME24	AIME25	GPQA Diamond
OpenThinker2 - 7B	25.4.3	Qwen2.5 - 7B - Instruct	50	33.3	49.3
OpenThinker - 7B	25.1.28	Qwen2.5 - 7B - Instruct	31.3	23.3	42.4
360Zhinao3 - 7B - O1.5	25.4.14	360Zhinao3 - 7B - Instruct	54.2	36.3	40.0
OpenR1 - Qwen - 7B	25.2.11	Qwen2.5 - Math - 7B - Instruct	48.7	34.7	21.2
DeepSeek - R1 - Distill - Qwen - 7B	25.1.20	Qwen2.5 - Math - 7B - Instruct	57.3	33.3	47.3
Light - R1 - 7B - DS	25.3.12	DeepSeek - R1 - Distill - Qwen - 7B	59.1	44.3	49.4
Areal - boba - RL - 7B	25.3.31	DeepSeek - R1 - Distill - Qwen - 7B	61.9	48.3	47.6

模型推理

部署

vLLM安装

推荐使用 vllm==0.6.0。

如果使用 CUDA 12.1和PyTorch 2.1，可以直接使用以下命令安装vLLM：

pip install  vllm==0.6.0

否则，请参考vLLM官方安装说明。

安装完成后，执行以下步骤：

将 vllm/zhinao.py 复制到vllm安装目录（python/conda环境）下的 vllm/model_executor/models 中。
然后在 vllm/model_executor/models/__init__.py 中添加一行：

"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),

vLLM服务启动

启动服务：

python -m vllm.entrypoints.openai.api_server \
    --model qihoo360/360Zhinao3-7B-O1.5 \
    --served-model-name 360Zhinao3-7B-O1.5 \
    --port 8360 \
    --host 0.0.0.0 \
    --dtype bfloat16 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code

使用curl请求服务：

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao3-7B-O1.5",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

使用python请求服务：

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao3-7B-O1.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

⚠️ 重要提示

如果需要启用重复惩罚，建议设置 presence_penalty 和 frequency_penalty 而不是 repetition_penalty。

模型微调

训练数据

训练数据：data/training_data_sample.json。此示例数据从 multiturn_chat_0.8M 中采样了10000行并进行了格式转换。

数据格式：

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好！我今天能为您做些什么？有什么问题或需要帮助吗? 我在这里为您提供服务。"
        }
    ]
  }
]

微调脚本

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False