Devstral-Small-2505-fp8开源大模型 - 助力软件工程，智能探索与编辑代码库

首页

Devstral Small 2505 Fp8

由 bullerwins 开发

Devstral是由Mistral AI与All Hands AI合作开发的面向软件工程任务的智能体大语言模型，擅长利用工具探索代码库、编辑多文件及驱动软件工程智能体。

大型语言模型

Safetensors

支持多种语言开源协议:Apache-2.0 #智能编码助手 #128K长上下文 #SWE-bench领先

下载量 243

发布时间 : 5/21/2025

模型简介

Devstral是一个专为智能编码任务设计的大语言模型，适合构建软件工程智能体，拥有128k token的超长上下文窗口。

模型特点

智能编码

专为智能编码任务设计，是构建软件工程智能体的理想选择。

轻量化

仅240亿参数的紧凑体积，可在单块RTX 4090或32GB内存的Mac上运行。

长上下文

支持128k token的超长上下文窗口。

开源许可

采用Apache 2.0许可证，允许商业和非商业用途。

模型能力

代码生成

代码编辑

多文件处理

软件工程智能体驱动

多语言支持

使用案例

软件开发

构建待办事项应用

使用FastAPI和React构建包含添加、删除、标记任务等功能的单页应用。

生成完整应用代码并自动部署。

代码维护

代码优化

对现有代码进行迭代优化，如添加复选框提升用户体验。

改进代码功能并提升用户体验。

🚀 Devstral-Small-2505

Devstral是一款专为软件工程任务打造的智能大语言模型（LLM），由Mistral AI和All Hands AI合作开发。它在利用工具探索代码库、编辑多个文件以及驱动软件工程智能体等方面表现出色。该模型在SWE-bench基准测试中取得了显著成绩，成为此基准测试中的开源模型第一名。

它基于Mistral-Small-3.1进行微调，因此拥有长达128k标记的长上下文窗口。作为一个编码智能体，Devstral仅处理文本，在从Mistral-Small-3.1微调之前，移除了视觉编码器。

对于需要特殊功能（如增加上下文、特定领域知识等）的企业，我们将发布超越Mistral AI为社区贡献的商业模型。

在我们的博客文章中了解更多关于Devstral的信息。

🚀 快速开始

该模型已使用LLMCompressor量化为FP8。

非常适合在双GPU系统（如2x3090）上使用vLLM或SGlang运行：

vllm serve bullerwins/Devstral-Small-2505-fp8 --max-model-len 16000 --host 0.0.0.0 --port 5000 -tp 2 --tokenizer_mode mistral

✨ 主要特性

智能编码：Devstral专为智能编码任务而设计，是软件工程智能体的理想选择。
轻量级：仅240亿参数的紧凑规模，Devstral轻到足以在单个RTX 4090或拥有32GB内存的Mac上运行，适合本地部署和设备端使用。
Apache 2.0许可证：开放许可证，允许商业和非商业用途的使用和修改。
上下文窗口：128k的上下文窗口。
分词器：使用词汇量为131k的Tekken分词器。

📚 详细文档

基准测试结果

SWE-Bench

Devstral在SWE-Bench Verified上的得分达到46.8%，比之前的开源最优模型高出6%。

模型	脚手架	SWE-Bench Verified (%)
Devstral	OpenHands Scaffold	46.8
GPT-4.1-mini	OpenAI Scaffold	23.6
Claude 3.5 Haiku	Anthropic Scaffold	40.6
SWE-smith-LM 32B	SWE-agent Scaffold	40.2

在相同的测试脚手架（由All Hands AI提供的OpenHands）下进行评估时，Devstral远超Deepseek-V3-0324和Qwen3 232B-A22B等更大的模型。

SWE Benchmark

使用方法

我们建议将Devstral与OpenHands脚手架一起使用。你可以通过我们的API或在本地运行来使用它。

API

按照这些说明创建Mistral账户并获取API密钥。

然后运行以下命令启动OpenHands Docker容器：

export MISTRAL_API_KEY=<MY_KEY>

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik

mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json

docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.39

本地推理

该模型也可以使用以下库进行部署：

OpenHands（推荐）

启动服务器部署Devstral-Small-2505

确保你按照上述说明启动了一个兼容OpenAI的服务器，如vLLM或Ollama。然后，你可以使用OpenHands与Devstral-Small-2505进行交互。

在本教程中，我们通过运行以下命令启动一个vLLM服务器：

vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

服务器地址应采用以下格式：http://<your-server-url>:8000/v1

启动OpenHands

你可以按照此处的说明安装OpenHands。

启动OpenHands最简单的方法是使用Docker镜像：

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik

docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.38

然后，你可以在http://localhost:3000访问OpenHands UI。

连接到服务器

访问OpenHands UI时，系统会提示你连接到服务器。你可以使用高级模式连接到之前启动的服务器。

填写以下字段：

自定义模型：openai/mistralai/Devstral-Small-2505
基础URL：http://<your-server-url>:8000/v1
API密钥：token（或者如果你在启动服务器时使用了其他令牌，则填写该令牌）

使用由Devstral驱动的OpenHands

现在你可以通过开始新对话在OpenHands中使用Devstral Small。让我们来构建一个待办事项列表应用程序。

待办事项列表应用程序

1. 让我们要求Devstral使用以下提示生成应用程序： ```txt 构建一个待办事项列表应用程序，满足以下要求： - 使用FastAPI和React构建。 - 使其成为单页应用程序，具备以下功能： - 允许添加任务。 - 允许删除任务。 - 允许将任务标记为已完成。 - 显示任务列表。 - 将任务存储在SQLite数据库中。 ``` ![Agent prompting](assets/tuto_open_hands/agent_prompting.png)

查看结果你应该会看到智能体构建应用程序，并能够探索它生成的代码。

如果它没有自动完成，请要求Devstral部署应用程序或手动进行部署，然后访问前端部署URL查看应用程序。 Agent working App UI

迭代现在你已经得到了第一个结果，可以通过要求智能体进行改进来进行迭代。例如，在生成的应用程序中，我们可以点击任务将其标记为已选中，但添加一个复选框会改善用户体验。你还可以要求它添加编辑任务的功能，或者添加按状态过滤任务的功能。

享受使用Devstral Small和OpenHands进行开发的乐趣！

vLLM（推荐）

我们建议使用vLLM库来实现生产就绪的推理管道。

安装

确保你安装了vLLM >= 0.8.5：

pip install vllm --upgrade

这样做应该会自动安装mistral_common >= 1.5.5。要进行检查：

python -c "import mistral_common; print(mistral_common.__version__)"

你还可以使用现成的Docker镜像或从Docker Hub获取。

服务器

我们建议在服务器/客户端设置中使用Devstral。

启动服务器：

vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

要测试客户端，你可以使用以下简单的Python代码片段：

import requests
import json
from huggingface_hub import hf_hub_download

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Devstral-Small-2505"

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "<your-command>",
            },
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

Mistral-inference

我们建议使用mistral-inference快速试用Devstral。

安装

确保安装了mistral_inference >= 1.6.0。

pip install mistral_inference --upgrade

下载

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)

Python

你可以使用以下命令运行模型：

mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300

然后你可以用任何你想要的内容进行提示。

Transformers

为了充分利用我们的模型与transformers，请确保安装了mistral-common >= 1.5.5以使用我们的分词器。

pip install mistral-common --upgrade

然后加载我们的分词器和模型并生成结果：

import torch

from mistral_common.protocol.instruct.messages import (
    SystemMessage, UserMessage
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

model_id = "mistralai/Devstral-Small-2505"
tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")

tokenizer = MistralTokenizer.from_file(tekken_file)

model = AutoModelForCausalLM.from_pretrained(model_id)

tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            SystemMessage(content=SYSTEM_PROMPT),
            UserMessage(content="<your-command>"),
        ],
    )
)

output = model.generate(
    input_ids=torch.tensor([tokenized.tokens]),
    max_new_tokens=1000,
)[0]

decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
print(decoded_output)

LMStudio

从Hugging Face下载权重：

pip install -U "huggingface_hub[cli]"
huggingface-cli download \
"mistralai/Devstral-Small-2505_gguf" \
--include "devstralQ4_K_M.gguf" \
--local-dir "mistralai/Devstral-Small-2505_gguf/"

你可以使用LMStudio在本地提供模型服务。

下载LM Studio并安装。
安装lms cli ~/.lmstudio/bin/lms bootstrap。
在bash终端中，在下载模型检查点的目录（例如mistralai/Devstral-Small-2505_gguf）中运行lms import devstralQ4_K_M.gguf。
打开LMStudio应用程序，点击终端图标进入开发者选项卡。点击选择要加载的模型并选择Devstral Q4 K M。切换状态按钮以启动模型，在设置中切换“在本地网络上服务”为开启状态。
在右侧选项卡中，你将看到一个API标识符（应该是devstralq4_k_m）和一个API地址。记录下这个地址，我们将在下一步中使用它。

启动Openhands 现在你可以使用Openhands与从LM Studio提供服务的模型进行交互。使用Docker启动Openhands服务器：

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.38

点击“查看高级设置”。在新选项卡中，将高级模式切换为开启状态。将自定义模型设置为mistral/devstralq4_k_m，将基础URL设置为我们在LM Studio的上一步中获得的API地址。将API密钥设置为dummy。点击“保存更改”。