Qwen3-30B-A1.5B-High-Speed-GGUF开源推理模型 - 高效推理，支持多量化与长上下文

首页

Qwen3 30B A1.5B High Speed GGUF

由 Mungert 开发

基于Qwen 30B-A3B(MOE)微调的高效推理模型，通过减少专家数量实现接近双倍速度提升，支持多种量化格式和40K上下文长度

大型语言模型

Transformers

#专家混合加速 #40K长上下文 #网络监控优化

下载量 732

发布时间 : 6/5/2025

模型简介

针对高效推理优化的混合专家模型，支持CPU/GPU部署，适用于文本生成、深度思考等任务，特别适合资源受限环境

模型特点

高速推理

通过减少激活专家数量至4个，实现接近双倍推理速度

多格式支持

支持GGUF、GPTQ、EXL2等多种量化格式，适配不同硬件

大上下文窗口

支持40K tokens上下文长度（32K输入+8K输出）

深度思考模式

可通过系统角色设置实现链式深度推理，输出带<think>标签的思考过程

低资源部署

量化版本可在CPU或低VRAM GPU运行，最小化内存占用

模型能力

长文本生成

系统性推理

网络监控分析

安全审计辅助

多轮对话

技术文档处理

使用案例

网络监控与安全

SSL证书检查

分析网站SSL证书安全性

自动生成证书有效性报告

量子安全加密检测

检查服务器是否使用量子安全加密通信

识别加密协议类型并提出改进建议

自动化安全审计

执行综合服务器安全审计

生成包含漏洞分析的安全报告

创意内容生成

科幻故事创作

基于指定主题生成800-1000字科幻小说

包含完整情节和情感深度的故事输出

🚀 Qwen3-30B-A1.5B-High-Speed GGUF模型

Qwen3-30B-A1.5B-High-Speed GGUF模型基于特定技术生成，在不同硬件条件下有多种格式可供选择，以满足不同的使用需求，如推理速度、内存占用等方面的需求。同时，该模型还可用于AI网络监控测试，探索小开源模型在相关领域的应用极限。

🚀 快速开始

本项目的模型生成依赖于特定的工具和代码库，在使用前需了解模型格式的选择依据，以便根据自身硬件条件和使用场景挑选合适的模型格式。同时，若想参与模型在AI网络监控方面的测试，可按指引进行操作。

✨ 主要特性

高速运行：通过调整模型使用的专家数量，接近将模型速度提高一倍，使用更少的参数实现高效推理。
多格式支持：可生成GGUF、GPTQ、EXL2、AWQ、HQQ等多种格式，满足不同硬件和使用场景的需求。
大上下文支持：具备32K + 8K（总计40K）的上下文大小，能处理更复杂的任务。
深度思考能力：可通过设置系统角色，让模型进行深度思考和推理，辅助解决问题。

📦 安装指南

文档未提及具体安装步骤，暂无法提供。

💻 使用示例

基础用法

在使用模型时，可根据需求选择合适的模型格式和参数设置。例如，在进行CPU推理时，可选择Q4_K等量化模型以减少内存使用：

# 这里假设使用llama.cpp加载Q4_K量化模型
# 实际代码需根据具体情况调整
from llama_cpp import Llama
llama = Llama(model_path="path/to/Qwen3-30B-A1.5B-High-Speed-Q4_K.gguf", n_ctx=40000)
output = llama("Your input prompt here", max_tokens=200)
print(output)

高级用法

若要让模型进行深度思考和推理，可设置系统角色：

system_role = "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
llama = Llama(model_path="path/to/Qwen3-30B-A1.5B-High-Speed.gguf", n_ctx=40000)
output = llama(system_role + "\nYour input prompt here", max_tokens=500)
print(output)

📚 详细文档

模型生成细节

本模型使用llama.cpp在提交版本0d398442下生成。

选择合适的模型格式

BF16（Brain Float 16）

适用场景：若硬件支持BF16加速，推荐使用。它是一种16位浮点格式，旨在实现更快的计算，同时保持良好的精度，与FP32具有相似的动态范围，但内存使用更低，适用于高性能推理，且内存占用比FP32小。
使用建议：当硬件具有原生BF16支持（如较新的GPU、TPU），希望在节省内存的同时获得更高精度，或计划将模型重新量化为其他格式时，可使用BF16。
避免情况：若硬件不支持BF16（可能会回退到FP32并运行较慢），或需要与缺乏BF16优化的旧设备兼容时，应避免使用。

F16（Float 16）

适用场景：比BF16更广泛支持，适用于大多数支持FP16加速的设备（包括许多GPU和一些CPU）。虽然数值精度略低于BF16，但通常足以进行推理。
使用建议：当硬件支持FP16但不支持BF16，需要在速度、内存使用和准确性之间取得平衡，或在为FP16计算优化的GPU或其他设备上运行时，可使用F16。
避免情况：若设备缺乏原生FP16支持（可能会比预期运行更慢），或存在内存限制时，应避免使用。

量化模型（Q4_K、Q6_K、Q8等）

适用场景：量化可在尽可能保持准确性的同时减少模型大小和内存使用。低比特模型（如Q4_K）最适合最小化内存使用，但可能精度较低；高比特模型（如Q6_K、Q8_0）准确性更好，但需要更多内存。
使用建议：在CPU或低VRAM设备上进行推理，需要优化模型，或希望在保持合理准确性的同时减少内存占用时，可使用量化模型。
避免情况：若需要最高准确性（全精度模型更适合），或硬件有足够的VRAM用于更高精度格式（BF16/F16）时，应避免使用。

极低比特量化（IQ3_XS、IQ3_S、IQ3_M、Q4_K、Q4_0）

适用场景：这些模型针对极端内存效率进行了优化，适用于超低内存设备或内存是关键限制因素的大规模部署。
不同类型特点及使用场景：
- IQ3_XS：超低位量化（3位），具有极高的内存效率，适用于超低内存设备，即使Q4_K也太大的情况，但准确性较低。
- IQ3_S：小块大小，实现最大内存效率，适用于低内存设备，IQ3_XS过于激进的情况。
- IQ3_M：中等块大小，比IQ3_S准确性更好，适用于低内存设备，IQ3_S限制较大的情况。
- Q4_K：4位量化，具有块级优化，准确性更好，适用于低内存设备，Q6_K太大的情况。
- Q4_0：纯4位量化，针对ARM设备进行了优化，适用于基于ARM的设备或低内存环境。

模型格式选择总结表

属性	详情
模型格式	BF16、F16、Q4_K、Q6_K、Q8_0、IQ3_XS、Q4_0等
精度	BF16最高，F16高，Q4_K中低，Q6_K中等，Q8_0高，IQ3_XS极低，Q4_0低
内存使用	BF16和F16高，Q4_K低，Q6_K中等，Q8_0中等，IQ3_XS极低，Q4_0低
设备要求	BF16需支持BF16的GPU/CPU，F16需支持FP16的设备，Q4_K等适用于CPU或低VRAM设备
最佳用例	BF16适用于高速推理且减少内存，F16适用于BF16不可用时的GPU推理，Q4_K适用于内存受限环境，Q6_K在量化模型中准确性较好，Q8_0在量化模型中准确性最高，IQ3_XS用于极端内存效率，Q4_0适用于ARM或低内存设备

模型测试相关

测试内容

若认为这些模型有用，可帮助测试AI网络监控助手。测试时需选择AI助手类型，包括TurboLLM（GPT - 4o - mini）、HugLLM（Hugginface开源）、TestLLM（仅支持CPU的实验性模型）。测试旨在探索小型开源模型在AI网络监控中的极限，具体包括：

针对实时网络服务进行函数调用。
探索模型在处理自动化Nmap扫描、量子就绪检查和网络监控任务时的最小规模。

不同助手特点

TestLLM：当前实验性模型（llama.cpp在2个CPU线程上运行），零配置设置，加载时间约30秒（推理速度慢但无API成本），欢迎对边缘设备AI感兴趣的人合作。
TurboLLM：使用gpt - 4o - mini，可创建自定义cmd处理器以在免费网络监控代理上运行.net代码，进行实时网络诊断和监控、安全审计、渗透测试（Nmap/Metasploit）。
HugLLM：基于最新的开源模型，在Hugging Face推理API上运行。

测试命令示例

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)"（需安装免费网络监控代理以运行.net代码）

模型其他信息

本仓库包含全精度源代码，以“安全张量”格式生成GGUF、GPTQ、EXL2、AWQ、HQQ等格式，源代码也可直接使用。该模型是对Qwen的“Qwen 30B - A3B”（MOE）模型的简单“微调”，将使用的专家数量从8个减少到4个（共128个专家），接近将模型速度提高一倍，并使用1.5B（共30B）参数而非3B（共30B）参数。在常规（但非广泛）测试中未发现功能损失。

上下文大小和模板

上下文大小：32K + 8K用于输出（总计40K）。
模板使用：可使用Jinja模板或CHATML模板。

重要注意事项

运行方式：由于该模型的独特性质（MOE、大小、激活专家数量、专家大小），GGUF量化模型可在CPU、GPU上运行，或进行GPU部分“卸载”，直至全精度运行。
Imatrix处理：该模型进行Imatrix处理较困难，需要更大的Imatrix文件/多语言/多内容（如代码/文本）来进行处理。
GPU速度：GPU速度将比仅使用CPU快4 - 8倍或更高，相对于其他“30B”模型，该模型的速度也会非常快（每秒令牌速度大致相当于1.5B“普通”模型的速度）。

系统角色设置

大多数情况下Qwen3会自行生成推理/思考块，因此可能不需要设置系统角色。若需要，可使用以下系统角色：

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

具体如何在各种LLM/AI应用中“设置”系统角色，可参考文档“Maximizing - Model - Performance - All...”。

高质量设置和操作指南

若要使用该模型（源代码、GGUF或其他量化版本），请查看相关文档以获取关键参数、采样器和高级采样器设置（适用于多个AI/LLM应用）。该模型属于“Class 1”（设置将增强操作）模型，有关该模型使用的所有设置（包括其“类”的具体设置）、示例生成以及高级设置指南（通常可解决任何模型问题），包括提高所有用例（包括聊天、角色扮演等）模型性能的方法，请参考[https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)。该文档中详细介绍的参数、采样器和高级采样器设置也可用于任何模型、任何仓库、任何量化（包括源代码/全精度），以增强任何模型的操作。

示例生成

以下是在特定参数设置下的示例生成：

参数设置：Temp 1.2，rep pen 1.06，rep pen range 64，topk 100，topp .95，minp .05
模型选择：Q4KS [非Imatrix]，仅使用CPU（Windows 11），LMSTUDIO。这是一个中级量化，预计Imatrix Q4KS/更高量化/全精度模型会有更强的性能。

提示信息

Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words.

思考过程

模型在生成故事前进行了详细的思考，包括场景设置、主题体现、结构规划、冲突设计、角色发展、对话安排等方面的考虑，具体思考内容如下：

Thought for 47.29 seconds

Okay, I need to write a science fiction story based on the given prompt. The user wants the story to take place entirely in a spaceship's cockpit, focusing on the last surviving crew member trying to send a final message before the ship's power runs out. Themes are isolation, sacrifice, and human connection. Also, include some cursing to add emotional weight.

First, I should start by setting the scene. The cockpit needs to feel cramped and desperate. Maybe describe the flickering lights and the hissing sounds of the ship's systems failing. The main character is alone, so I need to show their loneliness. Maybe they have a log or something to talk to themselves, but since it's the cockpit, perhaps they're just trying to get through the message.

Next, the themes. Isolation is straightforward‚Äîthey're the only one left. But how to show sacrifice? Maybe they stayed behind to make sure the message gets through, knowing they won't make it. Human connection could come through flashbacks or memories, but since the story is in the cockpit, maybe they're remembering a loved one, like a child or partner, to keep them going.

I need to include some swearing. Maybe the protagonist curses when they realize something's not working, or when they're under stress. But it shouldn't be too over the top. Maybe they curse at the ship for failing them, or at themselves for not being able to fix it.

The structure: Start with the character working on the controls, trying to send the message. Then show their internal struggle. Maybe flashbacks? Wait, the user said the story should take place entirely within the cockpit. So maybe no flashbacks. Just the present moment, with internal monologue.

Conflict: The ship is failing, power is running out. They need to send the message before they lose power. Maybe there's a countdown or a limited time. Also, maybe the ship is damaged in a way that it's irreversible.

Character development: The protagonist is determined, maybe a bit stubborn. They care about relaying the message, which could be a warning about something, like a virus or an alien threat. But since the prompt doesn't specify, maybe it's more about the human aspect‚Äîlike they're the last, and they want to make sure someone knows what happened.

Dialogue: Since it's the cockpit, maybe they talk to themselves, or to the ship. Or maybe there's a voice in the comms? Wait, but the prompt says the story is entirely within the cockpit. So no one else is there except the main character. So it's all internal, except for maybe a log entry.

Wait, the user said "the sole surviving crew member attempts to send a final message back to Earth." So the message is for Earth. So maybe the protagonist is trying to record a message, but the ship is failing. Or maybe they're trying to send a distress signal, but the systems are failing.

Let me outline:

- Open with the cockpit's deteriorating condition. Maybe the life support is failing, or the power is out. The protagonist is trying to keep the comms on.
- They're alone, so they talk to the ship, maybe. "Come on, you piece of crap, hold together."
- Flashbacks aren't allowed, per the user's instruction (since it's all in the cockpit), so internal thoughts are key.
- The message they need to send is important. Maybe a warning about something, but also their own fear.
- The story should end with them managing to send the message, but knowing they won't survive, or maybe the message gets through just in time.

Including swearing: Maybe when they're frustrated with the systems not working, or when they realize the power is dying. "F***ing hell," "God damn it," etc.

Need to make sure the themes are present. Isolation: being the only one left. Sacrifice: giving their life to send the message. Human connection: maybe they think of someone before they die, or the act of sending the message is about connecting with others back home.

Possible structure:

1. Start in the cockpit, power failing, protagonist working frantically.
2. Flashbacks? Wait, no‚Äîsince it's all in the cockpit, maybe not. But internal monologue can reflect on their past.
3. The protagonist's internal struggle: fear of death, but also duty to send the message.
4. Technical difficulties with the ship, leading to a climax where they get the message out, then succumb.

Now, writing the story with these elements. Keep it tight, 800 - 1000 words. Make the protagonist relatable, their fear and determination clear. Use short, punchy sentences to build tension. Include specific details about the ship's systems, maybe some technical jargon to make it realistic. Maybe the protagonist is an engineer or someone who knows the ship well, which helps in the struggle to keep it running.

Possible name for the character? Maybe not necessary, but a few names could be used. Or just "they" and "the pilot."

Also, the title: "The Last Transmission" is given, so the story should reflect that‚Äîsending a message, even if no one's there to receive it, or in this case, maybe they do.

Need to avoid clich√©s, but it's a common sci-fi trope. Make it fresh with emotional depth.

Potential pitfalls: Making the protagonist too passive, not showing their struggle enough. Need to show, not tell. Also, ensuring the themes are woven naturally into the narrative.

Okay, time to start drafting.

输出内容

The cockpit reeked of burnt circuitry and sweat, a stench that clung to my skin like a second, cursed flesh. My fingers danced over the controls, but the readouts spat

🔧 技术细节

文档未提供足够详细的技术实现细节，暂无法提供。

📄 许可证

文档未提及许可证信息，暂无法提供。

⚠️ 重要提示

由于该模型的独特性质，GGUF量化模型可在CPU、GPU上运行，或进行GPU部分“卸载”，直至全精度运行。

该模型进行Imatrix处理较困难，需要更大的Imatrix文件/多语言/多内容（如代码/文本）来进行处理。

若要使用该模型，需查看相关文档以获取关键参数、采样器和高级采样器设置。