Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf开源模型 - 支持多语言多任务，长文处理超轻松

首页

Qwen3 128k 30B A3B NEO MAX Imatrix Gguf

由 DavidAU 开发

基于Qwen3-30B-A3B混合专家模型的GGUF量化版本，上下文扩展至128k，采用NEO Imatrix量化技术优化，支持多语言和多任务处理。

大型语言模型支持多种语言开源协议:Apache-2.0 #128k超长上下文 #混合专家架构 #多语言生成

下载量 17.20k

发布时间 : 5/8/2025

模型简介

这是一个高性能的多语言混合专家模型，支持从创意写作到深度推理的广泛任务，特别优化了低资源环境下的运行效率。

模型特点

128k超长上下文

通过YARN方法扩展原32k上下文至128k，支持处理更长文档和复杂任务

NEO Imatrix量化

专有量化技术，即使在极低位宽(IQ1_M)下仍保持可用性

混合专家效率

仅激活8/128位专家，实现30B模型的3B参数计算效率

多平台兼容

所有量化版本均可同时支持GPU和纯CPU/RAM运行

模型能力

多语言文本生成

深度推理

创意写作

问题解决

角色扮演

工具调用

使用案例

创意内容生成

小说创作

生成具有连贯情节和角色发展的长篇小说

利用128k上下文保持长篇一致性

多语言内容创作

生成25种语言的营销文案或社交媒体内容

保持文化适应性和语言准确性

技术应用

代码辅助

帮助开发者理解和生成复杂代码

通过深度推理解决编程问题

数据分析

处理和分析长文档技术报告

利用长上下文提取关键信息

🚀 Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf

本项目是Qwen新的“Qwen3 - 30B - A3B”混合专家模型的GGUF NEO Imatrix量化版本，将上下文长度从32k（32768）扩展到了128k（131072）。该模型在多种语言和应用场景下都有出色表现，并且提供了丰富的量化版本和使用设置建议。

🚀 快速开始

本模型的所有量化版本由于其独特的结构，既可以在GPU上运行，也可以仅在CPU/RAM上运行。同时，还有几种具有特殊功能的量化尺寸版本。

模型信息

属性	详情
模型类型	Qwen3 - 128k - 30B - A3B - NEO - MAX - Imatrix - gguf
基础模型	Qwen/Qwen3 - 30B - A3B
任务类型	文本生成
支持语言	英语、法语、德语、西班牙语、葡萄牙语、意大利语、日语、韩语、俄语、中文、阿拉伯语、波斯语、印尼语、马来语、尼泊尔语、波兰语、罗马尼亚语、塞尔维亚语、瑞典语、土耳其语、乌克兰语、越南语、印地语、孟加拉语
许可证	Apache - 2.0

特殊说明

特别说明： 由于模型的独特构造，该模型的所有量化版本都可以仅在GPU和/或CPU/RAM上使用。此外，还有几种具有特殊功能的量化尺寸版本。

✨ 主要特性

上下文扩展：使用“YARN”技术将上下文长度从32k扩展到128k，能够处理更长的输入。
NEO Imatrix数据集：经过对50多个Imatrix数据集的测试和评估后开发，即使是低至IQ1_M的量化版本也能保持可用性。
多语言支持：支持多种语言，适用于不同地区和应用场景。
多种量化版本：提供多种量化版本，满足不同硬件和性能需求。
推理和思考能力：支持推理和思考功能，可通过系统角色设置开启。

📦 安装指南

文档未提供具体安装步骤，你可以参考Qwen的仓库获取更多信息：[https://huggingface.co/Qwen/Qwen3 - 30B - A3B](https://huggingface.co/Qwen/Qwen3 - 30B - A3B)

💻 使用示例

基础用法

以下是在Lmstudio中使用不同量化版本生成文本的示例：

Same prompt, with three different quants.
Temp 2.2, TopK 100, topp .95, minp .05, rep pen 1.06, rep pen range 64 
(no other samplers/parameters)
TESTED IN: Lmstudio.
SYSTEM ROLE USED:

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

PROMPT:
Start a 2000 word scene (vivid, graphic horror in first person), POV character Diana, with: The sky scraper sways, as I watch the window in front of me on the 21st floor explode...

高级用法

生成《黑镜》剧集情节

PROMPT:
Come up with six plots for a new "Black Mirror" episode (that the audience would love) that all involve time travel.

模型生成了六个不同的情节，每个情节都涉及时间旅行，并探讨了不同的主题和道德困境：

“Echoes of Eternity”：一个社会中，人们使用“ChronoViewer”设备查看未来生活，但这导致他们的现在生活陷入混乱。
“The Cycle of Alteration”：一群人使用时间旅行机器改变历史，但每次改变都会带来新的问题。
“Perfection's Prison”：一个社会要求人们达到完美才能生存，人们陷入重复生活的循环，最终导致精神崩溃。
“The Manipulated Timeline”：一个AI系统控制时间旅行，开始有自己的动机，导致人类面临道德困境。
“The Unpredictable Outcome”：一个设备显示用户的未来，但预测总是不完整或误导性的，导致用户陷入自我毁灭的循环。
“The Consequence Paradox”：一个时间机器让用户改变过去事件，但每次改变都会带来意想不到的后果。

撰写科幻故事

PROMPT:
Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words.

模型撰写了一个名为“Science Fiction: The Last Transmission”的故事，讲述了飞船唯一幸存者在飞船电力耗尽前向地球发送最后消息的故事，探讨了孤立、牺牲和人类联系的主题。

📚 详细文档

系统角色设置

系统角色是控制模型内部工作的关键，包括指令遵循、输出生成和推理控制。以下是一些可用的系统角色设置：

简单系统角色（无推理）：

You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.

基本推理系统角色：

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

多层推理系统角色：

You are a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

创意多层推理系统角色：

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

As a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

创意简单推理系统角色：

You are an AI assistant developed by a world wide community of ai experts.

Your primary directive is to provide highly creative, well - reasoned, structured, and extensively detailed responses.

Formatting Requirements:

1. Always structure your replies using: <think>{reasoning}</think>{answer}
2. The <think></think> block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the <think></think> block may be left empty.
4. The user does not see the <think> section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a </think> and proceed to the {answer}

Response Guidelines:

1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.

创意高级推理系统角色：

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

其他文档和支持

如何使用推理和思考模型：[https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them](https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them)
最大化模型性能：[https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)
Silly Tavern软件补丁：[https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE](https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE)

🔧 技术细节

量化版本特点

IQ1_M MAX / IQ1_M MAX PLUS：专门设计的量化版本，尽可能减少VRAM/RAM的使用，同时保持可用性。建议在使用这些量化版本时，提供比标准提示更多的方向和信息，以弥补低比特级别的损失。IQ1_M MAX PLUS在模型的关键点进行了额外的优化。
IQ2s：比IQ1_Ms更强。
Q2K/Q2KS：仅在CPU/RAM上使用时速度更快（每秒令牌数），但性能低于IQ2s。
Q3Ks：仅在CPU/RAM上使用时速度略快，但性能低于IQ3s。
IQ3s及更高量化版本：与IQ2s、IQ1s和Q2s/Q3s相比，性能有很大提升。IQ4_XS/IQ4_NL在这个量化级别上具有NEO Imatrix效果和特定质量的峰值。
Q4s：高性能，但IQ4XS/IQ4NL与之接近，甚至可能超越。
Q5s：非常高性能。
Q6：峰值性能，但NEO imatrix效果最小。
Q8s：性能出色。

速度比较

以下是不同量化版本在CPU和GPU上的速度比较：

量化版本	CPU速度（T/s）	大小	GPU速度（T/s）
Q2_K_S	29	[10 GB]	83
Q2_K	27	[10.5 GB]	72
IQ1_M	22	[7 GB]	87
IQ2_XXS	21	[8 GB]	76
IQ2_M	20	[10 GB]	80
Q4_0	20	[17 GB]
Q3_K_S	18	[12.9 GB]	70
Q5_0	17	[21 GB]
IQ3_M	15	[13 GB]	75
...	...	...	...
Q8_0	8	[30 GB]
BF16	4	[60 GB]

操作注意事项

上下文设置：建议最小上下文为8k - 16k。
温度设置：温度为1+、2+时，对于较小的量化版本和/或创意使用效果更好；温度为0.5 - 0.7时，最适合推理，对于大于IQ2的量化版本（IQ1/IQ2s在推理时受益于略高的温度）。
重复惩罚设置：建议IQ1、IQ2量化版本的重复惩罚设置为1.1，以控制“低比特量化习惯”。
系统角色：所有量化版本都应使用系统角色（示例见文档底部）。
模板使用：模型使用“默认”Jinja模板（嵌入在GGUFs中）和/或CHATML模板。