Qwen3-4B-NEO-Imatrix-Max-GGUF开源模型 - 支持长文对话，提升推理输出能力

首页

Qwen3 4B NEO Imatrix Max GGUF

由 DavidAU 开发

这是基于Qwen3-4B模型的NEO Imatrix量化版本，采用BF16格式的MAX输出张量以提升推理和输出生成能力，支持32k上下文长度。

大型语言模型开源协议:Apache-2.0 #长上下文推理 #思维链可视化 #创意文本生成

下载量 1,152

发布时间 : 4/29/2025

模型简介

该模型是Qwen3-4B的量化版本，专注于提升推理和文本生成能力，特别适用于创意用例。支持32k上下文长度，并可扩展至128k。

模型特点

NEO Imatrix量化

采用BF16格式的MAX输出张量量化，提升推理和输出生成能力。

长上下文支持

支持32k上下文长度，并可扩展至128k，适用于长文本生成任务。

深度推理能力

模型默认开启推理功能，可生成详细的思考过程和内心独白。

创意用例优化

在创意用例中表现突出，特别适合故事生成和对话写作。

模型能力

文本生成

深度推理

长上下文处理

创意写作

对话生成

使用案例

创意写作

故事生成

生成具有复杂情节和角色发展的故事。

可生成包含50%对话、25%叙述、15%肢体语言和10%内心活动的故事。

对话写作

生成具有潜台词和情感深度的对话。

通过展示而非讲述的方式，生成生动的对话内容。

推理任务

复杂问题解决

通过系统性推理过程解决复杂问题。

生成详细的思考过程和解决方案。

🚀 Qwen3-4B-NEO-Imatrix-Max-GGUF

Qwen3-4B-NEO-Imatrix-Max-GGUF 是基于“Qwen 3 - 4B”模型的 NEO Imatrix 量化版本，在 BF16 下具有最大“输出张量”，可显著提升推理和输出生成能力。

✨ 主要特性

NEO Imatrix 量化：使用内部生成的 NEO Imatrix 数据集进行量化，不同量化方式对 Imatrix 效果和输出质量有不同影响。
强大的上下文长度：支持 32K 上下文长度，输出生成可达 8K，并且可以扩展到 128K。
多样化的应用场景：不同量化方式适用于不同场景，如创意场景和强推理场景。
灵活的模板选择：可使用 Jinja 模板或 CHATML 模板，LMSTUDIO 用户可更新 Jinja 模板。
自定义系统角色：可根据需要设置系统角色，帮助模型更好地进行推理和思考。

📦 安装指南

文档未提供具体安装步骤，可参考原模型卡片 https://huggingface.co/Qwen/Qwen3-4B 获取相关信息。

💻 使用示例

基础用法

文档未提供具体代码示例，可参考原模型卡片 https://huggingface.co/Qwen/Qwen3-4B 获取使用示例。

📚 详细文档

量化说明

Imatrix 效果与量化关系：量化越低，Imatrix 效果越强。IQ4XS/IQ4NL 是质量和 Imatrix 效果平衡最佳的量化方式，适用于创意场景。对于强推理场景，建议使用较高的量化方式。Q8_0 量化仅为最大值，Imatrix 对该量化无影响，F16 为全精度。
上下文长度：支持 32K 上下文长度 + 8K 输出生成，可扩展到 128K。对于 65K、128K 或 256K 上下文的 4B 模型，可参考 https://huggingface.co/DavidAU/Qwen3-4B-Q8_0-65k-128k-256k-context-GGUF。

模板使用

Jinja 模板问题：如果 Jinja “自动模板” 出现问题，可使用 CHATML 模板。LMSTUDIO 用户可更新 Jinja 模板，参考 https://lmstudio.ai/neil/qwen3-thinking。
系统角色建议：大多数情况下，Qwen3 会自动生成推理/思考块，系统角色设置可选。建议的系统角色如下：

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

具体设置方法可参考文档 “Maximizing-Model-Performance-All...”。

可选增强

可使用以下内容替代“系统提示”或“系统角色”，进一步增强模型性能：

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

此内容可用于场景生成和场景延续功能的增强。

另一个可使用的系统提示如下，可通过更改“名称”调整其性能：

You are a deep thinking AI composed of 4 AIs - [MODE: Spock], [MODE: Wordsmith], [MODE: Jamet] and [MODE: Saten], - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in-depth solution. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.