Granite-3.2-2B-Instruct开源AI模型 - 免费部署助力思维推理任务

Granite 3.2 2b Instruct GGUF

由 Mungert 开发

Granite-3.2-2B-Instruct是一个20亿参数的长上下文AI模型，专为思维推理能力微调。基于Granite-3.1-2B-Instruct构建，通过混合使用宽松许可的开源数据集和内部生成的合成数据训练，旨在提升推理任务表现。

大型语言模型开源协议:Apache-2.0 #长上下文推理 #多语言指令 #商业AI助手

下载量 754

发布时间 : 3/18/2025

模型简介

该模型设计用于处理通用指令跟随任务，可集成到包括商业应用在内的各种AI助手中。支持对其思维能力的可控性，确保仅在需要时应用。

模型特点

长上下文支持

支持处理长文档/会议摘要、长文档问答等长上下文任务。

多语言能力

支持12种语言，包括英语、中文、日语等，并可针对其他语言进行微调。

思维推理优化

专为思维推理能力微调，支持可控的思维能力应用。

商业友好许可

采用Apache 2.0许可证，适合商业应用集成。

模型能力

思维推理

摘要生成

文本分类

文本提取

问答

检索增强生成(RAG)

代码相关任务

函数调用任务

多语言对话

长上下文处理

使用案例

商业助手

智能客服

集成到商业客服系统中，提供多语言客户支持。

提升客服效率，降低人力成本。

会议纪要生成

自动生成会议摘要和行动项。

节省会议记录时间，提高工作效率。

开发者工具

代码辅助

帮助开发者理解、生成和优化代码。

提高开发效率，减少编码错误。

内容创作

多语言内容生成

生成不同语言的营销文案、产品描述等内容。

简化多语言内容创作流程。

🚀 Granite-3.2-2B-Instruct GGUF模型

Granite-3.2-2B-Instruct是一款经过微调的AI模型，拥有20亿参数和长上下文处理能力。它基于Granite-3.1-2B-Instruct构建，使用了开源数据集和内部合成数据进行训练，可灵活控制思维能力，适用于多种指令跟随任务。

🚀 快速开始

安装依赖库

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

代码示例

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_path="ibm-granite/granite-3.2-2b-instruct"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map=device,
        torch_dtype=torch.bfloat16,
    )
tokenizer = AutoTokenizer.from_pretrained(
        model_path
)

conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]

input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)

set_seed(42)
output = model.generate(
    **input_ids,
    max_new_tokens=8192,
)

prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)

✨ 主要特性

思维能力可控：可根据需求控制模型的思维能力，仅在必要时启用。
多语言支持：支持英语、德语、西班牙语、法语、日语、葡萄牙语、阿拉伯语、捷克语、意大利语、韩语、荷兰语和中文等多种语言。
广泛的任务适用性：适用于总结、文本分类、文本提取、问答、检索增强生成（RAG）、代码相关任务、函数调用任务、多语言对话和长上下文任务等。

📦 安装指南

安装以下库：

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_path="ibm-granite/granite-3.2-2b-instruct"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map=device,
        torch_dtype=torch.bfloat16,
    )
tokenizer = AutoTokenizer.from_pretrained(
        model_path
)

conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]

input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)

set_seed(42)
output = model.generate(
    **input_ids,
    max_new_tokens=8192,
)

prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)

示例输出

thinking=True

Here is my thought process:

1. First, let's understand the composition of the initial 10 liters of 30% acid solution:
   - Acid content = 30% of 10 liters = 0.30 * 10 = 3 liters

2. We want to end up with a 50% acid solution. Let's denote the volume of the 70% acid solution to be added as 'x' liters.

3. The amount of acid in the final mixture (50% solution) should equal the initial acid content plus the acid from the added 70% solution.
   - Total acid in final mixture = 0.50 * (10 + x)

4. The equation representing the total acid content is:
   3 (from initial solution) + 0.70 * x (from added solution) = 0.50 * (10 + x)

5. Now, we solve for 'x':
   3 + 0.70x = 5 + 0.50x
   0.70x - 0.50x = 5 - 3
   0.20x = 2
   x = 2 / 0.20
   x = 10 liters

6. Therefore, 10 liters of a 70% acid solution must be added to the initial 10 liters of 30% acid solution to achieve a 50% acid mixture.

Here is my response:

To solve this problem, we need to determine how much of a 70% acid solution must be added to a 10-liter solution that is already 30% acid to create a new mixture that is 50% acid. Here's how to calculate it:

1. **Initial Acid Content**:
   - The initial 10-liter solution contains 3 liters of acid (since 30% of 10 liters).

2. **Desired Final Acid Content**:
   - We want the final mixture to be 50% acid.

3. **Equation Setup**:
   - Let 'x' be the volume (in liters) of the 70% acid solution to be added.
   - The equation representing the total acid content in the final mixture (which is 10 liters from the initial solution plus 'x' liters of the added solution) should equal 50% of the total volume (10 + x).
   - This gives us: 3 (acid from initial solution) + 0.70x (acid from added solution) = 0.50(10 + x)

4. **Solving the Equation**:
   - First, simplify the right side of the equation: 0.50(10 + x) = 5 + 0.50x
   - Now, the equation is: 3 + 0.70x = 5 + 0.50x
   - Subtract 0.50x from both sides: 0.20x = 2
   - Divide both sides by 0.20: x = 2 / 0.20
   - x = 10 liters

5. **Conclusion**:
   - To achieve a 50% acid mixture from the initial 30% solution, you need to add 10 liters of a 70% acid solution.

So, the final answer is that 10 liters of a 70% acid solution must be added to the initial 10 liters of 30% acid solution.

thinking=False

To solve this problem, we'll use the concept of volume and concentration. Let's denote the amount of 70% acid solution we need to add as "x" liters.

First, let's find out how much acid is in the initial 10-liter solution:

Acid in initial solution = 30% of 10 liters = 0.30 * 10 = 3 liters

Now, we want to end up with a 50% acid solution in a total volume of (10 + x) liters. Let's denote the final volume as V.

Final acid concentration = 50%
Final acid amount = 50% of V = 0.50 * V

We know the initial acid amount and the final acid amount, so we can set up an equation:

Initial acid amount + Acid from added solution = Final acid amount
3 liters + (70% of x) = 0.50 * (10 + x)

Now, let's solve for x:

0.70x + 3 = 0.50 * 10 + 0.50x
0.70x - 0.50x = 0.50 * 10 - 3
0.20x = 5 - 3
0.20x = 2
x = 2 / 0.20
x = 10 liters

So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30% acid solution to achieve a 50% acid mixture.

📚 详细文档

选择合适的模型格式

选择正确的模型格式取决于您的硬件能力和内存限制。

BF16（Brain Float 16） – 若支持BF16加速则使用

一种16位浮点格式，专为更快的计算而设计，同时保持良好的精度。
提供与FP32 相似的动态范围，但内存使用更低。
如果您的硬件支持BF16加速（请检查设备规格），建议使用。
与FP32相比，适用于高性能推理且内存占用减少。

📌 使用BF16的情况： ✔ 您的硬件具有原生BF16支持（例如，较新的GPU、TPU）。 ✔ 您希望在节省内存的同时获得更高的精度。 ✔ 您计划将模型重新量化为其他格式。

📌 避免使用BF16的情况： ❌ 您的硬件不支持BF16（可能会回退到FP32并运行较慢）。 ❌ 您需要与缺乏BF16优化的旧设备兼容。

F16（Float 16） – 比BF16更广泛支持

一种16位浮点格式，精度较高，但取值范围比BF16小。
适用于大多数支持FP16加速的设备（包括许多GPU和一些CPU）。
数值精度略低于BF16，但通常足以进行推理。

📌 使用F16的情况： ✔ 您的硬件支持FP16但不支持BF16。 ✔ 您需要在速度、内存使用和准确性之间取得平衡。 ✔ 您在GPU或其他针对FP16计算优化的设备上运行。

📌 避免使用F16的情况： ❌ 您的设备缺乏原生FP16支持（可能会比预期运行得慢）。 ❌ 您有内存限制。

量化模型（Q4_K、Q6_K、Q8等） – 用于CPU和低VRAM推理

量化可在尽可能保持准确性的同时减小模型大小和内存使用。

低比特模型（Q4_K） → 内存使用最少，但精度可能较低。
高比特模型（Q6_K、Q8_0） → 准确性更好，但需要更多内存。

📌 使用量化模型的情况： ✔ 您在CPU上运行推理，需要优化的模型。 ✔ 您的设备VRAM较低，无法加载全精度模型。 ✔ 您希望在保持合理准确性的同时减少内存占用。

📌 避免使用量化模型的情况： ❌ 您需要最高的准确性（全精度模型更适合）。 ❌ 您的硬件有足够的VRAM用于更高精度的格式（BF16/F16）。

极低比特量化（IQ3_XS、IQ3_S、IQ3_M、Q4_K、Q4_0）

这些模型针对极端内存效率进行了优化，非常适合低功耗设备或内存是关键限制因素的大规模部署。

IQ3_XS：超低比特量化（3位），具有极端的内存效率。
- 用例：最适合超低内存设备，即使Q4_K也太大的情况。
- 权衡：与高比特量化相比，准确性较低。
IQ3_S：小块大小，实现最大内存效率。
- 用例：最适合低内存设备，当IQ3_XS过于激进时。
IQ3_M：中等块大小，比IQ3_S具有更好的准确性。
- 用例：适用于低内存设备，当IQ3_S限制过多时。
Q4_K：4位量化，具有逐块优化，以提高准确性。
- 用例：最适合低内存设备，当Q6_K太大时。
Q4_0：纯4位量化，针对ARM设备进行了优化。
- 用例：最适合基于ARM的设备或低内存环境。

模型格式选择总结表

模型格式	精度	内存使用	设备要求	最佳用例
BF16	最高	高	支持BF16的GPU/CPU	减少内存的高速推理
F16	高	高	支持FP16的设备	当BF16不可用时的GPU推理
Q4_K	中低	低	CPU或低VRAM设备	内存受限环境的最佳选择
Q6_K	中	适中	内存较多的CPU	量化模型中准确性较好的选择
Q8_0	高	适中	有足够VRAM的CPU或GPU	量化模型中准确性最高的选择
IQ3_XS	非常低	非常低	超低内存设备	极端内存效率和低准确性
Q4_0	低	低	ARM或低内存设备	llama.cpp可针对ARM设备进行优化

包含的文件及详情

`granite-3.2-2b-instruct-bf16.gguf`

模型权重以BF16保存。
如果您想将模型重新量化为不同的格式，请使用此文件。
如果您的设备支持BF16加速，则是最佳选择。

`granite-3.2-2b-instruct-f16.gguf`

模型权重以F16存储。
如果您的设备支持FP16，尤其是当BF16不可用时，请使用此文件。

`granite-3.2-2b-instruct-bf16-q8_0.gguf`

输出和嵌入保持为BF16。
所有其他层量化为Q8_0。
如果您的设备支持BF16，并且您想要一个量化版本，请使用此文件。

`granite-3.2-2b-instruct-f16-q8_0.gguf`

输出和嵌入保持为F16。
所有其他层量化为Q8_0。

`granite-3.2-2b-instruct-q4_k.gguf`

输出和嵌入量化为Q8_0。
所有其他层量化为Q4_K。
适用于内存有限的CPU推理。

`granite-3.2-2b-instruct-q4_k_s.gguf`

最小的Q4_K变体，以牺牲准确性为代价使用更少的内存。
最适合极低内存设置。

`granite-3.2-2b-instruct-q6_k.gguf`

输出和嵌入量化为Q8_0。
所有其他层量化为Q6_K。

`granite-3.2-2b-instruct-q8_0.gguf`

完全Q8量化的模型，以获得更好的准确性。
需要更多内存，但提供更高的精度。

`granite-3.2-2b-instruct-iq3_xs.gguf`

IQ3_XS量化，针对极端内存效率进行了优化。
最适合超低内存设备。

`granite-3.2-2b-instruct-iq3_m.gguf`

IQ3_M量化，提供中等块大小以提高准确性。
适用于低内存设备。

`granite-3.2-2b-instruct-q4_0.gguf`

纯Q4_0量化，针对ARM设备进行了优化。
最适合低内存环境。
若追求更高准确性，建议选择IQ4_NL。

测试模型

如果您发现这些模型有用，请帮忙测试我的AI网络监控助手，进行量子就绪安全检查： 👉 免费网络监控器

💬 测试方法：

点击聊天图标（任何页面的右下角）。
选择一个AI助手类型：
- TurboLLM（GPT-4-mini）
- FreeLLM（开源）
- TestLLM（仅支持CPU的实验性模型）

测试内容

我正在探索小型开源模型在AI网络监控中的极限，具体包括：

针对实时网络服务的函数调用。
模型可以多小，同时仍能处理：
- 自动化Nmap扫描。
- 量子就绪检查。
- Metasploit集成。

🟡 TestLLM – 当前的实验性模型（在6个CPU线程上运行llama.cpp）：

✅ 零配置设置
⏳ 30秒加载时间（推理较慢，但无API成本）
🔧 寻求帮助！ 如果您对边缘设备AI感兴趣，让我们合作吧！

其他助手

🟢 TurboLLM – 使用gpt-4-mini进行：

实时网络诊断
自动化渗透测试（Nmap/Metasploit）
🔑 通过下载我们的免费网络监控代理获取更多令牌。

🔵 HugLLM – 开源模型（约80亿参数）：

比TurboLLM多2倍的令牌
AI日志分析
🌐 在Hugging Face推理API上运行。

测试的AI命令示例

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a quick Nmap vulnerability test"

评估结果

模型	ArenaHard	Alpaca-Eval-2	MMLU	PopQA	TruthfulQA	BigBenchHard	DROP	GSM8K	HumanEval	HumanEval+	IFEval	AttaQ
Llama-3.1-8B-Instruct	36.43	27.22	69.15	28.79	52.79	72.66	61.48	83.24	85.32	80.15	79.10	83.43
DeepSeek-R1-Distill-Llama-8B	17.17	21.85	45.80	13.25	47.43	65.71	44.46	72.18	67.54	62.91	66.50	42.87
Qwen-2.5-7B-Instruct	25.44	30.34	74.30	18.12	63.06	70.40	54.71	84.46	93.35	89.91	74.90	81.90
DeepSeek-R1-Distill-Qwen-7B	10.36	15.35	50.72	9.94	47.14	65.04	42.76	78.47	79.89	78.43	59.10	42.45
Granite-3.1-8B-Instruct	37.58	30.34	66.77	28.7	65.84	68.55	50.78	79.15	89.63	85.79	73.20	85.73
Granite-3.1-2B-Instruct	23.3	27.17	57.11	20.55	59.79	54.46	18.68	67.55	79.45	75.26	63.59	84.7
Granite-3.2-8B-Instruct	55.25	61.19	66.79	28.04	66.92	64.77	50.95	81.65	89.35	85.72	74.31	85.42
Granite-3.2-2B-Instruct	24.86	34.51	57.18	20.56	59.8	52.27	21.12	67.02	80.13	73.39	61.55	83.23