开源Mistral-7B-Instruct-v0.2模型 - 经剪枝压缩，无需重训保持高性能

首页

Mistral 7B Instruct V0.2 Sparsity 20 V0.1

由 wang7776 开发

Mistral-7B-Instruct-v0.2是基于Mistral-7B-Instruct-v0.1改进的指令微调大语言模型，采用Wanda剪枝方法压缩至2%稀疏度，无需重新训练即可保持竞争力性能。

大型语言模型

Transformers

开源协议:Apache-2.0 #指令微调优化 #无重训练剪枝 #多轮对话支持

下载量 80

发布时间 : 1/17/2024

模型简介

该模型是一个指令微调的大语言模型，主要用于文本生成任务，特别优化了指令跟随能力。

模型特点

Wanda剪枝技术

采用Wanda剪枝方法将稀疏度压缩至2%，无需重新训练或权重更新即可保持竞争力性能

改进的指令微调

相比v0.1版本进行了指令微调改进，提供更好的指令跟随能力

高效注意力机制

采用分组查询注意力和滑动窗口注意力机制，提高推理效率

模型能力

文本生成

指令跟随

对话系统

使用案例

对话系统

烹饪助手

可作为烹饪助手回答调味品选择和食谱问题

能提供详细的调味品偏好和食谱建议

通用问答

知识问答

可用于回答各种知识性问题

🚀 Mistral-7B-Instruct-v0.2模型介绍

本项目是一个经过优化的大语言模型，通过特定剪枝方法提升性能，同时提供了详细的使用说明和技术细节，帮助用户更好地使用该模型。

🚀 快速开始

本模型使用 Wanda剪枝方法将稀疏度修剪至 2%。此方法无需重新训练或更新权重，仍能实现具有竞争力的性能。基础模型的链接可在此处找到。

Mistral-7B-Instruct-v0.2 大语言模型（LLM）是 Mistral-7B-Instruct-v0.1 的改进版指令微调模型。

若需了解该模型的完整细节，请阅读我们的论文和发布博客文章。

✨ 主要特性

剪枝优化：使用Wanda剪枝方法，在无需重新训练的情况下将模型稀疏度修剪至2%，并保持竞争力的性能。
指令微调：是Mistral-7B-Instruct-v0.1的改进版指令微调模型，能更好地响应指令。
特定架构：基于Mistral-7B-v0.1，采用了Grouped-Query Attention、Sliding-Window Attention和Byte-fallback BPE tokenizer等架构。

📚 详细文档

指令格式

为了利用指令微调，你的提示应使用 [INST] 和 [/INST] 标记包围。第一条指令应从句子起始ID开始，后续指令则不需要。助手生成的内容将以句子结束标记ID结束。

例如：

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

这种格式可通过 apply_chat_template() 方法作为聊天模板使用：

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

模型架构

此指令模型基于 Mistral-7B-v0.1，这是一个具有以下架构选择的Transformer模型：

分组查询注意力（Grouped-Query Attention）
滑动窗口注意力（Sliding-Window Attention）
字节回退BPE分词器（Byte-fallback BPE tokenizer）

故障排除

如果你遇到以下错误：

Traceback (most recent call last):
File "", line 1, in
File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/transformers/models/auto/configuration_auto.py", line 723, in getitem
raise KeyError(key)
KeyError: 'mistral'

从源代码安装transformers库应该可以解决此问题：

pip install git+https://github.com/huggingface/transformers

在transformers-v4.33.4 之后，应该不需要这样做。

局限性

Mistral 7B Instruct 模型是一个快速演示，表明基础模型可以轻松进行微调以实现出色的性能。它没有任何审核机制。我们期待与社区合作，探讨如何让模型严格遵守规则，以便在需要审核输出的环境中进行部署。

开发团队

Albert Jiang、Alexandre Sablayrolles、Arthur Mensch、Blanche Savary、Chris Bamford、Devendra Singh Chaplot、Diego de las Casas、Emma Bou Hanna、Florian Bressand、Gianna Lengyel、Guillaume Bour、Guillaume Lample、Lélio Renard Lavaud、Louis Ternon、Lucile Saulnier、Marie-Anne Lachaux、Pierre Stock、Teven Le Scao、Théophile Gervet、Thibaut Lavril、Thomas Wang、Timothée Lacroix、William El Sayed。