OpenHermes 2.5开源语言模型 - 免费部署，高效代码生成与通用任务处理！

首页

Openhermes 2.5 Mistral 7B GPTQ

由 TheBloke 开发

OpenHermes 2.5是基于Mistral-7B微调的高级语言模型，专注于代码生成和通用任务处理，性能优于前代版本。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #多轮对话优化 #代码能力增强 #GPT-4级微调

下载量 695

发布时间 : 11/2/2023

模型简介

OpenHermes 2.5是由Teknium开发的Mistral-7B微调模型，通过增加代码数据集训练提升了多项基准测试表现，特别擅长代码生成和复杂问题解决。

模型特点

代码能力增强

通过额外训练代码数据集，humaneval分数从43%提升至50.7%，显著提升编程问题解决能力

多基准测试提升

在TruthfulQA、AGIEval和GPT4All等非代码基准测试中表现优异，综合能力全面增强

ChatML格式支持

采用标准化的ChatML提示模板，便于集成到各类对话系统中

模型能力

文本生成

代码生成

复杂问题解答

角色扮演对话

知识问答

使用案例

编程辅助

代码生成与解释

根据自然语言描述生成功能代码或解释现有代码

humaneval pass@1达到50.7%

智能对话

个性化角色扮演

模拟特定角色或人物进行自然对话

可模拟动漫人物、历史人物等

知识问答

复杂问题解答

回答各类知识性问题并提供详细解释

在AGIEval基准测试中表现优异

🚀 Openhermes 2.5 Mistral 7B - GPTQ

Openhermes 2.5 Mistral 7B - GPTQ 是一个基于 Mistral 架构的量化模型，可用于多种自然语言处理任务，如文本生成、问答等。它提供了多种量化参数选项，以适应不同的硬件和需求。

🚀 快速开始

下载模型

你可以通过以下几种方式下载该模型：

在 text-generation-webui 中下载

从 main 分支下载，在“Download custom model or LoRA”中输入 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ。
从其他分支下载，在下载名称后添加 :branchname，例如 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ:gptq-4bit-32g-actorder_True。

从命令行下载

推荐使用 huggingface-hub Python 库：

pip3 install huggingface-hub

下载 main 分支到 OpenHermes-2.5-Mistral-7B-GPTQ 文件夹：

mkdir OpenHermes-2.5-Mistral-7B-GPTQ
huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --local-dir OpenHermes-2.5-Mistral-7B-GPTQ --local-dir-use-symlinks False

从其他分支下载，添加 --revision 参数：

mkdir OpenHermes-2.5-Mistral-7B-GPTQ
huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir OpenHermes-2.5-Mistral-7B-GPTQ --local-dir-use-symlinks False

使用 `git` 下载（不推荐）

git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ

在 text-generation-webui 中使用模型

确保使用的是 text-generation-webui 的最新版本。
点击 Model tab。
在 Download custom model or LoRA 中输入 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ。
- 若要从特定分支下载，输入如 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ:gptq-4bit-32g-actorder_True。
点击 Download。
模型下载完成后会显示 "Done"。
点击左上角 Model 旁边的刷新图标。
在 Model 下拉菜单中选择刚下载的模型 OpenHermes-2.5-Mistral-7B-GPTQ。
模型将自动加载，即可使用。
如需自定义设置，设置后点击 Save settings for this model 再点击 Reload the Model。

从 Text Generation Inference (TGI) 服务使用模型

推荐使用 TGI 版本 1.1.0 或更高版本，官方 Docker 容器为：ghcr.io/huggingface/text-generation-inference:1.1.0。示例 Docker 参数：

--model-id TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例 Python 代码：

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

从 Python 代码使用该 GPTQ 模型

安装必要的包

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 使用 cu117 如果是 CUDA 11.7

若安装 AutoGPTQ 有问题，从源码安装：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

使用代码示例

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
# 若使用不同分支，更改 revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用 transformers 的 pipeline 进行推理
print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

多种量化参数选项：提供多种量化参数，可根据硬件和需求选择最佳参数。
广泛的兼容性：已知可在多个推理服务器和 Web UI 中使用，如 text-generation-webui、KoboldAI United 等。
高性能：在多个基准测试中表现出色，如 GPT4All、AGIEval 等。

📦 安装指南

安装所需的依赖包：

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 使用 cu117 如果是 CUDA 11.7

若安装 AutoGPTQ 有问题，从源码安装：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

高级用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
# 使用不同分支
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="gptq-4bit-32g-actorder_True")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

📚 详细文档

模型信息

属性	详情
模型类型	Mistral
训练数据	主要基于 1,000,000 条 GPT - 4 生成的数据，以及来自 AI 领域开放数据集的其他高质量数据
模型创建者	Teknium
量化者	TheBloke

提示模板

使用 ChatML 提示模板：

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

已知兼容的客户端/服务器

提供的文件和 GPTQ 参数

多个量化参数可供选择，每个单独的量化在不同的分支中。大多数 GPTQ 文件使用 AutoGPTQ 制作，Mistral 模型目前使用 Transformers 制作。

GPTQ 参数说明

Bits：量化模型的位大小。
GS：GPTQ 组大小。较高的数字使用较少的 VRAM，但量化精度较低。“None” 是最低可能值。
Act Order：真或假。也称为 desc_act。真会导致更好的量化精度。一些 GPTQ 客户端在使用 Act Order 加组大小的模型时遇到过问题，但现在一般已解决。
Damp %：影响量化样本处理方式的 GPTQ 参数。默认值为 0.01，但 0.1 会导致稍高的精度。
GPTQ 数据集：量化期间使用的校准数据集。使用更适合模型训练的数据集可以提高量化精度。请注意，GPTQ 校准数据集与用于训练模型的数据集不同，请参考原始模型仓库了解训练数据集的详细信息。
序列长度：量化使用的数据集序列长度。理想情况下，这与模型序列长度相同。对于一些非常长序列的模型（16+K），可能需要使用较低的序列长度。请注意，较低的序列长度不会限制量化模型的序列长度，它仅影响较长推理序列的量化精度。
ExLlama 兼容性：该文件是否可以使用 ExLlama 加载，目前 ExLlama 仅支持 4 位的 Llama 和 Mistral 模型。

分支	Bits	GS	Act Order	Damp %	GPTQ 数据集	Seq Len	大小	ExLlama	描述
main	4	128	Yes	0.1	wikitext	4096	4.16 GB	Yes	4 位，带有 Act Order 和组大小 128g。比 64g 使用更少的 VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	Yes	0.1	wikitext	4096	4.57 GB	Yes	4 位，带有 Act Order 和组大小 32g。提供最高的推理质量，使用最大的 VRAM。
gptq-8bit--1g-actorder_True	8	None	Yes	0.1	wikitext	4096	4.95 GB	No	8 位，带有 Act Order。无组大小，以降低 VRAM 需求。
gptq-8bit-128g-actorder_True	8	128	Yes	0.1	wikitext	4096	5.00 GB	No	8 位，组大小 128g 以提高推理质量，带有 Act Order 以提高精度。
gptq-8bit-32g-actorder_True	8	32	Yes	0.1	wikitext	4096	4.97 GB	No	8 位，组大小 32g 和 Act Order 以实现最大推理质量。
gptq-4bit-64g-actorder_True	4	64	Yes	0.1	wikitext	4096	4.30 GB	Yes	4 位，带有 Act Order 和组大小 64g。比 32g 使用更少的 VRAM，但精度稍低。