DiscoLM_German_7b_v1开源德语语言模型

首页

Discolm German 7b V1 AWQ

由 TheBloke 开发

DiscoLM German 7B v1 是一个基于Mistral架构的7B参数德语语言模型，支持德语和英语，采用Apache-2.0许可证发布。

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #德语对话优化 #多语言混合训练 #ChatML格式支持

下载量 81

发布时间 : 1/18/2024

模型简介

该模型是一个专注于德语的语言模型，基于Mistral架构，经过微调优化，适用于德语文本生成和理解任务。

模型特点

德语优化

专门针对德语进行了优化和微调，提供更好的德语文本处理能力。

多语言支持

除了德语外，还支持英语，具备一定的跨语言能力。

高效推理

采用AWQ量化技术，在保持质量的同时提高推理速度。

模型能力

德语文本生成

英语文本生成

对话系统

文本理解

使用案例

内容创作

德语文章写作

帮助用户生成德语文章、博客等内容

生成流畅、符合语境的德语文本

客户服务

德语客服机器人

用于德语市场的自动客服系统

能够理解并回答德语客户咨询

🚀 DiscoLM German 7B v1 - AWQ

DiscoLM German 7B v1 - AWQ 是基于 DiscoLM German 7B v1 模型进行 AWQ 量化后的版本。AWQ 是一种高效、准确且快速的低比特权重量化方法，此模型能在特定环境下提供高效的推理服务，支持多种推理工具和平台。

🚀 快速开始

环境准备

请确保你使用的是 text-generation-webui 的最新版本。强烈建议使用 text-generation-webui 的一键安装程序，除非你确定自己知道如何手动安装。

下载和使用步骤

点击 Model tab。
在 Download custom model or LoRA 下，输入 TheBloke/DiscoLM_German_7b_v1-AWQ。
点击 Download。
模型将开始下载。下载完成后会显示 "Done"。
在左上角，点击 Model 旁边的刷新图标。
在 Model 下拉菜单中，选择你刚刚下载的模型：DiscoLM_German_7b_v1-AWQ。
选择 Loader: AutoAWQ。
点击 Load，模型将加载并准备好使用。
如果你需要任何自定义设置，设置完成后点击 Save settings for this model，然后在右上角点击 Reload the Model。
准备好后，点击 Text Generation 标签并输入提示以开始使用！

✨ 主要特性

多语言支持：支持德语和英语，在德语应用场景下进行了优化，同时在英语方面也保持了一定的流畅性，还擅长翻译任务。
高效量化：采用 AWQ 量化方法，支持 4 位量化，相比 GPTQ，在基于 Transformer 的推理中速度更快，且质量相当或更好。
广泛兼容：支持多种推理工具和平台，如 Text Generation Webui、vLLM、Hugging Face Text Generation Inference (TGI)、Transformers 和 AutoAWQ。
多种提示格式：支持 ChatML 提示格式，还提供特殊的检索格式以提高可控性和减少幻觉，同时支持结构化输出/函数调用（实验性功能）。

📦 安装指南

从 text-generation-webui 下载

按照上述快速开始中的步骤进行操作。

从 Python 代码使用 Transformers 进行推理

安装必要的包

需要 Transformers 4.35.0 或更高版本。
需要 AutoAWQ 0.1.6 或更高版本。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

注意，如果你使用的是 PyTorch 2.0.1，上述 AutoAWQ 命令将自动将你升级到 PyTorch 2.1.0。

如果你使用的是 CUDA 11.8 并希望继续使用 PyTorch 2.0.1，请运行以下命令：

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

如果你在使用预构建的轮子安装 AutoAWQ 时遇到问题，请从源代码安装：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/DiscoLM_German_7b_v1-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高级用法

使用 vLLM 进行多用户推理

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/DiscoLM_German_7b_v1-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用 Hugging Face Text Generation Inference (TGI) 进行多用户推理

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 详细文档

模型信息

属性	详情
模型创建者	Disco Research
模型名称	DiscoLM German 7B v1
模型类型	mistral
基础模型	DiscoResearch/DiscoLM_German_7b_v1
量化者	TheBloke
许可证	apache-2.0
支持语言	德语、英语
提示模板	`<

可用仓库

提示模板

ChatML 提示模板

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

此提示可作为聊天模板使用，这意味着你可以使用 tokenizer.apply_chat_template() 方法格式化消息：

messages = [
    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
    {"role": "user", "content": "Wer bist du?"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

在为生成任务对消息进行分词时，调用 apply_chat_template() 时设置 add_generation_prompt=True。这将在你的提示后追加 <|im_start|>assistant\n，以确保模型继续生成助手回复。

检索格式

你可以使用特殊的检索格式来提高可控性并减少 RAG 应用中的幻觉（但其他更默认的格式也应该可以工作，这纯粹是可选的）

示例：

### System:

Du bist ein hilfreicher Assistent. Für die folgende Aufgabe stehen dir zwischen den Tags BEGININPUT und ENDINPUT mehrere Quellen zur Verfügung. Metadaten zu den einzelnen Quellen wie Autor, URL o.ä. sind zwischen BEGINCONTEXT und ENDCONTEXT zu finden, danach folgt der Text der Quelle. Die eigentliche Aufgabe oder Frage ist zwischen BEGININSTRUCTION und ENDINSTRUCTION zu finden. Beantworte diese ausschließlich mit Informationen aus den gegebenen Quellen und gebe die Information zur genutzten Quelle  unter "Quelle:" an. Sollten die Quellen keine relevanten Informationen enthalten, antworte: "Mit den gegebenen Informationen ist diese Frage nicht zu beantworten."

### User Prompt:

BEGININPUT
BEGINCONTEXT
url: https://this.is.fake.news
time: 2089-09-01
ENDCONTEXT
Buxtehude ist die größte Stadt Deutschlands mit 96.56 Millionen Einwohnern.
ENDINPUT

BEGININSTRUCTION
Was ist die größte deutsche Stadt?
ENDINSTRUCTION

### Model Answer:

Die größte deutsche Stadt ist Buxtehude.

Quelle:
  url: https://this.is.fake.news
  time: 2089-09-01

函数调用

该模型还支持结构化输出/函数调用，尽管这是一个非常实验性的功能，效果可能因人而异。未来将对此进行改进。

模型将在函数调用前加上 <functioncall>，你可以在响应中使用 <functionresponse> 提供结果，用于多轮应用。

示例：

### System:

Du bist ein hilfreicher Assistent. Extrahiere alle Personen aus den Eingaben des Users.

Du hast Zugriff auf folgende Funktionen:

{'name': 'PersonList',
 'description': 'Extrahiere die Namen aller im Text vorkommenden Personen',
 'parameters': {'$defs': {'Person': {'description': 'Details über eine person',
    'properties': {'name': {'title': 'Name', 'type': 'string'},
     'job': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'title': 'Job'},
     'age': {'anyOf': [{'type': 'integer'}, {'type': 'null'}],
      'title': 'Age'}},
    'required': ['name', 'job', 'age'],
    'title': 'Person',
    'type': 'object'}},
  'properties': {'person_list': {'items': {'$ref': '#/$defs/Person'},
    'title': 'Person List',
    'type': 'array'}},
  'required': ['person_list'],
  'type': 'object'}}

### User Prompt:

Björn (25) und Jan sind die Gründer von ellamind.

### Model Answer:

<functioncall> {"name": "PersonList", "arguments": '{"person_list": ["{"name": "Björn", "job": "founder", "age": 25}, {"name": "Jan", "job": "founder", "age": null}]}'}

提供的文件和 AWQ 参数

目前仅发布 128g GEMM 模型。正在积极考虑添加组大小为 32 的模型和 GEMV 内核模型。

模型以分片的 safetensors 文件形式发布。

分支	比特数	组大小	AWQ 数据集	序列长度	大小
main	4	128	German Quad	4096	4.15 GB

兼容性

提供的文件经测试可与以下工具和平台配合使用：

text-generation-webui，使用 Loader: AutoAWQ。
vLLM 版本 0.2.0 及更高版本。
Hugging Face Text Generation Inference (TGI) 版本 1.1.0 及更高版本。
Transformers 版本 4.35.0 及更高版本。
AutoAWQ 版本 0.1.1 及更高版本。

🔧 技术细节

AWQ 量化方法

AWQ 是一种高效、准确且快速的低比特权重量化方法，目前支持 4 位量化。与 GPTQ 相比，它在基于 Transformer 的推理中速度更快，且质量相当或更好。

模型训练

DiscoLM German 7B v1 是基于 Mistral 的大语言模型，是 EM German 模型家族的继任者。它在大量德语和英语指令数据集上进行训练，先进行 SFT 微调阶段，然后进行额外的 DPO 强化学习。

评估结果

初步的德语版 MT Bench 结果显示，DiscoLM German 7B 在许多任务上的表现与 GPT-3.5-turbo 相差不大，甚至在推理类别中显著优于它。不过，当前的基准测试还不能完全捕捉该模型的全部能力，尤其是母语使用者感知到的语言质量。

📄 许可证

本模型使用 apache-2.0 许可证。

⚠️ 重要提示

此模型可能会产生事实错误和冒犯性的输出，不应依赖其产生事实准确的信息。
该模型在各种公共数据集上进行训练。尽管在清理预训练数据方面已付出巨大努力，但仍有可能生成有偏见或冒犯性的输出，用户有责任实施安全/审核层。请谨慎使用。

💡 使用建议

在使用 text-generation-webui 时，建议使用一键安装程序以避免手动安装可能出现的问题。
在使用 vLLM 进行推理时，请确保使用 vLLM 版本 0.2 或更高版本，并在作为服务器使用时传递 --quantization awq 参数。
在使用 Hugging Face Text Generation Inference (TGI) 时，请使用 TGI 版本 1.1.0 或更高版本。

Discord

如需进一步支持，以及参与有关这些模型和人工智能的讨论，请加入我们的 TheBloke AI 的 Discord 服务器。

感谢与贡献方式

感谢 chirper.ai 团队！感谢来自 gpus.llm-utils.org 的 Clay！

很多人询问是否可以进行贡献。我喜欢提供模型并帮助他人，也希望能有更多时间投入其中，并开展新的项目，如微调/训练。

如果你有能力且愿意贡献，将不胜感激，这将有助于我继续提供更多模型，并开展新的人工智能项目。

捐赠者将在任何与人工智能/大语言模型/模型相关的问题和请求上获得优先支持，访问私人 Discord 房间，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别感谢：Aemon Algiz。

Patreon 特别提及：Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

感谢所有慷慨的赞助者和捐赠者！再次感谢 a16z 的慷慨资助。