Deepseek Coder 6.7B Instruct开源编程AI助手 - 免费解答计算机科学问题

首页

Deepseek Coder 6.7B Instruct AWQ

由 TheBloke 开发

Deepseek Coder 6.7B Instruct 是一个专注于编程任务的AI助手模型，由DeepSeek公司开发。它专门用于回答与计算机科学相关的问题，拒绝回答非技术性问题。

大型语言模型

Transformers

开源协议:其他 #编程助手 #代码生成 #计算机科学专用

下载量 248

发布时间 : 11/5/2023

模型简介

这是一个67亿参数的编程专用语言模型，基于DeepSeek的架构开发，专门优化用于代码生成、解释和调试等编程相关任务。

模型特点

编程专用

专门针对编程任务优化，能够高效处理代码相关的问题

安全限制

内置安全机制，拒绝回答政治敏感、隐私安全等非技术问题

高效推理

支持AWQ量化，可在消费级硬件上高效运行

长上下文支持

支持长达16384 tokens的上下文窗口

模型能力

代码生成

代码解释

编程问题解答

代码调试

算法实现

使用案例

软件开发

代码自动补全

根据上下文自动生成代码片段

提高开发效率，减少重复编码工作

代码审查

分析代码并提出改进建议

帮助开发者提高代码质量

编程学习

编程概念解释

解释复杂的编程概念和算法

帮助初学者理解编程知识

🚀 Deepseek Coder 6.7B Instruct - AWQ

本项目提供了Deepseek Coder 6.7B Instruct模型的AWQ量化版本，该模型由DeepSeek团队开发，可用于计算机科学相关问题的编程辅助。

🚀 快速开始

模型信息

属性	详情
模型创建者	DeepSeek
原始模型	Deepseek Coder 6.7B Instruct
模型类型	deepseek
量化者	TheBloke
许可证	查看

模型仓库

提示模板

You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:

✨ 主要特性

AWQ量化方法

AWQ是一种高效、准确且极快的低比特权重量化方法，目前支持4位量化。与GPTQ相比，它在基于Transformer的推理中速度更快，并且在质量上与最常用的GPTQ设置相当或更优。

多平台支持

Text Generation Webui - 使用Loader: AutoAWQ
vLLM - 仅支持Llama和Mistral模型
Hugging Face Text Generation Inference (TGI)
AutoAWQ - 可在Python代码中使用

📦 安装指南

在text-generation-webui中使用

请确保使用的是text-generation-webui的最新版本。强烈建议使用text-generation-webui的一键安装程序，除非你确定知道如何手动安装。

点击Model tab。
在Download custom model or LoRA下，输入TheBloke/deepseek-coder-6.7B-instruct-AWQ。
点击Download。
模型将开始下载，完成后会显示“Done”。
在左上角，点击Model旁边的刷新图标。
在Model下拉菜单中，选择刚刚下载的模型：deepseek-coder-6.7B-instruct-AWQ。
选择Loader: AutoAWQ。
点击Load，模型将加载并准备好使用。
如果你需要自定义设置，设置完成后点击Save settings for this model，然后在右上角点击Reload the Model。
准备好后，点击Text Generation标签并输入提示以开始！

vLLM多用户推理服务器

安装和使用vLLM的文档可在此处找到。

请确保使用的是vLLM版本0.2或更高版本。
使用vLLM作为服务器时，传递--quantization awq参数。

例如：

python3 python -m vllm.entrypoints.api_server --model TheBloke/deepseek-coder-6.7B-instruct-AWQ --quantization awq

Hugging Face Text Generation Inference (TGI)多用户推理服务器

使用TGI版本1.1.0或更高版本。官方Docker容器为：ghcr.io/huggingface/text-generation-inference:1.1.0

示例Docker参数：

--model-id TheBloke/deepseek-coder-6.7B-instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例Python代码与TGI交互（需要huggingface-hub 0.17.0或更高版本）：

pip3 install huggingface-hub

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

使用AutoAWQ从Python代码进行推理

安装AutoAWQ包

需要AutoAWQ 0.1.1或更高版本。

pip3 install autoawq

如果你在使用预构建的轮子安装AutoAWQ时遇到问题，可以从源代码安装：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基础用法

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "TheBloke/deepseek-coder-6.7B-instruct-AWQ"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

print("*** Running model.generate:")

token_input = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    token_input,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=512
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("LLM output: ", text_output)

"""
# Inference should be possible with transformers pipeline as well in future
# But currently this is not yet supported by AutoAWQ (correct as of September 25th 2023)
from transformers import pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])
"""

高级用法

在vLLM中使用时，可通过设置不同的参数来调整生成结果：

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/deepseek-coder-6.7B-instruct-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

📚 详细文档

模型详情

deepseek-coder-6.7b-instruct是一个具有67亿参数的模型，它从deepseek-coder-6.7b-base初始化，并在20亿个指令数据令牌上进行了微调。

主页：DeepSeek
仓库：deepseek-ai/deepseek-coder
与DeepSeek Coder聊天：DeepSeek-Coder

模型使用示例

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
# 32021 is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

许可证

此代码仓库遵循MIT许可证。DeepSeek Coder模型的使用需遵循模型许可证。DeepSeek Coder支持商业使用。更多详细信息请参阅LICENSE-MODEL。

联系我们

如果您有任何问题，请提出问题或通过agi_code@deepseek.com联系我们。

🔧 技术细节

提供的文件和AWQ参数

首次发布AWQ模型时，仅发布128g模型。如果有需求，并且在进行困惑度和评估比较后，会考虑添加32g模型，但目前32g模型仍未在AutoAWQ和vLLM中完全测试。

模型以分片的safetensors文件形式发布。

分支	比特数	分组大小	AWQ数据集	序列长度	大小
main	4	128	Evol Instruct Code	16384	3.89 GB

兼容性

提供的文件经过测试，可与以下工具配合使用：

text-generation-webui 使用 Loader: AutoAWQ
vLLM 版本0.2.0及更高版本
Hugging Face Text Generation Inference (TGI) 版本1.1.0及更高版本
AutoAWQ 版本0.1.1及更高版本

📄 许可证

本项目遵循DeepSeek Coder的许可证，详情请见LICENSE。

Discord

如需进一步支持，以及讨论这些模型和人工智能相关话题，请加入我们的： TheBloke AI的Discord服务器

感谢与贡献方式

感谢chirper.ai团队！感谢来自gpus.llm-utils.org的Clay！

很多人询问是否可以进行贡献。我喜欢提供模型并帮助他人，也希望能够花更多时间做这些事情，同时拓展到新的项目，如微调/训练。

如果您有能力且愿意贡献，将不胜感激，这将帮助我继续提供更多模型，并开展新的人工智能项目。

捐赠者将在所有AI/LLM/模型问题和请求上获得优先支持，访问私人Discord房间，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别感谢：Aemon Algiz。

Patreon特别提及：Brandon Frisco, LangChain4j, Spiking Neurons AB, transmissions 11, Joseph William Delisle, Nitin Borwankar, Willem Michiel, Michael Dempsey, vamX, Jeffrey Morgan, zynix, jjj, Omer Bin Jawed, Sean Connelly, jinyuan sun, Jeromy Smith, Shadi, Pawan Osman, Chadd, Elijah Stavena, Illia Dulskyi, Sebastain Graf, Stephen Murray, terasurfer, Edmond Seymore, Celu Ramasamy, Mandus, Alex, biorpg, Ajan Kanaga, Clay Pascal, Raven Klaugh, 阿明, K, ya boyyy, usrbinkat, Alicia Loh, John Villwock, ReadyPlayerEmma, Chris Smitley, Cap'n Zoog, fincy, GodLy, S_X, sidney chen, Cory Kujawski, OG, Mano Prime, AzureBlack, Pieter, Kalila, Spencer Kim, Tom X Nguyen, Stanislav Ovsiannikov, Michael Levine, Andrey, Trailburnt, Vadim, Enrico Ros, Talal Aujan, Brandon Phillips, Jack West, Eugene Pentland, Michael Davis, Will Dee, webtim, Jonathan Leane, Alps Aficionado, Rooh Singh, Tiffany J. Kim, theTransient, Luke @flexchar, Elle, Caitlyn Gatomon, Ari Malik, subjectnull, Johann-Peter Hartmann, Trenton Dambrowitz, Imad Khwaja, Asp the Wyvern, Emad Mostaque, Rainer Wilmers, Alexandros Triantafyllidis, Nicholas, Pedro Madruga, SuperWojo, Harry Royden McLaughlin, James Bentley, Olakabola, David Ziegler, Ai Maven, Jeff Scroggin, Nikolai Manek, Deo Leter, Matthew Berman, Fen Risland, Ken Nordquist, Manuel Alberto Morcote, Luke Pendergrass, TL, Fred von Graf, Randy H, Dan Guido, NimbleBox.ai, Vitor Caleffi, Gabriel Tamborski, knownsqashed, Lone Striker, Erik Bjäreholt, John Detwiler, Leonard Tan, Iucharbius

感谢所有慷慨的赞助者和捐赠者！再次感谢a16z的慷慨资助。