WizardCoder-33B-V1.1开源代码大模型 - 免费部署，基准测试表现优异的实用之选

首页

Wizardcoder 33B V1.1

由 WizardLMTeam 开发

WizardCoder-33B-V1.1是基于deepseek-coder-33b-base训练的开源代码大语言模型，在HumanEval和MBPP等基准测试上表现优异，成为SOTA开源代码LLM。

大型语言模型

Transformers

其他#代码生成SOTA #编程任务优化 #多语言代码支持

下载量 293

发布时间 : 1/4/2024

模型简介

WizardCoder是一个通过Evol-Instruct方法增强的代码生成大语言模型，专注于代码生成和编程任务。

模型特点

高性能代码生成

在HumanEval上达到79.9 pass@1，超越ChatGPT 3.5和Gemini Pro

Evol-Instruct训练方法

采用创新的Evol-Instruct方法增强模型能力

开源SOTA

当前开源代码LLM中的最先进模型

模型能力

代码自动补全

代码生成

编程问题解答

代码解释

代码重构

使用案例

软件开发

自动化代码生成

根据自然语言描述生成可运行的代码

在HumanEval基准上达到79.9%的通过率

编程教育

帮助学生理解和学习编程概念

技术面试准备

编程题解答

生成编程面试题的解决方案

🚀 WizardCoder：通过Evol - Instruct赋能代码大语言模型

WizardCoder是一款专注于代码生成的大语言模型，借助Evol - Instruct技术提升性能。它在多个代码评估数据集上表现出色，为代码生成领域带来了新的解决方案。

关键信息

属性	详情
模型类型	WizardCoder
评估指标	code_eval
训练数据处理	对Code - Alpaca数据应用Code Evol - Instruct
测试数据集	openai_humaneval（HumanEval）
pass@1指标值	0.799

项目链接

📢 最新消息

[2024/01/04] 🔥 我们发布了基于deepseek - coder - 33b - base训练的WizardCoder - 33B - V1.1，它是EvalPlus排行榜上的最优开源代码大语言模型，在HumanEval上的pass@1达到0.799，在HumanEval - Plus上为0.732，在MBPP上为0.789，在MBPP - Plus上为0.669。
[2024/01/04] 🔥 WizardCoder - 33B - V1.1在HumanEval和HumanEval - Plus的pass@1指标上超过了ChatGPT 3.5、Gemini Pro和DeepSeek - Coder - 33B - instruct。
[2024/01/04] 🔥 WizardCoder - 33B - V1.1在MBPP和MBPP - Plus的pass@1指标上与ChatGPT 3.5相当，超过了Gemini Pro。

模型性能对比

模型	检查点	论文	HumanEval	HumanEval+	MBPP	MBPP+	许可证
GPT - 4 - Turbo (Nov 2023)	-	-	85.4	81.7	83.0	70.7	-
GPT - 4 (May 2023)	-	-	88.4	76.8	-	-	-
GPT - 3.5 - Turbo (Nov 2023)	-	-	72.6	65.9	81.7	69.4	-
Gemini Pro	-	-	63.4	55.5	72.9	57.9	-
DeepSeek - Coder - 33B - instruct	-	-	78.7	72.6	78.7	66.7	-
WizardCoder - 33B - V1.1	🤗 HF链接	📃 WizardCoder	79.9	73.2	78.9	66.9	MSFTResearch
WizardCoder - Python - 34B - V1.0	🤗 HF链接	📃 WizardCoder	73.2	64.6	73.2	59.9	Llama2
WizardCoder - 15B - V1.0	🤗 HF链接	📃 WizardCoder	59.8	52.4	--	--	OpenRAIL - M
WizardCoder - Python - 13B - V1.0	🤗 HF链接	📃 WizardCoder	64.0	--	--	--	Llama2
WizardCoder - Python - 7B - V1.0	🤗 HF链接	📃 WizardCoder	55.5	--	--	--	Llama2
WizardCoder - 3B - V1.0	🤗 HF链接	📃 WizardCoder	34.8	--	--	--	OpenRAIL - M
WizardCoder - 1B - V1.0	🤗 HF链接	📃 WizardCoder	23.8	--	--	--	OpenRAIL - M

📦 训练数据制作

对Code - Alpaca数据应用我们的Code Evol - Instruct。

❗ 数据污染检查

在模型训练前，我们仔细严格地检查了所有训练数据，并使用多种去重方法来验证和防止在HumanEval和MBPP测试集上的数据泄露。

⚠️ 重要提示

请严格使用与我们相同的系统提示，我们不保证量化版本的准确性。

默认版本：

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

💻 复现WizardCoder - 33B - V1.1的性能

依赖环境

transformers==4.36.2
vllm==0.2.5

代码和结果

我们在这里提供了所有代码。
我们也在这里提供了所有生成的结果。

（1）HumanEval和HumanEval - Plus

步骤1：代码生成（无加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 164 problems, 21 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 21))
  end_index=$(((i + 1) * 21))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python humaneval_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --greedy_decode
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

步骤1：代码生成（使用vllm加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode_vllm

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
    --start_index 0 --end_index 164 --temperature ${temp} \
    --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4 --overwrite

步骤2：获取分数

安装Eval - Plus基准测试：

git clone https://github.com/evalplus/evalplus.git
cd evalplus
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

获取HumanEval和HumanEval - Plus分数：

output_path=preds/T0.0_N1_WizardCoder-33B-V1.1_Greedy_Decode

echo 'Output path: '$output_path
python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evalplus.evaluate --dataset humaneval --samples ${output_path}.jsonl

（2）MBPP和MBPP - Plus

预处理后的问题在mbppplus.json中提供。

步骤1：代码生成（无加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/MBPP_T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 399 problems, 50 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 50))
  end_index=$(((i + 1) * 50))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python mbppplus_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --mbpp_path "mbppplus.json" --greedy_decode
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

步骤1：代码生成（使用vllm加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/MBPP_T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode_vllm

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python mbppplus_gen_vllm.py --model ${model} \
    --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
    --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --mbpp_path "mbppplus.json" --num_gpus 4

步骤2：获取分数

安装Eval - Plus基准测试：

git clone https://github.com/evalplus/evalplus.git
cd evalplus
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

获取MBPP和MBPP - Plus分数：

output_path=preds/MBPP_T0.0_N1_WizardCoder-33B-V1.1_Greedy_Decode

echo 'Output path: '$output_path
python mbppplus_process_preds.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evalplus.evaluate --dataset mbpp --samples ${output_path}.jsonl

📄 引用

如果您使用了本仓库中的数据、方法或代码，请引用该仓库：

@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}