deepseek-coder-1.3B-kexer开源文本模型 - 免费部署专注Kotlin代码生成

首页

Deepseek Coder 1.3B Kexer

由 JetBrains 开发

基于Deepseek-coder-1.3B微调的开源生成式文本模型，专注于Kotlin代码生成任务

大型语言模型

Transformers

其他开源协议:Apache-2.0 #Kotlin代码生成 #微调优化 #函数补全

下载量 1,937

发布时间 : 5/13/2024

模型简介

该模型是在Kotlin Exercices数据集上微调的生成式文本模型，主要用于Kotlin代码生成和补全任务

模型特点

Kotlin代码优化

专门针对Kotlin编程语言进行了优化和微调

FIM支持

支持填充中间(FIM)任务，可处理不完整代码片段

性能提升

相比基础模型在Kotlin HumanEval上的通过率提升了约10%

模型能力

Kotlin代码生成

代码补全

函数实现

代码片段填充

使用案例

软件开发

Kotlin函数实现

根据函数签名自动生成Kotlin函数实现

在Kotlin HumanEval上达到36.65%的通过率

代码补全

根据部分代码上下文自动补全完整代码

🚀 Kexer模型

Kexer模型是一系列开源的生成式文本模型，它们在Kotlin Exercices数据集上进行了微调。本仓库是微调后的Deepseek-coder-1.3b模型，采用Hugging Face Transformers格式。

🚀 快速开始

Kexer模型可用于文本生成任务，以下是使用示例。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load pre-trained model and tokenizer
model_name = 'JetBrains/deepseek-coder-1.3B-kexer'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda')

# Create and encode input
input_text = """\
This function takes an integer n and returns factorial of a number:
fun factorial(n: Int): Int {\
"""
input_ids = tokenizer.encode(
    input_text, return_tensors='pt'
).to('cuda')

# Generate
output = model.generate(
    input_ids, max_length=60, num_return_sequences=1, 
    early_stopping=True, pad_token_id=tokenizer.eos_token_id,
)

# Decode output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

高级用法

和基础模型一样，我们可以使用FIM。要实现这一点，必须使用以下格式：

'<｜fim▁begin｜>' + prefix + '<｜fim▁hole｜>' + suffix + '<｜fim▁end｜>'

📚 详细文档

训练设置

该模型在一块A100 GPU上进行训练，使用了以下超参数：

属性	详情
模型类型	基于Deepseek-coder-1.3B微调的模型
训练数据	Kotlin Exercices 数据集

超参数	值
`warmup`	10%
`max_lr`	1e - 4
`scheduler`	linear
`total_batch_size`	256（每步约130K个标记）
`num_epochs`	4

更多关于微调的详细信息可在技术报告中找到（即将发布！）。

微调数据

为了微调此模型，我们使用了合成生成的Kotlin Exercices数据集中的15K个示例。每个示例都遵循HumanEval格式。该数据集总共包含约350万个标记。

评估

为了进行评估，我们使用了Kotlin HumanEval数据集，该数据集包含由人类专家翻译成Kotlin的HumanEval中的所有161个任务。你可以在数据集页面上找到获得我们评估结果所需的预处理详细信息，包括运行代码。

以下是我们的评估结果：

模型名称	Kotlin HumanEval通过率
`Deepseek-coder-1.3B`	26.71
`Deepseek-coder-1.3B-Kexer`	36.65

伦理考量与局限性

Deepseek-coder-1.3B-Kexer是一项新技术，使用时存在风险。到目前为止所进行的测试并未涵盖，也不可能涵盖所有场景。因此，和所有大语言模型一样，Deepseek-coder-1.3B-Kexer的潜在输出无法提前预测，在某些情况下，该模型可能会对用户的提示产生不准确或令人反感的响应。该模型是在特定的数据格式（Kotlin任务）上进行微调的，偏离这种格式也可能导致对用户查询的不准确或不理想的响应。因此，在部署Deepseek-coder-1.3B-Kexer的任何应用程序之前，开发人员应该针对其具体应用对模型进行安全测试和调整。