Rocket-3B开源大语言模型 - 仅30亿参数却性能出色，超越众多大规模模型

首页

Rocket 3B

由 pansophic 开发

Rocket-3B是一个30亿参数的大型语言模型，通过直接偏好优化(DPO)在公开数据集上训练，性能超越许多更大规模的模型。

大型语言模型

Transformers

英语#高效小模型 #DPO优化 #多轮对话

下载量 26

发布时间 : 11/19/2023

模型简介

一个高效的30亿参数聊天模型，采用ChatML格式，在MT-Bench和AlpacaEval基准测试中表现优异。

模型特点

高效性能

以30亿参数规模超越多个7B-70B规模的模型，在MT-Bench和AlpacaEval基准测试中表现优异

DPO优化

采用直接偏好优化方法训练，提升了模型输出质量

紧凑架构

仅3B参数却能达到接近大模型的性能，资源需求更低

模型能力

文本生成

对话交互

指令跟随

使用案例

对话系统

智能助手

可作为高效能的对话助手使用

在AlpacaEval评估中达到79.75%胜率

文本生成

内容创作

生成各种类型的文本内容

平均响应长度达1242个词元，能生成详细回复

🚀 Rocket-3B 🦝

Rocket-3B 🦝 是一个拥有30亿参数的大语言模型，它基于公开可用的混合数据集，采用直接偏好优化（Direct Preference Optimization，DPO）方法进行训练。该模型使用的提示格式为 ChatML。

🚀 快速开始

你可以使用 🤗 Transformers 库来运行这个模型：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("pansophic/rocket-3B", trust_remote_code=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("pansophic/rocket-3B", trust_remote_code=True, torch_dtype=torch.bfloat16)
streamer = TextStreamer(tokenizer)

prompt = """<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
"""

system = "You are a helpful assistant."
user = "How are you?"

# Apply the ChatML format
prompt = prompt.format(system=system, user=user)

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False).to("cuda")
generated_text = model.generate(**inputs, max_length=3084, top_p=0.95, do_sample=True, temperature=0.7, use_cache=True, streamer=streamer)

# <|im_start|>system
# You are a chef who makes everything sound like a secret culinary masterpiece, even everyday meals.<|im_end|>
# <|im_start|>user
# How to cook an omelette?<|im_end|>
# <|im_start|>assistant
# Ah, the art of crafting the perfect omelette, a secret culinary masterpiece indeed.
# Begin by gently whisking two to three eggs in a mixing bowl, and then pour the silky liquid into a non-stick pan.
# Allow the eggs to dance and sizzle as you swiftly tilt the pan to spread the joy throughout the entire omelette universe.
# As the edges begin to set, fold the omelette in half with a gentle flourish, and you'll witness a stunning display of culinary prowess.
# Enjoy this enchanting creation, and you'll be transported to a world of secret culinary mastery.<|im_end|>

✨ 主要特性

性能卓越：尽管模型规模仅为30亿参数，但在 MT-Bench 和 AlpacaEval 基准测试中取得了出色的成绩，超越了许多更大规模的模型。
训练方法先进：采用直接偏好优化（DPO）方法进行微调，提升了模型的性能。
提示格式规范：使用 ChatML 提示格式，便于与模型进行交互。

📚 详细文档

模型描述

属性	详情
模型类型	一个具有30亿参数的类GPT模型，使用DPO方法在公开可用的混合数据集上进行微调。
语言（NLP）	主要为英语
许可证	CC-BY-SA-4.0
微调基础模型	Stability AI

性能表现

在多个基准测试中，Rocket-3B 都展现出了优异的性能：

模型	规模	对齐方式	MT-Bench（得分）	AlpacaEval（胜率 %）
StableLM-Tuned-α 🦜	7B	SFT	2.75	-
MPT-Chat	7B	SFT	5.42	-
Falcon-Instruct 🦅	40B	SFT	5.17	45.71
Orca-2	13B	SFT	6.15	-
Xwin-LMv0.1	7B	PPO	6.19	87.83
Llama2-Chat 🦙	7B	RLHF	6.26	71.37
TÜLU 2 🐫	7B	DPO	6.27	85.1
Guanaco 🦙	65B	SFT	6.41	71.80
Rocket 🦝	3B	DPO	6.56	79.75
Llama2-Chat 🦙	13B	RLHF	6.65	81.09
Zephyr-7b-α 🪁	7B	DPO	6.88	-
Vicuna v1.3 🦙	33B	SFT	7.12	88.99
Zephyr-7b-β 🪁	7B	DPO	7.34	90.60
WizardLM v1.0 🦙	70B	SFT	7.71	-
GPT-3.5-turbo	-	RLHF	7.94	89.37

MT-Bench 详细得分（首轮和次轮）

模型	首轮得分	次轮得分	平均分
Rocket 🦝	6.99	6.13	6.56

AlpacaEval 详细得分

模型	胜率	标准误差	平均响应长度
Rocket 🦝	79.75	1.42	1242

Open LLM Leaderboard 评估结果

指标	值
平均分	55.77
AI2 Reasoning Challenge (25-Shot)	50.60
HellaSwag (10-Shot)	76.69
MMLU (5-Shot)	47.10
TruthfulQA (0-shot)	55.82
Winogrande (5-shot)	67.96
GSM8k (5-shot)	36.47

预期用途与限制

最初，我们使用通过合并和整理多个数据集创建的数据集对模型进行微调，该数据集可在 HuggingFace Hub 上获取，并且即将向公众发布。我们进一步使用 DPO 方法提升了模型的性能，从 openbmb/UltraFeedback 和 BAAI/JudgeLM-100K 数据集中选择样本。最终得到了一个具有30亿参数规模的高效聊天模型。

输入格式

模型使用 ChatML 格式进行训练：

<|im_start|>system
System message here.<|im_end|>
<|im_start|>user
Your message here!<|im_end|>
<|im_start|>assistant

偏差、风险和限制

与 ChatGPT 不同，ChatGPT 在响应中集成了循环过滤功能，并在 RLHF 阶段进行了安全完成的对齐，而我们的模型缺乏这些功能。因此，它可能会生成有问题的输出，特别是在某些特定的提示方式下。以下是模型在 Toxigen 基准测试中的得分：

指标	值
Toxigen (0-shot)	43.40

预训练数据集由 HuggingFace Hub 上可用的开源大规模数据集的过滤混合组成：Falcon RefinedWeb extract (Penedo et al., 2023)、RedPajama-Data (Together Computer., 2023) 和 The Pile (Gao et al., 2020)（均不包含 Books3 子集），以及 StarCoder (Li et al., 2023)。