jetmoe-8b-chat开源大语言模型 - 低成本训练，性能超越LLaMA2-7B！

首页

Jetmoe 8b Chat

由 jetmoe 开发

JetMoE-8B是一款高效开源的大语言模型，以10万美元低成本训练超越LLaMA2-7B性能，推理时仅激活22亿参数

大型语言模型

Transformers

开源协议:Apache-2.0 #低成本高效训练 #稀疏激活推理 #开源学术友好

下载量 26

发布时间 : 3/31/2024

模型简介

基于混合专家架构(MoE)的开源大语言模型，专注于高效推理和低成本训练，适用于对话生成、代码补全等任务

模型特点

低成本高效训练

仅用10万美元成本（96×H100训练2周）即超越LLaMA2-7B性能

高效推理

推理时仅激活22亿参数，显著降低计算成本

完全开源

使用公开数据集训练，代码开源，支持消费级GPU微调

两阶段训练方案

采用MiniCPM训练法：阶段1基础训练+阶段2高质量数据微调

模型能力

文本生成

对话系统

代码补全

数学问题求解

多轮对话

使用案例

对话系统

智能聊天机器人

构建友好且知识丰富的对话助手

MT-Bench评分6.681，超越Llama-2-13b-chat

代码生成

编程辅助

自动生成和补全代码

MBPP基准Pass@1达34.2%，优于LLaMA2-7B

🚀 JetMoE：仅用10万美元达到LLaMA2性能

JetMoE-8B模型以不到10万美元的训练成本，超越了拥有数十亿美元训练资源的Meta AI的LLaMA2-7B。它完全开源，对学术界友好，推理时仅22亿活跃参数，大幅降低计算成本。

🚀 快速开始

以下是使用JetMoE-8B-chat的快速示例：

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
# Initialize the model and tokenizer
model_name = "jetmoe/jetmoe-8b-chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, attn_implementation="eager", trust_remote_code=True)
# Check if a GPU is available and move the model to GPU if it is
if torch.cuda.is_available():
    model = model.cuda()
    print("Using GPU:", torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("GPU is not available, using CPU instead.")
# Encode input context
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenized_chat)
# If using a GPU, move the input IDs to the GPU
if torch.cuda.is_available():
    input_ids = tokenized_chat.cuda()
# Generate text
output = model.generate(input_ids, max_length=500, num_return_sequences=1, no_repeat_ngram_size=2)
# If the output is on the GPU, move it back to CPU for decoding
if torch.cuda.is_available():
    output = output.cpu()
# Decode the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

✨ 主要特性

低成本高性能：JetMoE-8B训练成本不到10万美元，却超越了Meta AI的LLaMA2-7B。这表明大语言模型的训练成本可以比人们之前认为的低得多。
完全开源且友好：仅使用公共数据集进行训练，代码开源，无需专有资源。并且可以在大多数实验室都能负担得起的有限计算预算（如消费级GPU）下进行微调。
低计算成本：推理时仅22亿活跃参数，与Gemma-2B等推理计算量相似的模型相比，性能更优。

📚 详细文档

基准测试

我们使用与Open LLM排行榜相同的评估方法。对于MBPP代码基准测试，我们使用与LLaMA2和Deepseek-MoE论文中相同的评估方法。结果如下：

模型	激活参数	训练令牌	Open LLM排行榜平均得分	ARC	Hellaswag	MMLU	TruthfulQA	WinoGrande	GSM8k	MBPP	HumanEval
样本				25	10	5	0	5	5	3	0
指标				acc_norm	acc_norm	acc	mc2	acc	acc	Pass@1	Pass@1
LLaMA2-7B	7B	2T	51.0	53.1	78.6	46.9	38.8	74	14.5	20.8	12.8
LLaMA-13B	13B	1T	51.4	56.2	80.9	47.7	39.5	76.2	7.6	22.0	15.8
DeepseekMoE-16B	2.8B	2T	51.1	53.2	79.8	46.3	36.1	73.7	17.3	34.0	25.0
Gemma-2B	2B	2T	46.4	48.4	71.8	41.8	33.1	66.3	16.9	28.0	24.4
JetMoE-8B	2.2B	1.25T	53.0	48.7	80.5	49.2	41.7	70.2	27.8	34.2	14.6

模型	MT-Bench得分
GPT-4	9.014
GPT-3.5-turbo	7.995
Claude-v1	7.923
JetMoE-8B-chat	6.681
Llama-2-13b-chat	6.650
Vicuna-13b-v1.3	6.413
Wizardlm-13b	6.353
Llama-2-7b-chat	6.269

令人惊讶的是，尽管JetMoE-8B的训练成本和计算量较低，但它的性能甚至优于LLaMA2-7B、LLaMA-13B和DeepseekMoE-16B。与Gemma-2B等训练和推理计算量相似的模型相比，JetMoE-8B也取得了更好的性能。

模型详情

JetMoE-8B有24个块。每个块有两个MoE层：混合注意力头（MoA）和混合MLP专家（MoE）。每个MoA和MoE层有8个专家，每个输入令牌激活2个专家。它总共有80亿个参数，22亿个活跃参数。JetMoE-8B在来自公开可用数据集的1.25T令牌上进行训练，学习率为5.0 x 10⁻⁴，全局批量大小为4M令牌。