gemma-2-baku-2b-it开源模型 - 优化指令遵循，适用自然语言处理任务

首页

Gemma 2 Baku 2b It

由 rinna 开发

基于Gemma 2 Baku 2B的指令微调模型，优化了指令遵循能力，适用于自然语言处理任务。

大型语言模型

Transformers

日语#日语指令微调 #ORPO优化 #聊天向量增强

下载量 2,555

发布时间 : 10/2/2024

模型简介

该模型是通过特定优化和调整得到的语言模型，在指令遵循方面表现优异，支持多种自然语言处理任务。

模型特点

指令微调

基于Gemma 2 Baku 2B进行指令微调，优化了指令遵循能力。

ORPO优化

使用赔率比偏好优化（ORPO）技术进一步提升模型性能。

聊天向量

通过聊天向量添加过程赋予模型指令遵循能力。

模型能力

文本生成

指令遵循

自然语言处理

使用案例

问答系统

人物信息查询

回答关于特定人物的问题，如'西田幾多郎とはどんな人物ですか？'

生成关于人物的详细描述

对话系统

多轮对话

支持基于上下文的连续对话

生成连贯、符合上下文的回复

🚀 `Gemma 2 Baku 2B Instruct (rinna/gemma-2-baku-2b-it)`

本项目的 Gemma 2 Baku 2B Instruct 模型是基于特定优化和调整得到的语言模型，它能够在指令遵循方面有出色表现，为自然语言处理相关任务提供有力支持。

🚀 快速开始

模型使用示例

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "rinna/gemma-2-baku-2b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
    attn_implementation="eager",
)

chat = [
    { "role": "user", "content": "西田幾多郎とはどんな人物ですか？" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
outputs = model.generate(
    input_ids,
    max_new_tokens=512,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

⚠️ 重要提示

建议在 bfloat16 精度下进行批量推理时使用 eager 注意力机制。目前，当使用默认注意力机制（torch.scaled_dot_product_attention）结合 bfloat16 时，Gemma 2 对于带有填充的输入序列会产生 NaN 值。

✨ 主要特性

指令微调：该模型是 rinna/gemma-2-baku-2b 的指令微调变体，利用聊天向量和赔率比偏好优化（ORPO）进行微调，遵循 gemma-2 聊天格式。
模型架构：基于 26 层、隐藏层大小为 2304 的 Transformer 架构语言模型。详细架构信息可参考 Gemma 2 Model Card。

📦 安装指南

文档未提及具体安装步骤，可参考 transformers 库的官方安装说明来安装相关依赖。

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "rinna/gemma-2-baku-2b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
    attn_implementation="eager",
)

chat = [
    { "role": "user", "content": "西田幾多郎とはどんな人物ですか？" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
outputs = model.generate(
    input_ids,
    max_new_tokens=512,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

高级用法

文档未提及高级用法相关代码，可根据实际需求对基础用法代码进行扩展，例如调整生成参数等。

📚 详细文档

模型规模与训练情况

规模	持续预训练	指令微调
2B	Gemma 2 Baku 2B [HF]	Gemma 2 Baku 2B Instruct [HF]

训练过程

模型合并

通过聊天向量添加过程为基础模型赋予指令遵循能力。聊天向量是通过从 google/gemma-2-2b-it 中减去 google/gemma-2-2b 的参数向量得到的，如下所示：

rinna/gemma-2-baku-2b + 1.0 * (google/gemma-2-2b-it - google/gemma-2-2b)

在这个过程中，参数向量的减法和加法不包括嵌入层。

ORPO 优化

使用以下数据集的子集应用 ORPO 进一步优化合并模型的性能：

rinna 的内部数据集

贡献者

发布日期

2024 年 10 月 3 日

基准测试

请参考 rinna's LM benchmark page (Sheet 20241003)。

分词器

该模型使用原始的 google/gemma-2-2b-it 分词器。

🔧 技术细节

模型基于 Transformer 架构，通过特定的微调方法提升指令遵循能力。在训练过程中，结合聊天向量和 ORPO 优化技术，并且在模型合并时对嵌入层进行特殊处理，以保证模型性能。

📄 许可证

Gemma Terms of Use

📚 引用信息

@misc{rinna-gemma-2-baku-2b-it,
    title = {rinna/gemma-2-baku-2b-it},
    author = {Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/gemma-2-baku-2b-it}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

📚 参考文献

@article{gemma-2-2024,
    title = {Gemma 2},
    url = {https://www.kaggle.com/models/google/gemma-2},
    publisher = {Kaggle},
    author = {Gemma Team},
    year = {2024}
}

@article{huang2023chat,
    title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
    author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
    year = {2023},
    url = {https://arxiv.org/abs/2310.04799}
}

@article{hong2024orpo,
  title = {ORPO: Monolithic Preference Optimization without Reference Model},
  author = {Hong, Jiwoo and Lee, Noah and Thorne, James},
  year = {2024},
  url = {https://arxiv.org/abs/2403.07691}
}