luvai-phi3开源对话模型 - 专为角色扮演打造，实现生动交互体验

首页

Luvai Phi3

由 luvGPT 开发

基于microsoft/phi-3-mini-4k-instruct微调优化的对话模型，专为角色扮演场景设计，需要特定提示格式以获得最佳效果

大型语言模型

Transformers

英语开源协议:MIT #角色扮演优化 #LoRA微调 #对话式生成

下载量 98

发布时间 : 3/18/2025

模型简介

该模型经过优化，能在保持角色一致性的同时适应不同角色设定，擅长创意性、角色驱动的对话，并展现出对系统提示中不同人格特征的高度适应能力

模型特点

角色扮演优化

专为与各种角色人物进行角色扮演对话而设计，能保持角色一致性

特定提示格式

需要严格遵循特定提示模板才能获得最佳输出效果

多精度支持

支持从FP32到4位量化的多种精度设置，适应不同硬件环境

表情丰富回复

模型回复常包含表情前缀，增强对话表现力

模型能力

角色扮演对话

创意文本生成

多角色适应

长对话保持一致性

使用案例

娱乐

虚拟角色互动

与设定好的虚拟角色进行沉浸式对话

能根据角色设定生成符合人物性格的回复

教育

知识讲解助手

以特定角色身份解释复杂概念

能用简单术语解释技术概念，如示例中的比特币解释

🚀 luvai-phi3

这个模型是microsoft/phi-3-mini-4k-instruct的微调版本，针对与各种角色人设进行角色扮演对话进行了优化。该模型以对话形式进行交流。请注意，提示模板指南对于获得可用输出极为重要。

Header Image

🚀 快速开始

本模型在角色扮演对话中表现出色，但需要遵循特定的提示模板才能获得良好的输出。以下将详细介绍使用方法。

✨ 主要特性

角色一致性：该模型经过优化，能够在采用不同角色时保持人设的一致性。
创意对话：擅长进行富有创意、以角色为驱动的对话。
高度适应性：能高度适应系统提示中提供的不同个性特征。

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "luvGPT/luvai-phi3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Define character persona - you can customize this!
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."

# Format the prompt with the raw format (not using chat template)
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"

# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

# Process the output
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()

# Sometimes the model may continue with "User:" - need to truncate
if "User:" in response:
    response = response.split("User:")[0].strip()

print(f"Character: {response}")

高级用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class CharacterChat:
    def __init__(self, model_path="luvGPT/luvai-phi3", persona=None):
        print(f"Loading model from {model_path}...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Default persona or use provided one
        if persona is None:
            self.persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
        else:
            self.persona = persona
            
        self.conversation_history = []
        print(f"Character is ready to chat!")
        
    def chat(self, message):
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": message})
        
        # Format the conversation in the raw format that works
        raw_prompt = f"{self.persona}\n"
        
        # Add conversation history
        for msg in self.conversation_history:
            if msg["role"] == "user":
                raw_prompt += f"User: {msg['content']}\n"
            else:
                raw_prompt += f"Assistant: {msg['content']}\n"
        
        # Add the final Assistant: prompt
        raw_prompt += "Assistant:"
        
        # Generate response
        inputs = self.tokenizer(raw_prompt, return_tensors="pt").to(self.model.device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=100,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        # Decode full output
        full_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Extract just the response
        try:
            response = full_output[len(raw_prompt):].strip()
            
            # Sometimes the model may continue with "User:" - need to truncate
            if "User:" in response:
                response = response.split("User:")[0].strip()
                
            # Store the response in conversation history
            self.conversation_history.append({"role": "assistant", "content": response})
            
            return response
        except:
            return "Error extracting response"
    
    def reset_conversation(self):
        self.conversation_history = []
        return "Conversation has been reset."

# Simple interactive chat example
if __name__ == "__main__":
    persona = input("Enter character persona (or press Enter for default): ")
    chat = CharacterChat(persona=persona if persona else None)
    
    print("Chat started! Type 'quit' to exit or 'reset' to restart conversation.")
    
    while True:
        user_input = input("\nYou: ")
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        elif user_input.lower() == 'reset':
            print(chat.reset_conversation())
            continue
            
        response = chat.chat(user_input)
        print(f"\nCharacter: {response}")

📦 安装指南

本模型使用transformers库，可通过以下命令安装：

pip install transformers

📚 详细文档

示例对话

提示模板指南

重要提示：此模型需要特定的提示格式才能正常工作。

为获得最佳效果，请遵循以下确切的模板结构：

Character's Persona: [详细的人设描述]
User: [用户消息]
Assistant:

模型将在“Assistant:”之后完成回复。这种特定格式至关重要——使用标准聊天模板或不同的格式将导致输出不佳或无意义。

关键点：

以“Character's Persona:”格式开始角色描述
每个用户消息必须以“User: ”为前缀
每个模型回复将在“Assistant:”之后
模型的回复中通常会包含表情符号前缀（👅、😈、🔥 等）
请保持上述格式，包括换行符

示例对话：Sophie 角色（知识助手）

你： i lost a lot of money in crypto
Sophie： 🙄 Sorry to hear that. How long have you been interested in cryptocurrencies?

你： mm im not really
Sophie： 🤔 Ahh okay then. Well, let me ask you this - have you ever heard of Bitcoin?

你： yes i have, why do you ask?
Sophie： 👍 Good! Bitcoin is actually one of the most popular cryptocurrencies out there. Do you know anything about how it works?

你： no tell me about it
Sophie： 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.

硬件要求

该模型已针对不同的精度设置进行了测试，以确定最低硬件要求：

属性	详情
精度	FP32 (32位)：需要14.24 GB VRAM，适用于RTX 3090、4090、A5000、A6000等，为默认加载模式；FP16 (16位)：需要7.12 GB VRAM，适用于RTX 3090、4090、A5000、A6000等，推荐大多数用户使用；8位量化：需要5.68 GB VRAM，适用于RTX 2060 12GB、3060、3070等，在质量和效率之间取得了良好平衡；4位量化：需要2.27 GB VRAM，适用于大多数现代GPU（GTX 1060+），质量最低，但可在较旧的硬件上运行
训练数据	由于该数据集用于luvGPT的专有内部开发，目前无法开源。初始对话由开源大语言模型根据特定生成指令生成，并由评判模型进行筛选。数据集大小约为13k高质量示例（从50k初始对话中筛选），数据格式为JSONL，每个条目包含一个消息数组，包含系统、用户和助手角色。使用评判模型对初始数据集进行评分和筛选，仅保留表现出强烈人设一致性和引人入胜回复的最高质量示例。平均消息长度约为240个标记，对话通常包含6 - 7条消息

属性

详情

精度

FP32 (32位)：需要14.24 GB VRAM，适用于RTX 3090、4090、A5000、A6000等，为默认加载模式；FP16 (16位)：需要7.12 GB VRAM，适用于RTX 3090、4090、A5000、A6000等，推荐大多数用户使用；8位量化：需要5.68 GB VRAM，适用于RTX 2060 12GB、3060、3070等，在质量和效率之间取得了良好平衡；4位量化：需要2.27 GB VRAM，适用于大多数现代GPU（GTX 1060+），质量最低，但可在较旧的硬件上运行

训练数据

由于该数据集用于luvGPT的专有内部开发，目前无法开源。初始对话由开源大语言模型根据特定生成指令生成，并由评判模型进行筛选。数据集大小约为13k高质量示例（从50k初始对话中筛选），数据格式为JSONL，每个条目包含一个消息数组，包含系统、用户和助手角色。使用评判模型对初始数据集进行评分和筛选，仅保留表现出强烈人设一致性和引人入胜回复的最高质量示例。平均消息长度约为240个标记，对话通常包含6 - 7条消息

模型描述

该模型经过优化，能够在采用不同角色时保持人设的一致性。它擅长进行富有创意、以角色为驱动的对话，并能高度适应系统提示中提供的不同个性特征。

性能

训练指标显示，在整个训练过程中性能持续提升： Training Loss Token Accuracy

标记准确率：从约0.48提高到约0.73
训练损失：从约2.2降至约1.05
收敛性：模型在训练结束时表现出很强的收敛性

训练详情

基础模型：microsoft/phi-3-mini-4k-instruct
方法：使用LoRA/deepspeed进行微调，参数如下：
- LoRA秩：16
- LoRA alpha：32
- 目标模块：q_proj、k_proj、v_proj、o_proj、gate_proj、up_proj、down_proj
训练过程：
- 硬件：单块NVIDIA GPU，显存24GB
- 训练时间：约3小时
- 优化器：AdamW，使用DeepSpeed ZeRO stage 2优化
- 学习率：2e-4，采用余弦调度
- 批量大小：8（有效）
- 训练轮数：3

🔧 技术细节

本模型基于microsoft/phi-3-mini-4k-instruct进行微调，使用LoRA技术进行参数高效微调，结合DeepSpeed进行训练加速。在训练过程中，使用特定的提示模板来确保模型能够学习到不同角色的人设和对话风格。通过优化训练参数，如学习率、批量大小和训练轮数，模型在角色扮演对话中表现出良好的性能。