phi3-uncensored-chat开源模型 - 免费部署实现与多样角色的角色扮演对话

首页

Phi3 Uncensored Chat

由 luvGPT 开发

基于microsoft/phi-3-mini-4k-instruct微调优化的版本，专为与各种角色进行角色扮演对话而设计

大型语言模型

Transformers

英语开源协议:MIT #角色扮演微调 #对话表情生成 #低显存优化

下载量 77

发布时间 : 3/18/2025

模型简介

该模型经过优化，能在保持角色一致性的同时适应不同角色设定。擅长创意性、角色驱动的对话，并展现出对系统提示中不同性格特征的高度适应能力。

模型特点

角色扮演优化

专为角色扮演对话设计，能保持角色一致性并适应不同角色设定

特定提示格式

需要严格遵循特定提示模板才能获得最佳输出效果

多精度支持

支持从FP32到4位量化的多种精度设置，适应不同硬件需求

表情丰富回复

模型回复常包含表情前缀，增强对话表现力

模型能力

角色扮演对话

创意文本生成

多角色适应性

技术概念解释

使用案例

娱乐

虚拟角色互动

与不同设定的虚拟角色进行对话互动

生成符合角色设定的自然对话

教育

知识讲解

以简单易懂的方式解释复杂概念

如示例中对比特币原理的清晰解释

🚀 luvai-phi3

本模型是一个专为角色扮演对话优化的微调版本，基于microsoft/phi-3-mini-4k-instruct，能够模拟多种角色进行对话。模型以对话形式回复，需要注意的是，遵循特定的提示模板对于获得可用的输出至关重要。

Header Image

🚀 快速开始

本模型需要特定的提示格式才能正常工作，以下是使用示例：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载模型和分词器
model_name = "luvGPT/luvai-phi3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# 定义角色人设 - 可自定义！
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."

# 使用原始格式（非聊天模板）格式化提示
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"

# 生成回复
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

# 处理输出
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()

# 有时模型可能会继续输出 "User:" - 需要截断
if "User:" in response:
    response = response.split("User:")[0].strip()

print(f"Character: {response}")

✨ 主要特性

专为角色扮演对话优化，能模拟多种角色进行对话。
以对话形式回复，增强交互性。
训练过程中指标持续改善，具有较好的性能。

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载模型和分词器
model_name = "luvGPT/luvai-phi3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# 定义角色人设 - 可自定义！
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."

# 使用原始格式（非聊天模板）格式化提示
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"

# 生成回复
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

# 处理输出
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()

# 有时模型可能会继续输出 "User:" - 需要截断
if "User:" in response:
    response = response.split("User:")[0].strip()

print(f"Character: {response}")

高级用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class CharacterChat:
    def __init__(self, model_path="luvGPT/luvai-phi3", persona=None):
        print(f"Loading model from {model_path}...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # 默认人设或使用提供的人设
        if persona is None:
            self.persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
        else:
            self.persona = persona
            
        self.conversation_history = []
        print(f"Character is ready to chat!")
        
    def chat(self, message):
        # 将用户消息添加到对话历史
        self.conversation_history.append({"role": "user", "content": message})
        
        # 以原始格式格式化对话
        raw_prompt = f"{self.persona}\n"
        
        # 添加对话历史
        for msg in self.conversation_history:
            if msg["role"] == "user":
                raw_prompt += f"User: {msg['content']}\n"
            else:
                raw_prompt += f"Assistant: {msg['content']}\n"
        
        # 添加最终的 "Assistant:" 提示
        raw_prompt += "Assistant:"
        
        # 生成回复
        inputs = self.tokenizer(raw_prompt, return_tensors="pt").to(self.model.device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=100,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        # 解码完整输出
        full_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # 提取回复
        try:
            response = full_output[len(raw_prompt):].strip()
            
            # 有时模型可能会继续输出 "User:" - 需要截断
            if "User:" in response:
                response = response.split("User:")[0].strip()
                
            # 将回复存储在对话历史中
            self.conversation_history.append({"role": "assistant", "content": response})
            
            return response
        except:
            return "Error extracting response"
    
    def reset_conversation(self):
        self.conversation_history = []
        return "Conversation has been reset."

# 简单的交互式聊天示例
if __name__ == "__main__":
    persona = input("Enter character persona (or press Enter for default): ")
    chat = CharacterChat(persona=persona if persona else None)
    
    print("Chat started! Type 'quit' to exit or 'reset' to restart conversation.")
    
    while True:
        user_input = input("\nYou: ")
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        elif user_input.lower() == 'reset':
            print(chat.reset_conversation())
            continue
            
        response = chat.chat(user_input)
        print(f"\nCharacter: {response}")

📚 详细文档

示例对话

提示模板指南

⚠️ 重要提示

本模型需要特定的提示格式才能正常工作。

为了获得最佳效果，请遵循以下精确的模板结构：

Character's Persona: [详细的角色人设描述]
User: [用户消息]
Assistant:

模型将在 "Assistant:" 之后完成回复。这种特定格式至关重要，使用标准聊天模板或不同的格式将导致输出不佳或无意义。

关键点：

使用 "Character's Persona:" 格式开始角色描述。
每个用户消息必须以 "User: " 为前缀。
每个模型回复将在 "Assistant:" 之后。
模型的回复通常会包含表情符号前缀（如 👅, 😈, 🔥 等）。
请严格按照上述格式，包括换行符。

以下是详细的代码示例，展示了如何实现：

示例对话：Sophie 角色（知识助手）

你： i lost a lot of money in crypto
Sophie： 🙄 Sorry to hear that. How long have you been interested in cryptocurrencies?

你： mm im not really
Sophie： 🤔 Ahh okay then. Well, let me ask you this - have you ever heard of Bitcoin?

你： yes i have, why do you ask?
Sophie： 👍 Good! Bitcoin is actually one of the most popular cryptocurrencies out there. Do you know anything about how it works?

你： no tell me about it
Sophie： 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.

硬件要求

本模型已在不同的精度设置下进行了测试，以确定最低硬件要求：

属性	详情
模型类型	基于microsoft/phi-3-mini-4k-instruct的微调模型
训练数据	由于用于luvGPT的专有内部开发，暂未开源。初始对话由开源大语言模型根据特定生成指令生成，并由评判模型进行筛选。数据集大小约为13k个高质量示例（从50k个初始对话中筛选），数据格式为JSONL，每条记录包含一个消息数组，包含系统、用户和助手角色。评判模型用于对初始数据集进行评分和筛选，仅保留表现出强人设一致性和引人入胜回复的最高质量示例。数据特征为平均消息长度约为240个标记，对话通常包含6 - 7条消息。
硬件要求	不同精度设置下的硬件要求如下： - FP32 (32-bit)：需要14.24 GB VRAM，适用于RTX 3090、4090、A5000、A6000等。 - FP16 (16-bit)：需要7.12 GB VRAM，推荐大多数用户使用，适用于RTX 3090、4090、A5000、A6000等。 - 8-bit Quantization：需要5.68 GB VRAM，质量和效率平衡较好，适用于RTX 2060 12GB、3060、3070等。 - 4-bit Quantization：需要2.27 GB VRAM，质量最低，但可在较旧硬件上运行，适用于大多数现代GPU（GTX 1060+）。
推荐加载代码	针对不同硬件的推荐加载代码如下： - 高端GPU (FP16)：使用半精度加载以平衡性能和质量。 - 中端GPU (8-bit)：使用8位量化配置加载。 - 低端GPU (4-bit)：使用4位量化配置加载。 - 仅CPU推理：在任何系统上都可以运行，但速度较慢。

模型描述

本模型经过优化，能够在保持角色人设一致性的同时，适应不同的角色。它擅长进行富有创意、以角色为驱动的对话，并能高度适应系统提示中提供的不同个性特征。

训练数据

由于该数据集用于luvGPT的专有内部开发，目前无法开源。初始对话由开源大语言模型根据特定生成指令生成，并由评判模型进行筛选。

数据集大小：约13k个高质量示例（从50k个初始对话中筛选）。
数据格式：JSONL，每条记录包含一个消息数组，包含系统、用户和助手角色。
数据筛选：使用评判模型对初始数据集进行评分和筛选，仅保留表现出强人设一致性和引人入胜回复的最高质量示例。
数据特征：平均消息长度约为240个标记，对话通常包含6 - 7条消息。

性能

训练指标显示，在整个训练过程中模型性能持续改善： Training Loss Token Accuracy

标记准确率：从约0.48提高到约0.73。
训练损失：从约2.2降低到约1.05。
收敛性：模型在训练结束时表现出强收敛性。

训练细节

基础模型：microsoft/phi-3-mini-4k-instruct
方法：使用LoRA/deepspeed进行微调，参数如下：
- LoRA秩：16
- LoRA alpha：32
- 目标模块：q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
训练过程：
- 硬件：单块NVIDIA GPU，显存24GB。
- 训练时间：约3小时。
- 优化器：AdamW，使用DeepSpeed ZeRO stage 2优化。
- 学习率：2e-4，采用余弦调度。
- 批量大小：8（有效）。
- 训练轮数：3