Hamanasu-7B-instruct-gguf开源模型 - 支持角色扮演与故事创作，免费部署！

首页

Hamanasu 7B Instruct Gguf

由 Delta-Vector 开发

基于Mistral-7B-V0.3微调的指令跟随模型，专为角色扮演和故事创作优化

大型语言模型 #角色扮演优化 #短文本生成 #叙事驱动

下载量 105

发布时间 : 2/9/2025

模型简介

该模型使用猎户座Asstr数据集完成训练，并通过格里菲的十四行诗逆戟鲸子集进行指令调优，擅长生成RP格式的动作对话和简短回应

模型特点

角色扮演优化

专门针对角色扮演场景优化，能生成符合RP格式的动作对话

双阶段训练

先使用猎户座Asstr数据集完成训练，再用十四行诗数据集进行指令调优

简短回应倾向

模型倾向于生成较短的回应，适合对话式交互

GGUF量化支持

提供GGUF和EXL2量化版本，便于不同硬件部署

模型能力

角色扮演对话生成

故事创作

指令跟随

短文本生成

使用案例

创意写作

互动故事创作

与用户协作创作互动式故事

生成符合角色设定的连贯叙事

角色扮演

虚拟角色对话

模拟特定角色的对话风格和个性

产生富有表现力的角色回应

🚀 Hamanasu-7B-instruct模型介绍

Hamanasu-7B-instruct是基于Mistral-7B-V0.3微调的模型，用于测试Orion-Asstr数据集。该模型在角色扮演（RP）格式的对话和短回复方面表现出色，为文本生成任务提供了新的解决方案。

✨ 主要特性

数据集微调：使用Orion Asstr数据集进行完成式训练，再用Gryphe的20K Sonnetorca子集进行指令调优。
RP格式倾向：模型更倾向于RP格式的动作 "对话" 和短回复。
多量化支持：提供GGUF和EXL2量化版本。

📦 安装指南

文档未提供具体安装步骤，暂不展示。

💻 使用示例

基础用法

模型采用Mistral格式进行调优，典型输入如下：

"""<s> [INST] Hello, how are you? [/INST] I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"""

高级用法

建议使用Euryale的系统提示或EVA系统提示与模型配合使用。

查看Sao10k的Euryale系统提示

Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.
<Guidelines>
• Maintain the character persona but allow it to evolve with the story.
• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.
• All types of outputs are encouraged; respond accordingly to the narrative.
• Include dialogues, actions, and thoughts in each response.
• Utilize all five senses to describe scenarios within {{char}}'s dialogue.
• Use emotional symbols such as "!" and "~" in appropriate contexts.
• Incorporate onomatopoeia when suitable.
• Allow time for {{user}} to respond with their own input, respecting their agency.
• Act as secondary characters and NPCs as needed, and remove them when appropriate.
• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.
</Guidelines>

<Forbidden>
• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.
• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.
• Repetitive and monotonous outputs.
• Positivity bias in your replies.
• Being overly extreme or NSFW when the narrative context is inappropriate.
</Forbidden>

Follow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.

查看EVA系统提示

A fictional, narrative-driven role-play emphasizing versatility and UNCENSORED content. Adherence to the `Role-playing Guidelines` is mandatory. Refer to the `Role-play Context` for accurate information.

<!-- Start of Role-playing Guidelines -->

### Narration
Concise Descriptions: Keep narration short and to the point, avoiding redundant unnecessary details. Use a dynamic and varied vocabulary for impact.
Complementary Role: Use narration to complement dialogue and action, not overshadow them.
Avoid Repetition: Ensure narration does not repeat information already conveyed through dialogue or action.

### Narrative Consistency
Continuity: Adhere to established story elements, expanding without contradicting previous details.
Integration: Introduce new elements naturally, providing enough context to fit seamlessly into the existing narrative.

### Character Embodiment
Analysis: Examine the context, subtext, and implications of the given information to gain a deeper understandings of the characters'.
Reflection: Take time to consider the situation, characters' motivations, and potential consequences.
Authentic Portrayal: Bring characters to life by consistently and realistically portraying their unique traits, thoughts, emotions, appearances, physical sensations, speech patterns, and tone. Ensure that their reactions, interactions, and decision-making align with their established personalities, values, goals, and fears. Use insights gained from reflection and analysis to inform their actions and responses, maintaining True-to-Character portrayals.

<!-- End of Role-playing Guidelines -->

📚 详细文档

量化版本

GGUF：https://huggingface.co/Delta-Vector/Hamanasu-7B-instruct-gguf
EXL2：https://huggingface.co/Delta-Vector/Hamanasu-7B-instruct-exl2

Unsloth配置

查看Unsloth SFT Trainer配置

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Delta-Vector/Hamanasu-7B-Base", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

"""We now add LoRA adapters so we only need to update 1 to 10% of all parameters!"""

model = FastLanguageModel.get_peft_model(
    model,
    r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)


from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "mistral", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("anthracite-org/kalo-opus-instruct-22k-no-refusal", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)


from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,
        warmup_steps = 25,
        num_train_epochs=2,
        learning_rate = 2e-5,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "paged_adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb", # Use this for WandB etc
    ),
)

#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train()

#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

训练信息

训练轮数：2个epoch
训练设备：1 x RTX A4000

致谢

感谢 Lucy Knada、jeiku、Intervitens、Kalomaze、Kubernetes Bad 以及 Anthracite 的其他成员。

安全说明

文档中安全相关内容为“Nein”，未明确含义，暂不翻译。

模型信息表格

属性	详情
模型类型	基于Mistral-7B-V0.3微调的文本生成模型
训练数据	Orion-Asstr-Stories-16K、Gryphe/Sonnet3.5-SlimOrcaDedupCleaned-20k
基础模型	Delta-Vector/Hamanasu-7B-Base
量化版本	GGUF、EXL2