Arsh-llm开源语言模型 - 免费生成创意故事、连贯文本与实用代码

首页

Arsh Llm

由 Arsh-ai 开发

Arsh-llm 是一个基于Llama架构的5000万参数语言模型，擅长生成创意故事、连贯文本和实用代码。

大型语言模型

Transformers

英语开源协议:MIT #轻量级故事生成 #对话微调优化 #代码辅助生成

下载量 1,481

发布时间 : 5/27/2025

模型简介

Arsh-llm 是一个紧凑而强大的语言模型，经过预训练和微调，适用于创意写作、代码生成和对话式AI等任务。

模型特点

紧凑高效

仅5000万参数，在T4 GPU上训练，资源占用低但性能出色。

多功能生成

能够生成创意故事、连贯文本和实用代码片段。

对话优化

经过20小时的对话数据微调，适合聊天机器人等应用。

开源许可

采用MIT许可证，允许自由使用和修改。

模型能力

创意写作

文本生成

代码生成

对话交互

数学问题解答

使用案例

创意写作

短篇小说生成

生成引人入胜的短篇故事或叙事提示。

编程辅助

代码片段生成

为各种编程任务生成实用的代码片段。

对话式AI

聊天机器人

为聊天机器人或助手提供自然对话能力。

教育工具

数学问题解答

辅助解决数学问题或逐步解释概念。

🚀 Arsh-llm：一个拥有5000万参数的紧凑强大模型

Arsh-llm 是一个基于Llama架构的5000万参数语言模型，旨在出色地生成富有创意的故事、连贯的文本和实用的代码。该模型在T4 GPU上使用精心挑选的小型但强大的数据集进行了35小时的预训练，并在对话数据上进行了20小时的微调。它就像一台精简高效的文本生成机器，潜力巨大。其训练损失在 1.2 - 1.9 之间，已经展现出了良好的前景，并且随着更多的训练有望进一步提升性能。系好安全带，这仅仅是个开始！

📚 模型概述

属性	详情
架构	基于Llama的因果语言模型
参数数量	5000万
上下文长度	128个令牌
预训练时长	在NVIDIA T4 GPU上约35小时
微调时长	在对话数据集上约20小时
训练损失	1.2 - 1.9（有提升空间！）
库	Transformers（Hugging Face）
许可证	MIT

📦 数据集

Arsh-llm在多种不同的数据集上进行了训练，以确保在故事讲述、文本生成和代码相关任务中具有通用性：

roneneldan/TinyStories：用于叙事生成的简短创意故事。
Salesforce/wikitext：基于维基百科的文本，用于获取常识和保证文本连贯性。
abhinand/alpaca-gpt4-sharegpt：基于指令的对话数据，用于面向任务的回复。
shibing624/sharegpt_gpt4：高质量的对话数据，用于类聊天交互。
ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions：带有逐步解决方案的数学问题，用于提升逻辑推理能力。

微调是在结构化的ShareGPT聊天模板上进行的，以增强对话能力，使Arsh-llm成为基于对话的应用程序的理想起点。

🎯 使用场景

Arsh-llm是一个多功能模型，适用于以下场景：

创意写作：生成引人入胜的短篇小说或叙事提示。
代码生成：为各种编程任务生成实用的代码片段。
对话式AI：为聊天机器人或助手提供自然对话能力。
教育工具：辅助解决数学问题或逐步解释概念。

⚠️ 重要提示

该模型仍在开发中。为了获得生产级别的性能，建议在更大的数据集上进行进一步的预训练，并在对话数据上进行后训练。

🚀 快速开始

要使用Arsh-llm，你可以直接从Hugging Face加载它：

基础用法

import torch
from transformers import pipeline, set_seed

# Set up the text-generation pipeline
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"

# Set seed for reproducibility (optional)
set_seed(42)

print("Arsh llm is ready! Type 'exit' to end the conversation.")

# Initialize the conversation history
conversation_history = []

conversation_history.append({"role": "system", "content": "You are a helpful assistant."})

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exited from the chat. Bye!")
        break

    # Append user message to the conversation history
    conversation_history.append({"role": "user", "content": user_input})

    # Prepare the messages with the conversation history and an empty assistant turn
    messages = conversation_history + [{"role": "assistant", "content": ""}]

    # Use the tokenizer's apply_chat_template() method to format the prompt.
    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.6,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=20
    )

    # The returned 'generated_text' includes the prompt plus the generation.
    full_text = response[0]["generated_text"]
    # Extract the assistant's response by removing the prompt portion.
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")