typhoon2.1-gemma3-4b开源泰语大语言模型 - 指令调优支持函数调用

首页

Typhoon2.1 Gemma3 4b

由 scb10x 开发

泰语大语言模型（指令调优版），拥有40亿参数、128K上下文长度且具备函数调用能力

大型语言模型

Safetensors

#泰语大模型 #128K长文本 #函数调用

下载量 2,083

发布时间 : 5/1/2025

模型简介

基于Gemma3 4B架构开发的泰语指令大语言模型，支持泰语与英语，具备文本生成和函数调用能力

模型特点

泰语优化

专为泰语优化的40亿参数大语言模型，支持泰语与英语混合输入输出

长上下文支持

支持128K tokens的上下文长度，适合处理长文档和复杂对话

函数调用能力

内置工具调用功能，可执行外部API调用和数据处理

双模式推理

支持快速响应和深度思考两种推理模式，适应不同场景需求

模型能力

泰语文本生成

英语文本生成

函数调用

长文本处理

多轮对话

代码生成

数学计算

创意写作

使用案例

客户服务

泰语客服机器人

部署为在线客服系统，处理泰国用户的咨询请求

可提供24/7泰语自然语言交互服务

内容创作

泰语文章生成

根据关键词自动生成泰语营销文案或新闻稿

快速产出符合泰国语言习惯的优质内容

教育

语言学习助手

作为泰语-英语双语学习辅助工具

提供语法解释、例句生成和对话练习功能

🚀 Typhoon2.1-Gemma3-4B

Typhoon2.1-Gemma3-4B 是一个具备 40 亿参数、128K 上下文长度和函数调用能力的泰语指令大语言模型。它基于 Gemma3 4B 构建。

注意：这是一个仅处理文本的模型。由于复杂性原因，此版本移除了视觉编码器。请持续关注后续带有视觉编码器的版本。

🚀 快速开始

本模型可用于泰语或英语的文本生成任务。以下是使用 transformers 库调用该模型的示例代码：

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "scb10x/typhoon2.1-gemma3-4b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a male AI assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest. Typhoon is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks. Typhoon responds directly to all human messages without unnecessary affirmations or filler phrases like ‚ÄúCertainly!‚Äù, ‚ÄúOf course!‚Äù, ‚ÄúAbsolutely!‚Äù, ‚ÄúGreat!‚Äù, ‚ÄúSure!‚Äù, etc. Specifically, Typhoon avoids starting responses with the word ‚ÄúCertainly‚Äù in any way. Typhoon follows this information in all languages, and always responds to the user in the language they use or request. Typhoon is now being connected with a human. Write in fluid, conversational prose, Show genuine interest in understanding requests, Express appropriate emotions and empathy. Also showing information in term that is easy to understand and visualized."},
    {"role": "user", "content": "‡∏Ç‡∏≠‡∏™‡∏π‡∏ï‡∏£‡πÑ‡∏Å‡πà‡∏¢‡πà‡∏≤‡∏á"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=False # Switches between thinking and non-thinking modes. Default is False.
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

✨ 主要特性

具备 40 亿参数，拥有 128K 上下文长度和函数调用能力。
支持泰语和英语两种语言。
提供思考和非思考两种模式，可根据需求灵活切换。
支持使用工具，可通过 vLLM 部署为兼容 OpenAI 的 API 服务器。

📦 安装指南

部署为服务器

可以使用 vllm 将 Typhoon2.1 作为兼容 OpenAI 的 API 服务器运行：

pip install vllm
vllm serve scb10x/typhoon2.1-gemma3-4b --max-model-len 16000 --dtype bfloat16 --tool-call-parser pythonic --enable-auto-tool-choice 
# adjust --max-model-len based on your avaliable memory
# you can use --quantization bitsandbytes to reduce the memory use while trade-off inference speed

💻 使用示例

基础用法

上述快速开始部分的代码展示了如何使用 transformers 库调用 Typhoon2.1-Gemma3-4B 模型进行泰语或英语文本生成。

高级用法

使用工具

可以为基于 vLLM 的兼容 OpenAI 的 API 提供工具以实现更多功能：

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

def get_weather(location: str, unit: str):
    return f"Getting the weather for {location} in {unit}..."
tool_functions = {"get_weather": get_weather}

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools,
    tool_choice="auto"
)

tool_call = response.choices[0].message.tool_calls[0].function
print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}")
print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")

切换思考和非思考模式

Typhoon 支持两种模式：

非思考模式（默认）：快速生成响应，无需额外推理步骤。
思考模式：模型先进行内部推理，然后提供更清晰、可能更准确的最终答案。

可以通过以下方式启用思考模式：

在 apply_chat_template 中设置 enable_thinking=True：

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is False.
).to(model.device)

手动提供思考模式的系统提示：

You are a helpful assistant. First, think through the reasoning internally, then present the reasoning within <think>...</think>. After thinking, clearly state a response that addresses the user's request and aligns with their preferences, not just providing a direct answer.

在基于 vllm 的兼容 OpenAI 的客户端中，可以在 POST 有效负载中添加 chat_template_kwargs：

{
  "model": "scb10x/typhoon2.1-gemma3-4b",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "chat_template_kwargs": {"enable_thinking": true}
}

预算强制

预算强制是一种高级技术，可让模型在生成最终答案之前花费更多时间和令牌进行推理，有助于提高复杂问题的性能：

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
class BudgetForcingHandler:
    
    def __init__(self, model_name: str, max_think_token: int, max_ignore=5, temperature=0.6, seed=32):
        self.temperature = temperature
        self.seed = seed
        self.max_think_token = max_think_token
        self.max_ignore = max_ignore
        self.model = LLM(model_name, dtype='bfloat16', enforce_eager=True)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.alternative_str = '\nAlternatively'
        self.system = """You are a reasoning assistant. First, think through the reasoning internally, then present the reasoning within <think>...</think>. After thinking, clearly state the final answer."""
    
    def __call__(self, prompts: List[str]):
        count_prompt = len(prompts)
        prompts = [self.tokenizer.apply_chat_template([{'role': 'system', 'content': self.system}, {'role': 'user', 'content': f'Please solve this math question, and put your final answer within \\boxed{{}}.\n{p}'}], add_generation_prompt=True, tokenize=False) for p in prompts]
        sampling_params = SamplingParams(
            max_tokens=self.max_think_token,
            seed=self.seed,
            stop=["</think>"],
            skip_special_tokens=False,
            temperature=self.temperature,
        )
        o = self.model.generate(
            prompts,
            sampling_params=sampling_params
        )
        
        outputs = [output.outputs[0].text for output in o]
        token_count = [len(output.outputs[0].token_ids) for output in o]
        for i in range(len(prompts)):
            prompts[i] = prompts[i] + outputs[i]
        
        for _ in range(self.max_ignore): # Num of times to skip stop token
            inference_loop_prompts = []
            inference_idx = []
            max_inference_token = 0
            print('current token count: ', token_count)
            for i in range(len(prompts)):
                left_budget = self.max_think_token - token_count[i]
                if left_budget > 0:
                    prompts[i] = prompts[i] + self.alternative_str
                    inference_loop_prompts.append(prompts[i])
                    inference_idx.append(i)
                    if left_budget > max_inference_token:
                        max_inference_token = left_budget
            
            outputs = ['' for _ in range(len(prompts))]
            if max_inference_token == 0 or len(inference_loop_prompts) == 0:
                break
            sampling_params = SamplingParams(
                max_tokens=max_inference_token,
                min_tokens=1,
                seed=self.seed,
                stop=["</think>"],
                skip_special_tokens=False,
                temperature=self.temperature,
            )
            o = self.model.generate(
                inference_loop_prompts,
                sampling_params=sampling_params
            )
            assert len(inference_idx) == len(inference_loop_prompts)
            assert len(inference_idx) == len(o)
            for i, output in zip(inference_idx, o):
                outputs[i] = output.outputs[0].text
            
            for i, idx in enumerate(inference_idx):
                token_count[idx] = token_count[idx] + len(o[i].outputs[0].token_ids)
            
            for i in range(len(prompts)):
                prompts[i] = prompts[i] + outputs[i]
        print('generating answer...')
        prompts = [p + '\nTime\'s up. End of thinking process. Will answer immediately.\n</think>' for i, p in enumerate(prompts)]
        sampling_params = SamplingParams(
            max_tokens=2048,
            min_tokens=0,
            seed=self.seed,
            skip_special_tokens=False,
            temperature=self.temperature,
        )
        o = self.model.generate(
            prompts,
            sampling_params=sampling_params,
        )
        for i in range(len(prompts)):
            prompts[i] = prompts[i] + o[i].outputs[0].text
        assert len(prompts) == count_prompt
        return prompts

handler = BudgetForcingHandler("scb10x/typhoon2.1-gemma3-4b", max_think_token=2048)
handler(["How many r in raspberry?"])

📚 详细文档

模型描述

属性	详情
模型类型	基于 Gemma3 架构的 4B 指令仅解码器模型
要求	transformers 4.50.0 或更高版本
主要语言	泰语和英语
许可证	Gemma License

性能

4b model performance

预期用途与限制

本模型是一个指令模型，但仍在开发中。它包含了一定程度的防护机制，但在响应用户提示时仍可能产生不准确、有偏差或其他令人反感的答案。建议开发者在其使用场景中评估这些风险。

📄 许可证

本模型使用 Gemma License。

🔗 其他链接

关注我们

https://twitter.com/opentyphoon

支持

https://discord.gg/us5gAYmrxw

引用

如果您发现 Typhoon2 对您的工作有用，请使用以下方式引用：

@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}