xLAM-2-3b-fc-r开源大型动作模型 - 支持多轮对话与工具使用，函数调用超出色

首页

Xlam 2 3b Fc R

由 Salesforce 开发

xLAM-2系列是基于先进数据合成和训练流程构建的大型动作模型(LAMs)，专注于多轮对话和工具使用，在函数调用和代理任务中表现卓越。

大型语言模型

Transformers

英语#多轮函数调用 #AI代理大脑 #128k长上下文

下载量 353

发布时间 : 3/27/2025

模型简介

xLAM-2是新一代大型动作模型，通过创新的APIGen-MT框架训练，在BFCL和τ-bench基准测试中达到最先进水平。模型优化了聊天模板和vLLM集成，便于构建高级AI代理。

模型特点

多轮对话能力

在复杂多轮对话场景中表现出色，能保持上下文一致性

高级函数调用

专为工具使用和函数调用优化，可准确解析和执行API调用

长上下文处理

支持128k超长上下文窗口，适合处理复杂任务

vLLM兼容性

完全兼容vLLM推理框架，便于高吞吐量部署

模型能力

自然语言理解

函数调用

多轮对话

任务规划

工作流自动化

使用案例

智能助手

天气查询助手

通过调用天气API提供实时天气信息

准确解析用户位置并返回格式化天气数据

旅行规划

多轮交互规划旅行路线和预订服务

能协调多个API完成复杂旅行安排

企业自动化

CRM集成

与Salesforce CRM系统集成处理客户请求

自动化常见客户服务流程

🚀 xLAM-2模型家族

大动作模型（LAMs）是先进的语言模型，旨在将用户意图转化为可执行的动作，以增强决策能力。作为AI智能体的“大脑”，LAMs能自主规划和执行任务以实现特定目标，在不同领域的工作流自动化中具有重要价值。本模型版本仅用于研究目的。

新的xLAM-2系列基于先进的数据合成、处理和训练管道构建，在多轮对话和工具使用方面取得了显著进展。该系列模型采用了新颖的APIGen - MT框架进行训练，通过模拟智能体与人类的交互生成高质量的训练数据。在BFCL和τ - bench基准测试中，我们的模型取得了领先的性能，超越了GPT - 4o和Claude 3.5等前沿模型。值得注意的是，即使是较小的模型在多轮场景中也展现出了卓越的能力，并且在多次试验中保持了出色的一致性。

我们还优化了聊天模板和vLLM集成，使构建先进的AI智能体变得更加容易。与之前的xLAM模型相比，xLAM - 2提供了更优越的性能，并能在各种应用中实现无缝部署。

Model Performance Overview
较大的xLAM - 2 - fc - r模型（8B - 70B，使用APIGen - MT数据训练）在函数调用（BFCL v3，截至2025年4月2日）和智能体能力（τ - bench）方面与最先进的基线模型的性能比较。

🚀 快速开始

框架版本

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

基本使用方法

使用Huggingface聊天模板

新的xLAM模型与Hugging Face Transformers库无缝协作，并使用自然的聊天模板，提供简单直观的对话体验。以下是使用这些模型的示例代码：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

使用vLLM进行推理

xLAM模型也可以使用vLLM高效服务，以实现高吞吐量的推理。请使用vllm>=0.6.5，因为早期版本会导致基于Qwen的模型性能下降。

安装和服务

安装所需版本的vLLM：

pip install "vllm>=0.6.5"

将工具解析器插件下载到本地路径：

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

启动与OpenAI API兼容的端点：

vllm serve Salesforce/xLAM-2-1b-fc-r \
  --enable-auto-tool-choice \
  --tool-parser-plugin ./xlam_tool_call_parser.py \
  --tool-call-parser xlam \
  --tensor-parallel-size 1

注意：确保已下载工具解析器插件文件，并且--tool-parser-plugin中指定的路径正确指向本地文件副本。xLAM系列模型都使用相同的工具调用解析器，因此所有模型只需下载一次。

使用OpenAI API进行测试

以下是一个使用服务端点测试工具使用的最小示例：

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

有关更高级的配置和部署选项，请参阅vLLM文档。

✨ 主要特性

多轮对话和工具使用能力提升：基于先进的数据合成、处理和训练管道，在多轮对话和工具使用方面取得显著进展。
高性能表现：在BFCL和τ - bench基准测试中超越了GPT - 4o和Claude 3.5等前沿模型。
易于集成：优化了聊天模板和vLLM集成，方便构建先进的AI智能体。

📦 安装指南

框架版本要求

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

vLLM安装

pip install "vllm>=0.6.5"

工具解析器插件下载

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

高级用法

使用vLLM进行推理

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

📚 详细文档

模型系列

属性	详情
模型类型	xLAM系列在许多方面表现出色，包括通用任务和函数调用。对于相同数量的参数，该模型在广泛的智能体任务和场景中进行了微调，同时保留了原始模型的能力。
训练数据	- Salesforce/APIGen - MT - 5k - Salesforce/xlam - function - calling - 60k

模型名称	总参数数量	上下文长度	类别	下载模型链接	下载GGUF文件链接
Llama - xLAM - 2 - 70b - fc - r	70B	128k	多轮对话、函数调用	🤗 链接	NA
Llama - xLAM - 2 - 8b - fc - r	8B	128k	多轮对话、函数调用	🤗 链接	🤗 链接
xLAM - 2 - 32b - fc - r	32B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	NA
xLAM - 2 - 3b - fc - r	3B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	🤗 链接
xLAM - 2 - 1b - fc - r	1B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	🤗 链接

*注意：基于Qwen - 2.5的模型的默认上下文长度为32k，但您可以使用YaRN（Yet Another Recursive Network）等技术实现最大128k的上下文长度。更多详细信息请参考此处。

您还可以在此处探索我们之前的xLAM系列。

-fc后缀表示这些模型针对函数调用任务进行了微调，而-r后缀表示这是一个研究版本。

✅ 所有模型都与vLLM和基于Transformers的推理框架完全兼容。

基准测试结果

伯克利函数调用排行榜（BFCL v3）

BFCL Results
不同模型在[BFCL排行榜](https://gorilla.cs.berkeley.edu/leaderboard.html)上的性能比较。排名基于整体准确率，这是不同评估类别的加权平均值。“FC”表示函数调用模式，与使用自定义“提示”提取函数调用相对。

τ - bench基准测试

Tau - bench Results
τ - bench基准测试的成功率（pass@1），至少进行5次试验并取平均值。我们的xLAM - 2 - 70b - fc - r模型在τ - bench上的总体成功率达到56.2%，显著优于基础的Llama 3.1 70B Instruct模型（38.2%）和其他开源模型，如DeepSeek v3（40.6%）。值得注意的是，我们的最佳模型甚至超过了专有模型，如GPT - 4o（52.9%），并接近Claude 3.5 Sonnet（new）（60.1%）等较新模型的性能。

Pass^k curves
Pass^k曲线衡量了给定任务在所有5次独立试验中都成功的概率，分别对τ - retail（左）和τ - airline（右）领域的所有任务进行平均。值越高表示模型的一致性越好。

伦理考虑

本版本仅用于支持学术论文的研究目的。我们的模型、数据集和代码并非专门为所有下游用途设计或评估。我们强烈建议用户在部署此模型之前评估并解决与准确性、安全性和公平性相关的潜在问题。我们鼓励用户考虑AI的常见局限性，遵守适用法律，并在选择用例时采用最佳实践，特别是在高风险场景中，错误或滥用可能会对人们的生活、权利或安全产生重大影响。有关用例的进一步指导，请参考我们的AUP和AI AUP。

模型许可证

🔧 技术细节

本模型采用新颖的APIGen - MT框架进行训练，该框架通过模拟智能体与人类的交互生成高质量的训练数据。在多轮对话和工具使用方面，模型基于先进的数据合成、处理和训练管道构建，从而在BFCL和τ - bench基准测试中取得了领先的性能。

📄 许可证

本模型使用CC - BY - NC - 4.0许可证。

引用

如果您在工作中使用了我们的模型或数据集，请引用我们的论文：

@article{prabhakar2025apigen,
  title={APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
  author={Prabhakar, Akshara and Liu, Zuxin and Zhu, Ming and Zhang, Jianguo and Awalgaonkar, Tulika and Wang, Shiyu and Liu, Zhiwei and Chen, Haolin and Hoang, Thai and others},
  journal={arXiv preprint arXiv:2504.03601},
  year={2025}
}

此外，请查看我们关于xLAM系列的其他优秀相关工作，并考虑也引用它们：

@article{zhang2025actionstudio,
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
  author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
  journal={arXiv preprint arXiv:2503.22673},
  year={2025}
}

@article{zhang2024xlam,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}

@article{liu2024apigen,
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={54463--54482},
  year={2024}
}

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}