Internlm2 5 7b Chat 1m

由 internlm 开发

InternLM2.5开源了面向实用场景的70亿参数基础模型与对话模型，具有卓越的推理能力和100万上下文窗口。

大型语言模型

Transformers

开源协议:其他 #百万级长文本处理 #数学推理优化 #多工具调用

下载量 128

发布时间 : 6/28/2024

模型介绍

内容详情

替代品

模型简介

InternLM2.5-7B-Chat-1M是InternLM2.5-7B-Chat的100万上下文版本，支持长文本理解和工具调用。

模型特点

卓越的推理能力

在数学推理方面达到同量级模型最优表现，超越Llama3和Gemma2-9B等模型。

100万上下文窗口

在100万token长度的上下文中近乎完美地实现'大海捞针'，在LongBench等长文本任务中表现领先。

强大的工具调用能力

支持从超过100个网页收集信息，在指令遵循、工具选择和反思等方面具有更好的工具利用相关能力。

模型能力

长文本理解

数学推理

工具调用

对话生成

信息检索

使用案例

长文本处理

长文档分析

处理和分析长达100万token的文档内容

能够准确地在长文档中定位关键信息

智能助手

智能对话

作为智能助手进行自然语言对话

提供流畅、准确的对话体验

信息检索

网络信息收集

从多个网页收集和整合信息

能够有效利用工具获取网络信息

pipeline_tag: 文本生成 license: 其他

书生·浦语

书生·浦语 ^热门

💻GitHub仓库 • 🤔问题反馈 • 📜技术报告

👋 加入我们的Discord和微信群

简介

InternLM2.5开源了面向实用场景的70亿参数基础模型与对话模型。该模型具有以下特点：

卓越的推理能力：在数学推理方面达到同量级模型最优表现，超越Llama3和Gemma2-9B等模型。
100万上下文窗口：在100万token长度的上下文中近乎完美地实现"大海捞针"，在LongBench等长文本任务中表现领先。可通过LMDeploy体验100万上下文推理和文件对话演示。
更强的工具调用能力：InternLM2.5支持从超过100个网页收集信息，相关实现将很快在Lagent中发布。InternLM2.5在指令遵循、工具选择和反思等方面具有更好的工具利用相关能力。查看示例。

InternLM2.5-7B-Chat-1M

InternLM2.5-7B-Chat-1M是InternLM2.5-7B-Chat的100万上下文版本。

性能评估

我们采用"大海捞针"方法评估模型从长文本中检索信息的能力。结果显示，InternLM2.5-7B-Chat-1M能够准确地在长达100万token的文档中定位关键信息。

我们还使用LongBench基准测试评估长文档理解能力。我们的模型在这些测试中取得了最优表现。

LMDeploy

由于huggingface Transformers不直接支持100万上下文的推理，我们推荐使用LMDeploy。常规的huggingface Transformers使用方法如下所示。

LMDeploy是由MMRazor和MMDeploy团队开发的LLM压缩、部署和服务工具包。

以下是100万上下文推理的示例。注意：100万上下文需要4块A100-80G显卡！

pip install lmdeploy

您可以使用以下Python代码在本地运行批量推理：

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(
        rope_scaling_factor=2.5,
        session_len=1048576,  # 100万上下文长度
        max_batch_size=1,
        cache_max_entry_count=0.7,
        tp=4)  # 4块A100-80G
pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)
prompt = '用长提示替换此句子'
response = pipe(prompt)
print(response)

更多详情请参阅LMDeploy文档

通过Transformers加载

由于Transformers不支持100万长上下文，我们仅展示非长上下文的使用方法。要使用Transformers加载InternLM2 7B Chat模型，请使用以下代码：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat-1m", trust_remote_code=True)
# 设置`torch_dtype=torch.float16`以float16精度加载模型，否则会以float32加载导致内存不足错误
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat-1m", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
# 你好！有什么我可以帮助你的吗？
response, history = model.chat(tokenizer, "请提供三个时间管理建议", history=history)
print(response)

可以使用stream_chat实现流式响应：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "internlm/internlm2_5-7b-chat-1m"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = model.eval()
length = 0
for response, history in model.stream_chat(tokenizer, "你好", history=[]):
    print(response[length:], flush=True, end="")
    length = len(response)

vLLM

使用vLLM>=0.3.2启动兼容OpenAI的服务器：

pip install vllm

python -m vllm.entrypoints.openai.api_server --model internlm/internlm2_5-7b-chat-1m --served-model-name internlm2_5-7b-chat-1m --trust-remote-code

如果遇到内存不足问题，请尝试减少--max-model-len或增加--tensor-parallel-size。

然后可以向服务器发送聊天请求：

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "internlm2_5-7b-chat-1m",
    "messages": [
    {"role": "system", "content": "你是一个乐于助人的助手。"},
    {"role": "user", "content": "向我介绍深度学习。"}
    ]
    }'

更多详情请参阅vLLM文档

开源许可证

代码遵循Apache-2.0许可证开源，模型权重对学术研究完全开放，并允许免费商业使用。申请商业许可请填写申请表(英文)/申请表(中文)。其他问题或合作请联系internlm@pjlab.org.cn。

引用

@misc{cai2024internlm2,
      title={InternLM2技术报告},
      author={蔡峥 and 曹茂松 and 陈豪炯 and 陈恺 and 陈科宇 and 陈鑫 and 陈洵 and 陈泽辉 and 陈志 and 褚培 and 董晓艺 and 段昊东 and 范琪 and 费昭烨 and 高洋 and 葛佳烨 and 顾辰亚 and 顾雨泽 and 桂韬 and 郭爱佳 and 郭芃鹏 and 何从辉 and 胡颖帆 and 黄婷 and 姜涛 and 焦鹏龙 and 金振疆 and 雷志凯 and 李佳兴 and 李静雯 and 李林阳 and 李帅彬 and 李威 and 李奕宁 and 刘宏伟 and 刘江宁 and 洪嘉伟 and 刘凯文 and 刘奎坤 and 刘笑然 and 吕承琪 and 吕海军 and 吕锴 and 马莉 and 马润远 and 马泽润 and 宁文长 and 欧阳林克 and 邱建涛 and 曲元 and 尚福凯 and 邵云帆 and 宋德民 and 宋子凡 and 隋志浩 and 孙鹏 and 孙宇 and 唐焕泽 and 王斌 and 王国腾 and 王佳琪 and 王佳宇 and 王睿 and 王宇栋 and 王子怡 and 魏星剑 and 翁启真 and 吴凡 and 熊颖瞳 and 徐超 and 徐瑞良 and 严航 and 严一荣 and 杨晓桂 and 叶皓晨 and 应怀原 and 余佳 and 余婧 and 臧宇航 and 张楚瑜 and 张莉 and 张攀 and 张鹏 and 张瑞杰 and 张硕 and 张松阳 and 张文健 and 张文蔚 and 张星城 and 张新月 and 赵辉 and 赵倩 and 赵晓萌 and 周凤哲 and 周载道 and 卓靖明 and 邹奕成 and 邱锡鹏 and 乔宇 and 林达华},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}