license: other
pipeline_tag: visual-question-answering
InternLM-XComposer-2.5-OL
InternLM-XComposer2.5-OL,一个支持长时流式视频与音频交互的全方位多模态系统。
通过Transformers加载
使用以下代码通过Transformers加载基础大语言模型:
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
model.tokenizer = tokenizer
使用MS-Swift加载基础音频模型的代码如下:
import os
os.environ['USE_HF'] = 'True'
import torch
from swift.llm import (
get_model_tokenizer, get_template, ModelType,
get_default_template_type, inference
)
from swift.utils import seed_everything
model_type = ModelType.qwen2_audio_7b_instruct
model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
model_kwargs={'device_map': 'cuda:0'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
快速开始
以下示例展示如何通过🤗 Transformers使用InternLM-XComposer-2.5-OL。完整指南请参阅此处。
音频理解
import os
os.environ['USE_HF'] = 'True'
import torch
from swift.llm import (
get_model_tokenizer, get_template, ModelType,
get_default_template_type, inference
)
from swift.utils import seed_everything
model_type = ModelType.qwen2_audio_7b_instruct
model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
model_kwargs={'device_map': 'cuda:0'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = '<audio>检测语言并识别语音内容。'
response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3')
print(f'query: {query}')
print(f'response: {response}')
图像理解
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
model.tokenizer = tokenizer
query = '请详细分析给定图像'
image = ['examples/images/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
引用
若您的研究或应用受益于Euclid,请使用以下BibTeX引用:
@misc{zhang2024internlmxcomposer25omnilivecomprehensivemultimodallongterm,
title={InternLM-XComposer2.5-OmniLive: 面向长时流式音视频交互的全方位多模态系统},
author={张潘、董晓艺、曹宇航、臧宇航、钱锐、魏熙林、陈林、李逸飞、牛俊博、丁双睿、郭启鹏、段昊东、陈昕、吕涵、聂政、闵张、王斌、张文蔚、张昕月、葛佳乐、李伟、李静雯、涂忠颖、何聪慧、张星辰、陈恺、乔宇、林达华、王佳琪},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.09596},
}
开源许可
代码遵循Apache-2.0许可证,模型权重对学术研究完全开放并允许免费商用。商业授权申请请填写申请表(中/英文)。其他合作需求请联系internlm@pjlab.org.cn。