MiniCPM-V 2.6开源多模态视觉语言模型 - 支持图文转文本与多语言处理

首页

Minicpm V 2 6 Int4

由 openbmb 开发

MiniCPM-V 2.6是一个多模态视觉语言模型，支持图像文本到文本的转换，具备多语言处理能力。

图像生成文本

Transformers

其他#多模态直播 #实时语音对话 #多语言支持

下载量 122.58k

发布时间 : 8/4/2024

模型简介

MiniCPM-V 2.6是一个基于MiniCPM-V架构的多模态模型，专注于视觉语言任务，能够处理图像、文本、视频等多种输入，并生成相应的文本输出。

模型特点

多模态支持

支持图像、文本、视频等多种输入模态，能够处理复杂的多模态任务。

多语言处理

支持多种语言，具备跨语言处理能力。

高性能

相比前代模型有显著性能提升，支持实时处理。

模型能力

图像文本转换

多语言文本生成

视频内容分析

光学字符识别

多图像处理

使用案例

内容生成

图像描述生成

根据输入的图像生成详细的文本描述。

生成准确且详细的图像描述文本。

视频内容摘要

分析视频内容并生成简洁的文本摘要。

生成视频内容的文本摘要，便于快速理解。

文档处理

光学字符识别

从图像或视频中提取文字信息。

高精度的文字识别和提取。

🚀 MiniCPM-V 2.6 int4

MiniCPM-V 2.6 int4 是 MiniCPM-V 2.6 的 int4 量化版本，使用该版本进行推理时，GPU 内存占用更低（约 7GB）。

🚀 快速开始

环境依赖

在 NVIDIA GPU 上使用 Huggingface transformers 进行推理。测试环境为 Python 3.10，所需依赖如下：

Pillow==10.1.0
torch==2.1.2
torchvision==0.16.2
transformers==4.40.0
sentencepiece==0.1.99
accelerate==0.30.1
bitsandbytes==0.43.1

代码示例

# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(res)

## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7,
    stream=True
)

generated_text = ""
for new_text in res:
    generated_text += new_text
    print(new_text, flush=True, end='')

✨ 主要特性

多语言支持：支持多种语言的处理。
多模态能力：支持图像、文本输入，可处理图像识别、OCR 等任务。
低内存占用：int4 量化版本显著降低 GPU 内存使用。

📦 安装指南

按照上述依赖列表，使用以下命令安装所需库：

pip install Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 accelerate==0.30.1 bitsandbytes==0.43.1

💻 使用示例

基础用法

# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(res)

高级用法

## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7,
    stream=True
)

generated_text = ""
for new_text in res:
    generated_text += new_text
    print(new_text, flush=True, end='')

📚 详细文档

模型信息

属性	详情
模型类型	图像文本到文本模型
训练数据	openbmb/RLAIF-V-Dataset
基础模型	openbmb/MiniCPM-V-2_6