许可证:apache-2.0
库名称:transformers
任务标签:目标检测
基础模型:THUDM/cogvlm-grounding-generalist-hf
量化者:Rodeszones
CogVLM
CogVLM Grounding通用模型,采用bitsandbytes 4位精度量化
CogVLM 是一款强大的开源视觉语言模型(VLM)。CogVLM-17B拥有100亿视觉参数和70亿语言参数,在10个经典跨模态基准测试(包括NoCaps、Flicker30k字幕生成、RefCOCO、RefCOCO+、RefCOCOg、Visual7W、GQA、ScienceQA、VizWiz VQA和TDIUC)中达到最先进性能,并在VQAv2、OKVQA、TextVQA、COCO字幕生成等任务中排名第二,超越或媲美PaLI-X 55B。
我的环境pip列表
pip install torch==2.2.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 transformers==4.38.1 accelerate==0.27.2 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.24 protobuf==3.20.3 triton==2.1.0 bitsandbytes==0.43.0.dev0
Windows系统安装triton和bitsandbytes需使用以下文件:
pip install bitsandbytes-0.43.0.dev0-cp310-cp310-win_amd64.whl
pip install triton-2.1.0-cp310-cp310-win_amd64.whl
快速开始
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, LlamaTokenizer
model_path = "'本地模型文件夹路径' 或 'Rodeszones/CogVLM-grounding-generalist-hf-quant4'"
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).eval()
query = '能否描述这张图片并标注每个提及物体的坐标[[x0,y0,x1,y1]]?'
image = Image.open("你的图片路径").convert('RGB')
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])
inputs = {
'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
}
gen_kwargs = {"max_length": 2048, "do_sample": False}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0]))
许可证
本仓库代码遵循Apache-2.0许可证开源,CogVLM模型权重的使用需遵守模型许可证。
引用
@article{wang2023cogvlm,
title={CogVLM: Visual Expert for Pretrained Language Models},
author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
year={2023},
eprint={2311.03079},
archivePrefix={arXiv},
primaryClass={cs.CV}
}