许可证: bsd-3-clause
标签:
- 图像描述生成
数据集:
- unography/laion-14k-GPT4V-LIVIS-Captions
管道类型: 图像转文本
语言:
- 英文
示例:
- 图片: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
示例标题: 热带草原
- 图片: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
示例标题: 足球比赛
- 图片: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
示例标题: 机场
推理参数:
最大长度: 250
束搜索数: 3
重复惩罚系数: 2.5
LongCap: 基于BLIP微调的图像长描述生成模型,适用于文生图提示和图像数据集标注
使用方法
本模型可用于条件式与非条件式图像描述生成
使用PyTorch模型
在CPU上运行模型
点击展开
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-long-cap")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250, num_beams=3, repetition_penalty=2.5)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在沙滩上,正在与一只戴着蓝白格子项圈的狗互动。狗位于女士左侧,女士手中似乎握着某物。背景是宁静的海滩场景,海浪拍打着海岸。画面中没有其他动物或人物出现。根据光线和阴影判断,时间可能是清晨或傍晚时分。
在GPU上运行模型
全精度模式
点击展开
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-long-cap").to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250, num_beams=3, repetition_penalty=2.5)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在沙滩上,正在与一只戴着蓝白格子项圈的狗互动。狗位于女士左侧,女士手中似乎握着某物。背景是宁静的海滩场景,海浪拍打着海岸。画面中没有其他动物或人物出现。根据光线和阴影判断,时间可能是清晨或傍晚时分。
半精度模式(float16
)
点击展开
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-long-cap", torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250, num_beams=3, repetition_penalty=2.5)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在沙滩上,正在与一只戴着蓝白格子项圈的狗互动。狗位于女士左侧,女士手中似乎握着某物。背景是宁静的海滩场景,海浪拍打着海岸。画面中没有其他动物或人物出现。根据光线和阴影判断,时间可能是清晨或傍晚时分。