pipeline_tag: 图像转文本
tags:
- 图像描述生成
languages:
- 英文
license: bsd-3-clause协议
widget:
- src: >-
https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
example_title: 热带草原
- src: >-
https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
example_title: 足球比赛
- src: >-
https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
example_title: 机场
datasets:
- unography/laion-14k-GPT4V-LIVIS-Captions
inference:
parameters:
max_length: 300
LongCap模型:基于BLIP微调的长文本图像描述生成器,适用于文生图提示和图像数据集标注
使用说明
本模型支持条件式与非条件式图像描述生成
使用PyTorch模型
在CPU上运行
点击展开
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在海滩上,穿着格子衬衫和狗项圈。女士正在与左侧的狗狗互动,背景是泛着金色光芒的平静海面。
在GPU上运行
全精度模式
点击展开
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap").to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在海滩上,穿着格子衬衫和狗项圈。女士正在与左侧的狗狗互动,背景是泛着金色光芒的平静海面。
半精度模式(float16)
点击展开
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap", torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> 一位女士坐在海滩上,穿着格子衬衫和狗项圈。女士正在与左侧的狗狗互动,背景是泛着金色光芒的平静海面。