blip-large-long-cap开源图像描述生成器 - 免费用于文生图提示与数据集标注

首页

Blip Large Long Cap

由 unography 开发

基于BLIP微调的长文本图像描述生成器，适用于文生图提示和图像数据集标注

图像生成文本

Transformers

开源协议:Bsd-3-clause #长文本图像描述 #文生图提示生成 #图像数据集标注

下载量 26.87k

发布时间 : 4/16/2024

模型简介

该模型是基于BLIP架构微调的图像描述生成模型，特别优化生成长文本描述，适用于文本到图像生成提示和图像数据集标注任务。

模型特点

长文本描述生成

特别优化生成长文本图像描述，最大长度可达300个token

多场景适用

适用于各种场景的图像描述生成，包括自然场景、人物活动等

条件式与非条件式生成

支持带条件和不带条件的图像描述生成模式

模型能力

图像转文本

长文本描述生成

图像内容分析

多场景图像理解

使用案例

文本到图像生成

AI绘画提示生成

为文本到图像生成系统提供详细的描述性提示

生成可用于AI绘画系统的详细提示文本

图像数据集标注

自动图像标注

为图像数据集生成详细的描述性标注

减少人工标注工作量，提高数据集标注效率

🚀 LongCap：基于BLIP微调的图像长描述生成模型

LongCap 模型基于 BLIP 微调，可生成图像的长描述，适用于文生图提示和文生图数据集标注

🚀 快速开始

本模型可用于有条件和无条件的图像描述生成。

✨ 主要特性

长描述生成：能够生成图像的长描述，为图像提供丰富的文本信息。
广泛适用性：适用于文生图的提示生成以及文生图数据集的标注。

📦 安装指南

文档未提及安装步骤，可参考 Hugging Face 上相关模型的通用安装方法。

💻 使用示例

基础用法

你可以使用此模型进行有条件和无条件的图像描述生成。

使用 PyTorch 模型

在 CPU 上运行模型

点击展开

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.

高级用法

在 GPU 上运行模型

全精度运行

点击展开

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap").to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.

半精度 (`float16`) 运行

点击展开

import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap", torch_dtype=torch.float16).to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.