qhub-blip-image-captioning-finetuned开源模型 - 支持零售产品图像视觉问答应用

首页

Qhub Blip Image Captioning Finetuned

由 quadranttechnologies 开发

针对零售产品图像视觉问答任务微调后的BLIP模型版本，基于在线零售平台的图像和产品描述标注的自定义数据集进行了微调。

图像生成文本

Transformers

支持多种语言开源协议:Apache-2.0 #零售产品描述生成 #电商图像理解 #元数据增强

下载量 369

发布时间 : 11/7/2024

模型简介

该模型用于零售行业中对产品图像进行问答，支持产品元数据增强、人工生成产品描述的验证等应用场景。

模型特点

零售场景优化

针对零售产品图像进行了专门微调，能准确识别和描述各类商品

多模态理解

结合视觉和语言信息，实现图像到文本的转换

条件式生成

支持基于提示文本的条件式图像描述生成

模型能力

图像描述生成

产品识别

视觉问答

零售场景理解

使用案例

电子商务

产品元数据增强

自动为电商平台上的产品图像生成描述性文本

如准确识别并描述'凯膳怡专业立式搅拌机'等产品

产品描述验证

验证人工编写的产品描述是否与图像内容匹配

零售分析

货架商品识别

识别零售货架上的商品并生成描述

如准确识别'布什牌白豆罐头'等商品

🚀 微调图像描述模型

这是一个经过微调的BLIP模型，用于对零售产品图像进行视觉问答。该模型在自定义数据集上进行了微调，这些数据集包含来自在线零售平台的图像，并配有产品描述注释。

这个实验性模型可用于回答零售行业产品图像相关的问题。产品元数据丰富、验证人工生成的产品描述等都是可能的应用场景。

🚀 快速开始

模型信息

属性	详情
模型类型	用于零售产品图像视觉问答的微调版BLIP模型
训练数据	phiyodr/coco2017以及来自在线零售平台的自定义图像数据集，并配有产品描述注释
评估指标	BLEU
库名称	transformers
许可证	Apache-2.0

模型预测示例

输入图像	预测结果
	kitchenaid artisann stand mixer
	a bottle of milk sitting on a counter
	dove sensitive skin lotion
	bread bag with blue plastic handl
	bush ' s best white beans

💻 使用示例

基础用法

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
model = BlipForConditionalGeneration.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# conditional image captioning
text = "a photography of"
inputs = processor(raw_image, text, return_tensors="pt")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))

# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))

📚 详细文档

BibTex和引用信息

@misc{https://doi.org/10.48550/arxiv.2201.12086,
  doi = {10.48550/ARXIV.2201.12086},
  
  url = {https://arxiv.org/abs/2201.12086},
  
  author = {Li, Junnan and Li, Dongxu and Xiong, Caiming and Hoi, Steven},
  
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}