🚀 微调图像描述模型
这是一个经过微调的BLIP模型,用于对零售产品图像进行视觉问答。该模型在自定义数据集上进行了微调,这些数据集包含来自在线零售平台的图像,并配有产品描述注释。
这个实验性模型可用于回答零售行业产品图像相关的问题。产品元数据丰富、验证人工生成的产品描述等都是可能的应用场景。
🚀 快速开始
模型信息
属性 |
详情 |
模型类型 |
用于零售产品图像视觉问答的微调版BLIP模型 |
训练数据 |
phiyodr/coco2017以及来自在线零售平台的自定义图像数据集,并配有产品描述注释 |
评估指标 |
BLEU |
库名称 |
transformers |
许可证 |
Apache-2.0 |
模型预测示例
输入图像 |
预测结果 |
 |
kitchenaid artisann stand mixer |
 |
a bottle of milk sitting on a counter |
 |
dove sensitive skin lotion |
 |
bread bag with blue plastic handl |
 |
bush ' s best white beans |
💻 使用示例
基础用法
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
model = BlipForConditionalGeneration.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
text = "a photography of"
inputs = processor(raw_image, text, return_tensors="pt")
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
inputs = processor(raw_image, return_tensors="pt")
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
📚 详细文档
BibTex和引用信息
@misc{https://doi.org/10.48550/arxiv.2201.12086,
doi = {10.48550/ARXIV.2201.12086},
url = {https://arxiv.org/abs/2201.12086},
author = {Li, Junnan and Li, Dongxu and Xiong, Caiming and Hoi, Steven},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}