license: llama2
模型详情
注意:使用本模型需遵守Meta许可协议。下载模型权重和分词器前,请访问官网并接受Llama 2社区许可协议,再申请访问权限。
模型类型:
基于开源大语言模型生成指令数据训练的LLaVA视觉语言模型。
开发状态:
FireLLaVA 13B版本训练于2023年12月
论文及资源:
更多信息请参阅:https://llava-vl.github.io/
使用指南
本模型已部署在Fireworks.ai平台,可在此体验:https://app.fireworks.ai/models/fireworks/firellava-13b
API调用指南详见:https://readme.fireworks.ai/docs/querying-vision-language-models
若需通过huggingface transformers库本地运行模型,请遵循以下说明:
首先确保transformers版本≥4.35.3。本模型支持多图像多提示词生成,即单次提示可传入多张图片。使用时需遵循标准提示模板(USER: xxx\nASSISTANT:),并在查询图像位置添加<image>标记。
需注意:由于训练时未使用多图像输入,多图场景下模型性能可能下降。
使用pipeline
接口
from transformers import pipeline
from PIL import Image
import requests
model_id = "fireworks-ai/FireLLaVA-13b"
pipe = pipeline("image-to-text", model=model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nWhat is the make of the car? Answer with one word or phrase.\n\nASSISTANT:"
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
print(outputs)
>>> [{'generated_text': 'USER: \nWhat is the make of the car? Answer with one word or phrase.\n\nASSISTANT: Volkswagen'}]
使用原生transformers
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "fireworks-ai/FireLLaVA-13b"
prompt = "USER: <image>\nWhat is this?\n\nASSISTANT:"
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
).to(0)
processor = AutoProcessor.from_pretrained(model_id)
raw_image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0], skip_special_tokens=True))
>>> "这是辆早期大众甲壳虫汽车,俗称VW bug,停放在砖铺街道上,旁边是带有多扇门的建筑..."