🚀 Ferret-UI(Gemma-2B版本)
Ferret-UI是首个以用户界面(UI)为中心的多模态大语言模型(MLLM),专为指称、定位和推理任务而设计。它基于Gemma-2B和Llama-3-8B构建,能够执行复杂的UI任务。此为Ferret-UI的Gemma-2B版本,其灵感来源于苹果公司的这篇论文。
🚀 快速开始
📦 安装指南
你需要先将builder.py
、conversation.py
、inference.py
、model_UI.py
和mm_utils.py
下载到本地。
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py
💻 使用示例
基础用法
from inference import inference_and_run
image_path = "appstore_reminders.png"
prompt = "Describe the image in details"
inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")
print("Inference Text:", inference_text)
高级用法
image_path = "appstore_reminders.png"
prompt = "What's inside the selected region?"
box = [189, 906, 404, 970]
inference_text = inference_and_run(
image_path=image_path,
prompt=prompt,
conv_mode="ferret_gemma_instruct",
model_path="jadechoghari/Ferret-UI-Gemma2b",
box=box
)
print("Inference Text:", inference_text)
定位提示
GROUNDING_TEMPLATES = [
'\nProvide the bounding boxes of the mentioned objects.',
'\nInclude the coordinates for each mentioned object.',
'\nLocate the objects with their coordinates.',
'\nAnswer in [x1, y1, x2, y2] format.',
'\nMention the objects and their locations using the format [x1, y1, x2, y2].',
'\nDraw boxes around the mentioned objects.',
'\nUse boxes to show where each thing is.',
'\nTell me where the objects are with coordinates.',
'\nList where each object is with boxes.',
'\nShow me the regions with boxes.'
]