🚀 Ferret - UI
Ferret - UI是首个以用户界面(UI)为中心的多模态大语言模型(MLLM),专为指称、定位和推理任务而设计。它基于Gemma - 2B和Llama - 3 - 8B构建,能够执行复杂的UI任务。这是Ferret - UI的Llama - 3 - 8B版本,其灵感来源于苹果公司的这篇论文。
🚀 快速开始
📦 安装指南
你首先需要将builder.py
、conversation.py
、inference.py
、model_UI.py
和mm_utils.py
下载到本地。
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py
💻 使用示例
基础用法
from inference import inference_and_run
image_path = "appstore_reminders.png"
prompt = "Describe the image in details"
inference_text = inference_and_run(image_path, prompt)
print("Inference Text:", inference_text)
高级用法
image_path = "appstore_reminders.png"
prompt = "What's inside the selected region?"
box = [189, 906, 404, 970]
inference_text = inference_and_run(
image_path=image_path,
prompt=prompt,
conv_mode="ferret_llama_3",
model_path="jadechoghari/Ferret-UI-Llama8b",
box=box
)
print("Inference Text:", inference_text)
GROUNDING_TEMPLATES = [
'\nProvide the bounding boxes of the mentioned objects.',
'\nInclude the coordinates for each mentioned object.',
'\nLocate the objects with their coordinates.',
'\nAnswer in [x1, y1, x2, y2] format.',
'\nMention the objects and their locations using the format [x1, y1, x2, y2].',
'\nDraw boxes around the mentioned objects.',
'\nUse boxes to show where each thing is.',
'\nTell me where the objects are with coordinates.',
'\nList where each object is with boxes.',
'\nShow me the regions with boxes.'
]