license: apache-2.0
pipeline_tag: zero-shot-object-detection
OmDet模型
OmDet模型由Om AI Lab的赵天澄、刘鹏、何轩、张璐、李奎松在论文《基于实时Transformer的高效融合头开放词汇检测》中提出。
GitHub代码库
如果您喜欢我们的模型,请考虑在GitHub上关注我们的项目OmDet,以获取更新和新模型发布的信息。
我们也邀请您探索我们在智能体框架上的最新工作OmAgent。
预期用途
该模型适用于零样本(也称为开放词汇)目标检测。
使用方法
单张图像推理
以下是如何加载模型并准备输入以在单张图像上执行零样本目标检测:
import requests
from PIL import Image
from transformers import AutoProcessor, OmDetTurboForObjectDetection
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
classes = ["猫", "遥控器"]
inputs = processor(image, text=classes, return_tensors="pt")
outputs = model(**inputs)
results = processor.post_process_grounded_object_detection(
outputs,
classes=classes,
target_sizes=[image.size[::-1]],
score_threshold=0.3,
nms_threshold=0.3,
)[0]
for score, class_name, box in zip(
results["scores"], results["classes"], results["boxes"]
):
box = [round(i, 1) for i in box.tolist()]
print(
f"检测到 {class_name},置信度 "
f"{round(score.item(), 2)},位置 {box}"
)
批量图像推理
OmDet-Turbo可以执行批量多图像推理,支持同一批次中不同的文本提示和类别:
>>> import torch
>>> import requests
>>> from io import BytesIO
>>> from PIL import Image
>>> from transformers import AutoProcessor, OmDetTurboForObjectDetection
>>> processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
>>> model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
>>> url1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image1 = Image.open(BytesIO(requests.get(url1).content)).convert("RGB")
>>> classes1 = ["猫", "遥控器"]
>>> task1 = "检测{}。".format(", ".join(classes1))
>>> url2 = "http://images.cocodataset.org/train2017/000000257813.jpg"
>>> image2 = Image.open(BytesIO(requests.get(url2).content)).convert("RGB")
>>> classes2 = ["船"]
>>> task2 = "检测所有看起来像船的东西。"
>>> url3 = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
>>> image3 = Image.open(BytesIO(requests.get(url3).content)).convert("RGB")
>>> classes3 = ["雕像", "树"]
>>> task3 = "聚焦前景,检测雕像和树。"
>>> inputs = processor(
... images=[image1, image2, image3],
... text=[classes1, classes2, classes3],
... task=[task1, task2, task3],
... return_tensors="pt",
... )
>>> with torch.no_grad():
... outputs = model(**inputs)
>>>
>>> results = processor.post_process_grounded_object_detection(
... outputs,
... classes=[classes1, classes2, classes3],
... target_sizes=[image1.size[::-1], image2.size[::-1], image3.size[::-1]],
... score_threshold=0.2,
... nms_threshold=0.3,
... )
>>> for i, result in enumerate(results):
... for score, class_name, box in zip(
... result["scores"], result["classes"], result["boxes"]
... ):
... box = [round(i, 1) for i in box.tolist()]
... print(
... f"在图像{i}中检测到{class_name},置信度"
... f"{round(score.item(), 2)},位置{box}"
... )
在图像0中检测到遥控器,置信度0.77,位置[39.9, 70.4, 176.7, 118.0]
在图像0中检测到猫,置信度0.72,位置[11.6, 54.2, 314.8, 474.0]
在图像0中检测到遥控器,置信度0.56,位置[333.4, 75.8, 370.7, 187.0]
在图像0中检测到猫,置信度0.55,位置[345.2, 24.0, 639.8, 371.7]
在图像1中检测到船,置信度0.32,位置[146.9, 219.8, 209.6, 250.7]
在图像1中检测到船,置信度0.3,位置[319.1, 223.2, 403.2, 238.4]
在图像1中检测到船,置信度0.27,位置[37.7, 220.3, 84.0, 235.9]
在图像1中检测到船,置信度0.22,位置[407.9, 207.0, 441.7, 220.2]
在图像2中检测到雕像,置信度0.73,位置[544.7, 210.2, 651.9, 502.8]
在图像2中检测到树,置信度0.25,位置[3.9, 584.3, 391.4, 785.6]
在图像2中检测到树,置信度0.25,位置[1.4, 621.2, 118.2, 787.8]
在图像2中检测到雕像,置信度0.2,位置[428.1, 205.5, 767.3, 759.5]