许可证:apache-2.0
标签:
- 目标检测
- 视觉
数据集:
- coco
小部件示例:
- 图片链接:https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
示例标题:热带草原
- 图片链接:https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
示例标题:足球比赛
- 图片链接:https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
示例标题:机场
YOLOS(微型版)模型
YOLOS模型基于COCO 2017目标检测数据集(11.8万张标注图像)微调而成。该模型由Fang等人在论文《You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection》中提出,并首次发布于此代码库。
免责声明:YOLOS发布团队未为此模型编写说明卡片,本卡片由Hugging Face团队撰写。
模型描述
YOLOS是一种使用DETR损失训练的视觉Transformer(ViT)。尽管结构简单,基础尺寸的YOLOS模型在COCO 2017验证集上达到了42 AP(与DETR及更复杂的框架如Faster R-CNN相当)。
模型训练采用“二分匹配损失”:将N=100个对象查询的预测类别和边界框与填充至相同长度N的真实标注进行对比(例如,若图像仅含4个物体,其余96个标注的类别为“无物体”,边界框为“无边界框”)。匈牙利匹配算法用于在查询与标注间建立最优一对一映射,随后通过交叉熵损失(类别)及L1与广义IoU损失的线性组合(边界框)优化模型参数。
用途与限制
该模型可直接用于目标检测任务。访问模型库获取全部YOLOS模型。
使用方法
示例代码如下:
from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image
import torch
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
bboxes = outputs.pred_boxes
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f"检测到 {model.config.id2label[label.item()]},置信度 "
f"{round(score.item(), 3)},位置 {box}"
)
当前特征提取器与模型仅支持PyTorch。
训练数据
YOLOS先在ImageNet-1k上预训练,再在COCO 2017目标检测数据集(含11.8万训练图/5千验证图)上微调。
训练过程
模型在ImageNet-1k上预训练300轮,在COCO上微调300轮。
评估结果
该模型在COCO 2017验证集上的平均精度(AP)为28.7。详细评估结果请参阅原论文。
引用信息
@article{DBLP:journals/corr/abs-2106-00666,
author = {Yuxin Fang and Bencheng Liao and Xinggang Wang and Jiemin Fang and Jiyang Qi and Rui Wu and Jianwei Niu and Wenyu Liu},
title = {You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection},
journal = {CoRR},
volume = {abs/2106.00666},
year = {2021},
url = {https://arxiv.org/abs/2106.00666},
eprinttype = {arXiv},
eprint = {2106.00666},
timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}