标签:
- 图像特征提取
- 鸟类识别
- pytorch
库名称: birder
许可证: apache-2.0
vit_reg4_b16_mim 模型卡
一个采用掩码图像建模(MIM)预训练的ViT reg4图像编码器。该模型未针对特定分类任务进行微调,旨在作为通用特征提取器或用于下游任务(如目标检测、分割或自定义分类)的骨干网络。
模型详情
模型使用
图像嵌入提取
import torch
import birder
from PIL import Image
(net, model_info) = birder.load_pretrained_model("vit_reg4_b16_mim_300", inference=True)
size = birder.get_size_from_signature(model_info.signature)
transform = birder.classification_transform(size, model_info.rgb_stats)
image = Image.open("图片路径.jpeg")
input_tensor = transform(image).unsqueeze(dim=0)
with torch.inference_mode():
embedding = net.embedding(input_tensor)
或使用load_model_with_cfg
函数
import torch
import birder
from PIL import Image
(net, cfg) = birder.load_model_with_cfg("models/vit_reg4_b16_mim.json", "models/vit_reg4_b16_mim_300.pt")
net.eval()
size = birder.get_size_from_signature(cfg["signature"])
transform = birder.classification_transform(size, cfg["rgb_stats"])
image = Image.open("图片路径.jpeg")
input_tensor = transform(image).unsqueeze(dim=0)
with torch.inference_mode():
embedding = net.embedding(input_tensor)
引用文献
@misc{dosovitskiy2021imageworth16x16words,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
year={2021},
eprint={2010.11929},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2010.11929},
}
@misc{darcet2024visiontransformersneedregisters,
title={Vision Transformers Need Registers},
author={Timothée Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},
year={2024},
eprint={2309.16588},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2309.16588},
}
@misc{he2021maskedautoencodersscalablevision,
title={Masked Autoencoders Are Scalable Vision Learners},
author={Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Dollár and Ross Girshick},
year={2021},
eprint={2111.06377},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2111.06377},
}