license: apache-2.0
library_name: timm
tags:
vit_large_patch14_dinov2.lvd142m 模型卡
基于视觉Transformer(ViT)的图像特征模型。采用自监督DINOv2方法在LVD-142M数据集上预训练。
模型详情
- 模型类型: 图像分类/特征主干网络
- 模型统计:
- 参数量(百万): 304.4
- GMACs运算量: 507.1
- 激活值(百万): 1058.8
- 图像尺寸: 518 × 518
- 相关论文:
- DINOv2: 无监督学习鲁棒视觉特征: https://arxiv.org/abs/2304.07193
- 一幅图像等价于16x16单词: 大规模图像识别的Transformer应用: https://arxiv.org/abs/2010.11929v2
- 原始项目: https://github.com/facebookresearch/dinov2
- 预训练数据集: LVD-142M
模型使用
图像分类
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('vit_large_patch14_dinov2.lvd142m', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
图像嵌入
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'vit_large_patch14_dinov2.lvd142m',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
模型比较
在timm的模型结果中查看该模型的数据集和运行时指标。
引用文献
@misc{oquab2023dinov2,
title={DINOv2: 无监督学习鲁棒视觉特征},
author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
journal={arXiv:2304.07193},
year={2023}
}
@article{dosovitskiy2020vit,
title={一幅图像等价于16x16单词: 大规模图像识别的Transformer应用},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch图像模型库},
year = {2019},
publisher = {GitHub},
journal = {GitHub仓库},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}