许可协议: cc-by-nc-4.0
库名称: timm
标签:
convnextv2_large.fcmae 模型卡
一个基于ConvNeXt-V2的自监督特征表示模型。采用全卷积掩码自编码器框架(FCMAE)进行预训练。该模型不含预训练头部,仅适用于微调或特征提取任务。
模型详情
- 模型类型: 图像分类/特征主干网络
- 模型统计:
- 参数量(百万): 196.4
- GMACs运算量: 34.4
- 激活值(百万): 43.1
- 图像尺寸: 224×224像素
- 相关论文:
- 《ConvNeXt V2: 协同设计与通过掩码自编码器扩展卷积网络》: https://arxiv.org/abs/2301.00808
- 原始实现: https://github.com/facebookresearch/ConvNeXt-V2
- 预训练数据集: ImageNet-1k
模型使用
图像分类
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('convnextv2_large.fcmae', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
特征图提取
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'convnextv2_large.fcmae',
pretrained=True,
features_only=True,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
for o in output:
print(o.shape)
图像嵌入
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'convnextv2_large.fcmae',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
模型对比
在timm的模型结果中查看该模型的数据集和运行时指标。
所有计时数据来自RTX 3090显卡上PyTorch 1.13的eager模式(启用AMP)。
(此处保留原始表格格式,内容为模型性能对比数据,包含top1准确率、top5准确率、图像尺寸、参数量等指标)
引用文献
@article{Woo2023ConvNeXtV2,
title={ConvNeXt V2: 协同设计与通过掩码自编码器扩展卷积网络},
author={Woo, Sanghyun and Debnath, Shoubhik and Hu, Ronghang and Chen, Xinlei and Liu, Zhuang and Kweon, In So and Xie, Saining},
year={2023},
journal={arXiv预印本 arXiv:2301.00808},
}
@misc{rw2019timm,
author = {Wightman, Ross},
title = {PyTorch图像模型库},
year = {2019},
publisher = {GitHub},
journal = {GitHub仓库},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}