标签:
- siglip
- siglip2
- 视觉
库名称: open_clip
管道标签: 零样本图像分类
许可证: apache-2.0
数据集:
- webli
ViT-B-16-SigLIP2-512 模型卡片
模型详情
一个在WebLI上训练的SigLIP 2视觉语言模型。
该模型已从Big Vision中的原始JAX检查点转换为OpenCLIP使用。
模型详情
- 模型类型: 对比图像-文本,零样本图像分类。
- 原始来源: https://github.com/google-research/big_vision
- 数据集: WebLI
- 论文:
- SigLIP 2: 具有改进语义理解、定位和密集特征的多语言视觉语言编码器: https://arxiv.org/abs/2502.14786
- 用于语言图像预训练的Sigmoid损失: https://arxiv.org/abs/2303.15343
模型使用
import torch
import torch.nn.functional as F
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer
model, preprocess = create_model_from_pretrained('hf-hub:timm/ViT-B-16-SigLIP2-512')
tokenizer = get_tokenizer('hf-hub:timm/ViT-B-16-SigLIP2-512')
image = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
image = preprocess(image).unsqueeze(0)
labels_list = ["一只狗", "一只猫", "一个甜甜圈", "一个贝奈特饼"]
text = tokenizer(labels_list, context_length=model.context_length)
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image, normalize=True)
text_features = model.encode_text(text, normalize=True)
text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)
zipped_list = list(zip(labels_list, [100 * round(p.item(), 3) for p in text_probs[0]]))
print("标签概率: ", zipped_list)
引用
@article{tschannen2025siglip,
title={SigLIP 2: 具有改进语义理解、定位和密集特征的多语言视觉语言编码器},
author={Tschannen, Michael and Gritsenko, Alexey and Wang, Xiao and Naeem, Muhammad Ferjad and Alabdulmohsin, Ibrahim and Parthasarathy, Nikhil and Evans, Talfan and Beyer, Lucas and Xia, Ye and Mustafa, Basil and H'enaff, Olivier and Harmsen, Jeremiah and Steiner, Andreas and Zhai, Xiaohua},
year={2025},
journal={arXiv预印本 arXiv:2502.14786}
}
@article{zhai2023sigmoid,
title={用于语言图像预训练的Sigmoid损失},
author={Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas},
journal={arXiv预印本 arXiv:2303.15343},
year={2023}
}
@misc{big_vision,
author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
title = {Big Vision},
year = {2022},
publisher = {GitHub},
journal = {GitHub仓库},
howpublished = {\url{https://github.com/google-research/big_vision}}
}