japanese-clip-vit-b-16开源模型 - 支持日语文本与图像对比学习

首页

Japanese Clip Vit B 16

由 rinna 开发

由rinna株式会社训练的日语CLIP模型，支持日语文本与图像的对比学习

文本生成图像

Transformers

日语开源协议:Apache-2.0 #日语多模态 #零样本图像分类 #ViT-B/16架构

下载量 26.12k

发布时间 : 4/27/2022

模型简介

该模型是基于CLIP架构的多模态模型，能够将日语文本与图像映射到同一特征空间，实现跨模态检索和分类任务。

模型特点

日语专用

专门针对日语优化的CLIP模型，支持日语文本与图像的关联学习

多模态能力

能够同时处理图像和文本输入，实现跨模态的特征提取和匹配

预训练模型

基于大规模数据集(CC12M)预训练，可直接用于下游任务

模型能力

图像特征提取

日语文本特征提取

图像-文本相似度计算

跨模态检索

使用案例

图像分类

多标签图像分类

使用日语标签对图像进行分类

可输出各标签的概率分布

跨模态搜索

以文搜图

使用日语文本描述搜索相关图像

以图搜文

使用图像搜索匹配的日语文本描述

🚀 rinna/japanese-clip-vit-b-16

这是由rinna株式会社训练的日语CLIP（对比语言 - 图像预训练）模型。

rinna-icon

如需了解其他可用模型，请参阅 japanese-clip。

🚀 快速开始

安装包

$ pip install git+https://github.com/rinnakk/japanese-clip.git

运行代码

import io
import requests
from PIL import Image
import torch
import japanese_clip as ja_clip

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = ja_clip.load("rinna/japanese-clip-vit-b-16", cache_dir="/tmp/japanese_clip", device=device)
tokenizer = ja_clip.load_tokenizer()

img = Image.open(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tinysrgb&dpr=3&h=750&w=1260').content))
image = preprocess(img).unsqueeze(0).to(device)
encodings = ja_clip.tokenize(
    texts=["犬", "猫", "象"],
    max_seq_len=77,
    device=device,
    tokenizer=tokenizer, # this is optional. if you don't pass, load tokenizer each time
)

with torch.no_grad():
    image_features = model.get_image_features(image)
    text_features = model.get_text_features(**encodings)
    
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1.0, 0.0, 0.0]]

🔧 技术细节

模型架构

该模型使用ViT - B/16 Transformer架构作为图像编码器，并使用12层BERT作为文本编码器。图像编码器是从AugReg vit - base - patch16 - 224模型初始化而来的。

训练数据

该模型在CC12M上进行训练，其字幕已被翻译成日语。

📄 许可证

Apache 2.0许可证

📅 发布日期

2022年5月12日

📚 引用方式

@misc{rinna-japanese-clip-vit-b-16,
    title = {rinna/japanese-clip-vit-b-16},
    author = {Shing, Makoto and Zhao, Tianyu and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-clip-vit-b-16}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}