语言:
- 英文
许可证: GPL-3.0
库名称: transformers
标签:
- CLIP
- 视觉
- 医学
- BERT
管道标签: 零样本图像分类
小部件示例:
- 图片: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg
候选标签: 胸部X光片、脑部MRI、腹部CT扫描、超声检查、口腔全景片
示例标题: 腹部CT扫描
- 图片: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00319.jpg
候选标签: 胸部X光片、脑部MRI、腹部CT扫描、超声检查、口腔全景片
示例标题: 胸部X光片
- 图片: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00016.jpg
候选标签: 胸部X光片、脑部MRI、腹部CT扫描、超声检查、口腔全景片
示例标题: MRI
- 图片: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_02259.jpg
候选标签: 胸部X光片、脑部MRI、腹部CT扫描、超声检查、口腔全景片
示例标题: 超声检查
基础模型: openai/clip-vit-large-patch14
RCLIP(基于放射影像及其描述微调的CLIP模型)
本模型是基于openai/clip-vit-large-patch14作为图像编码器,microsoft/BiomedVLP-CXR-BERT-general作为文本编码器,在ROCO数据集上微调的版本。
在评估集上取得如下结果:
热力图
下图展示了ROCO数据集测试集前30个样本图像与其描述文本的相似度得分热力图:

图像检索
该模型可用于图像检索任务,示例如下:
1-保存图像嵌入
点击查看代码
from PIL import Image
import numpy as np
import pickle, os, torch
from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
images_path = "/图片路径/"
images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
image_embeds = []
for img in images:
with torch.no_grad():
inputs = processor(text=None, images=Image.open(img), return_tensors="pt", padding=True)
outputs = model.get_image_features(**inputs)[0].numpy()
image_embeds.append(outputs)
with open("embeddings.pkl", 'wb') as f:
pickle.dump(np.array(image_embeds), f)
2-图像查询
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from PIL import Image
import pickle, torch, os
from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
query = "胸部X光照片"
inputs = processor(text=query, images=None, return_tensors="pt", padding=True)
with torch.no_grad():
query_embedding = model.get_text_features(**inputs)[0].numpy()
with open("embeddings.pkl", 'rb') as f:
image_embeds = pickle.load(f)
def find_k_similar_images(query_embedding, image_embeds, k=2):
similarities = cosine_similarity(query_embedding.reshape(1, -1), image_embeds)
closest_indices = np.argsort(similarities[0])[::-1][:k]
return closest_indices
similar_image_indices = find_k_similar_images(query_embedding, image_embeds, k=k)
images_path = "/图片路径/"
images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
similar_image_names = [images[index] for index in similar_image_indices]
Image.open(similar_image_names[0])
零样本图像分类
该模型可有效用于零样本图像分类,示例如下:
import requests
from PIL import Image
import matplotlib.pyplot as plt
from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
url = "https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg"
image = Image.open(requests.get(url, stream=True).raw)
possible_class_names = ["胸部X光片", "脑部MRI", "腹部CT扫描", "超声检查", "口腔全景片"]
inputs = processor(text=possible_class_names, images=image, return_tensors="pt", padding=True)
probs = model(**inputs).logits_per_image.softmax(dim=1).squeeze()
print("".join([x[0] + ": " + x[1] + "\n" for x in zip(possible_class_names, [format(prob, ".4%") for prob in probs])]))
image
指标
训练损失 |
周期 |
步数 |
验证损失 |
0.0974 |
4.13 |
22500 |
0.3388 |
展开查看所有步骤
训练损失 |
周期 |
步数 |
验证损失 |
0.7951 |
0.09 |
500 |
1.1912 |
...(其余数据行保持原样)... |
|
|
|
超参数
训练使用的超参数:
- 学习率: 5e-05
- 训练批大小: 24
- 评估批大小: 24
- 随机种子: 42
- 优化器: 带betas=(0.9,0.999)和epsilon=1e-08的Adam
- 学习率调度器类型: 余弦
- 学习率预热步数: 500
- 训练周期数: 8.0
框架版本
- Transformers 4.31.0.dev0
- PyTorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3
引用
@misc{https://doi.org/10.57967/hf/0896,
doi = {10.57967/HF/0896},
url = {https://huggingface.co/kaveh/rclip},
author = {{Kaveh Shahhosseini}},
title = {rclip},
publisher = {Hugging Face},
year = {2023}
}