开源UIClip模型 - 精准量化UI截图与文本描述的设计质量和相关性

首页

Uiclip Jitteredwebsites 2 224 Paraphrased Webpairs Humanpairs

由 biglab 开发

UIClip 是一个用于量化用户界面（UI）截图在给定文本描述下的设计质量和相关性的模型。

多模态融合

Transformers

开源协议:MIT #UI设计评估 #多模态评分 #设计建议生成

下载量 232

发布时间 : 3/31/2024

模型简介

UIClip 是一个基于 CLIP 风格的多模态双编码器 Transformer 模型，用于评估 UI 的设计质量和视觉相关性，基于其截图和自然语言描述。该模型还可用于生成自然语言设计建议。

模型特点

设计质量评估

能够量化 UI 截图在给定文本描述下的设计质量和相关性。

自然语言设计建议

可以生成自然语言的设计建议，帮助改进 UI 设计。

大规模数据集训练

结合了自动爬取、合成增强和人工评分构建的大规模 UI 数据集进行训练。

多模态学习

同时处理图像（UI 截图）和文本（描述）输入，学习两者之间的关联。

模型能力

UI 设计质量评分

UI 设计相关性评估

自然语言设计建议生成

使用案例

UI 设计辅助

UI 代码生成

利用 UIClip 评估生成的 UI 代码的设计质量。

提高生成 UI 的设计质量和可用性。

UI 设计提示生成

基于 UIClip 的评估结果生成改进 UI 设计的提示。

帮助设计师快速识别和改进设计问题。

质量感知的 UI 示例搜索

使用 UIClip 评分筛选高质量的 UI 设计示例。

提供更相关和高质量的 UI 设计参考。

🚀 UIClip模型

UIClip是一款专为量化用户界面（UI）截图与文本描述之间的设计质量和相关性而设计的模型。它还能生成自然语言设计建议，为UI设计评估提供了强大的支持。

🚀 快速开始

UIClip模型可用于评估用户界面（UI）截图与文本描述的设计质量和相关性，还能生成自然语言设计建议。以下是使用该模型的示例代码：

import torch
from transformers import CLIPProcessor, CLIPModel

IMG_SIZE = 224
DEVICE = "cpu" # can also be "cuda" or "mps"
LOGIT_SCALE = 100 # based on OpenAI's CLIP example code
NORMALIZE_SCORING = True

model_path="uiclip_jitteredwebsites-2-224-paraphrased_webpairs_humanpairs" # can also be regular or web pairs variants
processor_path="openai/clip-vit-base-patch32"

model = CLIPModel.from_pretrained(model_path)
model = model.eval()
model = model.to(DEVICE)

processor = CLIPProcessor.from_pretrained(processor_path)

def compute_quality_scores(input_list):
    # input_list is a list of types where the first element is a description and the second is a PIL image
    description_list = ["ui screenshot. well-designed. " + input_item[0] for input_item in input_list]
    img_list = [input_item[1] for input_item in input_list]
    text_embeddings_tensor = compute_description_embeddings(description_list) # B x H
    img_embeddings_tensor = compute_image_embeddings(img_list) # B x H

    # normalize tensors
    text_embeddings_tensor /= text_embeddings_tensor.norm(dim=-1, keepdim=True)
    img_embeddings_tensor /= img_embeddings_tensor.norm(dim=-1, keepdim=True)

    if NORMALIZE_SCORING:
        text_embeddings_tensor_poor = compute_description_embeddings([d.replace("well-designed. ", "poor design. ") for d in description_list]) # B x H
        text_embeddings_tensor_poor /= text_embeddings_tensor_poor.norm(dim=-1, keepdim=True)
        text_embeddings_tensor_all = torch.stack((text_embeddings_tensor, text_embeddings_tensor_poor), dim=1) # B x 2 x H
    else:
        text_embeddings_tensor_all = text_embeddings_tensor.unsqueeze(1)

    img_embeddings_tensor = img_embeddings_tensor.unsqueeze(1) # B x 1 x H

    scores = (LOGIT_SCALE * img_embeddings_tensor @ text_embeddings_tensor_all.permute(0, 2, 1)).squeeze(1)

    if NORMALIZE_SCORING:
        scores = scores.softmax(dim=-1)

    return scores[:, 0]

def compute_description_embeddings(descriptions):
    inputs = processor(text=descriptions, return_tensors="pt", padding=True)
    inputs['input_ids'] = inputs['input_ids'].to(DEVICE)
    inputs['attention_mask'] = inputs['attention_mask'].to(DEVICE)
    text_embedding = model.get_text_features(**inputs)
    return text_embedding

def compute_image_embeddings(image_list):
    windowed_batch = [slide_window_over_image(img, IMG_SIZE) for img in image_list]
    inds = []
    for imgi in range(len(windowed_batch)):
        inds.append([imgi for _ in windowed_batch[imgi]])

    processed_batch = [item for sublist in windowed_batch for item in sublist]
    inputs = processor(images=processed_batch, return_tensors="pt")
    # run all sub windows of all images in batch through the model
    inputs['pixel_values'] = inputs['pixel_values'].to(DEVICE)
    with torch.no_grad():
        image_features = model.get_image_features(**inputs)

    # output contains all subwindows, need to mask for each image
    processed_batch_inds = torch.tensor([item for sublist in inds for item in sublist]).long().to(image_features.device)
    embed_list = []
    for i in range(len(image_list)):
        mask = processed_batch_inds == i
        embed_list.append(image_features[mask].mean(dim=0))
    image_embedding = torch.stack(embed_list, dim=0)
    return image_embedding

def preresize_image(image, image_size):
    aspect_ratio = image.width / image.height
    if aspect_ratio > 1:
        image = image.resize((int(aspect_ratio * image_size), image_size))
    else:
        image = image.resize((image_size, int(image_size / aspect_ratio)))
    return image

def slide_window_over_image(input_image, img_size):
    input_image = preresize_image(input_image, img_size)
    width, height = input_image.size
    square_size = min(width, height)
    longer_dimension = max(width, height)
    num_steps = (longer_dimension + square_size - 1) // square_size

    if num_steps > 1:
        step_size = (longer_dimension - square_size) // (num_steps - 1)
    else:
        step_size = square_size

    cropped_images = []

    for y in range(0, height - square_size + 1, step_size if height > width else square_size):
        for x in range(0, width - square_size + 1, step_size if width > height else square_size):
            left = x
            upper = y
            right = x + square_size
            lower = y + square_size
            cropped_image = input_image.crop((left, upper, right, lower))
            cropped_images.append(cropped_image)

    return cropped_images


# compute the quality scores for a list of descriptions (strings) and images (PIL images)
prediction_scores = compute_quality_scores(list(zip(test_descriptions, test_images)))

✨ 主要特性

多模态评估：结合文本描述和UI截图，量化设计质量和相关性。
设计建议生成：根据评估结果，生成自然语言设计建议。
大规模数据集训练：使用自动化爬取、合成增强和人工评分构建的大规模数据集进行训练。
高一致性：在与人类设计师评估结果的对比中，与真实排名的一致性最高。

📦 安装指南

文档中未提及具体安装步骤，故跳过该章节。

📚 详细文档

模型描述

UIClip是一个旨在根据文本描述量化用户界面（UI）截图的设计质量和相关性的模型。该模型还可用于生成自然语言设计建议（详见论文）。这是在UIST 2024上发表的论文“UIClip: A Data-driven Model for Assessing User Interface Design”（https://arxiv.org/abs/2404.12500）中描述的模型。

用户界面（UI）设计对于确保应用程序的可用性、可访问性和美学品质是一项困难但重要的任务。在论文中，我们开发了一个机器学习模型UIClip，用于根据UI的截图和自然语言描述评估其设计质量和视觉相关性。为了训练UIClip，我们结合了自动爬取、合成增强和人工评分，构建了一个大规模的UI数据集，按描述整理并按设计质量排名。通过在该数据集上进行训练，UIClip隐式地学习了好设计和坏设计的属性，具体通过以下两种方式：i) 分配一个代表UI设计相关性和质量的数值分数；ii) 提供设计建议。在一项将UIClip和其他基线模型的输出与12位人类设计师评估的UI进行比较的评估中，我们发现UIClip与真实排名的一致性最高。最后，我们展示了三个示例应用，说明了UIClip如何促进依赖于即时评估UI设计质量的下游应用：i) UI代码生成；ii) UI设计提示生成；iii) 质量感知的UI示例搜索。