dpt-swinv2-base-384开源模型 - 用于单目深度估计，实现高精度深度预测

首页

Dpt Swinv2 Base 384

由 Intel 开发

DPT (密集预测变换器) 模型在140万张图像上训练，用于单目深度估计。该模型使用Swinv2作为骨干网络，适用于高精度深度预测任务。

3D视觉

Transformers

开源协议:MIT #单目深度估计 #Swinv2骨干网络 #零样本学习

下载量 182

发布时间 : 12/10/2023

模型简介

DPT模型是一种基于视觉变换器的密集预测模型，专门用于单目深度估计任务。该版本采用Swinv2作为骨干网络，能够从单张图像预测深度信息。

模型特点

高精度深度估计

基于140万张图像训练，能够从单张图像预测精确的深度信息

Swinv2骨干网络

采用先进的Swinv2变换器架构作为骨干网络，具有强大的特征提取能力

零样本预测

无需特定场景的微调即可进行深度估计

模型能力

单目深度估计

图像深度预测

3D场景理解

使用案例

计算机视觉

3D场景重建

从单张图像重建3D场景

生成精确的深度图

增强现实

为AR应用提供场景深度信息

实现更真实的虚拟物体放置

机器人视觉

自主导航

为机器人提供环境深度感知

辅助路径规划和避障

🚀 DPT 3.1 (Swinv2 主干网络)

DPT（密集预测变换器）模型在140万张图像上进行训练，用于单目深度估计。该模型由Ranftl等人在2021年的论文Vision Transformers for Dense Prediction中提出，并首次在此仓库发布。

声明：发布DPT的团队并未为此模型撰写模型卡片，本模型卡片由Hugging Face团队撰写。

🚀 快速开始

本部分将引导你快速了解如何使用DPT 3.1 (Swinv2 主干网络) 模型进行单目深度估计。

✨ 主要特性

大规模训练：在140万张图像上进行训练，为单目深度估计提供了强大的能力。
先进架构：采用 Swinv2 模型作为主干网络，并添加了颈部和头部结构，用于单目深度估计。

📚 详细文档

模型描述

此DPT模型使用 Swinv2 模型作为主干网络，并在其基础上添加了颈部和头部结构，用于单目深度估计。

模型图像

如何使用

以下是如何使用此模型对图像进行零样本深度估计的示例：

基础用法

from transformers import DPTImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = DPTImageProcessor.from_pretrained("Intel/dpt-swinv2-base-384")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-swinv2-base-384")

# prepare image for the model
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

高级用法

你也可以使用管道API：

from transformers import pipeline

pipe = pipeline(task="depth-estimation", model="Intel/dpt-swinv2-base-384")
result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg")
result["depth"]