SegFormer-b2开源语义分割模型 - 免费部署助力珊瑚礁生态系统图像分割

首页

Segformer B2 Finetuned Coralscapes 1024 1024

由 EPFL-ECEO 开发

这是一个基于SegFormer架构的语义分割模型，专门针对珊瑚礁生态系统的图像分割任务进行了优化，在Coralscapes数据集上微调。

图像分割

Transformers

开源协议:Apache-2.0 #珊瑚礁语义分割 #高分辨率图像处理 #生态监测

下载量 139

发布时间 : 3/7/2025

模型简介

该模型主要用于珊瑚礁生态系统的语义分割任务，能够识别和分割珊瑚礁图像中的不同类别。基于MiT-B2骨干网络，在1024x1024分辨率下针对Coralscapes数据集进行了微调。

模型特点

高分辨率处理能力

支持1024x1024高分辨率图像输入，适合珊瑚礁图像的精细分割

珊瑚礁专用优化

专门针对Coralscapes数据集进行微调，在珊瑚礁分割任务上表现优异

滑动窗口支持

提供滑动窗口分割策略，可处理任意尺寸的输入图像

模型能力

珊瑚礁图像分割

水下场景理解

生态监测

使用案例

生态监测

珊瑚礁健康评估

通过分割珊瑚礁图像中的不同区域，评估珊瑚礁健康状况

可识别40种不同类别的珊瑚和海洋生物

海洋生态研究

用于研究珊瑚礁生态系统变化和生物多样性

提供精确的珊瑚覆盖率统计数据

环境保护

珊瑚礁保护监测

监测珊瑚礁退化情况，为保护措施提供数据支持

🚀 珊瑚礁语义分割模型

本项目基于SegFormer模型，使用MiT - B2骨干网络在Coralscapes数据集上进行微调，可用于珊瑚礁图像的语义分割，为珊瑚礁生态研究提供了有力的技术支持。

🚀 快速开始

使用此模型对Coralscapes数据集中的图像进行分割的最简单方法如下：

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
from datasets import load_dataset

# 从coralscapes数据集中加载图像或加载您自己的图像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

inputs = preprocessor(image, return_tensors = "pt")
outputs = model(**inputs)
outputs = preprocessor.post_process_semantic_segmentation(outputs, target_sizes=[(image.size[1], image.size[0])])
label_pred = outputs[0].numpy()

虽然上述方法对于不同大小和比例的图像仍然有效，但对于与模型训练大小（1024x1024）相差较大的图像，我们建议使用以下滑动窗口方法以获得更好的结果：

import torch 
import torch.nn.functional as F
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import numpy as np
from datasets import load_dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'

def resize_image(image, target_size=1024):
    """
    用于调整图像大小，使较小的边等于1024
    """
    h_img, w_img = image.size
    if h_img < w_img:
        new_h, new_w = target_size, int(w_img * (target_size / h_img))
    else:
        new_h, new_w  = int(h_img * (target_size / w_img)), target_size
    resized_img = image.resize((new_h, new_w))
    return resized_img

def segment_image(image, preprocessor, model, crop_size = (1024, 1024), num_classes = 40, transform=None):
    """
    根据图像大小和宽高比找到最佳步长，创建大小为1024x1024的重叠滑动窗口，然后将其输入到模型中。  
    """ 
    h_crop, w_crop = crop_size
    
    img = torch.Tensor(np.array(resize_image(image, target_size=1024)).transpose(2, 0, 1)).unsqueeze(0)
    batch_size, _, h_img, w_img = img.size()
    
    if transform:
        img = torch.Tensor(transform(image = img.numpy())["image"]).to(device)    
        
    h_grids = int(np.round(3/2*h_img/h_crop)) if h_img > h_crop else 1
    w_grids = int(np.round(3/2*w_img/w_crop)) if w_img > w_crop else 1
    
    h_stride = int((h_img - h_crop + h_grids -1)/(h_grids -1)) if h_grids > 1 else h_crop
    w_stride = int((w_img - w_crop + w_grids -1)/(w_grids -1)) if w_grids > 1 else w_crop
    
    preds = img.new_zeros((batch_size, num_classes, h_img, w_img))
    count_mat = img.new_zeros((batch_size, 1, h_img, w_img))
    
    for h_idx in range(h_grids):
        for w_idx in range(w_grids):
            y1 = h_idx * h_stride
            x1 = w_idx * w_stride
            y2 = min(y1 + h_crop, h_img)
            x2 = min(x1 + w_crop, w_img)
            y1 = max(y2 - h_crop, 0)
            x1 = max(x2 - w_crop, 0)
            crop_img = img[:, :, y1:y2, x1:x2]
            with torch.no_grad():
                if(preprocessor):
                    inputs = preprocessor(crop_img, return_tensors = "pt")
                    inputs["pixel_values"] = inputs["pixel_values"].to(device)
                else:
                    inputs = crop_img.to(device)
                outputs = model(**inputs)

            resized_logits = F.interpolate(
                outputs.logits[0].unsqueeze(dim=0), size=crop_img.shape[-2:], mode="bilinear", align_corners=False
            )
            preds += F.pad(resized_logits,
                            (int(x1), int(preds.shape[3] - x2), int(y1),
                            int(preds.shape[2] - y2))).cpu()
            count_mat[:, :, y1:y2, x1:x2] += 1
        
    assert (count_mat == 0).sum() == 0
    preds = preds / count_mat
    preds = preds.argmax(dim=1)
    preds = F.interpolate(preds.unsqueeze(0).type(torch.uint8), size=image.size[::-1], mode='nearest')
    label_pred = preds.squeeze().cpu().numpy()
    return label_pred

# 从coralscapes数据集中加载图像或加载您自己的图像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

label_pred = segment_image(image, preprocessor, model)

✨ 主要特性

模型类型：SegFormer，在珊瑚礁图像语义分割任务上表现出色。
微调基础：基于预训练的SegFormer（b2大小）编码器（nvidia/mit - b2）进行微调。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
from datasets import load_dataset

# 从coralscapes数据集中加载图像或加载您自己的图像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

inputs = preprocessor(image, return_tensors = "pt")
outputs = model(**inputs)
outputs = preprocessor.post_process_semantic_segmentation(outputs, target_sizes=[(image.size[1], image.size[0])])
label_pred = outputs[0].numpy()

高级用法

import torch 
import torch.nn.functional as F
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import numpy as np
from datasets import load_dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'

def resize_image(image, target_size=1024):
    """
    用于调整图像大小，使较小的边等于1024
    """
    h_img, w_img = image.size
    if h_img < w_img:
        new_h, new_w = target_size, int(w_img * (target_size / h_img))
    else:
        new_h, new_w  = int(h_img * (target_size / w_img)), target_size
    resized_img = image.resize((new_h, new_w))
    return resized_img

def segment_image(image, preprocessor, model, crop_size = (1024, 1024), num_classes = 40, transform=None):
    """
    根据图像大小和宽高比找到最佳步长，创建大小为1024x1024的重叠滑动窗口，然后将其输入到模型中。  
    """ 
    h_crop, w_crop = crop_size
    
    img = torch.Tensor(np.array(resize_image(image, target_size=1024)).transpose(2, 0, 1)).unsqueeze(0)
    batch_size, _, h_img, w_img = img.size()
    
    if transform:
        img = torch.Tensor(transform(image = img.numpy())["image"]).to(device)    
        
    h_grids = int(np.round(3/2*h_img/h_crop)) if h_img > h_crop else 1
    w_grids = int(np.round(3/2*w_img/w_crop)) if w_img > w_crop else 1
    
    h_stride = int((h_img - h_crop + h_grids -1)/(h_grids -1)) if h_grids > 1 else h_crop
    w_stride = int((w_img - w_crop + w_grids -1)/(w_grids -1)) if w_grids > 1 else w_crop
    
    preds = img.new_zeros((batch_size, num_classes, h_img, w_img))
    count_mat = img.new_zeros((batch_size, 1, h_img, w_img))
    
    for h_idx in range(h_grids):
        for w_idx in range(w_grids):
            y1 = h_idx * h_stride
            x1 = w_idx * w_stride
            y2 = min(y1 + h_crop, h_img)
            x2 = min(x1 + w_crop, w_img)
            y1 = max(y2 - h_crop, 0)
            x1 = max(x2 - w_crop, 0)
            crop_img = img[:, :, y1:y2, x1:x2]
            with torch.no_grad():
                if(preprocessor):
                    inputs = preprocessor(crop_img, return_tensors = "pt")
                    inputs["pixel_values"] = inputs["pixel_values"].to(device)
                else:
                    inputs = crop_img.to(device)
                outputs = model(**inputs)

            resized_logits = F.interpolate(
                outputs.logits[0].unsqueeze(dim=0), size=crop_img.shape[-2:], mode="bilinear", align_corners=False
            )
            preds += F.pad(resized_logits,
                            (int(x1), int(preds.shape[3] - x2), int(y1),
                            int(preds.shape[2] - y2))).cpu()
            count_mat[:, :, y1:y2, x1:x2] += 1
        
    assert (count_mat == 0).sum() == 0
    preds = preds / count_mat
    preds = preds.argmax(dim=1)
    preds = F.interpolate(preds.unsqueeze(0).type(torch.uint8), size=image.size[::-1], mode='nearest')
    label_pred = preds.squeeze().cpu().numpy()
    return label_pred

# 从coralscapes数据集中加载图像或加载您自己的图像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

label_pred = segment_image(image, preprocessor, model)

📚 详细文档

模型详情

模型描述

属性	详情
模型类型	SegFormer
微调基础模型	[SegFormer（b2大小）仅预训练编码器 (`nvidia/mit - b2`)](https://huggingface.co/nvidia/mit - b2)

模型来源

仓库：[coralscapesScripts](https://github.com/eceo - epfl/coralscapesScripts/)
演示：[Hugging Face Spaces](https://huggingface.co/spaces/EPFL - ECEO/coralscapes_demo)

训练与评估详情

数据

模型在[Coralscapes数据集](https://huggingface.co/datasets/EPFL - ECEO/coralscapes)上进行训练和评估，该数据集是一个用于珊瑚礁的通用密集语义分割数据集。

过程

训练按照SegFormer原始[实现](https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235 - Paper.pdf)进行，使用批量大小为8，训练265个周期。使用AdamW优化器，初始学习率为6e - 5，权重衰减为1e - 2，并使用幂为1的多项式学习率调度器。在训练期间，图像在1到2的范围内随机缩放，以0.5的概率水平翻转，并随机裁剪为1024×1024像素。输入图像使用ImageNet的均值和标准差进行归一化。在评估时，采用非重叠滑动窗口策略，窗口大小为1024x1024。

结果

测试准确率：80.904
测试平均IoU：54.682

🔧 技术细节

文档未提及足够的技术实现细节，故跳过此章节。

📄 许可证

本项目采用Apache 2.0许可证。

引用

如果您发现此项目有用，请考虑引用：

@misc{sauder2025coralscapesdatasetsemanticscene,
        title={The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs}, 
        author={Jonathan Sauder and Viktor Domazetoski and Guilhem Banc-Prandi and Gabriela Perna and Anders Meibom and Devis Tuia},
        year={2025},
        eprint={2503.20000},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2503.20000}, 
  }