當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【语义分割】ASPP：Rethinking Atrous Convolution for Semantic Image Segmentation

發布時間：2023/12/15 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了【语义分割】ASPP：Rethinking Atrous Convolution for Semantic Image Segmentation 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 一、主要思想
- 二、實現
- 三、代碼

一、主要思想

為了提高對不同尺度目標的語義分割，作者串聯或并聯使用不同擴張率的空洞卷積來實現對多尺度上下文的語義信息捕捉。

Atrous Spatial Pyramid Pooling module

作者開篇拋出了兩個問題：

目前的深度卷積網絡雖然可以提取抽象的高層語義信息，但丟失了細節的空間信息
- 故本文使用了 atrous convolution
目標尺度的多樣性為分割帶來困難
- ① 給每個金字塔層后面都接了深度卷積網絡來抽取特征
- ② encoder-decoder 模塊從encoder模塊提取多尺度特征，從decoder模塊復現原始空間特征
- ③ 在原始網絡的上邊使用了額外的模塊來捕獲long-range信息
- ④ 使用多個不同比率的spatial pyramid pooling 來捕獲輸入特征圖中的多尺度目標

二、實現

現有空洞卷積使用的問題：

當采樣的間隔越大，濾波器中無用的權重就越多，也就是間隔越大，會有很多權重落到特征圖外，無法起作用，極端情況就是這個3x3的卷積的效果類似于一個1x1的卷積。

本文作者為了克服上述困難，建立了如下圖所示的ASPP結構，即并聯使用（b）Image pooling (global average pooling) 和（a）ASPP

三、代碼

import torch import torch.nn as nn from mmcv.cnn import ConvModulefrom mmseg.ops import resize from ..builder import HEADS from .decode_head import BaseDecodeHeadclass ASPPModule(nn.ModuleList):"""Atrous Spatial Pyramid Pooling (ASPP) Module.Args:dilations (tuple[int]): Dilation rate of each layer.in_channels (int): Input channels.channels (int): Channels after modules, before conv_seg.conv_cfg (dict|None): Config of conv layers.norm_cfg (dict|None): Config of norm layers.act_cfg (dict): Config of activation layers."""def __init__(self, dilations, in_channels, channels, conv_cfg, norm_cfg,act_cfg):super(ASPPModule, self).__init__()self.dilations = dilations # (1, 12, 24, 36)self.in_channels = in_channels # 2048self.channels = channels # 512self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg # BNself.act_cfg = act_cfg # Relufor dilation in dilations:self.append(ConvModule(self.in_channels,self.channels,1 if dilation == 1 else 3,dilation=dilation,padding=0 if dilation == 1 else dilation,conv_cfg=self.conv_cfg,norm_cfg=self.norm_cfg,act_cfg=self.act_cfg))def forward(self, x):"""Forward function."""aspp_outs = []for aspp_module in self:aspp_outs.append(aspp_module(x))return aspp_outs@HEADS.register_module() class ASPPHead(BaseDecodeHead):"""Rethinking Atrous Convolution for Semantic Image Segmentation.This head is the implementation of `DeepLabV3<https://arxiv.org/abs/1706.05587>`_.Args:dilations (tuple[int]): Dilation rates for ASPP module.Default: (1, 6, 12, 18)."""def __init__(self, dilations=(1, 6, 12, 18), **kwargs):super(ASPPHead, self).__init__(**kwargs)assert isinstance(dilations, (list, tuple))self.dilations = dilationsself.image_pool = nn.Sequential(nn.AdaptiveAvgPool2d(1),ConvModule(self.in_channels,self.channels,1,conv_cfg=self.conv_cfg,norm_cfg=self.norm_cfg,act_cfg=self.act_cfg))self.aspp_modules = ASPPModule(dilations,self.in_channels,self.channels,conv_cfg=self.conv_cfg,norm_cfg=self.norm_cfg,act_cfg=self.act_cfg)self.bottleneck = ConvModule((len(dilations) + 1) * self.channels,self.channels,3,padding=1,conv_cfg=self.conv_cfg,norm_cfg=self.norm_cfg,act_cfg=self.act_cfg)def forward(self, inputs):"""Forward function."""x = self._transform_inputs(inputs) # x.shape=[4, 2048, 64, 128]aspp_outs = [resize(self.image_pool(x),size=x.size()[2:],mode='bilinear',align_corners=self.align_corners)]# len(aspp_outs) = 1# aspp_outs[0].shape = [4, 512, 64, 128]aspp_outs.extend(self.aspp_modules(x))# len(aspp_outs) = 5# aspp_outs[0-4].shape = [4, 512, 64, 1024]aspp_outs = torch.cat(aspp_outs, dim=1) # [4, 2560, 64, 128]output = self.bottleneck(aspp_outs) # [4, 512, 64, 128]output = self.cls_seg(output) # [4, 19, 64, 128]return output

ASPP module：

總結

以上是生活随笔為你收集整理的【语义分割】ASPP：Rethinking Atrous Convolution for Semantic Image Segmentation的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。