當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【项目一、xxx病虫害检测项目】2、网络结构尝试改进：Resnet50、SE、CBAM、Feature Fusion

發(fā)布時間：2023/12/20 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了【项目一、xxx病虫害检测项目】2、网络结构尝试改进：Resnet50、SE、CBAM、Feature Fusion 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前言

馬上要找工作了，想總結下自己做過的幾個小項目。

先總結下實驗室之前的一個病蟲害檢測相關的項目。選用的baseline是SSD，代碼是在這個倉庫的基礎上改的 lufficc/SSD.這個倉庫寫的ssd還是很牛的，github有1.3k個star。

選擇這個版本的代碼，主要有兩個原因：

它的backbone代碼是支持直接加載pytorch官方預訓練權重的，所以很方便我做實驗
代碼高度模塊化，類似mmdetection和Detectron2，寫的很高級，不過對初學者不是很友好，但是很能提高工程代碼能力。
原倉庫主要實現(xiàn)了SSD-VGG16、SSD-Mobilenet-V2、SSD-Mobilenet-V3、SSD-EfficientNet等網(wǎng)絡，在我數(shù)據(jù)集上幾個改進版本都還不如SSD-VGG16效果好，所以我在原倉庫的基礎上進行了自己的實驗，加了一些也不算很高級的trick吧，主要是在我的數(shù)據(jù)集上確實好使，瘋狂調參，哈哈哈。

同系列講解：
【項目一、xxx病蟲害檢測項目】1、SSD原理和源碼分析.
【項目一、xxx病蟲害檢測項目】3、損失函數(shù)嘗試：Focal loss.

第二篇，介紹下對網(wǎng)絡結構作的幾個改進：

Resnet50替換VGG16；

在Resnet50中加入注意力機制SE、CBAM；

最后加入更加輕量高效的特征融合層

代碼已全部上傳GitHub: HuKai-cv/FFSSD-ResNet..

一、整體

2.1、整體網(wǎng)絡結構

如圖是我改進的整體網(wǎng)絡結構圖，左上部分是傳統(tǒng)的Resnet50結構，并且舍去了后面的Conv5、avg pool、fc、softmax等結構，左下角基于Resnet生成的3個特征圖進行特征融合，融合后的特征圖再送入一系列的額外特征層（1個bottleneck），得到多尺度特征，送入檢測器中進行檢測，最后nms輸出預測結果。

2.2、ResNet50

提出背景/解決問題：隨著網(wǎng)絡的加深，很可能會發(fā)生梯度消失和梯度爆炸，傳統(tǒng)的方法是使用數(shù)據(jù)初始化和BN來解決這個問題，雖然解決了梯度問題，但是隨著網(wǎng)絡的加深，會帶來另一個問題-網(wǎng)絡退化問題（不是過擬合），即隨著網(wǎng)絡深度的加深，網(wǎng)絡性能反而在下降。

網(wǎng)絡退化不是過擬合，過擬合是測試誤差大而訓練誤差小，而忘了退化是訓練誤差和測試誤差都大。

為了解決上述問題，本文提出殘差思想，用于解決這種退化問題，同時在一定程度上也緩解了梯度消失和梯度爆炸問題，提升網(wǎng)絡的性能。

殘差模塊：殘差模塊分為恒等映射和殘差部分組成。殘差部分有兩種實現(xiàn)方式，一種是連接兩個3x3conv，另一種是先1x1降維再3x3conv最后1x1升維（可以有效減少參數(shù)），最后將恒等映射和殘差部分生成的特征圖進行逐像素相加操作。通過這種前面層和后面層的"短路連接"，有助于訓練過程中梯度的反向傳播，抑制網(wǎng)絡退化。

下圖來自 b站霹靂吧啦Wz: 6.2 使用pytorch搭建ResNet并基于遷移學習訓練.

2.2.1、BasicBlock

這種結構專門在resnet18、resbnet34中使用。主要有兩種形式：一種為實線殘差結構，一種為虛線殘差結構。之所以在恒等映射分支加入一個1x1conv，是因為在resnet34的conv3_1、conv4_1、conv5_1輸入的模塊的channel的輸出的channel是不相等的，所以無法作像素級相加操作，所以在恒等映射分支加一個1x1conv用于調節(jié)輸入輸出channel。

實線殘差結構用于resnet34的conv3_1、conv4_1、conv5_1；而虛線殘差結構用于conv2_x、conv3_2-4、conv4_2-6、conv5_2-4中。

代碼如ssd/modeling/backbone/resnet_input_512.py中：

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):"""3x3 convolution with padding"""return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, groups=groups, bias=False, dilation=dilation)def conv1x1(in_planes, out_planes, stride=1):"""1x1 convolution"""return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)class BasicBlock(nn.Module):# resnet18 and resnet34expansion = 1 # 每一層內(nèi)部channel是否發(fā)生變換 1=不變def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,base_width=64, dilation=1, norm_layer=None, reduction=16, se=False, cbam=False):"""Args:inplanes: 模塊輸入特征圖channelplanes: 模塊輸出特征圖channelstride: 步長模塊第一個卷積的步長 =1實線殘差結構不下采樣 =2虛線殘差結構下采樣downsample: 下采樣虛線殘差結構 conv1x1 s=2 + bngroups: 分組卷積組數(shù) 1=普通卷積 BasicBlock只支持=1base_width:dilation: 空洞卷積 BasicBlock不支持norm_layer: bnreduction: 模塊中間層的channelse: se注意力機制cbam: cbam注意力機制"""super(BasicBlock, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dif groups != 1 or base_width != 64:raise ValueError('BasicBlock only support groups = 1 and base_width = 64')if dilation > 1:raise NotImplementedError("Dilation > 1 not support in BasicBlock")# Both self.conv1 and self.downsample layer downsample the input when stride != 1# 是否使用注意力機制self.se = seself.cbam = cbam# conv+bn+reluself.conv1 = conv3x3(inplanes, planes, stride)self.bn1 = norm_layer(planes)self.relu =nn.ReLU(inplace=True)# conv + bnself.conv2 = conv3x3(planes, planes)self.bn2 = norm_layer(planes)# attention layersself.se_layer = SELayer(planes, reduction)self.ca_layer = Channel_Attention(planes, reduction)self.sa_layer = Spatial_Attention()self.downsample = downsampleself.stride = stridedef forward(self, x):identity = x # 殘差分支out = self.conv1(x) # conv + bn + reluout = self.bn1(out)out = self.relu(out)out = self.conv2(out) # conv + bnout = self.bn2(out)if self.se and not self.cbam: # seout = self.se_layer(out)if not self.se and self.cbam: # cbamout = self.ca_layer(out)out = self.sa_layer(out)# 是否需要下采樣實線殘差結構不需要下采樣虛線殘差結構需要下采樣if self.downsample is not None:identity = self.downsample(x)out += identity # addout = self.relu(out) # relureturn out

2.2.2、Bottleneck

這種結構專門在resnet50、resbnet101中使用。主要也有兩種形式：一種為實線殘差結構，一種為虛線殘差結構。之所以在恒等映射分支加入一個1x1conv，是因為在resnet50的conv3_1、conv4_1、conv5_1輸入的模塊的channel的輸出的channel是不相等的，所以無法作像素級相加操作，所以在恒等映射分支加一個1x1conv用于調節(jié)輸入輸出channel。

左邊的實線殘差結構用于resnet50的conv2_x、conv3_2-4、conv2-6、conv2-3；右邊的虛線殘差結構用于conv3_1、conv4_x、conv5_1。

代碼如ssd/modeling/backbone/resnet_input_512.py中：

class Bottleneck(nn.Module):# resnet50 and resnet101expansion = 4 # 每一層內(nèi)部channel是否發(fā)生變換第三個卷積是第一個和第二個卷積channel的4倍def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,base_width=64, dilation=1, norm_layer=None, reduction=16, se=False, cbam=False):"""Args:inplanes: 模塊輸入特征圖channelplanes: 模塊輸出特征圖channelstride: 步長 1 第二個卷積的步長 =1不下采樣=實線殘差結構 =2下采樣=虛線殘差結構downsample: 下采樣虛線殘差結構 conv1x1 s=2 + bngroups: 分組卷積組數(shù) 1=普通卷積base_width: 64dilation: 空洞卷積norm_layer: bnreduction: 模塊中間層的channel 16se: se注意力機制默認Falsecbam: cbam注意力機制默認False"""super(Bottleneck, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dwidth = int(planes * (base_width / 64.)) * groups # 默認情況下inplanes = width# 是否使用attention機制self.se = seself.cbam = cbam#self.conv1 = conv1x1(inplanes, width)# Both self.conv2 and self.downsample layers downsample the input when stride != 1self.bn1 = norm_layer(width)self.conv2 = conv3x3(width, width, stride, groups, dilation)self.bn2 = norm_layer(width)self.conv3 = conv1x1(width, planes * self.expansion)self.bn3 = norm_layer(planes * self.expansion)self.se_layer = SELayer(planes * self.expansion, reduction)self.ca_layer = Channel_Attention(planes * self.expansion, reduction)self.sa_layer = Spatial_Attention()self.relu = nn.ReLU(inplace=True)self.downsample = downsampleself.stride = stridedef forward(self, x):# 殘差結構identity = x# conv + bn + reluout = self.conv1(x)out = self.bn1(out)out = self.relu(out)# conv + bn + reluout = self.conv2(out)out = self.bn2(out)out = self.relu(out)# conv + bnout = self.conv3(out)out = self.bn3(out)if self.se and not self.cbam: # seout = self.se_layer(out)if not self.se and self.cbam: # cbamout = self.ca_layer(out)out = self.sa_layer(out)if self.downsample is not None: # 空=實線殘差結構非空=虛線殘差結構identity = self.downsample(x)out += identity # addout = self.relu(out) # relureturn out

2.3、SE

SE注意力集中是一個經(jīng)典的通道注意力機制。SE注意力機制包括三個步驟：Sequeeze、excitation、Scale。

Sequeeze：在channel維度上使用1個全局平均池化求每個channel上所有元素的均值；

Excitation：通過連接兩個全連接層（第一個relu第二個sigmoid），先降維再升維，可以根據(jù)loss去自動學習各個維度的特征權重（se的核心），使有效的信息量大的特征權重更大；

scale：用Excitation計算的每個channel的權重去乘以對應channel中的每個元素。

幾個注意點：
1、Sequeeze使用全局平均池化是為了保留整體的語義信息，雖然會丟失一些突出的信息；如果使用卷積，參數(shù)太大了；如果使用最大池化會丟失太大語義信息。
2、Excitation為什么接兩個全連接層？為什么不用一個？1、增加非線性；2、減少參數(shù)量，兩個可以先降維再升維；3、可以更好的擬合通道之間的相關性；

代碼如ssd/modeling/backbone/resnet_input_512.py中：

class SELayer(nn.Module):def __init__(self, channel, reduction=16):super(SELayer, self).__init__()self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc = nn.Sequential( nn.Linear(channel, channel // reduction, bias=False),nn.ReLU(inplace=True),nn.Linear(channel // reduction, channel, bias=False),nn.Sigmoid())def forward(self, x):b, c, _, _ = x.size() y = self.avg_pool(x).view(b, c) # sequeezey = self.fc(y).view(b, c, 1, 1) # expansionreturn x * y.expand_as(x) # scale

2.4、CBAM

CBAM注意力機制是一個經(jīng)典的通道+空間注意力機制。整個模塊分為通道注意力模塊CAM和空間注意力模塊SAM。

通道注意力模塊CAM和SE注意力機制類似。還是Sequeeze、Excitation和scale三個部分。不同的是這里的Sequeeze使用的是全局平均池化和全局最大池化并聯(lián)的形式；Excitation也拋棄了全連接層采用了Conv層代替，第一個卷積降維16倍（ReLU）第二個卷積升維（Sigmoid）。

空間注意力模塊SAM，先對每個像素在channel維度上作一個全局平均池化和全局最大池化，得到兩個HxWx1的特征圖，再concat這兩個特征圖，得到1個HxWx2的特征圖。再經(jīng)過一個7x7/3x3卷積降維HxWx1，再經(jīng)過sigmoid處理生成通道注意力特征圖，最后和逐像素與輸入特征圖相乘。

幾個注意的點：
1、CAM增加一個全局最大池化層可以作為平均池化的一個補充，增加對突出點的關注。并聯(lián)比串聯(lián)丟失的信息相對更少。
2、通道注意力機制關注的是哪個通道更重要，空間注意力機制是更關注哪個像素的更重要。

代碼如ssd/modeling/backbone/resnet_input_512.py中：

class Channel_Attention(nn.Module): # CAMdef __init__(self, channel, r=16):super(Channel_Attention, self).__init__()self._avg_pool = nn.AdaptiveAvgPool2d(1)self._max_pool = nn.AdaptiveMaxPool2d(1)self._fc = nn.Sequential(nn.Conv2d(channel, channel // r, 1, bias=False),nn.ReLU(inplace=True),nn.Conv2d(channel // r, channel, 1, bias=False))self._sigmoid = nn.Sigmoid()def forward(self, x):y1 = self._avg_pool(x) # avg poolingy1 = self._fc(y1)y2 = self._max_pool(x) # max poolingy2 = self._fc(y2)y = self._sigmoid(y1 + y2) # add sigmoid return x * y # scaleclass Spatial_Attention(nn.Module):def __init__(self, kernel_size=3):super(Spatial_Attention, self).__init__()assert kernel_size % 2 == 1, "kernel_size = {}".format(kernel_size)padding = (kernel_size - 1) // 2self._layer = nn.Sequential(nn.Conv2d(2, 1, kernel_size=kernel_size, padding=padding),nn.Sigmoid())def forward(self, x):avg_mask = torch.mean(x, dim=1, keepdim=True) # avg pool in every pixelmax_mask, _ = torch.max(x, dim=1, keepdim=True) # max pool in every pixelmask = torch.cat([avg_mask, max_mask], dim=1) # concatmask = self._layer(mask) # convreturn x * mask # scale

2.5、Feature Fusion

這種特征融合方式來自論文：FSSD: Feature Fusion Single Shot Multibox Detector.

下圖來自FSSD解讀.

(a) image pyramid

(b) rcnn系列，只在最后一層feature預測

? FPN，語義信息一層傳遞回去，而且有很多相加的計算

(d) SSD，在各個level的feature上直接預測，每個level之間沒聯(lián)系

(e) 本文的做法，把各個level的feature concat，然后從fusion feature上生成feature pyramid

更多細節(jié)如下圖：

將淺層的3個特征層進行concat，并且拋棄了resnet50后面的conv4、avg pool、fc、softmax等結構，直接在特征融合層后面連接7個額外添加層，用于生成多尺度特征圖，再用這些多尺度特征進行目標檢測。

代碼如ssd/modeling/backbone/resnet_input_512.py中：

class ResNet(nn.Module):def __init__(self, block=None, blocks=None, zero_init_residual=False,groups=1, width_per_group=64, replace_stride_with_dilation=None,norm_layer=None, extras=None, se=False, cbam=False, ff=False):"""Args:block: res18/34=BasicBlock res50/101=Bottleneckblocks: [3, 4, 6, 3] conv2_x conv3_x conv4_x conv5_x中堆疊的次數(shù)zero_init_residual:groups:width_per_group:replace_stride_with_dilation:norm_layer: bnextras: resnet [512, 256, 128, 64, 128] 一系列額外添加層的輸出channelresnet-ff [128, 256, 512, 256, 128, 64, 128]se: 是否使用secbam: 是否使用cbamff: 是否使用feature fusion結構"""super().__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dself._norm_layer = norm_layer # bnself.inplanes = 64 # max pool之后第一個卷積層的輸入channelself.dilation = 1self.blocks = blocks # [3, 4, 6, 3]if replace_stride_with_dilation is None:# each element in the tuple indicates if we should replace# the 2x2 stride with a dilated convolution insteadreplace_stride_with_dilation = [False, False, False]if len(replace_stride_with_dilation) != 3:raise ValueError("replace_stride_with_dilation should be None ""or a 3-element tuple, got {}".format(replace_stride_with_dilation))self.se = se # Squeeze-and-Excitation Moduleself.cbam = cbam # Convolutional Block Attention Moduleself.ff = ff # Feature Fusion Moduleself.groups = groupsself.base_width = width_per_groupself.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)self.bn1 = self._norm_layer(self.inplanes)self.relu = nn.ReLU(inplace=True)self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.layer1 = self._make_layer(block, 64, self.blocks[0]) # conv2_xself.layer2 = self._make_layer(block, 128, self.blocks[1], stride=2, # conv3_xdilate=replace_stride_with_dilation[0])self.conv2 = nn.Conv2d(512, 256, 1)self.layer3 = self._make_layer(block, 256, self.blocks[2], stride=2, # conv4_xdilate=replace_stride_with_dilation[1])self.conv3 = nn.Conv2d(1024, 256, 1)self.bi1 = nn.UpsamplingBilinear2d(scale_factor=2) # 2倍上采樣self.layer4 = self._make_layer(block, 512, self.blocks[3], stride=2, # conv5_xdilate=replace_stride_with_dilation[2])self.conv4 = nn.Conv2d(2048, 256, 1)self.bi2 = nn.UpsamplingBilinear2d(scale_factor=4) # 4倍上采樣self.conv5 = nn.Conv2d(768, 512, 1)self.bn2 = nn.BatchNorm2d(512)if self.ff:self.extra_layers_ff = nn.Sequential(* self._add_extras_ff(block, extras))else:self.extra_layers = nn.Sequential(*self._add_extras(block, extras))for m in self.modules(): # initif isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):nn.init.constant_(m.weight, 1)nn.init.constant_(m.bias, 0)# Zero-initialize the last BN in each residual branch,# so that the residual branch starts with zeros, and each residual block behaves like an identity.# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677if zero_init_residual:for m in self.modules():if isinstance(m, Bottleneck):nn.init.constant_(m.bn3.weight, 0)elif isinstance(m, BasicBlock):nn.init.constant_(m.bn2.weight, 0)def _make_layer(self, block, planes, blocks, stride=1, dilate=False):norm_layer = self._norm_layerdownsample = Noneprevious_dilation = self.dilationif dilate:self.dilation *= stridestride = 1if stride != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(conv1x1(self.inplanes, planes * block.expansion, stride),norm_layer(planes * block.expansion),)layers = []layers.append(block(self.inplanes, planes, stride, downsample, self.groups,self.base_width, previous_dilation, norm_layer, se=self.se, cbam=self.cbam))self.inplanes = planes * block.expansionfor _ in range(1, blocks):layers.append(block(self.inplanes, planes, groups=self.groups,base_width=self.base_width, dilation=self.dilation,norm_layer=norm_layer, se=self.se, cbam=self.cbam))return nn.Sequential(*layers)def _add_extras(self, block, extras):layers = []layers += self._make_layer(block, extras[1], 2, stride=2)layers += self._make_layer(block, extras[2], 2, stride=2)layers += self._make_layer(block, extras[3], 2, stride=2)in_channels = extras[3] * block.expansionlayers += [nn.Conv2d(in_channels, extras[4] * block.expansion, kernel_size=2)]return layersdef _add_extras_ff(self, block, extras):self.inplanes = 512layers = []layers += self._make_layer(block, extras[0], 1)layers += self._make_layer(block, extras[1], 1, stride=2)layers += self._make_layer(block, extras[2], 1, stride=2)layers += self._make_layer(block, extras[3], 1, stride=2)layers += self._make_layer(block, extras[4], 1, stride=2)layers += self._make_layer(block, extras[5], 1, stride=2)layers += self._make_layer(block, extras[6], 1, stride=2)return layersdef forward(self, x):if not self.ff: # 不使用特征融合features = [] # 存放7個預測特征層x = self.conv1(x) # conv1+bn+relux = self.bn1(x)x = self.relu(x)x = self.maxpool(x) # max poolx = self.layer1(x) # layer1x = self.layer2(x) # layer2features.append(x) # 預測特征層1x = self.layer3(x) # layer3features.append(x) # 預測特征層2x = self.layer4(x) # layer4features.append(x) # 預測特征層3x = self.extra_layers[0](x)x = self.extra_layers[1](x)features.append(x) # 預測特征層4x = self.extra_layers[2](x)x = self.extra_layers[3](x)features.append(x) # 預測特征層5x = self.extra_layers[4](x)x = self.extra_layers[5](x)features.append(x) # 預測特征層6x = self.extra_layers[6](x)features.append(x) # 預測特征層7return tuple(features)else: # 使用特征融合features = []x = self.conv1(x) # conv1+bn+relux = self.bn1(x)x = self.relu(x)x = self.maxpool(x) # max poolx = self.layer1(x) # layer1x = self.layer2(x) # layer2features.append(self.conv2(x)) # 預測特征層1x = self.layer3(x) # layer3features.append(self.bi1(self.conv3(x))) # 預測特征層3 2倍上采樣x = self.layer4(x) # layer4features.append(self.bi2(self.conv4(x))) # 預測特征層4 4倍上采樣x = torch.cat((features), 1) # 特征融合x = self.conv5(x) # 在融合后的特征圖上進行convx = self.bn2(x)feature_map = []x = self.extra_layers_ff[0](x) # 1feature_map.append(x)x = self.extra_layers_ff[1](x) # 2feature_map.append(x)x = self.extra_layers_ff[2](x) # 3feature_map.append(x)x = self.extra_layers_ff[3](x) # 4feature_map.append(x)x = self.extra_layers_ff[4](x) # 5feature_map.append(x)x = self.extra_layers_ff[5](x) # 6feature_map.append(x)x = self.extra_layers_ff[6](x) # 7feature_map.append(x)return tuple(feature_map)

Reference

FSSD解讀.

總結

以上是生活随笔為你收集整理的【项目一、xxx病虫害检测项目】2、网络结构尝试改进：Resnet50、SE、CBAM、Feature Fusion的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python xyz_python中xy
下一篇： Android中创建自己的对话框

编程问答

【项目一、xxx病虫害检测项目】2、网络结构尝试改进：Resnet50、SE、CBAM、Feature Fusion

目錄

前言

一、整體

2.1、整體網(wǎng)絡結構

2.2、ResNet50

2.2.1、BasicBlock

2.2.2、Bottleneck

2.3、SE

2.4、CBAM

2.5、Feature Fusion

Reference

總結