當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【调参实战】BN和Dropout对小模型有什么影响？全局池化相比全连接有什么劣势？...

發布時間：2025/3/20 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了【调参实战】BN和Dropout对小模型有什么影响？全局池化相比全连接有什么劣势？... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

大家好，歡迎來到專欄《調參實戰》，雖然當前自動化調參研究越來越火，但那其實只是換了一些參數來調，對參數的理解和調試在機器學習相關任務中是最基本的素質，在這個專欄中我們會帶領大家一步一步理解和學習調參。

本次主要講述圖像分類項目中的BN層和Drouout層的調參對比實踐，以及全連接層和池化層的對比實踐。

作者&編輯 | 言有三

本文資源與結果展示

本文篇幅：3000字

背景要求：會使用Python和任一深度學習開源框架

附帶資料：Caffe代碼和數據集一份

同步平臺：有三AI知識星球(一周內)

1 項目背景與準備工作

在卷積神經網絡的設計中，早期出現的Dropout層可以降低模型過擬合的風險，增強模型的泛化性能。而隨著Batch Normalization層的出現，Dropout逐漸被代替，Batch Normalization層不僅可以加速模型的訓練，還在一定程度上緩解了模擬的過擬合風險。

與之類似，全連接層和全局池化層也是一對冤家，最早期的時候，對于分類任務來說網絡最后層都是全連接層，但是因為它的參數量巨大，導致后來被全局池化層替代，那替換就一定是帶來正向的結果嗎？會不會有什么副作用？

這一期我們來對以上問題進行實踐，本次項目開發需要以下環境：

(1) Linux系統，推薦ubuntu16.04或者ubuntu18.04。使用windows系統也可以完成，但是使用Linux效率更高。

(2) 最好擁有一塊顯存不低于6G的GPU顯卡，如果沒有使用CPU進行訓練速度較慢。

(3)?安裝好的Caffe開源框架。

2 Dropout和BN層實踐

下面我們首先對Dropout和BN層進行實踐，如果對這兩者的理解不熟悉的，請查看往期文章：

【AI初識境】深度學習模型中的Normalization，你懂了多少？

【AI初識境】被Hinton，DeepMind和斯坦福嫌棄的池化，到底是什么？

本次的數據集和基準模型與上一期內容相同，大家如果不熟悉就去查看上一期的內容，鏈接如下：

【調參實戰】如何開始你的第一個深度學習調參任務？不妨從圖像分類中的學習率入手。

【調參實戰】那些優化方法的性能究竟如何，各自的參數應該如何選擇？

2.1 Dropout層

首先我們給基準模型添加Dropout層，它通常是被添加在網絡靠后的位置，我們將其添加到conv5層后面，得到的模型結構如下：

完整的結構配置如下：

layer {

? ?name: "data"

? ?type: "ImageData"

? ?top: "data"

? ?top: "label"

? ?include {

? ? ?phase: TRAIN

? ?}

? ?transform_param {

? ? ?mirror: true

? ? ?crop_size: 224

? ? ?mean_value: 104.0

? ? ?mean_value: 117.0

? ? ?mean_value: 124.0

? ?}

? ?image_data_param {

? ? ?source: "list_train_shuffle.txt"

? ? ?batch_size: 64

? ? ?shuffle: true

? ? ?new_height: 256

? ? ?new_width: 256

? ?}

layer {

? ?name: "data"

? ?type: "ImageData"

? ?top: "data"

? ?top: "label"

? ?include {

? ? ?phase: TEST

? ?}

? ?transform_param {

? ? ?mirror: false

? ? ?crop_size: 224

? ? ?mean_value: 104.0

? ? ?mean_value: 117.0

? ? ?mean_value: 124.0

? ?}

? ?image_data_param {

? ? ?source: "list_val_shuffle.txt"

? ? ?batch_size: 64

? ? ?shuffle: false

? ? ?new_height: 224

? ? ?new_width: 224

? ?}

layer {

? bottom: "data"

? top: "conv1"

? name: "conv1"

? type: "Convolution"

? param {

? ? lr_mult: 1

? ? decay_mult: 1

? }

? param {

? ? lr_mult: 2

? ? decay_mult: 0

? }

? convolution_param {

? ? num_output: 64

? ? pad: 1

? ? kernel_size: 3? ??

? ? stride: 2

? ? weight_filler {

? ? ? type: "gaussian"

? ? ? std: 0.01

? ? }

? ? bias_filler {

? ? ? type: "constant"

? ? ? value: 0

? ? }

? }

}

layer {

? bottom: "conv1"

? top: "conv1"

? name: "relu1"

? type: "ReLU"

}

layer {

? bottom: "conv1"

? top: "conv2"

? name: "conv2"

? type: "Convolution"

? param {

? ? lr_mult: 1

? ? decay_mult: 1

? }

? param {

? ? lr_mult: 2

? ? decay_mult: 0

? }

? convolution_param {

? ? num_output: 64

? ? pad: 1

? ? kernel_size: 3

? ? stride: 2

? ? weight_filler {

? ? ? type: "gaussian"

? ? ? std: 0.01

? ? }

? ? bias_filler {

? ? ? type: "constant"

? ? ? value: 0

? ? }

? }

}

layer {

? bottom: "conv2"

? top: "conv2"

? name: "relu2"

? type: "ReLU"

}

layer {

? bottom: "conv2"

? top: "conv3"

? name: "conv3"

? type: "Convolution"

? param {

? ? lr_mult: 1

? ? decay_mult: 1

? }

? param {

? ? lr_mult: 2

? ? decay_mult: 0

? }

? convolution_param {

? ? num_output: 128

? ? pad: 1

? ? kernel_size: 3

? ? stride: 2

? ? weight_filler {

? ? ? type: "gaussian"

? ? ? std: 0.01

? ? }

? ? bias_filler {

? ? ? type: "constant"

? ? ? value: 0

? ? }

? }

}

layer {

? bottom: "conv3"

? top: "conv3"

? name: "relu3"

? type: "ReLU"

}

layer {

? bottom: "conv3"

? top: "conv4"

? name: "conv4"

? type: "Convolution"

? param {

? ? lr_mult: 1

? ? decay_mult: 1

? }

? param {

? ? lr_mult: 2

? ? decay_mult: 0

? }

? convolution_param {

? ? num_output: 128

? ? pad: 1

? ? stride: 2

? ? kernel_size: 3

? ? weight_filler {

? ? ? type: "gaussian"

? ? ? std: 0.01

? ? }

? ? bias_filler {

? ? ? type: "constant"

? ? ? value: 0

? ? }

? }

}

layer {

? bottom: "conv4"

? top: "conv4"

? name: "relu4"

? type: "ReLU"

}

layer {

? bottom: "conv4"

? top: "conv5"

? name: "conv5"

? type: "Convolution"

? param {

? ? lr_mult: 1

? ? decay_mult: 1

? }

? param {

? ? lr_mult: 2

? ? decay_mult: 0

? }

? convolution_param {

? ? num_output: 256

? ? pad: 1

? ? stride: 2

? ? kernel_size: 3

? ? weight_filler {

? ? ? type: "gaussian"

? ? ? std: 0.01

? ? }

? ? bias_filler {

? ? ? type: "constant"

? ? ? value: 0

? ? }

? }

}

layer {

? bottom: "conv5"

? top: "conv5"

? name: "relu5"

? type: "ReLU"

}

layer {

? name: "drop"

? type: "Dropout"

? bottom: "conv5"

? top: "conv5"

? dropout_param {

? ? dropout_ratio: 0.5

? }

}

layer {

? ? bottom: "conv5"

? ? top: "pool5"

? ? name: "pool5"

? ? type: "Pooling"

? ? pooling_param {

? ? ? ? kernel_size: 7

? ? ? ? stride: 1

? ? ? ? pool: AVE

? ? }

}

layer {

? bottom: "pool5"

? top: "fc"

? name: "fc"

? type: "InnerProduct"

? ? inner_product_param {

? ? ? ? num_output: 20

? ? ? ? weight_filler {

? ? ? ? ? ? type: "xavier"

? ? ? ? }

? ? ? ? bias_filler {

? ? ? ? ? ? type: "constant"

? ? ? ? ? ? value: 0

? ? ? ? }

? ? }

}

layer {

? name: "accuracy_at_1"

? type: "Accuracy"

? bottom: "fc"

? bottom: "label"

? top: "accuracy_at_1"

? accuracy_param {

? ? top_k: 1

? }

}

layer {

? name: "accuracy_at_5"

? type: "Accuracy"

? bottom: "fc"

? bottom: "label"

? top: "accuracy_at_5"

? accuracy_param {

? ? top_k: 5

? }

}

layer {

? bottom: "fc"

? bottom: "label"

? top: "loss"

? name: "loss"

? type: "SoftmaxWithLoss"

}

我們試驗了兩個不同比率，即Dropout=0.5和Dropout=0.9，優化參數配置如下：

net: "allconv6.prototxt"

test_interval:100

test_iter:15

base_lr: 0.01

lr_policy: "step"

stepsize: 10000

gamma: 0.1

momentum: 0.9

weight_decay: 0.005

display: 100

max_iter: 100000

snapshot: 10000

snapshot_prefix: "models/allconv6_"

solver_mode: GPU

其與基準模型試驗結果對比如下：

可以看出，添加Dropout之后，模型明顯要穩定很多，但是其性能稍微有所下降，這是因為基準模型本身就比較小，Dropout會降低模型的容量。Dropout=0.5和Dropout=0.9時性能差不多，這都是比較常用的配置，更小的比率預期會進一步降低模型的性能，大家可以進行嘗試。

2.2 BN層

總結

以上是生活随笔為你收集整理的【调参实战】BN和Dropout对小模型有什么影响？全局池化相比全连接有什么劣势？...的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【通知】有三AI运营组成员招收条件及管理
下一篇：【总结】有三AI秋季划图像质量组3月直播