當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ResNet网络详解与keras实现

發布時間：2025/3/15 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 ResNet网络详解与keras实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ResNet網絡詳解與keras實現

ResNet網絡詳解與keras實現
- - Resnet網絡的概覽
  - Pascal_VOC數據集
    - 第一層目錄
    - 第二層目錄
    - 第三層目錄
  - 梯度退化
  - Residual Learning
  - Identity vs Projection Shortcuts
  - Bottleneck architecture
  - Resnet網絡構建表
  - ResNet論文結果
    - 為了搭建Resnet網絡我們使用了以下策略
    - 整個代碼的流程如下
  - 實驗結果
  - 實驗結果分析
  - 本博客相關引用

本博客旨在給經典的ResNet網絡進行詳解與代碼實現，如有不足或者其他的見解，請在本博客下面留言。

Resnet網絡的概覽

為了解決訓練很深的網絡時候出現的梯度退化(gradient degradation)的問題,Kaiming He提出了Resnet結構。由于使用了殘差學習的方法(Resuidal learning)，使得網絡的層數得到了大大的提升。

ResNet由于使用了shortcut,把原來需要學習逼近的未知函數H(x)恒等映射(Identity mapping),變成了逼近F(x)=H(x)-x的一個函數。作者認為這兩種表達的效果相同，但是優化的難度卻并不相同，作者假設F(x)的優化會比H(x)簡單的多。這一想法也是源于圖像處理中的殘差向量編碼，通過一個reformulation，將一個問題分解成多個尺度直接的殘差問題，能夠很好的起到優化訓練的效果。

ResNet針對較深(層數大于等于50)的網絡提出了BottleNeck的結構，這個結構可以減少運算的時間復雜度。

ResNet里存在兩種shortcut,Identity shortcut & Projection shortcut。Identity shortcut使用零填充的方式保證其緯度不變，而Projection shortcut則具有下面的形式

y=F(x,Wi)+Wsx來匹配緯度的變換。

ResNet這個模型在圖像處理的相關任務中具有很好的泛化性，在2015年的ImageNet Recognization,ImageNet detection,ImageNet localization,COCO detection,COCO segmentation等等任務上取得第一的成績。

在本篇博客中，將對Resnet的結構進行詳細的解釋，并用代碼實現ResNet的網絡結構。同時，本文還將引入另一篇論文<>，來更加深入的理解Resnet。本文使用VOC2012的數據集進行網絡的訓練，驗證，與測試。為了快速開發，本次我們把Keras作為代碼的框架。

Pascal_VOC數據集

Pascal VOC為圖像識別，檢測與分割提供了一整套標準化的優秀的數據集，每一年都會舉辦一次圖像識別競賽。下面是VOC2012，訓練集(包括驗證集)的下載地址。

VOC2012里面有20類物體的圖片，圖片總共有1.7萬張。我把數據集分成了3個部分，訓練集，驗證集，測試集，比例為8:1:1。
下面是部分截圖：

第一層目錄

第二層目錄

第三層目錄

接著我們使用keras代碼來使用這個數據集，代碼如下：

IM_WIDTH=224 #圖片寬度 IM_HEIGHT=224 #圖片高度 batch_size=32 #批的大小# train data train_datagen = ImageDataGenerator(width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=True,rescale=1./255 ) train_generator = train_datagen.flow_from_directory(train_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size,shuffle=True )# vaild data vaild_datagen = ImageDataGenerator(width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=True,rescale=1./255 ) vaild_generator = train_datagen.flow_from_directory(vaildation_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size, )# test data test_datagen = ImageDataGenerator(rescale=1./255 ) test_generator = train_datagen.flow_from_directory(test_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size, )

我使用了3個ImageDataGenerator，分別來使用訓練集，驗證集與測試集的數據。使用ImageDataGenerator需要導入相應的模塊，==from keras.preprocessing.image import ImageDataGenerator==。ImageDataGenrator可以用來做數據增強，提高模型的魯棒性.它里面提供了許多變換，包括圖片旋轉，對稱，平移等等操作。里面的flow_from_directory方法可以從相應的目錄里面批量獲取圖片，這樣就可以不用一次性讀取所有圖片(防止內存不足)。

梯度退化

按照我們的慣性思維，一個網絡越深則這個網絡就應該具有更好的學習能力，而梯度退化是指下面一種現象：隨著網絡層數的增加，網絡的效果先是變好到飽和，然后立即下降的一個現象。在這里，我們引用一幅來自Resnet里面的圖片，更加直觀的理解這個現象：

從上圖我們可以看出，一個56層的網絡的訓練誤差和測試誤差都大于一個20層的網絡。

Residual Learning

為了解決梯度退化的問題，論文中提出了Residual learning這個方法，它通過構造一個Residual block來完成。如圖Figure 2所示，引入殘差結構以后，把原來需要學習逼近的未知函數H(x)恒等映射(Identity mapping),變成了逼近F(x)=H(x)-x的一個函數。作者認為這兩種表達的效果相同，但是優化的難度卻并不相同，作者假設F(x)的優化會比H(x)簡單的多。這一想法也是源于圖像處理中的殘差向量編碼，通過一個reformulation，將一個問題分解成多個尺度直接的殘差問題，能夠很好的起到優化訓練的效果。

上圖的恒等映射，是把一個輸入x和其堆疊了2次后的輸出F(x)的進行元素級和作為總的輸出。因此它沒有增加網絡的運算復雜度，而且這個操作很容易被現在的一些常用庫執行(e.g.,Caffe,tensorflow)。

下面是一張沒有使用普通圖(plain,即沒有加入恒等映射的圖)，與一張有shortcut圖的對比：

最左邊的圖為經典的VGG-19圖的網絡結構，中間的圖是一個類似于VGG-19的34層的普通圖，最右邊的圖是34層的帶有恒等映射的Resnet網絡圖。其中黑色的實線代表的是同一緯度(即卷積核的個數相同)下的恒等映射。而虛線指的是不同維度間(卷積核的個數不同)的恒等映射。

Identity vs Projection Shortcuts

除了最簡單的Identity shortcuts(直接進行同緯度的元素級相加)，論文還研究了Projection shortcuts($ y=F(x,{W_i})+W_sx$).論文研究了以下3種情況：

i. 對于緯度沒有變化的連接進行直接相連，對于緯度增加的連接則通過補零填充后進行連接。由于shortcuts是恒等的，因此這個連接本身不會帶來額外的參數。

ii. 對于緯度沒有變化的連接進行直接相連，對于緯度增加的連接則通過投影相連，投影相連會增加參數。

iii. 對于所有的連接都采取投影相連。

作者對以上三種情況都進行了研究，發現iii的效果比ii好一點點點(marginly better)，發現ii的效果比i的效果好一點。這是因為$W_s$中帶來的額外參數所帶來的效果。

Bottleneck architecture

如上圖右邊所示，作者在研究更深層次(層數大于50)的網絡的時候，使用了Bottleneck這個網絡結構。我覺得作者可能是參考了goolenet里面的Inception結構。我們可以看到在Bottleneck中，第一個1x1的卷積層用來在降低緯度(用來降低運算復雜度)，而后一個的1x1的卷積層則用來增加緯度，使其保持與原來的輸入具有相同的緯度。(從而可以進行恒等映射)。

Resnet網絡構建表

Tabel 1

上圖是一個Resnet的網絡構建表，它顯示了resnet是怎么構成的。同時這個表還提供了各個網絡的運算浮點數，雖然resnet的層數比較深，但是它的運算量都小于VGG-19（19.6x10的9次方)。

ResNet論文結果:

上圖左邊是普通的網絡，右邊是殘差網絡，較細的線代表驗證誤差，較粗的線則代表訓練誤差。我們可以看到普通的網絡存在梯度退化的現象，即34層網絡的訓練和驗證誤差都大于18層的網絡，而殘差網絡中則不存在這個現象。可見殘差網絡解決了梯度退化的問題。

為了搭建Resnet網絡，我們使用了以下策略：

使用identity_block這個函數來搭建Resnet34,使用bottleneck這個函數來搭建Resnet50。
每個卷積層后都使用BatchNormalization，來防止模型過擬合，并且使輸出滿足高斯分布。
具體網絡搭建可以參考Tabel.1，可以邊看表里面的具體參數邊搭網絡。

整個代碼的流程如下：

graph TD A(導入相應庫) --> Z[模型參數設置以及其它配置] Z --> B[生成訓練集,測試集,驗證集的三個迭代器] B --> C[identity_block函數的編寫] C --> D[bottleneck_block函數的編寫] D --> F[根據resnet網絡構建表來構建網絡] F --> G[模型訓練與驗證] G --> H[模型保存] H --> I(模型在測試集上測試)

# coding=utf-8 from keras.models import Model from keras.layers import Input, Dense, Dropout, BatchNormalization, Conv2D, MaxPooling2D, AveragePooling2D, concatenate, \Activation, ZeroPadding2D from keras.layers import add, Flatten from keras.utils import plot_model from keras.metrics import top_k_categorical_accuracy from keras.preprocessing.image import ImageDataGenerator from keras.models import load_model import os# Global Constants NB_CLASS=20 IM_WIDTH=224 IM_HEIGHT=224 train_root='/home/faith/keras/dataset/traindata/' vaildation_root='/home/faith/keras/dataset/vaildationdata/' test_root='/home/faith/keras/dataset/testdata/' batch_size=32 EPOCH=60# train data train_datagen = ImageDataGenerator(width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=True,rescale=1./255 ) train_generator = train_datagen.flow_from_directory(train_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size,shuffle=True )# vaild data vaild_datagen = ImageDataGenerator(width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=True,rescale=1./255 ) vaild_generator = train_datagen.flow_from_directory(vaildation_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size, )# test data test_datagen = ImageDataGenerator(rescale=1./255 ) test_generator = train_datagen.flow_from_directory(test_root,target_size=(IM_WIDTH, IM_HEIGHT),batch_size=batch_size, )def Conv2d_BN(x, nb_filter, kernel_size, strides=(1, 1), padding='same', name=None):if name is not None:bn_name = name + '_bn'conv_name = name + '_conv'else:bn_name = Noneconv_name = Nonex = Conv2D(nb_filter, kernel_size, padding=padding, strides=strides, activation='relu', name=conv_name)(x)x = BatchNormalization(axis=3, name=bn_name)(x)return xdef identity_Block(inpt, nb_filter, kernel_size, strides=(1, 1), with_conv_shortcut=False):x = Conv2d_BN(inpt, nb_filter=nb_filter, kernel_size=kernel_size, strides=strides, padding='same')x = Conv2d_BN(x, nb_filter=nb_filter, kernel_size=kernel_size, padding='same')if with_conv_shortcut:shortcut = Conv2d_BN(inpt, nb_filter=nb_filter, strides=strides, kernel_size=kernel_size)x = add([x, shortcut])return xelse:x = add([x, inpt])return xdef bottleneck_Block(inpt,nb_filters,strides=(1,1),with_conv_shortcut=False):k1,k2,k3=nb_filtersx = Conv2d_BN(inpt, nb_filter=k1, kernel_size=1, strides=strides, padding='same')x = Conv2d_BN(x, nb_filter=k2, kernel_size=3, padding='same')x = Conv2d_BN(x, nb_filter=k3, kernel_size=1, padding='same')if with_conv_shortcut:shortcut = Conv2d_BN(inpt, nb_filter=k3, strides=strides, kernel_size=1)x = add([x, shortcut])return xelse:x = add([x, inpt])return xdef resnet_34(width,height,channel,classes):inpt = Input(shape=(width, height, channel))x = ZeroPadding2D((3, 3))(inpt)#conv1x = Conv2d_BN(x, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding='valid')x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)#conv2_xx = identity_Block(x, nb_filter=64, kernel_size=(3, 3))x = identity_Block(x, nb_filter=64, kernel_size=(3, 3))x = identity_Block(x, nb_filter=64, kernel_size=(3, 3))#conv3_xx = identity_Block(x, nb_filter=128, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))x = identity_Block(x, nb_filter=128, kernel_size=(3, 3))#conv4_xx = identity_Block(x, nb_filter=256, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))x = identity_Block(x, nb_filter=256, kernel_size=(3, 3))#conv5_xx = identity_Block(x, nb_filter=512, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True)x = identity_Block(x, nb_filter=512, kernel_size=(3, 3))x = identity_Block(x, nb_filter=512, kernel_size=(3, 3))x = AveragePooling2D(pool_size=(7, 7))(x)x = Flatten()(x)x = Dense(classes, activation='softmax')(x)model = Model(inputs=inpt, outputs=x)return modeldef resnet_50(width,height,channel,classes):inpt = Input(shape=(width, height, channel))x = ZeroPadding2D((3, 3))(inpt)x = Conv2d_BN(x, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding='valid')x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)#conv2_xx = bottleneck_Block(x, nb_filters=[64,64,256],strides=(1,1),with_conv_shortcut=True)x = bottleneck_Block(x, nb_filters=[64,64,256])x = bottleneck_Block(x, nb_filters=[64,64,256])#conv3_xx = bottleneck_Block(x, nb_filters=[128, 128, 512],strides=(2,2),with_conv_shortcut=True)x = bottleneck_Block(x, nb_filters=[128, 128, 512])x = bottleneck_Block(x, nb_filters=[128, 128, 512])x = bottleneck_Block(x, nb_filters=[128, 128, 512])#conv4_xx = bottleneck_Block(x, nb_filters=[256, 256, 1024],strides=(2,2),with_conv_shortcut=True)x = bottleneck_Block(x, nb_filters=[256, 256, 1024])x = bottleneck_Block(x, nb_filters=[256, 256, 1024])x = bottleneck_Block(x, nb_filters=[256, 256, 1024])x = bottleneck_Block(x, nb_filters=[256, 256, 1024])x = bottleneck_Block(x, nb_filters=[256, 256, 1024])#conv5_xx = bottleneck_Block(x, nb_filters=[512, 512, 2048], strides=(2, 2), with_conv_shortcut=True)x = bottleneck_Block(x, nb_filters=[512, 512, 2048])x = bottleneck_Block(x, nb_filters=[512, 512, 2048])x = AveragePooling2D(pool_size=(7, 7))(x)x = Flatten()(x)x = Dense(classes, activation='softmax')(x)model = Model(inputs=inpt, outputs=x)return modeldef acc_top2(y_true, y_pred):return top_k_categorical_accuracy(y_true, y_pred, k=2)def check_print():# Create a Keras Modelmodel = resnet_50(IM_WIDTH,IM_HEIGHT,3,NB_CLASS)model.summary()# Save a PNG of the Model Buildplot_model(model, to_file='resnet.png')model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc',top_k_categorical_accuracy])print 'Model Compiled'return modelif __name__ == '__main__':if os.path.exists('resnet_50.h5'):model=load_model('resnet_50.h5')else:model=check_print()model.fit_generator(train_generator,validation_data=vaild_generator,epochs=EPOCH,steps_per_epoch=train_generator.n/batch_size,validation_steps=vaild_generator.n/batch_size)model.save('resnet_50.h5')loss,acc,top_acc=model.evaluate_generator(test_generator, steps=test_generator.n / batch_size)print 'Test result:loss:%f,acc:%f,top_acc:%f' % (loss, acc, top_acc)

實驗結果

DataLossAccTop5-acc

Training set	1.85	39.9%	85.3%
Vaildation set	2.01	36.6%	82.0%
Testing set	2.08	35.7%	78.1%
Dataset	VOC2012	Classes	20
Model	ResNet	Framework	Keras

實驗結果分析

我們可以發現模型最后在測試集上的效果與訓練集上的效果有一定程度上的差距，模型出現了一點過擬合。為了防止過擬合，而且為了加速收斂，本文在每一層之間都是用了BatchNormalization層。由于本文只訓練了60個epoch，每個epoch差不多迭代500次，由于訓練的次數太少，故效果并未具體顯現。

本博客相關引用

以下是本博客的引用，再次本人對每個引用的作者表示感謝。讀者如果對Resnet這個網絡仍然存在一些疑慮，或者想要有更深的理解，可以參考以下的引用。

引用博客1

引用博客2

引用文獻1:Deep Residual Learning for Image Recognition

引用文獻2:Residual Networks are Exponential Ensembles of Relatively Shallow Networks

總結

以上是生活随笔為你收集整理的ResNet网络详解与keras实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： SPSS基础操作（二）：用迭代法处理序列
下一篇： Python 小白从零开始 PyQt5