Xception
文章目錄
- 論文信息
- Xception 設計進化
- 深度可分離卷積 Depthwise Separable Convolution
- Xception 網絡架構
- Xception 微調
論文信息
原文地址: Xception:Deep Learning with Depthwise Separable Convolutions
作者:Fran?ois Chollet
Xception 設計進化
- Xception 是 Google 繼 Inception 后提出的對 Inception-v3 的另一種改進。
- 作者認為,通道之間的相關性 與 空間相關性 最好要分開處理。采用 Separable Convolution(極致的 Inception 模塊)來替換原來 Inception-v3中的卷積操作。
結構的變形過程如下:
- 在 Inception 中,特征可以通過 1×11×11×1 卷積,3×33×33×3 卷積,5×55×55×5 卷積,pooling 等進行提取,Inception 結構將特征類型的選擇留給網絡自己訓練,也就是將一個輸入同時輸給幾種提取特征方式,然后做 concat 。Inception-v3的結構圖如下:
- 對 Inception-v3 進行簡化,去除 Inception-v3 中的 avg pool 后,輸入的下一步操作就都是 1×11×11×1 卷積:
- 提取 1×11×11×1 卷積的公共部分:
- Xception(極致的 Inception):先進行普通卷積操作,再對 1×11×11×1 卷積后的每個channel分別進行 3×33×33×3 卷積操作,最后將結果 concat:
深度可分離卷積 Depthwise Separable Convolution
傳統卷積的實現過程:
Depthwise Separable Convolution 的實現過程:
Depthwise Separable Convolution 與 極致的 Inception 區別:
-
極致的 Inception:
-
第一步:普通 1×11×11×1 卷積。
-
第二步:對 1×11×11×1 卷積結果的每個 channel,分別進行 3×33×33×3 卷積操作,并將結果 concat。
-
-
Depthwise Separable Convolution:
-
第一步:Depthwise 卷積,對輸入的每個channel,分別進行 3×33×33×3 卷積操作,并將結果 concat。
-
第二步:Pointwise 卷積,對 Depthwise 卷積中的 concat 結果,進行 1×11×11×1 卷積操作。
-
兩種操作的循序不一致:Inception 先進行 1×11×11×1 卷積,再進行 3×33×33×3 卷積;Depthwise Separable Convolution 先進行 3×33×33×3 卷積,再進行 1×11×11×1 卷積。(作者認為這個差異并沒有大的影響)
作者發現,在“極致的 Inception”模塊中,用于學習空間相關性的 3×33×33×3 的卷積,和用于學習通道間相關性的 1×11×11×1 卷積之間,不使用非線性激活函數時,收斂過程更快、準確率更高:
Xception 網絡架構
Xception 的結構基于 ResNet,但是將其中的卷積層換成了Separable Convolution(極致的 Inception模塊)。如下圖所示。整個網絡被分為了三個部分:Entry,Middle和Exit。
-
Xception 在 ImageNet 上,比 Inception-v3 的準確率稍高, 同時參數量有所下降,在 Xception 中加入的類似 ResNet 的殘差連接機制也顯著加快了Xception的收斂過程并獲得了顯著更高的準確率。
-
潛在的問題是,雖然 Depthwise Separable Convolution 可以帶來準確率的提升或是理論計算量的大幅下降,但由于其計算過程較為零散,現有的卷積神經網絡實現中它的效率都不夠高,例如本文中 Xception 的理論計算量是遠小于Inception-v3的,但其訓練時的迭代速度反而更慢一些。
Xception 微調
Xception 權重文件下載:地址
from keras.models import Model from keras import layers from keras.layers import Dense, Input, BatchNormalization, Activation from keras.layers import Conv2D, SeparableConv2D, MaxPooling2D, GlobalAveragePooling2D from keras.optimizers import SGD from sklearn.metrics import log_loss from get_traffic_dataset import TrafficImageDataGeneratortrain_file = './citySpace/outData/train/' val_file = './citySpace/outData/val/'def Xception(img_rows, img_cols, color_type,num_classes,weights_path = 'xception_weights_tf_dim_ordering_tf_kernels.h5'):img_input = Input(shape=(img_rows, img_cols, color_type))# Block 1x = Conv2D(32, (3, 3), strides=(2, 2), use_bias=False)(img_input)x = BatchNormalization()(x)x = Activation('relu')(x)x = Conv2D(64, (3, 3), use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)residual = Conv2D(128, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)residual = BatchNormalization()(residual)# Block 2x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)# Block 2 Poolx = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)x = layers.add([x, residual])residual = Conv2D(256, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)residual = BatchNormalization()(residual)# Block 3x = Activation('relu')(x)x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)# Block 3 Poolx = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)x = layers.add([x, residual])residual = Conv2D(728, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)residual = BatchNormalization()(residual)# Block 4x = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)x = layers.add([x, residual])# Block 5 - 12for i in range(8):residual = xx = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = layers.add([x, residual])residual = Conv2D(1024, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)residual = BatchNormalization()(residual)# Block 13x = Activation('relu')(x)x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)x = SeparableConv2D(1024, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)# Block 13 Poolx = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)x = layers.add([x, residual])# Block 14x = SeparableConv2D(1536, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)# Block 14 part 2x = SeparableConv2D(2048, (3, 3), padding='same', use_bias=False)(x)x = BatchNormalization()(x)x = Activation('relu')(x)# Fully Connected Layerx_fc = GlobalAveragePooling2D()(x)x_fc = Dense(1000, activation='softmax')(x_fc)inputs = img_input# Create modelmodel = Model(inputs, x_fc, name='xception')# load weightsmodel.load_weights(weights_path)# Truncate and replace softmax layer for transfer learning# Cannot use model.layers.pop() since model is not of Sequential() type# The method below works since pre-trained weights are stored in layers but not in the modelx_newfc = GlobalAveragePooling2D()(x)x_newfc = Dense(num_classes, activation='softmax', name='fc_out')(x_newfc)# Create another model with our customized softmaxmodel = Model(img_input, x_newfc)# Learning rate is changed to 0.001sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.9, nesterov=True)model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])return modelif __name__ == '__main__': img_rows, img_cols = 299, 299 # Resolution of inputschannel = 3num_classes = 3batch_size = 16nb_epoch = 5# Initalize the data generator seperately for the training and validation settrain_generator = TrafficImageDataGenerator(train_file, scale_size=(img_rows, img_cols), horizontal_flip=True, shuffle=True)val_generator = TrafficImageDataGenerator(val_file, scale_size=(img_rows, img_cols), horizontal_flip=True, shuffle=True)X_valid, Y_valid,val_labels = val_generator.all(1000) X_train, Y_train, train_labels = train_generator.all(5000) # Load our modelmodel = Xception(img_rows, img_cols,channel,num_classes)# Start Fine-tuningmodel.fit(X_train, Y_train,batch_size=batch_size,nb_epoch=nb_epoch,shuffle=True,verbose=1,validation_data=(X_valid, Y_valid),)# Make predictionspredictions_valid = model.predict(X_valid, batch_size=batch_size, verbose=1)# Cross-entropy loss scorescore = log_loss(Y_valid, predictions_valid)總結
- 上一篇: GoogLeNet Inception
- 下一篇: Light-Head R-CNN