使用预训练的卷积神经网络(猫狗图片分类)
本次所用數據來自ImageNet,使用預訓練好的數據來預測一個新的數據集:貓狗圖片分類。這里,使用VGG模型,這個模型內置在Keras中,直接導入就可以了。
from keras.applications import VGG16conv_base = VGG16(weights='imagenet',include_top=False,input_shape=(150, 150, 3))說一下這三個參數:
- weights:指定模型初始化權重檢查點
- include_top:指定模型最后是否包含密集連接分類器。默認情況下,這個密集連接分類器對應于ImageNet的1000個類別。因為我們打算使用自己的分類器(只有兩個類別:cat和dog),所以不用包含。
- input_shape:輸入到網絡中的圖像張量(可選參數),如果不傳入這個參數,那么網絡可以處理任意形狀的輸入
看一下VGG16網絡的詳細構架:
conv_base.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 150, 150, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 150, 150, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 150, 150, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 75, 75, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 75, 75, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 75, 75, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 37, 37, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 37, 37, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 37, 37, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 37, 37, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 18, 18, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 9, 9, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 4, 4, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0最后這個特征圖形狀為(4, 4, 512),我們在這個特征上面添加一個密集連接分類器。
不使用數據增強的快速特征提取(計算代價低)
首先,運行ImageDataGenerator實例,將圖像及其標簽提取為Numpy數組,調用conv_base模型的predict方法從這些圖像的中提取特征。
import os import numpy as np from keras.preprocessing.image import ImageDataGeneratorbase_dir = '/Users/fchollet/Downloads/cats_and_dogs_small'train_dir = os.path.join(base_dir, 'train') validation_dir = os.path.join(base_dir, 'validation') test_dir = os.path.join(base_dir, 'test')datagen = ImageDataGenerator(rescale=1./255) batch_size = 20def extract_features(directory, sample_count):features = np.zeros(shape=(sample_count, 4, 4, 512))labels = np.zeros(shape=(sample_count))generator = datagen.flow_from_directory(directory,target_size=(150, 150),batch_size=batch_size,class_mode='binary')i = 0for inputs_batch, labels_batch in generator:features_batch = conv_base.predict(inputs_batch)features[i * batch_size : (i + 1) * batch_size] = features_batchlabels[i * batch_size : (i + 1) * batch_size] = labels_batchi += 1if i * batch_size >= sample_count:break # 這些生成器在循環(huán)中不斷生成數據,所以你必須在讀完所有圖像之后終止循環(huán)return features, labelstrain_features, train_labels = extract_features(train_dir, 2000) validation_features, validation_labels = extract_features(validation_dir, 1000) test_features, test_labels = extract_features(test_dir, 1000)目前,提取的特征形狀為(samples, 4, 4, 512),我們要將其輸入到密集連接分類器中去,所以必須首先對其形狀展平為(samples ,8192)
train_features = np.reshape(train_features, (2000, 4 * 4 * 512)) validation_features = np.reshape(validation_features, (1000, 4 * 4 * 512)) test_features = np.reshape(test_features, (1000, 4 * 4 * 512))下面定義一個密集連接分類器,并在剛剛保存好的數據和標簽上訓練分類器:
from keras import models from keras import layers from keras import optimizersmodel = models.Sequential() model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512)) model.add(layers.Dropout(0.5)) model.add(layers.Dense(1, activation='sigmoid'))model.compile(optimizer=optimizers.RMSprop(lr=2e-5),loss='binary_crossentropy',metrics=['acc'])history = model.fit(train_features, train_labels,epochs=30,batch_size=20,validation_data=(validation_features, validation_labels))訓練速度非常快,因為只需要處理兩個Dense層。下面看一下訓練過程中的損失曲線和精度曲線:
import matplotlib.pyplot as pltacc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss']epochs = range(len(acc))plt.plot(epochs, acc, 'bo', label='Training acc') plt.plot(epochs, val_acc, 'b', label='Validation acc') plt.title('Training and validation accuracy') plt.legend()plt.figure()plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend()plt.show()
從圖中可以看出,驗證精度達到了約90%,比之前從一開始就訓練小型模型效果要好很多,但是從圖中也可以看出,雖然dropout比率比較大,但模型從一開始就出現了過擬合。這是因為本方法中沒有使用數據增強,而數據增強對防止小型圖片數據集過擬合非常重要。
使用數據增強的特征提取(計算代價高)
這種方法速度更慢,計算代價更高,但是可以在訓練期間使用數據增強。這種方法是:擴展conv_base模型,然后在輸入數據上端到端的運行模型。(這種方法計算代價很高,必須在GPU上運行)
承接我們之前定義的網絡模型
from keras import models from keras import layersmodel = models.Sequential() model.add(conv_base) model.add(layers.Flatten()) model.add(layers.Dense(256, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Model) (None, 4, 4, 512) 14714688 _________________________________________________________________ flatten_1 (Flatten) (None, 8192) 0 _________________________________________________________________ dense_3 (Dense) (None, 256) 2097408 _________________________________________________________________ dense_4 (Dense) (None, 1) 257 ================================================================= Total params: 16,812,353 Trainable params: 16,812,353 Non-trainable params: 0我們可以看到,VGG16的卷積基一共有14714688個參數,其上添加的分類器一共有200萬個參數,非常多。
在編譯和訓練模型之前,需要凍結卷積基。凍結一個或多個層是指在訓練過程中保持其權重不變。如果不這么做,那么卷積基之前學到的表示將會在訓練過程中被修改。因為其上添加的Dense是隨機初始化的,所以非常打的權重更新會在網絡中進行傳播,對之前學到的表示造成很大破壞。
在Keras中,凍結網絡的方法是將其trainable屬性設置為False
print('This is the number of trainable weights ''before freezing the conv base:', len(model.trainable_weights))This is the number of trainable weights before freezing the conv base: 30
conv_base.trainable = False print('This is the number of trainable weights ''after freezing the conv base:', len(model.trainable_weights))This is the number of trainable weights after freezing the conv base: 4
如此設置之后,只有添加的兩個Dense層的權重才會被訓練,總共有4個權重張量,每層2個(主權重矩陣和偏置向量),注意的是,如果想修改權重屬性trainable,那么應該修改好屬性之后再編譯模型。
下面,我們可以訓練模型了,并使用數據增強的辦法:
from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale=1./255,rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode='nearest')# Note that the validation data should not be augmented! test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(# This is the target directorytrain_dir,# All images will be resized to 150x150target_size=(150, 150),batch_size=20,# Since we use binary_crossentropy loss, we need binary labelsclass_mode='binary')validation_generator = test_datagen.flow_from_directory(validation_dir,target_size=(150, 150),batch_size=20,class_mode='binary')model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(lr=2e-5),metrics=['acc'])history = model.fit_generator(train_generator,steps_per_epoch=100,epochs=30,validation_data=validation_generator,validation_steps=50,verbose=2) model.save('cats_and_dogs_small_3.h5')我們再來看看驗證精度:
acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss']epochs = range(len(acc))plt.plot(epochs, acc, 'bo', label='Training acc') plt.plot(epochs, val_acc, 'b', label='Validation acc') plt.title('Training and validation accuracy') plt.legend()plt.figure()plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend()plt.show()
驗證精度到了將近96%,而且減少了過擬合。
微調模型
我們下面使用模型微調,進一步提高模型的性能。模型微調的步驟如下:
- (1)在已經訓練好的基網絡(base network)上添加自定義網絡
- (2)凍結基網絡
- (3)訓練所添加的部分
- (4)解凍基網絡的一些層
- (5)聯合訓練解凍的這些層和添加的部分
在做特征提取的時候已經完成了前三個步驟。我們繼續(xù)第四個步驟,先解凍conv_base,然后凍結其中的部分層。
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 150, 150, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 150, 150, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 150, 150, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 75, 75, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 75, 75, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 75, 75, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 37, 37, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 37, 37, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 37, 37, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 37, 37, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 18, 18, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 9, 9, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 4, 4, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0再回顧一下這些層,我們將微調最后三個卷積層,也就是說,知道block4_pool的所有層都應該被凍結,后面三層來進行訓練。
要知道,訓練的參數越多,過擬合的風險越大。卷積基有1500萬個參數,所以你在小型數據集上訓練這么多參數是有風險的。因此,這種情況下最好的策略是僅微調卷積基最后三兩層。
conv_base.trainable = Trueset_trainable = False for layer in conv_base.layers:if layer.name == 'block5_conv1':set_trainable = Trueif set_trainable:layer.trainable = Trueelse:layer.trainable = False現在可以微調網絡了,我們將使用學習率非常小的RMSProp優(yōu)化器來實現。之所以讓學習率很小,是因為對于微調網絡的三層表示,我們希望其變化范圍不要太大,太大的權重可能會破壞這些表示。
model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(lr=1e-5),metrics=['acc'])history = model.fit_generator(train_generator,steps_per_epoch=100,epochs=100,validation_data=validation_generator,validation_steps=50) model.save('cats_and_dogs_small_4.h5')下面,繪制曲線看看效果:
acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss']epochs = range(len(acc))plt.plot(epochs, acc, 'bo', label='Training acc') plt.plot(epochs, val_acc, 'b', label='Validation acc') plt.title('Training and validation accuracy') plt.legend()plt.figure()plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend()plt.show()
這些曲線看起來包含噪音。為了讓圖像更具有可讀性,可以讓每個損失精度替換為指數移動平均,從而讓曲線變得更加平滑,下面用一個簡單實用函數來實現:
通過指數移動平均,驗證曲線變得更清楚了。可以看到,精度提高了1%,約從96%提高到了97%。
下面,在測試集上評估一下這個模型
test_generator = test_datagen.flow_from_directory(test_dir,target_size=(150, 150),batch_size=20,class_mode='binary')test_loss, test_acc = model.evaluate_generator(test_generator, steps=50) print('test acc:', test_acc)Found 1000 images belonging to 2 classes.
test acc: 0.967999992371
得到了差不多97%的測試精度,在關于這個數據集的原始Kaggle競賽中,這個結果是最佳結果之一。
值得注意的是,我們只是用了一小部分訓練數據(約10%)就得到了這個結果。訓練20000個樣本和訓練2000個樣本還是有很大差別的。
更多精彩內容,歡迎關注我的微信公眾號:數據瞎分析
總結
以上是生活随笔為你收集整理的使用预训练的卷积神经网络(猫狗图片分类)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 房价预测:回归问题
- 下一篇: 卷机神经网络的可视化(可视化中间激活)