當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

项目总结二：人脸识别项目（Face Recognition for the Happy House）

發布時間：2025/4/14 pytorch 71 豆豆

生活随笔收集整理的這篇文章主要介紹了项目总结二：人脸识别项目（Face Recognition for the Happy House）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、人臉驗證問題（face verification）與人臉識別問題（face recognition）

1、人臉驗證問題（face verification）： ??????????輸入?????????????????????? 數據庫

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Image???????????????????? Image

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID??????????????????????????? ID?

通過輸入的ID找到數據庫里的Image，然后將Image與輸入的Image比較，判斷圖片是不是同一個人。一對一問題,通過監督學習即可解決。例如高鐵站的門禁系統。

?2、人臉識別問題（face recognition）：?????????? 輸入???????????????????????? 數據庫

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Image? ? ? ? ? ? ? ? ? ? ? ? ?Image *100

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID? * 100

假設數據庫里有100張圖片,通過分別計算輸入圖片與數據庫里所有圖片的d函數的的值，即如果d>閾值τ,則不是同一個人；如果d<閾值τ,則是同一個人。1對k的問題，需要解決一次學習問題（One-shot learning problem），這意味著在大多數人臉識別應用中，你需要通過單單一張圖片或者單單一個人臉樣例就能去識別這個人。例如Andrew NG展示的百度員工上班的門禁系統。

二、模型

By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128.By computing a distance between two encodings and thresholding（0.7）, you can determine if the two pictures represent the same person.

1、Siamese 網絡（Siamese network）

對于兩個不同的輸入，運行相同的卷積神經網絡，然后比較它們，這一般叫做Siamese網絡架構。怎么訓練這個Siamese神經網絡呢？不要忘了這兩個網絡有相同的參數，所以你實際要做的就是訓練一個網絡，它計算得到的編碼(encoding)可以用于計算距離，它可以告訴你兩張圖片是否是同一個人。

2、Inception?模型

（1）Inception 模塊

將學這些模塊組合起來，構筑就可以構建Inception網絡。

（2）實現Inception Network的代碼

import tensorflow as tf import numpy as np import os from numpy import genfromtxt from keras import backend as K from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate from keras.models import Model from keras.layers.normalization import BatchNormalization from keras.layers.pooling import MaxPooling2D, AveragePooling2D import fr_utils from keras.layers.core import Lambda, Flatten, Densedef inception_block_1a(X):"""Implementation of an inception block"""X_3x3 = Conv2D(96, (1, 1), data_format='channels_first', name ='inception_3a_3x3_conv1')(X)X_3x3 = BatchNormalization(axis=1, epsilon=0.00001, name = 'inception_3a_3x3_bn1')(X_3x3)X_3x3 = Activation('relu')(X_3x3)X_3x3 = ZeroPadding2D(padding=(1, 1), data_format='channels_first')(X_3x3)X_3x3 = Conv2D(128, (3, 3), data_format='channels_first', name='inception_3a_3x3_conv2')(X_3x3)X_3x3 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3a_3x3_bn2')(X_3x3)X_3x3 = Activation('relu')(X_3x3)X_5x5 = Conv2D(16, (1, 1), data_format='channels_first', name='inception_3a_5x5_conv1')(X)X_5x5 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3a_5x5_bn1')(X_5x5)X_5x5 = Activation('relu')(X_5x5)X_5x5 = ZeroPadding2D(padding=(2, 2), data_format='channels_first')(X_5x5)X_5x5 = Conv2D(32, (5, 5), data_format='channels_first', name='inception_3a_5x5_conv2')(X_5x5)X_5x5 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3a_5x5_bn2')(X_5x5)X_5x5 = Activation('relu')(X_5x5)X_pool = MaxPooling2D(pool_size=3, strides=2, data_format='channels_first')(X)X_pool = Conv2D(32, (1, 1), data_format='channels_first', name='inception_3a_pool_conv')(X_pool)X_pool = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3a_pool_bn')(X_pool)X_pool = Activation('relu')(X_pool)X_pool = ZeroPadding2D(padding=((3, 4), (3, 4)), data_format='channels_first')(X_pool)X_1x1 = Conv2D(64, (1, 1), data_format='channels_first', name='inception_3a_1x1_conv')(X)X_1x1 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3a_1x1_bn')(X_1x1)X_1x1 = Activation('relu')(X_1x1)# CONCATinception = concatenate([X_3x3, X_5x5, X_pool, X_1x1], axis=1)return inceptiondef inception_block_1b(X):X_3x3 = Conv2D(96, (1, 1), data_format='channels_first', name='inception_3b_3x3_conv1')(X)X_3x3 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_3x3_bn1')(X_3x3)X_3x3 = Activation('relu')(X_3x3)X_3x3 = ZeroPadding2D(padding=(1, 1), data_format='channels_first')(X_3x3)X_3x3 = Conv2D(128, (3, 3), data_format='channels_first', name='inception_3b_3x3_conv2')(X_3x3)X_3x3 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_3x3_bn2')(X_3x3)X_3x3 = Activation('relu')(X_3x3)X_5x5 = Conv2D(32, (1, 1), data_format='channels_first', name='inception_3b_5x5_conv1')(X)X_5x5 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_5x5_bn1')(X_5x5)X_5x5 = Activation('relu')(X_5x5)X_5x5 = ZeroPadding2D(padding=(2, 2), data_format='channels_first')(X_5x5)X_5x5 = Conv2D(64, (5, 5), data_format='channels_first', name='inception_3b_5x5_conv2')(X_5x5)X_5x5 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_5x5_bn2')(X_5x5)X_5x5 = Activation('relu')(X_5x5)X_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3), data_format='channels_first')(X)X_pool = Conv2D(64, (1, 1), data_format='channels_first', name='inception_3b_pool_conv')(X_pool)X_pool = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_pool_bn')(X_pool)X_pool = Activation('relu')(X_pool)X_pool = ZeroPadding2D(padding=(4, 4), data_format='channels_first')(X_pool)X_1x1 = Conv2D(64, (1, 1), data_format='channels_first', name='inception_3b_1x1_conv')(X)X_1x1 = BatchNormalization(axis=1, epsilon=0.00001, name='inception_3b_1x1_bn')(X_1x1)X_1x1 = Activation('relu')(X_1x1)inception = concatenate([X_3x3, X_5x5, X_pool, X_1x1], axis=1)return inceptiondef inception_block_1c(X):X_3x3 = fr_utils.conv2d_bn(X,layer='inception_3c_3x3',cv1_out=128,cv1_filter=(1, 1),cv2_out=256,cv2_filter=(3, 3),cv2_strides=(2, 2),padding=(1, 1))X_5x5 = fr_utils.conv2d_bn(X,layer='inception_3c_5x5',cv1_out=32,cv1_filter=(1, 1),cv2_out=64,cv2_filter=(5, 5),cv2_strides=(2, 2),padding=(2, 2))X_pool = MaxPooling2D(pool_size=3, strides=2, data_format='channels_first')(X)X_pool = ZeroPadding2D(padding=((0, 1), (0, 1)), data_format='channels_first')(X_pool)inception = concatenate([X_3x3, X_5x5, X_pool], axis=1)return inceptiondef inception_block_2a(X):X_3x3 = fr_utils.conv2d_bn(X,layer='inception_4a_3x3',cv1_out=96,cv1_filter=(1, 1),cv2_out=192,cv2_filter=(3, 3),cv2_strides=(1, 1),padding=(1, 1))X_5x5 = fr_utils.conv2d_bn(X,layer='inception_4a_5x5',cv1_out=32,cv1_filter=(1, 1),cv2_out=64,cv2_filter=(5, 5),cv2_strides=(1, 1),padding=(2, 2))X_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3), data_format='channels_first')(X)X_pool = fr_utils.conv2d_bn(X_pool,layer='inception_4a_pool',cv1_out=128,cv1_filter=(1, 1),padding=(2, 2))X_1x1 = fr_utils.conv2d_bn(X,layer='inception_4a_1x1',cv1_out=256,cv1_filter=(1, 1))inception = concatenate([X_3x3, X_5x5, X_pool, X_1x1], axis=1)return inceptiondef inception_block_2b(X):#inception4eX_3x3 = fr_utils.conv2d_bn(X,layer='inception_4e_3x3',cv1_out=160,cv1_filter=(1, 1),cv2_out=256,cv2_filter=(3, 3),cv2_strides=(2, 2),padding=(1, 1))X_5x5 = fr_utils.conv2d_bn(X,layer='inception_4e_5x5',cv1_out=64,cv1_filter=(1, 1),cv2_out=128,cv2_filter=(5, 5),cv2_strides=(2, 2),padding=(2, 2))X_pool = MaxPooling2D(pool_size=3, strides=2, data_format='channels_first')(X)X_pool = ZeroPadding2D(padding=((0, 1), (0, 1)), data_format='channels_first')(X_pool)inception = concatenate([X_3x3, X_5x5, X_pool], axis=1)return inceptiondef inception_block_3a(X):X_3x3 = fr_utils.conv2d_bn(X,layer='inception_5a_3x3',cv1_out=96,cv1_filter=(1, 1),cv2_out=384,cv2_filter=(3, 3),cv2_strides=(1, 1),padding=(1, 1))X_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3), data_format='channels_first')(X)X_pool = fr_utils.conv2d_bn(X_pool,layer='inception_5a_pool',cv1_out=96,cv1_filter=(1, 1),padding=(1, 1))X_1x1 = fr_utils.conv2d_bn(X,layer='inception_5a_1x1',cv1_out=256,cv1_filter=(1, 1))inception = concatenate([X_3x3, X_pool, X_1x1], axis=1)return inceptiondef inception_block_3b(X):X_3x3 = fr_utils.conv2d_bn(X,layer='inception_5b_3x3',cv1_out=96,cv1_filter=(1, 1),cv2_out=384,cv2_filter=(3, 3),cv2_strides=(1, 1),padding=(1, 1))X_pool = MaxPooling2D(pool_size=3, strides=2, data_format='channels_first')(X)X_pool = fr_utils.conv2d_bn(X_pool,layer='inception_5b_pool',cv1_out=96,cv1_filter=(1, 1))X_pool = ZeroPadding2D(padding=(1, 1), data_format='channels_first')(X_pool)X_1x1 = fr_utils.conv2d_bn(X,layer='inception_5b_1x1',cv1_out=256,cv1_filter=(1, 1))inception = concatenate([X_3x3, X_pool, X_1x1], axis=1)return inceptiondef faceRecoModel(input_shape):"""Implementation of the Inception model used for FaceNetArguments:input_shape -- shape of the images of the datasetReturns:model -- a Model() instance in Keras"""# Define the input as a tensor with shape input_shapeX_input = Input(input_shape)# Zero-PaddingX = ZeroPadding2D((3, 3))(X_input)# First BlockX = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1')(X)X = BatchNormalization(axis = 1, name = 'bn1')(X)X = Activation('relu')(X)# Zero-Padding + MAXPOOLX = ZeroPadding2D((1, 1))(X)X = MaxPooling2D((3, 3), strides = 2)(X)# Second BlockX = Conv2D(64, (1, 1), strides = (1, 1), name = 'conv2')(X)X = BatchNormalization(axis = 1, epsilon=0.00001, name = 'bn2')(X)X = Activation('relu')(X)# Zero-Padding + MAXPOOLX = ZeroPadding2D((1, 1))(X)# Second BlockX = Conv2D(192, (3, 3), strides = (1, 1), name = 'conv3')(X)X = BatchNormalization(axis = 1, epsilon=0.00001, name = 'bn3')(X)X = Activation('relu')(X)# Zero-Padding + MAXPOOLX = ZeroPadding2D((1, 1))(X)X = MaxPooling2D(pool_size = 3, strides = 2)(X)# Inception 1: a/b/cX = inception_block_1a(X)X = inception_block_1b(X)X = inception_block_1c(X)# Inception 2: a/bX = inception_block_2a(X)X = inception_block_2b(X)# Inception 3: a/bX = inception_block_3a(X)X = inception_block_3b(X)# Top layerX = AveragePooling2D(pool_size=(3, 3), strides=(1, 1), data_format='channels_first')(X)X = Flatten()(X)X = Dense(128, name='dense_layer')(X)# L2 normalizationX = Lambda(lambda x: K.l2_normalize(x,axis=1))(X)# Create model instancemodel = Model(inputs = X_input, outputs = X, name='FaceRecoModel')return model

3、損失函數：The Triplet Loss

raining will use triplets of images?

A is an "Anchor" image--a picture of a person.
P is a "Positive" image--a picture of the same person as the Anchor image.
N is a "Negative" image--a picture of a different person than the Anchor image.

These triplets are picked from our training dataset. We will write?

?You'd like to make sure that an image?

You would thus like to minimize the following "triplet cost":

Here, we are using the notation "

Notes:

The term (1) is the squared distance between the anchor "A" and the positive "P" for a given triplet; you want this to be small.
The term (2) is the squared distance between the anchor "A" and the negative "N" for a given triplet, you want this to be relatively large, so it thus makes sense to have a minus sign preceding it.

?Most implementations also normalize the encoding vectors to have norm equal one (i.e.,?

4、model compile

FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy']) load_weights_from_FaceNet(FRmodel)

?The pretrained model we use is inspired by Victor Sy Wang's implementation and was loaded using his code:?https://github.com/iwantooxxoox/Keras-OpenFace.

5、輸入輸出數據類型

(1)輸入數據：This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of?

It outputs a matrix of shape?

6、總結

Face verification solves an easier 1:1 matching problem; face recognition addresses a harder 1:K matching problem.
The triplet loss is an effective loss function for training a neural network to learn an encoding of a face image.
The same encoding can be used for verification and recognition. Measuring distances between two images' encodings allows you to determine whether they are pictures of the same person.

轉載于:https://www.cnblogs.com/hezhiyao/p/8653081.html

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的项目总结二：人脸识别项目（Face Recognition for the Happy House）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： JMeter和JMeterPlugin
下一篇： Python 爬虫系列：糗事百科最热段子