深度学习:在图像上找到手势_使用深度学习的人类情绪和手势检测器:第1部分
深度學習:在圖像上找到手勢
情感手勢檢測 (Emotion Gesture Detection)
Has anyone ever wondered looking at someone and tried to analyze what kind of emotion they had or what kind of gesture they were trying to perform but you ended up being confused. Maybe once you tried to approach a baby which looked like this:
有沒有人想知道看著某人并試圖分析他們有什么樣的情感或他們試圖執行哪種手勢,但最終卻感到困惑。 也許一旦您嘗試接近這樣的嬰兒:
You thought it likes you and just wants a cuddle and then you ended up carrying it and then this happened!
您以為它喜歡您,只想要一個擁抱,然后您最終攜帶它,然后發生了!
Source: Brytny.com-Unsplash資料來源:Brytny.com-UnsplashOops! That did not work out as planned. But real-life uses may not be as simple as the above situation and may require more precise human emotion analysis as well as gesture analysis. This field of application is especially useful in any department where customer satisfaction or just knowing what the customer wants is extremely important.
糟糕! 那沒有按計劃進行。 但是現實生活中的使用可能不像上述情況那么簡單,并且可能需要更精確的人類情感分析以及手勢分析。 在客戶滿意度或僅了解客戶需求至關重要的任何部門中,此應用領域特別有用。
Today we will be uncovering a couple of Deep Learning models which does exactly that. The models we will be developing today can identify some human emotions as well as a few gestures. We will be trying to identify 6 emotions namely angry, happy, neutral, fear, sad and surprise. We will also be identifying 4 types of gestures which are loser, victory, super, and punch. We will be performing a real-time performance and we will be getting a real-time vocal response from the model.
今天,我們將揭露一些可以做到這一點的深度學習模型。 我們今天將要開發的模型可以識別一些人類情感以及一些手勢。 我們將嘗試識別6種情緒,即憤怒,快樂,中立,恐懼,悲傷和驚奇。 我們還將確定4種手勢類型,即失敗者,勝利,超級手勢和拳打手勢。 我們將進行實時表演,并從模型中獲得實時的聲音響應。
The emotions model will be built using convolution neural networks from scratch and for finger gestures, I will be using transfer learning with VGG-16 architecture and adding custom layers to improve the performance of the model to get better and higher accuracy. The emotion analysis and finger gestures will provide an appropriate vocal as well as text response for each of the actions. The metric we will be using is accuracy and we will try to achieve a validation accuracy of at least 50% for the emotions model-1, over 65% for emotions model-2, and a validation accuracy of over 90% for the gestures model.
情緒模型將從頭開始使用卷積神經網絡構建,并用于手指手勢,我將在VGG-16架構中使用轉移學習,并添加自定義圖層以改善模型的性能以獲得更好和更高的準確性。 情緒分析和手指手勢將為每個動作提供適當的聲音和文本響應。 我們將使用的度量標準是準確性,我們將努力使情感模型1的驗證精度至少達到50%,情感模型2的驗證精度至少達到65%,手勢模型的驗證精度達到90%以上。
數據集: (Datasets:)
Let us now look at the dataset choices we have available to us.
現在讓我們看一下可供選擇的數據集。
1. Kaggle’s fer2013 dataset — The dataset is an open-source dataset that contains 35,887 grayscale images of various emotions which are all labeled and are of size 48x48. The Facial Expression Recognition Dataset was published during the International Conference on Machine Learning (ICML). This Kaggle dataset will be the more primary and important dataset that will be used for emotion analysis in this case study.
1. Kaggle的fer2013數據集 —該數據集是一個開源數據集,其中包含35887張各種情緒的灰度圖像,這些圖像均已標記且大小為48x48。 面部表情識別數據集在國際機器學習大會(ICML)期間發布。 該Kaggle數據集將是更主要和重要的數據集,在本案例研究中將用于情感分析。
The dataset is given in an excel sheet in .csv format and the pixels are to be extracted and after extraction of the pixels and pre-processing of data, the dataset looks like the image posted below:
數據集以.csv格式在excel工作表中給出,并且要提取像素,并且在提取像素并進行數據預處理之后,數據集看起來像下面發布的圖像:
Source: Image by author資料來源:作者提供的圖片(Refer this link in case the first link is not working).
(如果第一個鏈接不起作用,請參考此鏈接 )。
2. The first affect in the wild challenge — This can be a secondary dataset considered for this case study. The First Affect-in-the-wild Challenge is a design on state of the art Deep Neural Architectures including the AffWildNet which allows us to exploit the AffWild database for learning features, which can be used as priors for achieving the best performances for dimensional and categorical emotion recognition. In the download link, we will find a tar.gz file, which contains 4 folders named: videos, annotations, boxes, and landmarks. However, for our emotions recognition model, we will be strictly considering only the fer2013 dataset.
2. 野外挑戰中的第一個影響 -這可以是本案例研究考慮的第二個數據集。 “第一個自然情感挑戰”是對包括AffWildNet在內的最先進的深度神經架構進行的設計,該設計使我們能夠利用AffWild數據庫獲取學習功能,可以將其用作獲得最佳尺寸和性能的先驗條件。分類情感識別。 在下載鏈接中,我們將找到一個tar.gz文件,該文件包含4個文件夾,名稱分別為:視頻,注釋,框和地標。 但是,對于我們的情緒識別模型,我們將嚴格只考慮fer2013數據集。
3. ASL Alphabet dataset — This will be the primary dataset for finger gesture detection. The “American Sign Language” Alphabet dataset consists of the collection of images of alphabets from the American Sign Language, separated into 29 folders which represent the various classes. The training data set contains 87,000 images which are 200x200 pixels.
3. ASL字母數據集 -這將是手指手勢檢測的主要數據集。 “美國手語”字母表數據集由來自美國手語的字母表圖像集合組成,分為29個文件夾,分別代表各個類別。 訓練數據集包含87,000張200x200像素的圖像。
There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE, and NOTHING. These 3 classes are very helpful in real-time applications and classification. However, for our gesture recognition, we will be using 4 classes from A-Z from this data for some of the appropriate required actions with the fingers. The Model will be trained to recognize 4 of these specific hand gestures which are A (punch), F (Super), L (Loser), and V (Victory). We will then train our model to recognize these gestures and give an appropriate vocal response for each of the following accordingly.
有29個類,其中26個是字母AZ,3個類是SPACE,DELETE和NOTHING。 這3個類在實時應用程序和分類中非常有幫助。 但是,為了進行手勢識別,我們將使用來自該數據的AZ中的4個類,用手指進行一些適當的必要操作。 該模型將接受訓練以識別其中4種特定手勢,即A(打Kong),F(超級),L(失敗者)和V(勝利)。 然后,我們將訓練我們的模型以識別這些手勢,并相應地對以下每個手勢做出適當的聲音響應。
4. Custom Datasets — For both of these i.e. emotion analysis and finger gesture detection we can also use custom datasets of yourself or friends or even family for the recognition of various sentiments as well as hand gestures. The images taken will be grayscaled and then resized according to our requirements.
4.自定義數據集-對于這兩種情感分析和手指手勢檢測,我們還可以使用您自己或朋友甚至家人的自定義數據集來識別各種情緒以及手勢。 拍攝的圖像將被灰度化,然后根據我們的要求調整大小。
預處理: (Pre-processing:)
For our emotions model, we will be using Kaggle’s fer2013 dataset and we will be using ASL dataset for gesture identification. We can begin performing the required pre-processing required for the models. For the emotions dataset, we will look at the libraries required for pre-processing.
對于我們的情感模型,我們將使用Kaggle的fer2013數據集,并將使用ASL數據集進行手勢識別。 我們可以開始執行模型所需的預處理。 對于情緒數據集,我們將研究預處理所需的庫。
Pandas is a fast, flexible open-source data analysis library that we will be using for accessing the .csv files.
Pandas是一個快速,靈活的開源數據分析庫,我們將使用它來訪問.csv文件。
Numpy is used for processing on multi-dimensional arrays. For our Data Pre-Processing, we will use numpy for making an array of the pixel features.
Numpy用于處理多維數組。 對于我們的數據預處理,我們將使用numpy制作像素特征數組。
The OS module provides us a way to interact with the operating system.
操作系統模塊為我們提供了一種與操作系統進行交互的方式。
The cv2 module is the computer vision/open-cv module we will be using to convert the numpy arrays of pixels into visual images.
cv2模塊是計算機視覺/ open-cv模塊,我們將使用它將像素的numpy數組轉換為視覺圖像。
tqdm is an optional library that we can use for visualizing the speed of processing and the number of bits/second.
tqdm是一個可選庫,我們可以使用它來可視化處理速度和每秒位數。
Now let us read the fer2013.csv file using pandas.
現在,讓我們使用熊貓讀取fer2013.csv文件。
We read the fer2013.csv file using pandas. The fer2013 is the facial expressions recognition .csv file from Kaggle. In the .csv file we have 3 main columns — emotion, pixels and Usage. The emotion column consists of labels 0–6. The pixels row contains the pixel images in an array format. The Usage column contains of Training, Public Test and Private Test. Let us have a closer look at this.
我們使用熊貓讀取了fer2013.csv文件。 fer2013是來自Kaggle的面部表情識別.csv文件。 在.csv文件中,我們有3個主要列-情感,像素和使用情況。 情緒欄包含標簽0–6。 像素行包含陣列格式的像素圖像。 “用法”列包含“培訓”,“公共測試”和“私人測試”。 讓我們仔細看看。
The Labels are in the range 0–6 where:
標簽的范圍是0–6,其中:
0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy,
0 =生氣,1 =厭惡,2 =恐懼,3 =開心,
4 = Sad, 5 = Surprise, 6 = Neutral.
4 =悲傷,5 =驚喜,6 =中立。
The pixels consists of the pixel values which we can convert into an array form and then use open cv module cv2 to convert the pixel array into an actual image we can visualize. The Usage column consists of Training, PublicTest, and PrivateTest. We will use the Training to store the locations of the training dataset and the remaining PublicTest and PrivateTest will be used for storing the images in a validation folder.
像素由像素值組成,我們可以將其轉換為數組形式,然后使用開放式cv模塊cv2將像素數組轉換為可以可視化的實際圖像。 “使用情況”列包括培訓,PublicTest和PrivateTest。 我們將使用培訓來存儲培訓數據集的位置,其余的PublicTest和PrivateTest將用于將圖像存儲在驗證文件夾中。
Now let us extract these images accordingly. In the below code blocks I will be showing for one class for both train and validation. In this code block, we will be extracting the images from the pixel’s column and then we will be creating a train and validation folder which can be tracked from the Usage column. For each of the train and validation directories, we will be creating all the 7 folders which will contain the Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
現在讓我們相應地提取這些圖像。 在下面的代碼塊中,我將顯示針對訓練和驗證的一門課程。 在此代碼塊中,我們將從像素的列中提取圖像,然后創建可從“用法”列中進行跟蹤的訓練和驗證文件夾。 對于每個火車和驗證目錄,我們將創建所有7個文件夾,其中包含“憤怒”,“厭惡”,“恐懼”,“快樂”,“悲傷”,“驚喜”和“中立”。
We are looping through the dataset and we are converting the pixels from string to float and then storing all the float values in a numpy array. We are converting the image of size 48x48 which is our desired image size. (This step is optional because the given pixels are already of the desired size.)
我們正在遍歷數據集,并將像素從字符串轉換為float,然后將所有float值存儲在numpy數組中。 我們正在轉換大小為48x48的圖像,這是我們所需的圖像尺寸。 (此步驟是可選的,因為給定的像素已經具有所需的大小。)
If the Usage is given as Training then we make a train directory as well as the separate directories for each of the emotions. We store the images in the right emotion directory which can be found by the labels of the emotion column.
如果“用法”作為“培訓”給出,那么我們將為每個情感創建一個培訓目錄以及單獨的目錄。 我們將圖像存儲在正確的情感目錄中,該目錄可通過情感列的標簽找到。
These steps are similarly repeated for the validation directory for which we consider the Usage values as PublicTest and PrivateTest. The emotions are categorized by the labels from the emotion column similar to how the train directory works.
對于驗證目錄,我們將類似地重復執行這些步驟,我們將其“用法”值視為PublicTest和PrivateTest。 情緒由“情緒”列中的標簽進行分類,類似于火車目錄的工作方式。
After this step, all the Data Pre-Processing for the training of the emotions is now completed and we have successfully extracted all the images required for the emotions recognition model and now we can proceed with the further steps. Luckily we don’t have to do a lot of pre-processing for the gestures data. Download the ASL dataset and then create the train1 and validation1 folders as below:
完成此步驟之后,用于情感訓練的所有數據預處理均已完成,并且我們已經成功提取了情感識別模型所需的所有圖像,現在我們可以繼續執行其他步驟。 幸運的是,我們無需對手勢數據進行大量預處理。 下載ASL數據集,然后按如下所示創建train1和validation1文件夾:
The train1 and validation1 directories have 4 sub-directories labeled as shown. We will use the letter ‘L’ for loser, ‘A’ for punch, ‘F’ for super, and ‘V’ for Victory. Summarizing the letters and gestures below:
train1和validation1目錄具有4個子目錄,如圖所示。 我們將使用字母“ L”代表失敗者,“ A”代表拳,“ F”代表超級,“ V”代表勝利。 總結以下字母和手勢:
L = Loser | A = Punch | F = Super | V = Victory
L =失敗者| A =打Kong| F =超級| V =勝利
The ASL data set contains 3000 images for each letter. So we will use the first 2400 images for the training process and the rest 600 images for validation purposes. This way we are splitting the data into 80:20, train: validation ratio. Paste the first 2400 images of each of the alphabets ‘L’, ‘A’, ‘F,’ and ‘V’ into their respective sub-directories in the train1 folder and paste the remaining 600 images of the alphabets into their respective sub-directories in the validation1 folder.
ASL數據集每個字母包含3000張圖像。 因此,我們將使用前2400張圖像進行訓練,將其余600張圖像用于驗證。 通過這種方式,我們將數據分為80:20,即訓練:驗證比率。 將每個字母“ L”,“ A”,“ F”和“ V”的前2400張圖像粘貼到train1文件夾中各自的子目錄中,并將其余600張字母圖像粘貼到其各自的子目錄中。驗證文件夾中的目錄。
探索性數據分析(EDA): (EXPLORATORY DATA ANALYSIS (EDA):)
Before starting to train our emotion and gesture models let us look at images and the overall data we have in our hands after the pre-processing step. Firstly, we will be looking into EDA for emotions data and then we will look into the gestures data. Starting with the emotions data, we will plot a bar graph and scatter plot to see if the dataset is balanced or fairly balanced or totally unbalanced. We will be referring to the train directory.
在開始訓練我們的情緒和手勢模型之前,讓我們先看一下圖像和預處理步驟之后手中的全部數據。 首先,我們將研究EDA中的情感數據,然后研究手勢數據。 從情緒數據開始,我們將繪制條形圖和散點圖,以查看數據集是平衡的還是相當平衡的或完全不平衡的。 我們將參考火車目錄。
條狀圖: (Bar Graph:)
散點圖: (Scatter Plot:)
We can notice that this is a fairly balanced model except the images for “disgust” is comparatively less. For our first emotions model, we will be dropping this emotion completely and we will only consider the remaining the remaining 6 emotions. Now let us look at how the train and validation directories for our emotions dataset look like.
我們可以注意到,這是一個相當均衡的模型,只是“厭惡”的圖像相對較少。 對于我們的第一個情緒模型,我們將完全刪除該情緒,并且僅考慮剩余的6種情緒。 現在讓我們看一下情緒數據集的訓練目錄和驗證目錄。
培養: (Train:)
The Bar Graph and the scatter plot of the train data are as shown below:
火車數據的條形圖和散點圖如下所示:
The Train images of each of the dataset is as shown below:
每個數據集的訓練圖像如下所示:
驗證: (Validation:)
The Bar Graph and the scatter plot of the train data are as shown below:
火車數據的條形圖和散點圖如下所示:
The Validation images of each of the dataset is as shown below:
每個數據集的驗證圖像如下所示:
With our emotions dataset analyzed, we can move on to the gestures dataset and perform a similar analysis as above and understand the gestures dataset as well. Since the dataset for our gestures data for both train and validation is completely balanced it is easier to analyze them. Both the train and validation data for gestures dataset will be analyzed in the next part and similar images will be displayed as well.
通過分析情緒數據集,我們可以繼續進行手勢數據集并執行與上述類似的分析,并理解手勢數據集。 由于用于訓練和驗證的手勢數據的數據集是完全平衡的,因此更易于分析它們。 手勢數據集的訓練數據和驗證數據將在下一部分中進行分析,并且還將顯示類似的圖像。
This completes our exploratory data analysis for emotions model. We can now start building our models for emotion recognition. Firstly, we will build an emotions model using image data augmentation and then we will build the gestures model. Later, we will build a second emotions model directly from the .csv file and try to obtain a higher accuracy. In the end, we will create a final model to run the entire script.
這樣就完成了我們對情緒模型的探索性數據分析。 現在,我們可以開始建立情感識別模型。 首先,我們將使用圖像數據增強來構建情感模型,然后將構建手勢模型。 稍后,我們將直接從.csv文件構建第二個情緒模型,并嘗試獲得更高的準確性。 最后,我們將創建一個最終模型以運行整個腳本。
情感模型1: (Emotion Model-1:)
In this model-1, we will be using techniques of data augmentation. The formal definition of data augmentation is as follows-
在這個model-1中,我們將使用數據擴充技術。 數據擴充的正式定義如下:
Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.
數據擴充是一種策略,使從業人員可以顯著增加可用于訓練模型的數據的多樣性,而無需實際收集新數據 。 諸如裁剪,填充和水平翻轉之類的數據增強技術通常用于訓練大型神經網絡。
Reference: bair.berkeley.edu
參考: bair.berkeley.edu
We will now proceed to import the required libraries and specify some parameters which will be needed for training the model.
現在,我們將繼續導入所需的庫,并指定一些訓練模型所需的參數。
Import all the important required Deep Learning Libraries to train the emotions model.Keras is an Application Programming Interface (API) that can run on top of Tensorflow.Tensorflow will be the main deep learning module we will use to build our deep learning model.The ImageDataGenerator is used for Data augmentation where the model can see more copies of the model. Data Augmentation is used for creating replications of the original images and using those transformations in each epoch. The layers for training which will be used are as follows:1. Input = The input layer in which we pass the input shape.2. Conv2D = The Convolutional layer combined with Input to provide an output of tensors3. Maxpool2D = Downsampling the Data from the convolutional layer.4. Batch normalization = It is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.5. Dropout = Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly and this prevents over-fitting.6. Dense = Fully Connected layers.7. Flatten = Flatten the entire structure to a 1-D array.The Models can be built in a model like structure or can be built in a sequential manner.Use of l2 regularization for fine-tuning.The optimizer used will be Adam as it performs better than the other optimizers on this model.Numpy for numerical array-like operations.pydot_ng and Graphviz are used for making plots.We are also importing the os module to make it compatible with the Windows environment.
導入所有必需的重要深度學習庫以訓練情緒模型.Keras是一個可在Tensorflow之上運行的應用程序編程接口(API),Tensorflow將是我們用于構建深度學習模型的主要深度學習模塊。 ImageDataGenerator用于數據增強,其中模型可以查看模型的更多副本。 數據增強用于創建原始圖像的復制并在每個時期中使用這些轉換。 將使用的訓練層如下:1.。 輸入 =我們在其中傳遞輸入形狀的輸入層2。 Conv2D =卷積層與Input組合以提供張量的輸出3。 Maxpool2D =從卷積層對數據進行下采樣4。 批次歸一化 =這是一種用于訓練非常深的神經網絡的技術,該技術可以將每個微型批次的輸入標準化。 這具有穩定學習過程并顯著減少訓練深度網絡所需的訓練時期的數量的作用。5。 輟學 =輟學是一種在訓練過程中忽略隨機選擇的神經元的技術。 它們是隨機“脫落”的,這可以防止過擬合6。 密集 =完全連接的圖層7。 Flatten =將整個結構展平為一維數組,這些模型可以以類似結構的模型構建,也可以以順序方式構建。使用l2正則化進行微調,執行時使用的優??化器將是Adam比該模型上的其他優化器更好.Numpy用于類似數字數組的操作.pydot_ng和Graphviz用于繪制圖;我們還導入了os模塊以使其與Windows環境兼容。
num_classes defines the number of classes we have to predict which are namely Angry, Fear, Happy, Neutral, Surprise, and Neutral.From the exploratory Data Analysis we know that The Dimensions of the image are: Image Height = 48 pixels Image Width = 48 pixels Number of classes = 1 because the images are gray-scale images.We will consider a batch size of 32 for the training of the image augmentation.
num_classes定義了我們必須預測的類別數,即憤怒,恐懼,快樂,中立,驚奇和中立。根據探索性數據分析,我們知道圖像的尺寸為:圖像高度= 48像素圖像寬度= 48像素類別數= 1,因為圖像是灰度圖像。我們將考慮批量大小為32的圖像增強訓練。
Specify the train and the validation directory for the stored images.train_dir is the directory that will contain the set of images for training.validation_dir is the directory that will contain the set of validation images.
為存儲的圖像指定火車和驗證目錄.train_dir是將包含用于訓練的圖像集的目錄.validation_dir是將包含驗證圖像集的目錄。
數據擴充 (DATA AUGMENTATION:)
We will look at the data augmentation code now:
我們現在來看一下數據增強代碼:
The ImageDataGenerator is used for data augmentation of images. We will be replicating and making copies of the transformations of theoriginal images. The Keras Data Generator will use the copies andnot the original ones. This will be useful for training at each epoch. We will be rescaling the image and updating all the parameters to suit our model. The parameters are as follows:1. rescale = Rescaling by 1./255 to normalize each of the pixel values2. rotation_range = specifies the random range of rotation3. shear_range = Specifies the intensity of each angle in the counter-clockwise range.4. zoom_range = Specifies the zoom range. 5. width_shift_range = specify the width of the extension.6. height_shift_range = Specify the height of the extension.7. horizontal_flip = Flip the images horizontally.8. fill_mode = Fill according to the closest boundaries. train_datagen.flow_from_directory Takes the path to a directory & generates batches of augmented data. The callable properties are as follows:1. train dir = Specifies the directory where we have stored the image data.2. color_mode = Important feature which we need to specify how our images are categorized i.e. grayscale or RGB format. The default is RGB.3. target_size = The Dimensions of the image.4. batch_size = The number of batches of data for the flow operation.5. class_mode = Determines the type of label arrays that are returned.“categorical” will be 2D one-hot encoded labels.6. shuffle = shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric order.
ImageDataGenerator用于圖像的數據擴充。 我們將復制和復制原始圖像的轉換。 Keras數據生成器將使用副本而不是原始副本。 這對于每個時期的訓練都是有用的。 我們將重新縮放圖像并更新所有參數以適合我們的模型。 主要參數如下:1。 重新調整 =重標度由1./255歸一化每個像素values2的。 rotation_range =指定旋轉的隨機范圍3。 shear_range =指定逆時針范圍內每個角度的強度4。 zoom_range =指定縮放范圍。 5. width_shift_range =指定擴展名的寬度。6 。 height_shift_range =指定擴展的高度7。 horizo??ntal_flip = 水平翻轉圖像8。 fill_mode =根據最接近的邊界填充。 train_datagen.flow_from_directory取得目錄的路徑并生成批次的擴充數據。 可調用的屬性如下:1。 train dir =指定我們存儲圖像數據的目錄2。 color_mode =重要功能,我們需要指定圖像的分類方式,即灰度或RGB格式。 默認值為RGB.3。 target_size =圖片的尺寸4。 batch_size =流操作的數據批數5。 class_mode =確定返回的標簽數組的類型。“ categorical”將是二維一鍵編碼的標簽。6。 shuffle = shuffle:是否隨機播放數據(默認值:True)如果設置為False,則按字母數字順序對數據進行排序。
情緒模型-1: (EMOTIONS MODEL-1:)
Now we will proceed towards building the model.
現在,我們將繼續構建模型。
We will be using a sequential type of architecture for our model. Our Sequential model will have a total of 5 blocks i.e. three convolutional blocks, one fully connected layer, and one output layer.We will have 3 convolutional blocks with filters of increasing size like 32, 64, and 128 respectively. The kernel_size will be (3,3) and the kernel_initializer will be he_normal. We can also use a kernel_regularizer with l2 normalization. Our Preferred choice of activation is elu because it usually performs better on images. The Input shape will be the same as the size of each of our train and validation images.The Batch Normalization layer — Batch normalization is a technique for improving the speed, performance, and stability of artificial neural networks. Max pooling is used to downsample the data. The Dropout layer is used for the prevention of over-fitting.The fully connected block consists of a Dense layer of 64 filters and a batch normalization followed by a dropout layer. Before passing through the Dense layer the data is flattened to match the dimensions.Finally, the output layer consists of a Dense layer with a softmax activation to give probabilities according to the num_classes which represents the number of predictions to be made.
我們將為模型使用順序類型的體系結構。 我們的序列模型總共有5個塊,即3個卷積塊,1個完全連接層和1個輸出層;我們將有3個卷積塊,其濾波器的大小分別為32、64和128。 kernel_size將為(3,3),kernel_initializer將為he_normal。 我們還可以使用帶有l2歸一化的kernel_regularizer。 我們首選的激活方式是elu,因為它通常在圖像上表現更好。 輸入的形狀將與我們每個訓練和驗證圖像的大小相同。批歸一化層—批歸一化是一種用于提高人工神經網絡的速度,性能和穩定性的技術。 最大池用于減少數據采樣。 Dropout層用于防止過度擬合。完全連接的塊由64個過濾器的Dense層和批處理歸一化后接dropout層組成。 在通過Dense層之前,數據將被展平以匹配尺寸。最后,輸出層由具有softmax激活的Dense層組成,以根據num_classes給出表示要進行的預測的次數的概率。
模型圖: (Model Plot:)
This is how our overall model which we built looks like:
這就是我們構建的整體模型的樣子:
回調: (Callbacks:)
We will be importing the 3 required callbacks for training our model. The 3 important callbacks are ModelCheckpoint, ReduceLROnPlateau, and Tensorboard. Let us look at what task each of these individual callbacks performs.
我們將導入3個必需的回調以訓練模型。 3個重要的回調是ModelCheckpoint,ReduceLROnPlateau和Tensorboard。 讓我們看看這些單獨的回調分別執行什么任務。
ModelCheckpoint — This callback is used for storing the weights of our model after training. We save only the best weights of our model by specifying save_best_only=True. We will monitor our training by using the accuracy metric.
ModelCheckpoint —此回調用于訓練后存儲模型的權重。 通過指定save_best_only = True,我們僅保存模型的最佳權重。 我們將使用準確性指標來監控我們的培訓。
ReduceLROnPlateau — This callback is used for reducing the learning rate of the optimizer after a specified number of epochs. Here, we have specified the patience as 10. If the accuracy does not improve after 10 epochs, then our learning rate is reduced accordingly by a factor of 0.2. The metric used for monitoring here is accuracy as well.
ReduceLROnPlateau-此回調用于在指定的時期數之后降低優化器的學習率。 在這里,我們將耐心性指定為10。如果在10個周期后精度沒有提高,那么我們的學習率將相應降低0.2倍。 此處用于監視的指標也是準確性。
Tensorboard — The tensorboard callback is used for plotting the visualization of the graphs, namely the graph plots for accuracy and the loss.
Tensorboard — tensorboard回調用于繪制圖形的可視化效果,即準確性和損失的圖形圖。
編譯并擬合模型: (Compile and fit the model:)
We are compiling and fitting our model in the final step. Here, we are training the model and saving the best weights to emotions.h5 so that we don’t have to re-train the model repeatedly and we can use our saved model when required. We will be training on both the training and validation data. The loss we have used is categorical_crossentropy which computes the cross-entropy loss between the labels and predictions. The optimizer we will be using is Adam with a learning rate of 0.001 and we will compile our model on the metric accuracy. We will fit the data on the augmented training and validation images. After the fitting step, these are the results we are able to achieve on train and validation loss and accuracy.
我們將在最后一步中編譯和擬合模型。 在這里,我們正在訓練模型,并保存對情緒的最佳權重。因此,我們不必重復訓練模型,可以在需要時使用保存的模型。 我們將在培訓和驗證數據上進行培訓。 我們使用的損失是categorical_crossentropy,它計算標簽和預測之間的交叉熵損失。 我們將使用的優化器是Adam,學習率為0.001,我們將根據度量精度來編譯我們的模型。 我們將把數據擬合在增強的訓練和驗證圖像上。 擬合步驟完成之后,這些就是我們在訓練中以及驗證損失和準確性上能夠實現的結果。
圖形: (Graph:)
觀察: (Observation:)
The Model is able to perform quite well. We can notice that the train and validation losses are decreasing constantly and the train as well as validation accuracy increases constantly. There is no over-fitting in the deep learning model and we are able to achieve an accuracy of about about 51% and validation accuracy of about 53%.
該模型能夠執行得很好。 我們可以注意到,訓練和驗證損失不斷減少,訓練和驗證準確性也在不斷增加。 深度學習模型中沒有過度擬合的問題,我們能夠實現約51%的準確性和約53%的驗證準確性。
This is it for the first part guys! I hope all of you enjoyed reading this as much as I did writing this article. In the next part, we will cover the gestures train model and then also look into a second emotions train model which we can use to achieve a higher accuracy. In the end, we will create a final pipeline to access the models in real time and get a vocal response from the model about the particular emotion or gesture. I will also be posting the GitHub repository for the entire code, scripts and building blocks. Stay tuned for the next part and have a wonderful day!
這是第一部分的家伙! 我希望大家像我寫這篇文章一樣喜歡閱讀本文。 在下一部分中,我們將介紹手勢訓練模型,然后研究第二個情緒訓練模型,我們可以使用該模型來獲得更高的準確性。 最后,我們將創建最終的管道以實時訪問模型,并從模型中獲得關于特定情緒或手勢的聲音響應。 我還將發布有關整個代碼,腳本和構建塊的GitHub存儲庫。 敬請期待下一部分,祝您有美好的一天!
翻譯自: https://towardsdatascience.com/human-emotion-and-gesture-detector-using-deep-learning-part-1-d0023008d0eb
深度學習:在圖像上找到手勢
總結
以上是生活随笔為你收集整理的深度学习:在图像上找到手势_使用深度学习的人类情绪和手势检测器:第1部分的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 于和伟获评国家一级演员 本人幽默回应:恭
- 下一篇: 华为P60 Pro工程机影像参数曝光 主