當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第2部分

發布時間：2023/12/15 pytorch 36 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第2部分小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

深度學習：在圖像上找到手勢

情感手勢檢測 (Emotion Gesture Detection)

Hello everyone! Welcome back to the part-2 of human emotion and gesture detector using Deep Learning. In case you haven’t already, check out part-1 here. In this article, we will be covering the training of our gestures model and also look at a way to achieve higher accuracy on the emotions model. Finally, we will create a final pipeline using computer vision through which we can access our webcam and get a vocal response from the models we have trained. Without further ado let’s start coding and understanding the concepts.

大家好！歡迎回到使用深度學習的人類情感和手勢檢測器的第二部分。如果您還沒有的話，請在此處查看第1部分。在本文中，我們將介紹手勢模型的訓練，并探討一種在情感模型上實現更高準確性的方法。最后，我們將使用計算機視覺創建最終管道，通過該管道我們可以訪問網絡攝像頭并從我們訓練的模型中獲得聲音響應。事不宜遲，讓我們開始編碼和理解概念。

For training the gestures model, we will be using a transfer learning model. We will use VGG-16 architecture for training the model and exclude the top layer of the VGG-16. Then we will proceed to add our own custom layers to improve the accuracy and reduce the loss. We will try to achieve an overall high accuracy of about 95% on our gestures model as we have a fairly balanced dataset and using the techniques of image data augmentation and the VGG-16 transfer learning model this task can be achieved easily and also in fewer epochs comparatively to our emotions model. In a future article, we will cover how exactly the VGG-16 architecture works but for now let us proceed to analyze the data at hand and perform an exploratory data analysis on the gestures dataset similar to how we performed on the emotions dataset after the extraction of images.

為了訓練手勢模型，我們將使用轉移學習模型。我們將使用VGG-16架構來訓練模型，并排除VGG-16的頂層。然后，我們將繼續添加自己的自定義圖層，以提高準確性并減少損失。我們將嘗試在手勢模型上實現大約95％的總體高精度，因為我們擁有相當平衡的數據集，并且使用圖像數據增強技術和VGG-16轉移學習模型可以輕松且以更少的方式完成此任務相對于我們的情感模型而言。在以后的文章中，我們將介紹VGG-16架構的工作原理，但現在讓我們繼續分析手頭的數據，并對手勢數據集執行探索性數據分析，類似于提取后對情感數據集的操作方式圖片。

探索性數據分析(EDA)： (EXPLORATORY DATA ANALYSIS (EDA):)

In this next code block, we will look at the contents in the train folder and try to figure out the total number of classes, that we have for each of the categories for the gestures in the train folder.

在下一個代碼塊中，我們將查看火車文件夾中的內容，并嘗試找出火車文件夾中每個類別的手勢的類總數。

培養： (Train:)

We can look at the four sub-folders we have in the train1 folder. Let us visually look at the number of images in these directories.

我們可以看一下train1文件夾中的四個子文件夾。讓我們直觀地查看這些目錄中的圖像數量。

條狀圖： (Bar Graph:)

We can notice from the bar graph that each of the directories contains 2400 images each and this is a completely balanced dataset. Now, let us proceed to visualize the images in the train directory. We will look at the first image in each of the sub-directories and then look into the dimensions and number of channels of each of the images which are present in these folders.

從條形圖中我們可以注意到，每個目錄都包含2400張圖像，這是一個完全平衡的數據集。現在，讓我們繼續可視化火車目錄中的圖像。我們將查看每個子目錄中的第一張圖像，然后查看這些文件夾中存在的每個圖像的尺寸和通道數。

The dimension of the images are as follows:

圖像的尺寸如下：

The Height of the image = 200 pixelsThe Width of the image = 200 pixelsThe Number of channels = 3

圖像的高度= 200像素圖像的寬度= 200像素通道數= 3

Similarly, we can perform an analysis on the validation1 directory and check how our Validation dataset and the validation images look like.

同樣，我們可以對validation1目錄執行分析，并檢查Validation數據集和驗證圖像的外觀。

驗證： (Validation:)

條狀圖： (Bar Graph:)

We can notice from the bar graph that each of the directories contains 600 images each and this is a completely balanced dataset. Now, let us proceed to visualize the images in the validation directory. We will look at the first image in each of the sub-directories. The dimensions and number of channels of each of the images which are present in these folders are the same as the train directory.

從條形圖中我們可以注意到，每個目錄包含600張圖像，這是一個完全平衡的數據集。現在，讓我們繼續可視化驗證目錄中的圖像。我們將查看每個子目錄中的第一張圖像。這些文件夾中存在的每個圖像的尺寸和通道數與火車目錄相同。

With this our exploratory data analysis (EDA) for our gestures dataset is completed. We can proceed to build the gestures training model for appropriate gestures prediction.

這樣，我們完成了手勢數據集的探索性數據分析(EDA)。我們可以繼續構建用于適當的手勢預測的手勢訓練模型。

手勢訓練模型： (Gestures Train Model:)

Let us look at the code block below to understand the libraries we are importing as well as set the number of classes along with their dimensions and their respective directories.

讓我們看一下下面的代碼塊，以了解我們要導入的庫，并設置類的數量以及它們的尺寸和各自的目錄。

Import all the important required Deep Learning Libraries to train the gestures model.Keras is an Application Programming Interface (API) that can run on top of tensorflow.Tensorflow will be the main deep learning module we will use to build our deep learning model.From tensorflow, we will be referring to a pre-trained model called VGG-16. We will be using VGG-16 with custom convolutional neural networks (CNN’s) i.e. We will be using our transfer learning model VGG-16 alongside our own custom model to train an overall accurate model. The VGG-16 model in keras is pre-trained with the imagenet weights.

導入所有必需的重要深度學習庫以訓練手勢模型.Keras是可以在tensorflow之上運行的應用程序編程接口(API).Tensorflow將是我們用于構建深度學習模型的主要深度學習模塊。 tensorflow，我們將參考稱為VGG-16的預訓練模型。我們將使用帶有自定義卷積神經網絡(CNN)的VGG-16，即，我們將使用轉移學習模型VGG-16和我們自己的自定義模型來訓練總體準確的模型。 keras中的VGG-16模型已使用imagenet權重進行了預訓練。

The ImageDataGenerator is used for Data augmentation where the model can see more copies of the model. Data Augmentation is used for creating replications of the original images and using those transformations in each epoch.The layers for training which will be used are as follows:1. Input = The input layer in which we pass the input shape.2. Conv2D = The Convoluional layer combined with Input to provide an output of tensors.3. Maxpool2D = Downsampling the Data from the convolutional layer.4. Batch normalization = It is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.5. Dropout = Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly and this prevents over-fitting.6. Dense = Fully Connected layers.7. Flatten = Flatten the entire structure to a 1-D array. The Models can be built in a model like structure as shown in this particular model or can be built in a sequential manner. Here, we will be using a functional API model-like structure which is different from our emotions model which is a sequential model.We can use l2 regularization for fine-tuning.The optimizer used will be Adam as it performs better than the other optimizers on this model.We are also importing the os module to make it compatible with the Windows environment.

ImageDataGenerator用于數據增強，其中模型可以查看模型的更多副本。數據增強用于創建原始圖像的副本并在每個時期使用這些轉換。將使用的訓練層如下：1。輸入 =我們在其中傳遞輸入形狀的輸入層2。 Conv2D =卷積層與Input結合以提供張量輸出3。 Maxpool2D =從卷積層對數據進行下采樣4。 批次歸一化 =這是一種用于訓練非常深的神經網絡的技術，該技術可以將每個微型批次的輸入標準化。這具有穩定學習過程并顯著減少訓練深度網絡所需的訓練時期的數量的作用。5。輟學 =輟學是一種在訓練過程中忽略隨機選擇的神經元的技術。它們是隨機“脫落”的，這可以防止過擬合6。密集 =完全連接的圖層7。展平 =將整個結構展平為一維數組。可以按照此特定模型所示的類似結構的模型構建模型，也可以按順序構建模型。在這里，我們將使用類似于API的功能模型結構，而不是作為順序模型的情感模型。我們可以使用l2正則化進行微調。所使用的優化器將是Adam，因為它的性能優于其他優化器在此模型上。我們還導入了os模塊，以使其與Windows環境兼容。

We have 4 classes of gestures which are namely punch, Victory, Super and Loser.Each of the images has a height and width of 200 as well as it is a RGB image i.e. a 3-Dimensional image.We will be using a batch_size of 128 for the image Data Augmentation.

我們有4種手勢，分別是打Kong，勝利，超級和失敗者。每個圖像的高度和寬度均為200，并且是RGB圖像即3維圖像。我們將使用batch_size為128用于圖像數據增強。

We will also specify the train and the validation directory for the stored images.train_dir is the directory that will contain the set of images for training.validation_dir is the directory that will contain the set of validation images.

我們還將為存儲的圖像指定火車和驗證目錄.train_dir是將包含訓練圖像集的目錄.validation_dir是將包含驗證圖像集的目錄。

數據擴充 (DATA AUGMENTATION:)

We will look at the image data augmentation for the gestures dataset which is similar to the emotions data.

我們將查看手勢數據集的圖像數據增強，該數據與情感數據類似。

The ImageDataGenerator is used for data augmentation of images. We will be replicating and making copies of the transformations of the original images. The Keras Data Generator will use the copies andnot the original ones. This will be useful for training at each epoch. We will be rescaling the image and updating all the parameters to suit our model:1. rescale = Rescaling by 1./255 to normalize each of the pixel values2. rotation_range = specifies the random range of rotation3. shear_range = Specifies the intensity of each angle in the counter-clockwise range.4. zoom_range = Specifies the zoom range. 5. width_shift_range = specify the width of the extension.6. height_shift_range = Specify the height of the extension.7. horizontal_flip = Flip the images horizontally.8. fill_mode = Fill according to the closest boundaries. train_datagen.flow_from_directory Takes the path to a directory & generates batches of augmented data. The callable properties are as follows:1. train dir = Specifies the directory where we have stored the image data.2. color_mode = Important feature which we need to specify how our images are categorized i.e. grayscale or RGB format. The default is RGB.3. target_size = The Dimensions of the image.4. batch_size = The number of batches of data for the flow operation.5. class_mode = Determines the type of label arrays that are returned.“categorical” will be 2D one-hot encoded labels.6. shuffle = shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric order.

ImageDataGenerator用于圖像的數據擴充。我們將復制原始圖像并對其進行復制。 Keras數據生成器將使用副本而不是原始副本。這對于每個時期的訓練都是有用的。我們將重新縮放圖像并更新所有參數以適合我們的模型：1。 重新調整 =重標度由1./255歸一化每個像素values2的。 rotation_range =指定旋轉的隨機范圍3。 shear_range =指定逆時針范圍內每個角度的強度4。 zoom_range =指定縮放范圍。 5. width_shift_range =指定擴展名的寬度。6 。 height_shift_range =指定擴展的高度7。 horizo??ntal_flip = 水平翻轉圖像8。 fill_mode =根據最接近的邊界填充。 train_datagen.flow_from_directory取得目錄的路徑并生成批次的擴充數據。可調用的屬性如下：1。 train dir =指定我們存儲圖像數據的目錄2。 color_mode =重要功能，我們需要指定圖像的分類方式，即灰度或RGB格式。默認值為RGB.3。 target_size =圖片的尺寸4。 batch_size =流操作的數據批數5。 class_mode =確定返回的標簽數組的類型。“ categorical”將是二維一鍵編碼的標簽。6。 shuffle = shuffle：是否隨機播放數據(默認值：True)如果設置為False，則按字母數字順序對數據進行排序。

In the next code block, we are importing the VGG-16 Model in the variable VGG16_MODEL and making sure we input the model without the top layer.Using the VGG16 architecture without the top layer, we can now add our custom layers. To Avoid training VGG16 Layers we give the command below:layers.trainable = False. We will also print out these layers and make sure their training is set as False.

在下一個代碼塊中，我們將VGG-16模型導入變量VGG16_MODEL中，并確保輸入的模型不包含頂層。使用不包含頂層的VGG16體系結構，我們現在可以添加自定義層。為避免訓練VGG16圖層，我們提供以下命令：layers.trainable = False。我們還將打印出這些圖層，并確保將它們的訓練設置為False。

手指手勢模型： (FINGERS GESTURE MODEL:)

Below is the complete code for the custom layers of the fingers gesture model we are building —

以下是我們正在構建的手指手勢模型的自定義層的完整代碼-

The Finger Gesture Model we are building will be trained by usingtransfer learning. We will be using the VGG-16 model with no top layer.We will be adding custom layers to the top layer of the VGG-16 model and then we will use this transfer learning model for prediction ofthe finger gestures.The Custom layer consists of the input layer which is, basically theoutput of the VGG-16 Model. We add a convolutional layer with 32 filters,kernel_size of (3,3), and default strides of (1,1) and we use activationas relu with he_normal as the initializer.We will be using the pooling layer to downsampled the layers from theconvolutional layer.The 2 fully connected layers are used with activation as relu i.e. a Dense architecture after the sample is being passed through a flattenlayer.The output layer has a softmax activation with num_classes is 4 thatpredicts the probabilities for the num_classes namely Punch, Super, Victory and Loser.The final Model takes the input as the start of the VGG-16 modeland outputs as the final output layer.

我們正在構建的手指手勢模型將通過使用轉移學習進行訓練。我們將使用沒有頂層的VGG-16模型，將在VGG-16模型的頂層添加自定義層，然后使用此轉移學習模型來預測手指手勢。輸入層，基本上是VGG-16模型的輸出。我們添加了一個包含32個濾鏡的卷積層，kernel_size為(3,3)，默認跨度為(1,1)，我們使用Activationas relu和he_normal作為初始值設定項。我們將使用池化層對卷積層進行降采樣2個完全連接的層與激活一起使用，即樣本通過平坦層后作為密集結構。輸出層的softmax激活值為num_classes為4，可預測num_classs的概率，即打Kong，超級，勝利最終模型將輸入作為VGG-16模型的開始，并將輸出作為最終輸出層。

The callbacks are similar to the previous emotions model, so let us directly move on the compilation and training of the gestures model.

回調類似于以前的情緒模型，因此讓我們直接進行手勢模型的編譯和訓練。

編譯并擬合模型： (Compile and fit the model:)

We are compiling and fitting our model in the final step. Here, we are training the model and saving the best weights to gesturenew.h5 so that we don’t have to re-train the model repeatedly and we can use our saved model when required. We are training on both the training and validation data. The loss we have used is categorical_crossentropy which computes the cross-entropy loss between the labels and predictions. The optimizer we will be using is Adam with a learning rate of 0.001 and we will compile our model on the metric accuracy. We will fit the data on the augmented training and validation images. After the fitting step, these are the results we are able to achieve on train and validation loss and accuracy.

我們將在最后一步中編譯和擬合模型。在這里，我們正在訓練模型并將最佳權重保存到gesturenew.h5，這樣我們就不必重復訓練模型，并且可以在需要時使用保存的模型。我們正在訓練數據和驗證數據。我們使用的損失是categorical_crossentropy，它計算標簽和預測之間的交叉熵損失。我們將使用的優化器是Adam，學習率為0.001，我們將根據度量精度來編譯我們的模型。我們將把數據擬合在增強的訓練和驗證圖像上。擬合步驟完成之后，這些就是我們在訓練中以及驗證損失和準確性上能夠實現的結果。

圖形： (Graph:)

觀察： (Observation:)

The Model is able to perform extremely well. We can notice that the train and validation losses are decreasing constantly and the train as well as validation accuracy increases constantly. There is no over-fitting in the deep learning model and we are able to achieve a validation accuracy of over 95%.

該模型的性能非常好。我們可以注意到，訓練和驗證損失不斷減少，訓練和驗證準確性也在不斷增加。深度學習模型中沒有過度擬合的問題，我們能夠實現超過95％的驗證準確性。

獎金： (BONUS:)

情緒模型2： (EMOTIONS MODEL-2:)

This is an additional model that we will be looking at. With this method, we can achieve higher accuracy with the exact same model. After some research and experimentation, I was able to find out that we could achieve higher accuracy by using the pixels in numpy arrays and then training them. There is a wonderful article where the author has used a similar approach. I would highly recommend users to check out that article as well. Here, we will use this approach with the custom sequential model and see what accuracy we are able to achieve. Import the libraries similar to the previous emotions model. Refer to the GitHub repository at the end of the post for additional information. Below is the code block for the complete preparation of data for the model.

這是我們將要研究的附加模型。使用這種方法，我們可以使用完全相同的模型獲得更高的精度。經過一些研究和實驗，我發現通過使用numpy數組中的像素然后對其進行訓練，可以達到更高的精度。有一篇很棒的文章，作者使用了類似的方法。我強烈建議用戶也查看該文章。在這里，我們將這種方法與自定義順序模型一起使用，并查看我們能夠達到的精度。導入與以前的情緒模型相似的庫。有關更多信息，請參閱文章末尾的GitHub存儲庫。以下是用于完整準備模型數據的代碼塊。

num_classes = Defines the number of classes we have to predict which are namely Angry, Fear, Happy, Neutral, Surprise, Neutral, and Disgust.From the exploratory Data Analysis we know that The Dimensions of the image are: Image Height = 48 pixels Image Width = 48 pixels Number of channels = 1 because it is a grayscale image.We will consider a batch size of 64 for the model.

num_classes =定義我們必須預測的類別數，即憤怒，恐懼，快樂，中立，驚奇，中立和厭惡。通過探索性數據分析，我們知道圖像的尺寸為：圖像高度= 48像素圖像寬度= 48像素通道數= 1，因為它是灰度圖像。我們將考慮模型的批處理大小為64。

We will convert the pixels to a list in this method. We split the data by spaces and then take them as arrays and reshape them into 48, 48 shape. We can proceed to expand the dimensions and then convert the labels to the categorical matrix.

我們將通過這種方法將像素轉換為列表。我們將數據按空格分割，然后將它們作為數組，然后將其重塑為48、48形狀。我們可以繼續擴展維度，然后將標簽轉換為分類矩陣。

Finally, we split the data into train, test, and validation. This approach is slightly different from our previous model’s approach where we only made use of train and validation as we divided the data in an 80:20 ratio. Here, we divide the data in an 80:10:10 format. We will be using the same sequential model as I did in the previous part. Let us have a look at the model once again and see how it performs after training.

最后，我們將數據分為訓練，測試和驗證。這種方法與我們先前模型的方法略有不同，在以前的模型中，我們僅以訓練和驗證的方式將數據按80:20的比例劃分。在這里，我們將數據以80:10:10格式劃分。我們將使用與上一部分相同的順序模型。讓我們再次看一下模型，看看它在訓練后的表現。

The final accuracy, validation accuracy, loss, and validation loss we were able to achieve on all 7 emotions were as follows:

我們對所有7種情緒都能實現的最終準確性，驗證準確性，損失和驗證損失如下：

圖形： (Graph:)

觀察： (Observation:)

The Model is able to perform quite well. We can notice that the train and validation losses are decreasing constantly and the train, as well as validation accuracy, increases constantly. There is no over-fitting in the deep learning model and we are able to achieve a validation accuracy of over 65% and an accuracy of almost 70% and reduce the overall losses as well.

該模型能夠執行得很好。我們可以注意到，訓練和驗證損失在不斷減少，訓練以及驗證準確性也在不斷增加。深度學習模型中沒有過度擬合的問題，我們能夠實現超過65％的驗證準確度和幾乎70％的準確度，并且還能減少總體損失。

錄音： (Recordings:)

In this section, we will be creating the recordings required for the vocal response from the models. We can create custom recordings for each of the models and for each emotion or gesture. In the below code block, I will be showing an example for the recordings for one emotion and one gesture respectively.

在本節中，我們將創建模型的聲音響應所需的錄音。我們可以為每個模型以及每個情感或手勢創建自定義記錄。在下面的代碼塊中，我將顯示一個分別記錄一種情感和一種手勢的示例。

Understanding the imported libraries:

了解導入的庫：

gTTS = Google Text-to-Speech is a python library that we can use to convert text to a vocal translation response.

gTTS = Google Text-to-Speech是一個python庫，我們可以使用該庫將文本轉換為語音翻譯響應。

playsound = This module is useful for playing sound directly from a specified path with a .mp3 format.

playound =此模塊對于直接從指定路徑播放.mp3格式的聲音很有用。

shutil = This module offers several high-level operations on files and collections of files. In particular, functions are provided which support file copying, moving, and removal.

shutil =該模塊對文件和文件集合提供了一些高級操作。特別是，提供了支持文件復制，移動和刪除的功能。

In this python file, we will be creating all the required voice recordings for both the emotions as well as all the gestures and we will be storing them in the reactions directory. I have shown an example of how to create a custom voice recording in the code block for each emotion or gesture. The entire code for the recordings will be posted in the GitHub repository at the end of this post.

在這個python文件中，我們將為情感和所有手勢創建所有必需的語音記錄，并將它們存儲在React目錄中。我已經展示了一個示例，說明如何在代碼塊中為每種情感或手勢創建自定義語音記錄。記錄的整個代碼將在此文章的結尾發布在GitHub存儲庫中。

最終管道： (Final Pipeline:)

Our final pipeline will consist of loading both our saved models and then using them accordingly to predict emotions and gestures. I will be including 2 python files in the GitHub repository. The final_run.py takes the choice from the user and runs either an emotion or gestures model. The final_run1.py runs both the emotions and gestures model simultaneously. Feel free to use whichever is more convenient for you guys. I will be using the saved models from the first emotions trained model and the trained gestures model. We will be using an additional XML file called haarcascade_frontalface_default.xml for the detection of faces. Let us try to understand the code for the final pipeline from the code block below.

我們的最終流程包括加載兩個保存的模型，然后相應地使用它們來預測情緒和手勢。我將在GitHub存儲庫中包含2個python文件。 final_run.py從用戶那里進行選擇，并運行情感或手勢模型。 final_run1.py同時運行情緒模型和手勢模型。隨意使用對您來說更方便的一種。我將使用第一個情感訓練模型和手勢訓練模型中保存的模型。我們將使用一個名為haarcascade_frontalface_default.xml的附加XML文件來檢測面部。讓我們嘗試從下面的代碼塊中了解最終管道的代碼。

In this particular code block, we are importing all the required libraries which we will be using to obtain a vocal response for the predicted label by the model. The cv2 is the computer vision (open-cv) module which we will be using to access and use our webcam in real-time. We are importing the time module to make sure we get a prediction only after 10 seconds of analysis. We load the saved pre-trained weights of both the emotions and gestures model. We then specify the classifier that we will be used for the detection of faces. We then assign all the emotions and gestures labels which can be predicted by our model.

在這個特定的代碼塊中，我們將導入所有必需的庫，我們將使用它們來獲得模型對預測標簽的聲音響應。 cv2是計算機視覺(open-cv)模塊，我們將使用它來實時訪問和使用我們的網絡攝像頭。我們正在導入時間模塊，以確保僅在分析10秒后才能獲得預測。我們加載已保存的情緒和手勢模型的預訓練權重。然后，我們指定用于面部檢測的分類器。然后，我們分配所有可以由我們的模型預測的情感和手勢標簽。

In the next code block, we will look at a code snippet for the emotions model. For the entire code, refer to the GitHub repository at the end of the article.

在下一個代碼塊中，我們將查看情緒模型的代碼片段。有關整個代碼，請參閱本文末尾的GitHub存儲庫。

In this choice, we will be running the emotions model. While webcam is detected we will read the frames and then we will proceed to draw a rectangle box (similar to a bounding box) when the haar cascade classifier detects a face. We will convert the facial image into a grayscale of dimensions 48, 48 similar to the trained images for better predictions. The Prediction is only made when the np.sum detects at least one face. The keras commands img_to_array converts the image to array dimensions and in case more images are detected we expand the dimensions. The Predictions are made according to the labels and the recordings will be played accordingly.

在這種選擇下，我們將運行情緒模型。當檢測到網絡攝像頭時，我們將讀取幀，然后在haar級聯分類器檢測到人臉時繼續繪制一個矩形框(類似于邊界框)。我們將把面部圖像轉換為與訓練圖像相似的尺寸為48、48的灰度，以獲得更好的預測。僅當np.sum檢測到至少一張臉時才進行預測。 keras命令img_to_array將圖像轉換為數組尺寸，如果檢測到更多圖像，我們將擴展尺寸。根據標簽進行預測，并將相應地播放錄音。

Let us look at the code snippet for running the gestures model.

讓我們看一下運行手勢模型的代碼片段。

In this choice, we will be running the gestures model. While the webcam is detected we will read the frames and then we will draw a rectangle box in the middle of the screen, unlike the emotions model. The User will have to place the fingers in the required box to make the following work. The Prediction is only made when the np.sum detects at least one finger model. The keras commands img_to_array converts the image to array dimensions and in case more images are detected we expand the dimensions. The Predictions are made according to the labels and the recordings will be played accordingly. With this, our final pipeline is completed and we have analyzed all the code required for building the human emotion and gesture detector models. We can now proceed to release the video capture and destroy all windows, which means we can quit running the frame which is being run by the computer vision module.

在此選擇中，我們將運行手勢模型。當檢測到網絡攝像頭時，我們將讀取幀，然后在屏幕中間繪制一個矩形框，這與情感模型不同。用戶必須將手指放在所需的框中才能進行以下工作。僅當np.sum檢測到至少一個手指模型時才進行預測。 keras命令img_to_array將圖像轉換為數組尺寸，如果檢測到更多圖像，我們將擴展尺寸。根據標簽進行預測，并將相應地播放錄音。至此，我們的最終流程完成了，我們已經分析了構建人類情感和手勢檢測器模型所需的所有代碼。現在，我們可以繼續釋放視頻捕獲并銷毀所有窗口，這意味著我們可以停止運行由計算機視覺模塊運行的框架。

結論： (Conclusion:)

We have finally completed going through the entire human emotion and gesture detector. The GitHub repository for the entire code can be found here. I would highly recommend experimenting with the various parameters as well as the layers in all the 3 models we have built and try to achieve better results. The various recordings can also be modified as desired by the user. It is also possible to try out various transfer learning models or build your custom architectures and achieve an overall better performance. Have fun experimenting and trying out different and unique things with the models!

我們終于完成了整個人類情感和手勢檢測器的測試。完整代碼的GitHub存儲庫可在此處找到。我強烈建議您嘗試各種參數以及我們構建的所有3個模型中的圖層，并嘗試獲得更好的結果。各種記錄也可以根據用戶的需要進行修改。還可以嘗試各種遷移學習模型或構建您的自定義體系結構，以實現總體上更好的性能。嘗試模型，嘗試不同的獨特事物，玩得開心！

最后的想法： (Final Thoughts:)

I had great fun in writing this 2-part series and it was an absolute blast. I hope all of you enjoyed reading this as much as I did writing this. I look forward to posting more articles in the future as I find it extremely enjoyable. So, any ideas for future articles or any topic you guys want me to cover would be highly appreciated. Thank you everyone for sticking on till the end and I wish you all a wonderful day!

我寫了這個由兩部分組成的系列，這非常有趣。我希望大家像我寫這篇文章一樣喜歡閱讀本文。我希望以后會發布更多文章，因為我覺得它非常有趣。因此，對于未來文章的任何想法或您希望我介紹的任何主題將受到高度贊賞。謝謝大家堅持到底，祝大家有美好的一天！

翻譯自: https://towardsdatascience.com/human-emotion-and-gesture-detector-using-deep-learning-part-2-471724f7a023

深度學習：在圖像上找到手勢

總結

以上是生活随笔為你收集整理的深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第2部分的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。