深度学习用于视频检测_视频如何用于检测您的个性?
深度學習用于視頻檢測
視頻是新的第一印象! (Videos are the New First Impressions!)
Think about the approximate number of video calls you have been a part of since March, 2020. Now, compare it to the number of video calls you were a part of before that. I am sure the difference is huge for most of us. Meetings with family, friends, and colleagues have shifted to video calls.
考慮一下自2020年3月以來您參加過的視頻通話的大概數量。現在,將其與之前參加過的視頻通話數量進行比較。 我相信對我們大多數人來說,差異是巨大的。 與家人,朋友和同事的會議已轉向視頻通話。
Video calling has also made it possible for us to keep expanding our networks and meet new people while maintaining social distancing. Hence, it is not wrong to say that, we are making quite a few personal as well as professional first impressions over videos. The personality perception through first impressions can be quite subjective and even lead to first impression bias. Of course, there are self-reported personality assessment tests but they can often suffer from a social desirability bias. And this gives us an opportunity to leverage AI to find a more objective approach to apparent personality analysis.
視頻通話還使我們有可能繼續擴大我們的網絡并結識新朋友,同時保持社交距離。 因此,可以肯定地說,我們在視頻方面給個人和專業都留下了很多印象。 通過第一印象產生的人格感知可能相當主觀,甚至導致第一印象的偏差 。 當然,有自我報告的性格評估測試,但它們經常會遭受社會期望偏差的困擾。 這為我們提供了一個機會,可以利用AI尋找更為客觀的方法來進行明顯的人格分析。
Keeping this in mind, the aim of this blog post is to show one such deep learning approach which uses videos to predict the scores for the Big-5 personality traits.
請牢記這一點,此博客文章的目的是展示一種這樣的深度學習方法,該方法使用視頻來預測Big-5人格特質的得分。
Created by Author由作者創建五大個性特質是什么? (What are the Big-5 Personality Traits?)
Most contemporary psychologists believe that there are 5 core dimensions to personality: Extraversion, Agreeableness, Openness, Conscientiousness, and Neuroticism; often referred by the acronym OCEAN. Unlike many of its previous counterparts which believe in the binary aspect of personality traits, the Big-5 Personality trait theory asserts that each personality trait is a spectrum.
大多數當代心理學家認為,人格有五個核心方面: 外向性 ,和A 可親性 , 開放性 , 盡責性和神經質。 通常以OCEAN的縮寫來表示。 大五型人格特質理論斷言每個人格特質都是一個 光譜, 與之前的許多人認為人格特質的二元方面不同。
Let’s look at how each trait is characterized followed by a map of how some popular fictional characters would score on the Big-5 personality traits…
讓我們看一下每個特征的特征,然后是一些流行的虛構人物如何在Big-5人格特征上得分的地圖…
Created by Author using Character Images from their Wiki Profiles由作者使用其維基個人資料中的人物圖片創建An interesting aspect of the Big-5 personality trait theory is that, these traits are independent but not mutually exclusive. For example, we can see in the above image that Sheldon Cooper (The Big Bang Theory) would score low on Extraversion but would also score high on Neuroticism, Phoebe Buffay (Friends) would score low on conscientiousness but score high on openness and so on…
大五人格特質理論的一個有趣方面是,這些特質是獨立的,但并不相互排斥 。 例如,我們可以從上面的圖像中看到,謝爾頓·庫珀(The Big Bang Theory)在外向性方面得分較低,但在神經質主義方面也得分較高,菲比·布法(Friends)在盡職調查方面得分較低,但是在開放性方面得分較高,依此類推…
關于數據集 (About the Data Set)
The First Impressions Challenge provides a data set of 10k clips from 3k YouTube Videos. The aim of this challenge was to understand how a deep learning approach can be used to infer apparent personality traits from videos of subjects speaking in front of the camera.
第一印象挑戰賽提供了來自3k YouTube視頻的1萬個片段的數據集。 這項挑戰的目的是了解如何使用深度學習方法從在鏡頭前講話的對象的視頻中推斷出明顯的人格特質。
The training set comprised of 6k videos. The validation and test sets had 2k videos each. The average duration of the videos was 15 seconds. The ground truth labels for each video, consisted of five scores representing performance on each of the Big-5 personality traits. These scores were between 0 and 1. The labeling was done by Amazon Mechanical Turk Workers. More information about the challenge and the data set can be found in this paper.
培訓集包括6k視頻。 驗證和測試集各有2k個視頻。 視頻的平均時長為15秒。 每個視頻的地面真相標簽由五個得分組成,分別代表在Big-5個性特質上的表現。 這些分數在0到1之間。標簽由Amazon Mechanical Turk Workers完成。 有關挑戰和數據集的更多信息可以在本文中找到。
Video data is unstructured but rich with multimedia features. The approach explained in this blog post uses audio and visual features from the videos. The analysis and modeling was done on Google Colab. The code can be accessed on Github.
視頻數據是非結構化的,但具有多媒體功能。 本博客文章中介紹的方法使用了視頻中的音頻和視頻功能。 分析和建模是在Google Colab上完成的。 可以在Github上訪問該代碼。
地面真相標簽的分布 (Distribution of the Ground Truth Labels)
Created by Author由作者創建The graph on the left shows the distributions of personality scores in the training data set. It’s interesting to note that the distributions of the scores are quite similar and even symmetric along the mean. The reason for this symmetry could be that the scores aren’t self reported. Self-reported personality assessment scores are usually skewed due to social desirability bias.
左圖顯示了訓練數據集中人格分數的分布。 有趣的是,分數的分布沿均值非常相似甚至對稱。 這種對稱性的原因可能是分數不是自我報告的。 自我報告的人格評估得分通常由于社會期望偏差而偏向 。
提取視覺特征 (Extracting Visual Features)
Videos consists of image frames. These frames were extracted from videos using OpenCV. In apparent personality analysis, visual features include facial cues, movement of hands, posture of the person, etc. Since the data set consisted of videos with an average duration of 15 seconds, from each video 15 random frames were extracted. Each extracted frame was then resized to 150 X 150 and scaled by a factor of 1/255.
視頻由圖像幀組成。 這些幀是使用OpenCV從視頻中提取的。 在明顯的人格分析中,視覺特征包括面部提示,手的移動,人的姿勢等。由于數據集由平均時長為15秒的視頻組成,因此從每個視頻中提取了15個隨機幀。 然后將每個提取的幀的大小調整為150 X 150,并按1/255縮放。
Created by Author using a Video from the First Impressions Challenge由作者使用“第一印象”挑戰賽的視頻創建提取音頻特征 (Extracting Audio Features)
The waveform audio was extracted from each video using ffmpeg subprocess. An open source toolkit, pyAudioAnalysis was used to extract audio features from 15 non overlapping frames (keeping frame step equal to the frame length in the audioAnalysis subprocess). These included 34 features along with their delta features. The output was 1 X 68 dimensional vector for each frame or a 15 X 68 dimensional tensor for 15 audio frames.
使用ffmpeg子過程從每個視頻中提取波形音頻。 使用開源工具包pyAudioAnalysis從15個非重疊幀中提取音頻特征(保持幀步長等于audioAnalysis子過程中的幀長)。 其中包括34個功能及其增量功能。 輸出是每幀1 X 68維矢量或15個音頻幀15 X 68維張量。
The types of features extracted through pyAudioAnalysis include Zero crossing rate, Chroma Vector,Chroma Deviation, MFCCs, Energy, Entropy of Energy, Spectral Centroid, Spectral spread, Spectral entropy, Spectral Flux and Spectral Rolloff.
通過pyAudioAnalysis提取的特征類型包括零交叉率,色度矢量,色度偏差,MFCC,能量,能量熵,光譜質心,光譜擴展,光譜熵,光譜通量和光譜衰減。
深雙峰回歸模型 (Deep Bimodal Regression Model)
The Functional API of Keras with Tensorflow as backend was used for defining the model. The model was defined in two phases. In the first phase the image and audio features were extracted and then, the sequential features of the videos were processed. To process audio and visual features a bimodal time distributed approach was taken in the first phase.
以Tensorflow為后端的Keras功能API用于定義模型。 該模型分為兩個階段。 在第一階段,提取圖像和音頻特征,然后處理視頻的順序特征。 為了處理音頻和視頻功能,在第一階段采用了雙峰時間分布方法。
Keras has a time distributed layer which can be used to apply the same layer individually to multiple inputs, resulting in a “many to many” mapping. Simply put, the time distributed wrapper enables any layer to extract features from each frame or time step separately. The result: an additional temporal dimension in the input and the output, representing the index of the time step.
Keras具有時間分布層,可用于將同一層分別應用于多個輸入,從而形成“多對多”映射。 簡而言之,時間分布包裝器使任何層都可以分別從每個幀或時間步提取特征。 結果:輸入和輸出中的另一個時間維度,表示時間步長的索引。
The audio features extracted via pyAudioAnalysis were passed through a dense layer with 32 units in a time distributed wrapper. Hence, the same dense layer was applied to 1 X 68 dimensional vectors of each audio frame. Similarly, each image frame was passed in parallel through a series of convolutional blocks.
通過pyAudioAnalysis提取的音頻特征通過一個具有時間分布的包裝器中的32個單元的密集層傳遞。 因此,將相同的密集層應用于每個音頻幀的1 X 68維矢量。 類似地,每個圖像幀并行通過一系列卷積塊。
Created by Author由作者創建After this step the audio and visual models were concatenated. To process the chronological or temporal aspect of videos, the concatenated outputs were further passed to a stacked LSTM model with a dropout and recurrent dropout rate of 0.2. The output of the stacked LSTM was passed to a dense layer with ReLU activation and dropout rate of 0.5. The final dense layer had 5 output units (one for each personality trait), along with sigmoid activation to get predicted scores between 0 and 1.
在此步驟之后,將音頻和視覺模型連接在一起。 為了處理視頻的時間或時間方面,將合并后的輸出進一步傳遞給堆棧的LSTM模型,該模型的輟學率和重復輟學率為0.2。 堆疊的LSTM的輸出傳遞到具有ReLU激活和輟學率為0.5的致密層。 最終的密集層有5個輸出單位(每個人格特質一個),以及乙狀結腸激活以得到0到1之間的預測分數。
發電機功能 (Generator Function)
The biggest challenge was managing the limited memory resources. This was accomplished using mini batch gradient descent. To implement it a custom generator function was defined as follows:
最大的挑戰是管理有限的內存資源。 這是使用小批量梯度下降完成的 。 為了實現它,自定義生成器函數定義如下:
Note: The generator function yields the input for the audio and visual models in one list. Corresponding to this the model is defined by passing a list of two inputs to the Model class of keras:
注意:生成器功能在一個列表中產生音頻和視覺模型的輸入。 與此對應,通過將兩個輸入的列表傳遞給keras的Model類來定義模型:
model = Model([input_img,input_aud],output)結果 (Results)
The model was compiled using the Adam optimizer with a learning rate of 0.00001. The model was trained for 20 epochs with a mini batch size of 8. Mean squared error was taken as the loss function. A custom metric called Mean accuracy was defined to see the performance of the model. It was calculated as follows:
使用Adam優化器以0.00001的學習率編譯模型。 對模型進行了20個時期的訓練,最小批次大小為8。將均方誤差作為損失函數。 定義了一個稱為平均準確性的自定義指標,以查看模型的性能。 計算公式如下:
Here N is the number of input videos.這里N是輸入視頻的數量。Overall the model performed quite well with a final test mean accuracy of 0.9047.
總體而言,該模型表現良好,最終測試的平均準確度為0.9047。
Created by Author由作者創建The table below shows the test mean accuracy for each of the Big-5 personality traits. The model shows similar performance for all 5 personality traits.
下表顯示了各大5型人格特質的測試平均準確度。 該模型顯示出所有5個人格特質的相似表現。
Created by Author由作者創建前方的路… (The Road ahead…)
The results of the model can be further improved by increasing the frame sizes and lengths depending upon the availability of processing power. NLP analysis of video transcriptions can also be used to get additional features.
根據處理能力的可用性,可以通過增加幀大小和長度來進一步改善模型的結果。 視頻轉錄的NLP分析也可用于獲取其他功能。
While automated apparent personality analysis has important use cases, it should be made sure that, algorithmic bias does not affect results. The aim of such AI applications is to provide a more objective approach. However, such objectivity can only be achieved if bias is excluded at each stage i.e. from data collection to results interpretation.
盡管自動外觀人格分析具有重要的用例,但應確保算法偏差不會影響結果。 此類AI應用程序的目的是提供一種更客觀的方法。 但是,只有在每個階段(即從數據收集到結果解釋)都排除偏差,才能實現這種客觀性。
翻譯自: https://towardsdatascience.com/can-your-video-be-used-to-detect-your-personality-d8423f6d3cb3
深度學習用于視頻檢測
總結
以上是生活随笔為你收集整理的深度学习用于视频检测_视频如何用于检测您的个性?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 德邦物流:预计 2022 年归母净利同比
- 下一篇: “雅客云”完成超千万元Pre-A轮融资