对比学习系列论文CPCforHAR(一):Contrastive Predictive Coding for Human Activity Recognition
0.Abusurt
0.1逐句翻譯
Feature extraction is crucial for human activity recognition (HAR) using body-worn movement sensors.
特征提取是利用佩戴式運動傳感器進行人體活動識別的關鍵。
Recently, learned representations have been used successfully, offering promising alternatives to manually engineered features.
最近,已經(jīng)成功地使用了學習表示,為手工設計的特性提供了有希望的替代方案。(大約就是使用深度學習的方法進行特征提取)
Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and ubiquitous computing scenarios.
我們的工作重點是有效利用少量標記數(shù)據(jù),以及利用在移動和無處不在的計算場景中可以直接收集的未標記數(shù)據(jù)。
We hypothesize and demonstrate that explicitly considering the temporality of sensor data at representation level plays an important role for effective HAR in challenging scenarios.
我們假設并證明,明確考慮傳感器數(shù)據(jù)在表示水平上的時效性,對于具有挑戰(zhàn)性的場景中有效的HAR具有重要作用。
We introduce the Contrastive Predictive Coding (CPC) framework to human activity recognition, which captures the long-term temporal structure of sensor data streams.
我們將對比預測編碼(CPC)框架引入到人類活動識別中,以捕獲傳感器數(shù)據(jù)流的長期時間結構。
Through a range of experimental evaluations on real-life recognition
tasks, we demonstrate its effectiveness for improved HAR.
通過對現(xiàn)實生活中識別能力的一系列實驗評估我們證明了它對改進HAR的有效性。(大約就是通過實驗說明了CPC是可以在HAR當中使用的)
CPC-based pre-training is self-supervised, and the resulting learned representations can be integrated into standard activity chains.
基于cpc的前訓練是自我監(jiān)督的,由此產生的學習表征可以集成到標準的活動鏈中。
It leads to significantly improved recognition performance when only small amounts of labeled training data are available, thereby demonstrating the practical value of our approach.
當只有少量的標記訓練數(shù)據(jù)時,它可以顯著提高識別性能,從而證明了我們的方法的實用價值。(就是本文不是簡單的使用無監(jiān)督,還在其中加入了一部分有標簽的數(shù)據(jù),準確的講,不就是個半監(jiān)督嗎)
0.2總結
文章的最大貢獻就是在HAR當中引入了對比學習這種表示學習的思想,在這種思想的加持下可以進行大量無標簽數(shù)據(jù)的訓練。
這里并不是一個嚴格的無監(jiān)督,他是一個半監(jiān)督的狀態(tài)
1.INTRODUCTION
1.1逐句翻譯
第一段(慣性傳感器的分布廣泛以及應用廣泛)
Body-worn movement sensors, such as accelerometers or full-fledged inertial measurement units (IMU), have been extensively utilized for a wide range of applications in mobile and ubiquitous computing, including but not limited to novel interaction paradigms [67, 82, 84], gesture recognition [83], eating detection [2, 7, 73, 87], and health and well-being assessments in general [24, 54, 76].
穿戴式運動傳感器,如加速度計或成熟的慣性測量單元(IMU),已被廣泛應用于移動和普適計算的廣泛應用,包括但不限于新型交互范式[67,82,84]、手勢識別[83]、進食檢測[2,7,73,87]、以及一般的健康和幸福評估[24,54,76]。
(大約就是說目前IMU應用的還挺多的,內涵一句應用場景廣泛)
They are widely utilized on commodity smartphones, and smartwatches such as Fitbit and the Apple Watch.
它們被廣泛應用于普通智能手機,以及Fitbit和Apple Watch等智能手表。
(就是說應用設備廣泛)
The ubiquitous nature of these devices makes them highly suitable for real-time capturing and analysis of activities as they are being performed.
這些設備無處不在的特性使它們非常適合實時捕捉和分析正在執(zhí)行的活動。
第二段()
The workflow for human activity recognition (HAR), i.e., the all encompassing paradigm for aforementioned applications, essentially involves the recording of movement data after which signal processing and machine learning techniques are applied to automatically recognize the activities.
人類活動識別(human activity recognition, HAR)的工作流程,即上述應用的包涵一切的范例,本質上涉及到記錄運動數(shù)據(jù),然后應用信號處理和機器學習技術來自動識別活動。
(介紹HAR)
This type of workflow is typically supervised in nature, i.e., it requires the labeling of what activities have been performed and when after the data collection is complete [8].
這種類型的工作流在本質上通常是受監(jiān)督的,例如,它需要在行為完成之后標記哪些活動已經(jīng)被執(zhí)行并且記錄執(zhí)行的時間[8]。
(就是你在采集完成之后得知道什么時間你在做什么,并將這個東西標記在你采集的傳感器數(shù)據(jù)上)
Streams of sensor data are segmented into individual analysis frames using a sliding window approach, and forwarded as input into feature extractors.
使用滑動窗口方法將傳感器數(shù)據(jù)流分割成獨立的分析幀,并將其作為輸入輸入特征提取器。
(這里應該是指的是傳統(tǒng)方法)
The resulting representations are then categorized by a machine learning based classification backend into the activities under study (or the NULL class).
結果表示然后被一個基于機器學習的分類后端分類為正在研究的活動
第三段(數(shù)據(jù)集大小是關鍵)
The availability of large-scale annotated datasets has resulted in astonishing improvements in performance due to the application of deep learning to computer vision [34, 42], speech recognition [3, 26] and natural language tasks [18, 52].
由于深度學習在計算機視覺[34,42]、語音識別[3,26]和自然語言任務[18,52]的應用,大規(guī)模標注數(shù)據(jù)集的可用性導致了驚人的性能改善。
(就是大規(guī)模的數(shù)據(jù)集在等等方面都可以有效地改善識別的效果)
While end-to-end training has also been applied to activity recognition from wearable sensors [27, 29, 59], the depth and complexity is limited by a lack of such large-scale, diverse labeled data.
雖然端到端訓練也已應用于可穿戴傳感器的活動識別[27,29,59],但由于缺乏這種大規(guī)模、多樣化的標記數(shù)據(jù),其深度和復雜性受到限制。
(使用傳感器數(shù)據(jù)的工作也有不少,但是大多受制于標記數(shù)據(jù)集的大小問題)
However, due to the ubiquity of sensors (e.g., in phones and commercially available wearables such as watches etc.) the data recording itself is typically straightforward, which is in contrast to obtaining their annotations, thereby resulting in potentially large quantities of unlabeled data.
然而,由于傳感器(例如,在手機和商業(yè)可用的可穿戴設備,如手表等)的無處不在,數(shù)據(jù)記錄本身通常是直接的,這與獲取它們的注釋不同,從而導致潛在的大量未標記數(shù)據(jù)。
(雖然標記得不多,但是沒有標記的很多)
Thus, in our work we look for approaches that can make economic use of the limited labeled data and exploit unlabeled data as effectively as possible.
因此,在我們的工作中,我們尋找的方法可以經(jīng)濟地利用有限的標記數(shù)據(jù),并盡可能有效地利用未標記數(shù)據(jù)。
第四段(傳統(tǒng)深度學習當中有長尾問題,所以本文想要使用無監(jiān)督的特征提取替代傳統(tǒng)的特征工程)
Previous works such as [31, 63, 69] have demonstrated how unlabeled data can be utilized to learn useful representations for wide ranging tasks, including identifying kitchen activities [11], activity tracking in car manufacturing [71], classifying every day activities such as walking or running [4, 10, 51, 66], and medical scenarios involving identifying freeze of gait in patients suffering from Parkinson’s disease [57].
之前的研究,如[31,63,69],已經(jīng)證明了如何利用未標記數(shù)據(jù)來學習廣泛任務的有用表示,包括識別廚房活動[11],汽車制造中的活動跟蹤[71],對日?;顒?如步行或跑步)進行分類[4,10,51,66],以及識別帕金森病[57]患者步態(tài)凍結的醫(yī)療方案。
In many such applications, the presence of complex and often sparsely occurring movement patterns coupled with limited annotation makes it especially hard for deriving effective recognition systems.
在許多這樣的應用程序中,復雜且經(jīng)常稀疏出現(xiàn)的移動模式的存在,加上有限的注釋,使得獲得有效的識別系統(tǒng)變得特別困難。
(在實際的行為識別過程中,我們可能遇見那些復雜的很少出現(xiàn)的行為,這些行為可能根本也沒有標簽標注他們。所以我們在訓練模型的是很很難獲得好的效果)
The promising results delivered in these works without the use of labels have resulted in a general direction of integrating unsupervised learning as-is into conventional activity recognition chains (ARC) [8] in the feature extraction step.
這些工作在不使用標簽的情況下取得了很有前景的結果,這導致了在特征提取步驟中將無監(jiān)督學習按現(xiàn)狀集成到傳統(tǒng)活動識別鏈(ARC)[8]中的一個總體方向。
(就是說用無監(jiān)督的方式取代傳統(tǒng)的特征提取方法)
In this work, we follow this general direction of utilizing (potentially large amounts of) unlabeled data for effective representation learning and subsequently construct activity recognizers from the representations learned.
在這項工作中,我們遵循這一總體方向,即利用(可能是大量的)未標記數(shù)據(jù)進行有效的表征學習,并隨后從學到的表征構建活動識別器。
第五段(本文提出的內容考慮了時間序列的特點)
Recent work towards such unsupervised pre-training has gone beyond the early introduction using Restricted Boltzmann Machines (RBMs) [63], involving (variants of) autoencoders [31, 74], and self-supervision [32, 69].
最近對這種無監(jiān)督的預訓練的研究已經(jīng)超越了早期使用受限玻爾茲曼機器(Restricted Boltzmann Machines, RBMs)[63]的引入,涉及(各種)自動編碼器[31,74]和自我監(jiān)督[32,69]。
While they result in effective representations, most of these approaches do not specifically target a characteristic inherent to body-worn sensor data – temporality.
雖然它們能有效地表示,但大多數(shù)方法并沒有專門針對穿戴式傳感器數(shù)據(jù)固有的特性——時間性。
Wearable sensor data resemble time-series and we hypothesize
that incorporating temporal characteristics directly at the representation learning level results in more discriminative features and more effective modeling, thereby leading to better recognition accuracy for HAR scenarios with limited availability of labeled training data – as they are typical for mobile and ubiquitous computing scenarios.
可穿戴傳感器數(shù)據(jù)類似于時間序列,我們假設在表示學習水平上直接結合時間特征會產生更有區(qū)別的特征和更有效的建模,從而導致在有限的標記訓練數(shù)據(jù)可用性下對HAR場景更好的識別精度——因為它們是典型的移動和無處不在的計算場景。
(因為在這里,我們不標記某種特定的分類,所以可以適應大場景下的多種行為的變化。)
第六段()
Previous work on masked reconstruction [32] has attempted to address temporality at feature level in a self-supervised learning scenario by regressing to the zeroed sensor data at randomly chosen timesteps.
之前關于掩碼重構[32]的工作試圖通過在隨機的節(jié)點上歸零傳感器數(shù)據(jù),并使用自動監(jiān)督的方式進行特征提取,來最終達到消除暫時性的目標。
(我覺得你隨機對序列的一部分進行置0達到的效果就是我們在訓練的過程中,讓序列的每個部分的貢獻都差不多,達到一個均衡的目的,也就不會使得暫時性對實驗結果造成很大的影響)
This incorporates local temporal characteristics into a pretext task that forces the recognition network to predict missing values based on immediate past and future data.
這將局部時間特征整合到借口任務中,迫使識別網(wǎng)絡根據(jù)最近的過去和未來數(shù)據(jù)預測缺失值。(這樣)
It was shown that the resulting sensor data representations are beneficial for modeling activities, which provides evidence for our aforementioned hypothesis of temporality at feature level playing a key role for effective modeling in HAR under challenging constraints.
結果表明,所得到的傳感器數(shù)據(jù)表示有利于建?;顒?#xff0c;這為我們前面提到的特征水平上的時間性假設提供了證據(jù),該假設對于具有挑戰(zhàn)性的約束條件下HAR的有效建模起到了關鍵作用。
1.2總結
- 1.慣性傳感器分布廣泛應用廣泛
- 2.HAR傳統(tǒng)都是通過滑動窗口將數(shù)據(jù)分成一段一段的,送入到下游的特征分析器當中,大約就是陳述了一個傳統(tǒng)機器學習的方法
- 3.這里說明了一個矛盾:
想要提升精度就需要大量的數(shù)據(jù)集,但是標記的行為識別數(shù)據(jù)集并不多見。
但是實際上我們生活當中有很多傳感器數(shù)據(jù),但是我們并沒有將其很好的利用起來。
為了解決這個矛盾本文就嘗試使用對比學習這種無監(jiān)督的方式來解決這個矛盾。
2 RELATED WORK ON REPRESENTATIONS OF SENSOR DATA IN HUMAN ACTIVITY RECOGNITION
3 SELF-SUPERVISED PRE-TRAINING WITH CONTRASTIVE PREDICTIVE CODING
3.0
3.0.1逐句翻譯
第一段(介紹本文提出的模型的主要結構)
In this paper, we introduce the Contrastive Predictive Coding (CPC) framework to human activity recognition from wearables.
本文將對比預測編碼(CPC)框架引入到可穿戴設備的人體活動識別中。
Fig. 1 outlines the overall workflow, which includes:
(i) pre-training (part 1 in Fig. 1), where unlabeled data are utilized to obtain useful representations (i.e., learn encoder weights) via the pretext task; and,
預先訓練(圖1中的第一部分),其中未標記數(shù)據(jù)通過藉由借口任務來獲得有用的表示(即,學習編碼器權值)
(大約就是使用借口任務讓表示學習的權重得到學習)
(ii) fine-tuning, which involves performing activity recognition on the learned representations using a classifier (part 2 in Fig. 1).
微調,這涉及使用分類器對已學習的表示進行活動識別(圖1中的第2部分)。
During pre-training, the sliding window approach is applied to large quantities of unlabeled data to segment it into overlapping windows.
They are utilized as input for self-supervised pre-training, which learns useful unsupervised representations.
在訓練前,將滑動窗口方法應用于大量未標記數(shù)據(jù),將其分割成重疊窗口。它們被用作自我監(jiān)督前訓練的輸入,學習有用的無監(jiān)督表征。
(大約就是在無監(jiān)督學習的過程中,也使用滑動窗口這個方法,來學習這些表征)
Once the pre-training is complete, weights from both 𝑔𝑒𝑛𝑐 and 𝑔𝑎𝑟 are frozen and used for feature extraction (part 2 in Fig. 1).
一旦預訓練完成,則凍結𝑔𝑒𝑛𝑐和𝑔𝑎𝑟的權重,用于特征提取(圖1中的第2部分)。
This corresponds to the feature extraction step in the ARC (part 3 in Fig. 1).
這對應于ARC中的特征提取步驟(圖1中的第3部分)。
第二段(怎么驗證學習的有效性)
The frozen learned weights are utilized with the backend classifier network (see Sec. 3.2), a three-layer multilayer perceptron (MLP), in order to classify windows of labeled data into activities.
后臺分類器網(wǎng)絡(一個三層多層感知器,MLP)利用凍結的學習權值,將標記數(shù)據(jù)的窗口分類為行為。
This corresponds to the classification step in the ARC. The learned weights from CPC are frozen and only the classifier is optimized on (potentially smaller amounts of) labeled datasets.
這與ARC中的分類步驟相對應。從CPC學到的權值被凍結,只有分類器在(可能數(shù)量更少)標記的數(shù)據(jù)集上進行優(yōu)化。
The resulting performance directly indicates the quality of the learned representations.
結果的性能直接表明了學習表示的質量。
第三段(介紹接下里的內容)
In what follows, we first detail our Contrastive Predictive Coding framework as it is applied to HAR, and then describe the backend classifier network used to evaluate the unsupervised representations.
在接下來的內容中,我們首先詳細介紹對比預測編碼框架在HAR中的應用,然后描述后端分類器網(wǎng)絡用于評估無監(jiān)督表示。
3.0.2總結
本文提出的內容的主要組成就是:
- 1.使用CPC從無標簽數(shù)據(jù)當中來學習特征提取
- 2.使用有標簽數(shù)據(jù)進行微調
- 3.前兩個部分被稱為預訓練,之后使用提取的特征取代傳統(tǒng)的行為識別的特征提取階段。
之后使用三層全連接,來預測最終的行為識別結果,達到一個與測試表征學習質量的目標
總結就是使用了非常傳統(tǒng)的對比學習思路。
3.1
4. HUMAN ACTIVITY RECOGNITION BASED ON CONTRASTIVE PREDICTIVE CODING基于對比預測編碼的人體活動識別
4.0
4.0.1逐句翻譯
第一段(總結上一段的內容:介紹模型設計,引出本文的實驗)
In the previous section we have introduced our representation learning framework for movement data based on contrastive predictive coding.
在上一節(jié)中,我們介紹了基于對比預測編碼的運動數(shù)據(jù)表示學習框架。
This pre-training step is integrated into an overarching human activity recognition framework, that is based on the standard Activity Recognition Chain (ARC) [8].
這個訓練前的步驟被集成到一個全面的人類活動識別框架中,該框架基于標準的活動識別鏈(ARC)[8]。
(這里作者還在說他這個東西可以嵌入到常規(guī)的行為識別鏈當中)
Addressing our general goal of deriving effective HAR systems from limited amounts of annotated training data, as it is a regular challenge in mobile and ubiquitous computing settings, we conducted extensive experimental evaluations to explore the overall effectiveness of our proposed representation learning approach.
我們的總體目標是從有限數(shù)量的標注訓練數(shù)據(jù)中獲得有效的HAR系統(tǒng),因為這是移動和普適計算環(huán)境中的一個常規(guī)挑戰(zhàn),我們進行了廣泛的實驗評估,以探索我們提出的表示學習方法的整體有效性。
(在此強調目標是使用廣泛分布的傳感器設備采集的數(shù)據(jù)進行行為識別。)
第二段(開始介紹本段的描述內容)
In what follows we provide a detailed explanation of our experimental evaluation, which includes descriptions of:
下面我們將對我們的實驗評估進行詳細的解釋,包括以下內容的描述:
i) Application scenarios that our work focuses on;
我們工作重點關注的應用場景;
ii) Implementation details;
實現(xiàn)細節(jié)
iii) Evaluation metrics used for quantitative evaluation; and
用于定量評價的評價指標;和
iv) Overall experimental procedure. Results of our experiments and discussion thereof are presented in Sec. 5.
整個實驗過程。我們的實驗結果和討論將在第5節(jié)中給出。
4.0.2總結
這一部分大約就是一個承上啟下的作用
4.1 Application Scenarios
4.3 Performance Metric
The test set mean F1-score is utilized as the primary metric to evaluate performance.
測試集的平均f1分數(shù)被用來作為評估性能的主要指標。
The datasets used in this study show substantial class imbalance and thus experiments require evaluation metrics that are less affected negatively by such biased class distributions [64].
本研究使用的數(shù)據(jù)集顯示了嚴重的分類與分類之間的不平衡,因此實驗需要的評價指標受這種偏置的階級分布的負面影響較小[64]
The mean F1-score is given by:
f1的平均分是:
where |𝑐| corresponds to the number of classes
其中|𝑐|對應類的數(shù)量
while 𝑝𝑟𝑒𝑐𝑐 and 𝑟𝑒𝑐𝑎𝑙𝑙𝑐 are the precision and recall for each class.
precc和recallc是對應每個分類的precision and recall
(帶幾個數(shù)據(jù)進去可以知道這個東西確實有效,但是不知道為啥有效,具體為啥應該得去看文章:Evaluation: from precision, recall and F-measure to ROC)
5 RESULTS AND DISCUSSION結果與討論
5.1 Activity Recognition
第一段(介紹實驗開展方式)
We perform CPC-based self-supervised pre-training and integrate the learned weights as a feature extractor in the activity recognition chain.
我們進行基于cpc的自我監(jiān)督預訓練,并將學習到的權值作為特征提取器集成到活動識別鏈中。
In order to evaluate these learned representations, we compute their performance on the classifier network (Sec. 3.2).
為了評估這些習得的表示,我們計算它們在分類器網(wǎng)絡上的性能(第3.2節(jié))。
The performance obtained by CPC is contrasted primarily against previous unsupervised approaches including multi-task self supervised learning [69], convolutional autoencoders [31], and masked reconstruction [32].
CPC獲得的性能主要與之前的無監(jiān)督方法進行了對比,包括多任務自監(jiān)督學習[69]、卷積自編碼器[31]和掩碼重構[32]。
For reference, we also compare the performance relative to the supervised baseline–DeepConvLSTM [59]– and a network with the same architecture as CPC, albeit trained end-to-end from scratch.
作為參考,我們還比較了相對于有監(jiān)督的基線DeepConvLSTM[59]和具有與CPC相同架構的網(wǎng)絡的性能,盡管是從頭到尾訓練的。
Once the model was pre-trained using CPC, the learned weights (from both 𝑔𝑒𝑛𝑐 and 𝑔𝑎𝑟) were frozen and used with the classifier network. Labeled data was utilized to train the classifier network using cross entropy loss and the test set mean F1-score was detailed in Tab. 2.
一旦使用CPC對模型進行預訓練,學習到的權值(來自𝑔𝑒𝑛𝑐和𝑔𝑎𝑟)將被凍結并與分類器網(wǎng)絡一起使用。利用標記數(shù)據(jù)利用交叉熵損失訓練分類器網(wǎng)絡,測試集f1均值得分詳見表2。
第二段()
We first compare the performance of the CPC-based pre-training to state-of-the-art unsupervised learning approaches.
我們首先比較了基于cpc的前訓練和最先進的無監(jiān)督學習方法的表現(xiàn)。
We note that all unsupervised learning approaches are evaluated on the same classifier network (Sec.3.2), which is optimized during model training for activity recognition.
我們注意到,所有的無監(jiān)督學習方法都在同一個分類器網(wǎng)絡上進行評估(章節(jié)3.2),該網(wǎng)絡在活動識別的模型訓練中進行了優(yōu)化。
On Mobiact, Motionsense and USC-HAD, CPC-based pre-training outperforms allstate-of-the-art unsupervised approaches.
For UCI-HAR, the performance
is comparable to masked reconstruction. This clearly demonstrates the effectiveness of the pre-training thereby
fulfilling one of the goals of the paper – which is to develop effective unsupervised pre-training approaches. It also
validates our hypothesis that explicitly incorporating temporality at the representation level itself is beneficial
towards learning useful representations.
總結
以上是生活随笔為你收集整理的对比学习系列论文CPCforHAR(一):Contrastive Predictive Coding for Human Activity Recognition的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: attention的query、key和
- 下一篇: 定位相关论文-A Novel Pedes