2021_KDD_Socially-Aware Self-Supervised Tri-Training for Recommendation
[論文閱讀筆記]2021_KDD_Socially-Aware Self-Supervised Tri-Training for Recommendation
論文下載地址: https://doi.org/10.1145/3447548.3467340
發表期刊:KDD
Publish time: 2021
作者及單位:
- Junliang Yu The University of Queensland Brisbane, Australiajl.yu@uq.edu.au
- Hongzhi Yin? The University of Queensland Brisbane, Australia h.yin1@uq.edu.au
- Min Gao Chongqing University Chongqing, China gaomin@cqu.edu.cn
- Xin Xia The University of Queensland Brisbane, Australia x.xia@uq.edu.au
- Xiangliang Zhang KAUST Thuwal, Saudi Arabia xiangliang.zhang@kaust.edu.sa
- Nguyen Quoc Viet Hung Griffith University Gold Coast, Australia quocviethung1@gmail.com
數據集:
- Last.fm http://files.grouplens.org/datasets/hetrec2011/ 作者給的
- Douban-Book https://github.com/librahu/HIN-Datasets-for-Recommendation-and-Network-Embedding
- Yelp https://github.com/Coder-Yu/QRec
代碼:
- https://github.com/Coder-Yu/QRec (作者在論文中公開的)
其他人寫的文章
- Socially-Aware Self-Supervised Tri-Training for Recommendation
- Heterogeneous graph + self-supervised
簡要概括創新點: (把SSL(Semi Supervised Learning)和Tri training,和對比學習(Contrastive Learning) 搬過來,完美地用起來。這波作者是MHCN的作者,有些思想接著用(尤其是User,Item三角關系))
- (1)We propose a general socially-aware self-supervised tri-training framework for recommendation. By unifying the recommendation task and the SSL task under this framework, the recommendation performance can achieve significant gains. (我們提出了一個通用的社交意識自監督tri-training推薦框架。通過在此框架下統一推薦任務和SSL任務,推薦性能可以取得顯著的改進。)
- (2)by discovering self-supervision signals from two complementary views of the raw data. (通過從原始數據的兩個互補視圖中發現自監督信號來改進推薦。)
- (3)Under the self-supervised tri-training scheme, the neighbor-discrimination based contrastive learning method is developed to refine user representations with pseudo-labels from the neighbors=. (在self-supervised tri-training方案下,提出了基于鄰居識別== 的 對比學習方法,利用鄰居中的偽標簽來細化用戶表示。)
細節
- (1) Tri-training [47] is a popular semi-supervised learning algorithm which exploits unlabeled data using three classifiers. (tri-training是一種流行的半監督學習算法,它使用三種分類器利用無標簽數據)
- (2)Then, in the labeling process of tri-training, for any classifier, an unlabeled example can be labeled for it as long as the other two classifiers agree on the labeling of this example. The generated pseudo-label is then used as the ground-truth to train the corresponding classifier in the next round of labeling. (然后,在 Tri-Training的標記過程中,對于任何一個分類器,只要其他兩個分類器對這個例子的標記達成一致,就可以對一個未標記的例子進行標記。然后將生成的偽標簽作為真實值,在下一輪標記中訓練相應的分類器。)
- (3)LightGCN [11] is the basic encoder in SEPT.
ABSTRACT
- Self-supervised learning (SSL), which can automatically generate ground-truth samples from raw data, holds vast potential to improve recommender systems. Most existing SSL-based methods perturb the raw data graph with uniform node/edge dropout to generate new data views and then conduct the self-discrimination based contrastive learning over different views to learn generalizable representations. Under this scheme, only a bijective mapping is built between nodes in two different views, which means that the self-supervision signals from other nodes are being neglected. (自監督學習(SSL)可以從原始數據中自動生成真實樣本,在改進推薦系統方面具有巨大的潛力。現有的基于ssl的方法通過節點/邊dropout干擾原始數據圖,生成新的數據視圖,然后對不同視圖進行基于對比學習的自識別,學習通用的表示。 在該模式下,只在兩個不同視圖的節點之間建立一個雙射映射,說明忽略了來自其他節點的自監督信號。
- Due to the widely observed homophily in recommender systems, we argue that the supervisory signals from other nodes are also highly likely to benefit the representation learning for recommendation. To capture these signals, a general socially-aware SSL framework that integrates tri-training is proposed in this paper. (由于在推薦系統中被廣泛觀察到的同質性,我們認為來自其他節點的監督信號也很有可能有利于推薦的表示學習。為了捕獲這些信號,本文提出了一種集成 tri-training的通用社交感知SSL框架。)
- Technically, our framework first augments the user data views with the user social information. (從技術上講,我們的框架首先通過用戶的社交信息來增強用戶的數據視圖)
- And then under the regime of tri-training for multi-view encoding, the framework builds three graph encoders (one for ecommendation) upon the augmented views and iteratively improves each encoder with self-supervision signals from other users, generated by the other two encoders. (然后在多視圖編碼的tri-training機制下,該框架在增強視圖上構建三個圖編碼器(只有一個編碼器用于推薦),并利用其他兩個編碼器生成的其他用戶的自監督信號對每個編碼器進行迭代更新。)
- Since the tri-training operates on the augmented views of the same data sources for self-supervision signals, we name it self-supervised tri-training. (由于tri-training是在自監督信號的相同數據源的增強視圖上進行的,因此我們將其命名為 self-supervised tri-training。)
- Extensive experiments on multiple real-world datasets consistently validate the effectiveness of the self-supervised tritraining framework for improving recommendation. The code is released at https://github.com/Coder-Yu/QRec.
CCS CONCEPTS
? Information systems → Recommender systems; ? Theory of computation → Semi-supervised learning.
KEYWORDS
Self-Supervised Learning, Tri-Training, Recommender Systems, Contrastive Learning
1 INTRODUCTION
-
(1) Self-supervised learning (SSL) [17], emerging as a novel learning paradigm that does not require human-annotated labels, recently has received considerable attention in a wide range of fields [5, 8, 16, 21, 23, 27, 45]. As the basic idea of SSL is to learn with the automatically generated supervisory signals from the raw data, which is an antidote to the problem of data sparsity in recommender systems, SSL holds vast potential to improve recommendation quality. The recent progress in self-supervised graph representation learning [14, 27, 40] has identified an effective training scheme for graph-based tasks. That is, performing stochastic augmentation by perturbing the raw graph with uniform node/edge dropout or random feature shuffling/masking to create supplementary views and then maximizing the agreement between the representations of the same node but learned from different views, which is known as graph contrastive learning [40]. Inspired by its effectiveness, a few studies [19, 29, 37, 46] then follow this training scheme and are devoted to transplanting it to recommendation. (自監督學習(SSL)是一種不需要人工標注標簽的新型學習范式,最近在許多領域受到了相當廣泛的關注。由于SSL的基本思想是從原始數據中學習自動生成的監督信號,可以解決推薦系統中的數據稀疏性問題,因此SSL在提高推薦性能方面具有巨大的潛力。自監督圖表示學習的最新進展已經證明是一種有效的基于圖的任務訓練模式。也就是說,通過使用節點/邊dropout或隨機特征變換/掩蔽原始圖來執行隨機增強,以創建補充視圖,然后最大化同一節點但從不同視圖學習的表示之間的一致性,這稱為圖對比學習。受其有效性的啟發,一些研究遵循這個訓練模式,并致力于將其移植到推薦中。
-
(2) With these research effort, the field of self-supervised recommendation recently has demonstrated some promising results showing that mining supervisory signals from stochastic augmentations is desirable [29, 46]. However, in contrast to other graph-based tasks, recommendation is distinct because there is widely observed homophily across users and items [20]. Most existing SSL-based methods conduct the self-discrimination based contrastive learning over the augmented views to learn generalizable representations against the variance in the raw data. **Under this scheme, a bijective mapping is built between nodes in two different views, and a given node can just exploit information from itself in another view. Meanwhile, the other nodes are regarded as the negatives that are pushed apart from the given node in the latent space. Obviously, a number of nodes are false negatives which are similar to the given node due to the homophily, and can actually benefit representation learning in the scenario of recommendation if they are recognized as the positives. Conversely, roughly classifying them into the negatives could lead to a performance drop.
(通過這些研究的努力,自監督推薦領域最近已經被證明了一些有潛力的結果,表明從隨機增強中挖掘監督信號是可取的。然而,與其他基于圖的任務相比,推薦是截然不同的,因為在用戶和商品之間有廣泛觀察到的同質性。大多數現有的基于ssl的方法是對增廣視圖進行基于自識別的對比學習,以學習針對原始數據的通用表示。在該方案下,在兩個不同視圖中的節點之間建立一個雙射,一個給定的節點可以在另一個視圖中從它本身挖掘信息。同時,將在潛在空間中的其他節點視為與給定節點被推開遠離的==負節點==。一些節點是假負樣本,由于同質性,它們與給定的節點相似,如果它們被識別為正樣本,那么在推薦下,實際上可以有利于表示學習。相反,把它們粗略地歸入負樣本可能會導致性能下降。 -
(3) To tackle this issue, a socially-aware SSL framework which combines the tri-training [47] (multi-view co-training) with SSL is proposed in this paper. (為了解決這一問題,本文提出了一種將 tri-training(多視圖共同訓練)與SSL相結合的社交感知SSL框架)
- For supplementary views that can capture the homophily among users, we resort to social relations which can be another data source that implicitly reflects users’ preferences [4, 38, 41–43]. Owing to the prevalence of social platforms in the past decade, social relations are now readily accessible in many recommender systems. (補充的視圖可以捕獲用戶之間的同質性,從另一個隱式反映用戶偏好的數據源中捕捉社交關系。由于在過去的十年中社交平臺的流行,社交關系現在在許多推薦系統中都很容易獲得)
- We exploit the triadic structures in the user-user and user-item interactions to augment two supplementary data views, and socially explain them as profiling users’ interests in expanding social circles and sharing desired items to friends, respectively. (我們利用用戶-用戶和用戶-項目交互中的三元結構來增強兩個補充數據視圖,并分別將其解釋為用戶在擴展社交圈的興趣和向朋友分享項目的興趣)
- Given the use-item view which contains users’ historical purchases, we have three views that characterize users’ preferences from different perspectives and also provide us with a scenario to fuse tri-training and SSL. (考慮到用戶-項目視圖包含了用戶的歷史購買,我們有三個視圖從不同的角度描述用戶的偏好,并提供一個場景來融合 tri-training和SSL。)
-
(4) Tri-training [47] is a popular semi-supervised learning algorithm which exploits unlabeled data using three classifiers. (tri-training是一種流行的半監督學習算法,它使用三種分類器利用無標簽數據)
- In this work, we employ it to mine self-supervision signals from other users in recommender systems with the multi-view encoding. Technically, we first build three asymmetric graph encoders over the three views, of which two are only for learning user representations and giving pseudo-labels, and another one working on the user-item view also undertakes the task of generating recommendations. (在本工作中,我們利用它來挖掘具有多視圖編碼的推薦系統中其他用戶的自監督信號。從技術上講,我們首先在三個視圖上構建了三個非對稱圖編碼器,其中兩個僅用于學習用戶表示和給出偽標簽,另一個針對用戶-項目視圖完成生成推薦的任務)
- Then we dynamically perturb the social network and user-item interaction graph to create an unlabeled example set. Following the regime of tri-training, during each epoch, the encoders over the other two views predict the most probable semantically positive examples in the unlabeled example set for each user in the current view. (然后,我們動態地擾亂社交網絡和用戶-項目交互圖,創建一個無標簽的樣本集。在tri-training機制下,在每輪期間,其他兩個視圖上的編碼器預測當前視圖中每個用戶的無標簽的樣本集中最可能的語義正樣本。)
- Then the framework refines the user representations by maximizing the agreement between representations of labeled users in the current view and the example set through the proposed neighbor-discrimination based contrastive learning. As all the encoders iteratively improve in this process, the generated pseudo-labels also become more informative, which in turn recursively benefit the encoders again. The recommendation encoder over the user-item view thus becomes stronger in contrast to those only enhanced by the self-discrimination SSL scheme. Since the tri-training operates on the complementary views of the same data sources to learn self-supervision signals, we name it self-supervised tri-training. (然后,該框架通過所提出的基于鄰居識別的對比學習細化用戶表示,即==最大化當前視圖中有標簽與無標簽數據集的用戶表示之間的一致性來細化用戶表示。隨著所有編碼器在這個過程中不斷改進,生成的偽標簽也變得更豐富,這反過來又遞歸地使編碼器再次受益。因此,與僅通過自識別SSL方案增強的推薦編碼器相比,用戶-項目視圖上的推薦編碼器變得更強。由于tri-training是基于同一數據源==的補充視圖來學習自監督信號,因此我們將其命名為自監督tri-training。
-
(5) The major contributions of this paper are summarized as follows:
- We propose a general socially-aware self-supervised tri-training framework for recommendation. By unifying the recommendation task and the SSL task under this framework, the recommendation performance can achieve significant gains. (我們提出了一個通用的社交意識自監督tri-training推薦框架。通過在此框架下統一推薦任務和SSL任務,推薦性能可以取得顯著的改進。)
- We propose to exploit positive self-supervision signals from other users and develop a neighbor-discrimination based contrastive learning method. (我們提出從其他用戶中挖掘積極自監督信號,并開發一種基于鄰居識別的對比學習方法。)
- We conduct extensive experiments on multiple real-world datasets to demonstrate the advantages of the proposed SSL framework and investigate the effectiveness of each module in the framework through a comprehensive ablation study.
-
(6) The rest of this paper is structured as follows. Section 2 summarizes the related work of recommendation and SSL. Section 3 introduces the proposed framework. The experimental results are reported in Section 4. Finally, Section 5 concludes this paper.
2 RELATED WORK
2.1 Graph Neural Recommendation Models
- (1) Recently, graph neural networks (GNNs) [7, 34] have gained considerable attention in the field of recommender systems for their effectiveness in solving graph-related recommendation tasks. (近年來,圖神經網絡(GNNs)因其在解決圖相關推薦任務的有效性而在推薦系統領域受到了廣泛的關注。)
- Particularly, GCN [15], as the prevalent formulation of GNNs which is a first-order approximation of spectral graph convolutions, has driven a multitude of graph neural recommendation models like GCMC [2], NGCF [28], and LightGCN [11]. (特別是,GCN,作為GNNs的普遍公式,即譜圖卷積的一階近似,已經驅動了大量的圖神經推薦模型,如GCMC,NGCF和LightGCN。)
- The basic idea of these GCN-based models is to exploit the high-order neighbors in the user-item graph by aggregating the embeddings of neighbors to refine the target node’s embeddings [33]. In addition to these general models, GNNs also empower other recommendation methods working on specific graphs such as SR-GNN [32] and DHCN [35] over the session-based graph, and DiffNet [31] and MHCN [44] over the social network. It is worth mentioning that GNNs are often used for social computing as the information spreading in social networks can be well captured by the message passing in GNNs [31]. That is the reason why we resort to social networks for self-supervisory signals generated by graph neural encoders.
這些基于GCN的模型的基本思想是通過聚合鄰居的嵌入來利用用戶-項目圖中的高階嵌入來細化目標節點的嵌入。除了這些通用模型之外,GNNs也被使用到其他針對特定圖的推薦方法中,如基于會話的圖SR-GNN和DHCN,以及社交網絡上的DiffNet和MHCN。值得一提的是,GNN經常被用于社交計算,因為GNN可以很好地捕獲社交網絡中的信息傳播。這就是為什么我們利用社交網絡由圖神經編碼器生成自監督信號的原因。
2.2 Self-Supervised Learning in RS (這個方向值得關注)
-
(1) Self-supervised learning [17] (SSL) is an emerging paradigm to learn with the automatically generated ground-truth samples from the raw data. It was firstly used in visual representation learning and language modeling [1, 5, 10, 12, 45] for model pretraining. The recent progress in SSL seeks to harness this flexible learning paradigm for graph representation learning [22, 23, 26, 27]. SSL models over graphs mainly mine self-supervision signals by exploiting the graph structure. The dominant regime of this line of research is graph contrastive learning which contrasts multiple views of the same graph where the incongruent views are built by conducting stochastic augmentations on the raw graph [9, 23, 27, 40]. The common types of stochastic augmentations include but are not limited to uniform node/edge dropout, random feature/attribute shuffling, and subgraph sampling using random walk.
(自監督學習(SSL)是一種新興的范式,可以通過自動從原始數據中生成的真實樣本進行學習。它首先被用于視覺表示學習和語言建模進行模型預訓練。SSL的最新進展試圖利用這種靈活的學習范式來進行圖表示學習。圖上的SSL模型主要是利用圖的結構來挖掘自監督信號。這一研究方向的主要機制是圖對比學習,它對比了同一個圖的多個視圖,其中不一樣的視圖是通過對原始圖進行隨機增強來建立的。常見的隨機增強類型包括但不限于節點/邊dropout、隨機特征/屬性變換和使用隨機游走的子圖采樣。 -
(2)Inspired by the success of graph contrastive learning, there have been some recent works [19, 29, 37, 46] which transplant the same idea to the scenario of recommendation. (受圖對比學習成功的啟發,最近有一些工作將同樣的想法移植到推薦的場景中)
- Zhou et al. [46] devise auxiliary self-supervised objectives by randomly masking attributes of items and skipping items and subsequences of a given sequence for pretraining sequential recommendation model. (Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization通過隨機屏蔽項目的屬性、跳過給定序列的項目和子序列來設計輔助的自監督目標,用于預訓練順序推薦模型。)
- Yao et al. [37] propose a two-tower DNN architecture with uniform feature masking and dropout for self-supervised item recommendation. (姚等人。[37]提出了一種具有統一特征掩蔽和丟棄的雙塔DNN架構,用于自我監督的項目推薦)
- Maet al.[19] mine extra signals for supervision by looking at the longer-term future and reconstruct the future sequence for self-supervision, which adopts feature masking in essence. (馬氏等人。[19]通過觀察長遠的未來,挖掘額外的信號進行監控,重建未來的自我監控序列,本質上采用了特征掩蔽)
- Wu et al. [29] summarize all the stochastic augmentations on graphs and unify them into a general self-supervised graph learning framework for recommendation. (吳等人。[29]總結了圖上所有的隨機增強,并將其統一為一個一般的自監督圖學習框架進行推薦。)
- Besides, there are also some studies [25, 36, 44] refining user representations with mutual information maximization among a set of certain members (e.g. ad hoc groups) for self-supervised recommendation. However, these methods are used for specific situations and cannot be easily generalized to other scenarios. (此外,還有一些研究在集合間的互信息最大化來細化用戶表示。
3 PROPOSED FRAMEWORK
In this section, we present our SElf-suPervised Tri-training framework, called SEPT, with the goal of mining self-supervision signals from other users by the multi-view encoding. The overview of SEPT is illustrated in Fig. 1.
3.1 Preliminaries
3.1.1 Notations.
- In this paper, we use two graphs as the data sources including the user-item interaction graph G r \mathcal{G}_r Gr? and the user social network G s \mathcal{G}s Gs.
- U = { u 1 , u 2 , . . . , u m } ( ∣ U ∣ = m ) \mathcal{U} = \{u_1, u_2, ..., u_m\} (|\mathcal{U}| = m) U={u1?,u2?,...,um?}(∣U∣=m) denotes the user nodes across both G r \mathcal{G}r Gr and G s \mathcal{G}s Gs,
- and I = i 1 , i 2 , . . . , i n ( ∣ I ∣ = n ) \mathcal{I} = {i1,i2, ...,in} (|\mathcal{I}| = n) I=i1,i2,...,in(∣I∣=n) denotes the item nodes in G r \mathcal{G}r Gr.
- As we focus on item recommendation, R ∈ R m × n R \in R^{m\times n} R∈Rm×n is the binary matrix with entries only 0 and 1 that represent user-item interactions in G r \mathcal{G}r Gr.
- For each entry ( u , i ) (u,i) (u,i) in R R R, if user u u u has consumed/clicked item i i i, r u i = 1 r_{ui} = 1 rui?=1, otherwise r u i = 0 r_{ui} = 0 rui?=0.
- As for the social relations, we use S ∈ R m × m S ∈ R^{m\times m} S∈Rm×m to denote the social adjacency matrix which is binary and symmetric because we work on undirected social networks with bidirectional relations.
- We use P ∈ R m × d P \in R^{m\times d} P∈Rm×d and Q ∈ R n × d Q \in R^{n\times d} Q∈Rn×d to denote the learned final user and item embeddings for recommendation, respectively.
- To facilitate the reading, in this paper, matrices appear in bold capital letters and vectors appear in bold lower letters.
3.1.2 Tri-Training.
- Tri-training [47] is a popular semi-supervised learning algorithm which develops from the co-training paradigm [3] and tackles the problem of determining how to label the unlabeled examples to improve the classifiers. (Tri-Training是一種流行的半監督學習算法,它從co-training范式發展而來,解決了確定如何標記無標簽數據以改進分類器的問題。)
- In contrast to the standard co-training algorithm which ideally requires two sufficient, redundant and conditionally independent views of the data samples to build two different classifiers, tri- training is easily applied by lifting the restrictions on training sets. (與標準的co-training算法在理想情況下需要兩個足夠的、冗余的和有條件獨立的數據樣本視圖來構建兩個不同的分類器相比,通過解除對訓練集的限制,可以很容易地使用 Tri-Training。)
- It does not assume sufficient redundancy among the data attributes, and initializes three diverse classifiers upon three different data views generated via bootstrap sampling [6]. (它不假設數據屬性之間有足夠的冗余,并通過引導抽樣在生成的三個不同的數據視圖上初始化三個不同的分類器。)
- Then, in the labeling process of tri-training, for any classifier, an unlabeled example can be labeled for it as long as the other two classifiers agree on the labeling of this example. The generated pseudo-label is then used as the ground-truth to train the corresponding classifier in the next round of labeling. (然后,在 Tri-Training的標記過程中,對于任何一個分類器,只要其他兩個分類器對這個例子的標記達成一致,就可以對一個未標記的例子進行標記。然后將生成的偽標簽作為真實值,在下一輪標記中訓練相應的分類器。)
3.2 Data Augmentation
3.2.1 View Augmentation.
-
(1) As has been discussed, there is widely observed homophily in recommender systems. Namely, users and items have many similar counterparts. (正如所討論的,在推薦系統中存在廣泛的同質性。也就是說,用戶和項目有許多相似的對應物。)
- To capture the homophily for self-supervision, we exploit the user social relations for data augmentation as the social network is often known as a reflection of homophily [20, 39] (i.e., users who have similar preferences are more likely to become connected in the social network and vice versa). (為了獲取自監督的同質性,我們利用用戶的社交關系來增強數據,因為社交網絡通常被稱為同質性的反映。(即,有相似偏好的用戶更有可能在社交網絡中建立聯系,反之亦然))
- Since many service providers such as Yelp1 encourage users to interact with others on their platforms, it provides their recommender systems with opportunities to leverage abundant social relations. However, as social relations are inherently noisy [41, 43], for accurate supplementary supervisory information, SEPT only utilizes the reliable social relations by exploiting the ubiquitous triadic closure [13] among users. (由于許多像Yelp這樣的服務提供商鼓勵用戶在其平臺上與他人互動,它為他們的推薦系統提供了利用豐富的社交關系的機會。然而,由于社會關系本質上是有噪聲的,對于準確的補充監督信息,SEPT僅利用 用戶中普遍存在的三元閉合 來使用可靠的社交關系。)
- In a socially-ware recommender system, by aligning the user-item interaction graph G r \mathcal{G}_r Gr? and the social network G s \mathcal{G}_s Gs?, we can readily get two types of triangles: three users socially connected with each other (e.g.u1,u2andu4in Fig. 1) and two socially connected users with the same purchased item (e.g. u1,u2and i1in Fig. 1). The former is socially explained as profiling users’ interests in expanding social circles, and the latter is characterizing users’ interests in sharing desired items with their friends. It is straightforward to regard the triangles as strengthened ties because if two persons in real life have mutual friends or common interests, they are more likely to have a close relationship. (在社交軟件推薦系統中,通過對齊用戶-項目交互圖 G r \mathcal{G}_r Gr? 和社交網絡 G s \mathcal{G}_s Gs?,我們可以很容易地得到兩種三角形:三個用戶之間的社交聯系(如圖1中的𝑢1、𝑢2和𝑢4)和兩個購買相同項目的社交聯系用戶(如圖1中的𝑢1、𝑢2和𝑖1)。前者被解釋為描述用戶在擴充社交圈中的興趣,后者是描述用戶對他的朋友分享物品的興趣。直接認為三角形是加強的聯系,因為如果現實生活中的兩個人有共同的朋友或共同的利益,他們更有可能有親密的關系。) 這個做法,其實是沿用了作者們的上一篇文章,MHCN模型中的那10個三角形
-
(2) Following our previous work [44], the mentioned two types of triangles can be efficiently extracted in the form of matrix multiplication. Let A f ∈ R m × m A_f \in R^{m\times m} Af?∈Rm×m and A s ∈ R m × m A_s \in R^{m\times m} As?∈Rm×m denote the adjacency matrices of the users involved in these two types of triangular relations. They can be calculated by: (根據我們之前的工作(MHCN),上述兩種三角形可以有效地提取為矩陣乘法的形式, 表示包含這兩種三角關系的用戶鄰接矩陣。它們可通過以下方法進行計算:)
-
(3) The multiplication S S ( R R ? ) SS (RR^{\top}) SS(RR?) accumulates the paths connecting two user via shared friends (items), and the **Hadamard product ⊙ S \odot S ⊙S makes these paths into triangles.
- Since both S S S and R R R are sparse matrices, the calculation is not time-consuming.
- The operation ⊙ S \odot S ⊙S ensures that the relations in A f A_f Af? and A s A_s As? are subsets of the relations in S S S.
- As A f A_f Af? and A s A_s As? are not binary matrices, Eq. (1) can be seen a special case of bootstrap sampling on S S S with the complementary information from R R R.
- Given A f A_f Af? and A s A_s As? as the augmentation of S S S and R R R, we have three views that characterize users’ preferences from different perspectives and also provide us with a scenario to fuse ri-training and SSL. To facilitate the understanding, we name (我們有三個視圖從不同的角度來描述用戶的偏好,也為我們提供了一個融合tri-training和SSL的場景)
- the view over the user-item interaction graph preference view,
- the view over the triangular social relations friend view,
- and another one sharing view,
- which are represented by R, Af, and As, respectively.
3.2.2 Unlabeled Example Set.
- To conduct tri-training, an unlabeled example set is required. We follow existing works [29,40] to perturb the raw graph with edge dropout at a certain probability ρ \rho ρ to create a corrupted graph from where the learned user presentations are used as the unlabeled examples. This process can be formulated as: (要進行tri-training,需要一個無標簽樣本集。我們遵循現有的工作,以概率𝜌對原始圖進行邊dropout,創建一個被干擾的圖,從中學習的用戶表示被用作無標簽的樣本。此過程可表述為:) (無標簽的數據是怎么來的)
- where N r N_r Nr? and N s N_s Ns? are nodes,
- E r Er Er and E s Es Es are edges in G r Gr Gr and G s Gs Gs,
- and m ∈ { 0 , 1 } ∣ E r ∪ E s ∣ m \in \{0,1\} ^{| Er\cup Es|} m∈{0,1}∣Er∪Es∣is the masking vector to drop edges.
- Herein we perturb both G r Gr Gr and G s Gs Gs instead of G r Gr Gr only, because the social information is included in the aforementioned two augmented views. (在這里,我們同時干擾 G 𝑟 G𝑟 Gr和 G 𝑠 G𝑠 Gs,而不是只干擾G𝑟,因為社交信息包含在上述兩個增強視圖中)
- For integrated self-supervision signals, perturbing the joint graph is necessary. (對于集成的自監督信號,擾動連接圖是必要的。)
3.3 SEPT: Self-Supervised Tri-Training
3.3.1 Architecture.
- With the augmented views and the unlabeled example set, we follow the setting of tri-training to build three encoders. Architecturally, the proposed self-supervised training framework can be model-agnostic so as to boost a multitude of graph neural recommendation models. But for a concrete framework which can be easily followed, we adopt LightGCN [11] as the basic structure of the encoders due to its simplicity. The general form of encoders is defined as follows: (使用增強視圖和無標簽的樣本集,我們遵循Tri-Training的設置來構建三個編碼器。在結構上,所提出的自監督訓練框架可以與模型無關,從而促進大量的圖神經推薦模型。 但對于一個易于遵循的具體框架,由于其簡單性,我們采用了LightGCN作為編碼器的基本結構。編碼器的一般形式的定義如下:
- where H H H is the encoder,
- Z ∈ R m × d Z \in R^{m\times d} Z∈Rm×d or R ( m + n ) × d R^{(m+n)\times d} R(m+n)×d denotes the final representation of nodes,
- E E E of the same size denotes the initial node embeddings which are the bottom shared by the three encoders,
- and V ∈ { R , A s , A f } \mathcal{V} \in \{R,A_s,A_f\} V∈{R,As?,Af?} is any of the three views.
- It should be noted that, unlike the vanilla tri-training, SEPT is asymmetric. (需要注意的是,與普通的tri-training不同,SEPT是不對稱的)
- The two encoders H f Hf Hf and H s Hs Hs that work on the friend view and sharing view are only in charge of learning user representations through graph convolution and giving pseudo-labels, while the encoder Hrworking on the reference view also undertakes the task of generating recommendations and thus learns both user and item representations (shown in Fig. 1). (兩個編碼器𝐻𝑓和𝐻𝑠工作在朋友視圖、分享視圖上,只負責通過圖卷積學習用戶表示給出偽標簽,而編碼器𝐻𝑟工作在偏好視圖上,承擔生成推薦的任務,從而學習用戶和項目表示(如圖1所示))
- Let H r Hr Hr be the dominant encoder (recommendation model), and H f Hf Hf and H s Hs Hs be the auxiliary encoders. Theoretically, given a concrete H r Hr Hr like LightGCN [11], there should be the optimal structures of H f Hf Hf and H s Hs Hs. (設𝐻𝑟為主編碼器(推薦模型),𝐻𝑓和𝐻𝑠為輔助編碼器。理論上,給定像LightGCN這樣的具體𝐻𝑟,應該有𝐻𝑓和𝐻𝑠的最優結構。)
- However, exploring the optimal structures of the auxiliary encoders is out of the scope of this paper. For simplicity, we assign the same structure to H f Hf Hf and H s Hs Hs. Besides, to learn representations of the unlabeled examples from the perturbed graph G ~ \tilde{G} G~, another encoder is required, but it is only for graph convolution. All the encoders share the bottom embeddings E E E and are built over different views with the LightGCN structure. (然而,探索輔助編碼器的最優結構超出了本文的范圍。為簡單起見,我們將相同的結構分配給𝐻𝑓和𝐻𝑠。此外,為了從擾亂的圖?G中學習無標簽樣本的表示,需要另一個編碼器,但只做圖卷積。 所有的編碼器都共享初始嵌入𝑬,并使用LightGCN結構在不同的視圖上構建編碼器。)
3.3.2 Constructing Self-Supervision Signals.
-
(1) By performing graph convolution over the three views, the encoders learn three groups of user representations. As each view reflects a different aspect of the user preference, it is natural to seek supervisory information from the other two views to improve the encoder of the current view. Given a user, we predict its semantically positive examples in the unlabeled example set using the user representations from the other two views. Taking user u u u in the preference view as an instance, the labeling is formulated as: (通過在這三個視圖上進行圖卷積,編碼器學習了三組用戶表示。由于每個視圖都反映了用戶偏好的不同方面,因此從其他兩個視圖中尋求監督信息,以改進當前視圖的編碼器。給定一個用戶,我們使用來自其他兩個視圖的用戶表示來預測它在無標簽樣本集中的語義上的正樣本。以偏好視圖中的用戶𝑢為例子,表述為:
- where ? \phi ? is the cosine operation,
- z u s z^s_u zus? and z u f z^f_u zuf? are the representations of user u u u learned by H s H_s Hs? and H f H_f Hf?, respectively,
- Z ~ \tilde{Z} Z~ is the representations of users in the unlabeled example set obtained through graph convolution, (通過圖卷積得到的無標簽樣本集中用戶的表示)
- and y u + s y^s_{u+} yu+s? and y u + f y^f_{u+} yu+f? denote the predicted probability of each user being the semantically positive example of user u u u in the corresponding views. (每個用戶在相應視圖中為用戶𝑢的語義正樣本的預測概率。)
-
(2) Under the scheme of tri-training, to avoid noisy examples, only if both H s H_s Hs? and H f H_f Hf? agree on the labeling of a user being the positive sample, and then the user can be labeled for H r H_r Hr?. We obey this rule and add up the predicted probabilities from the two views and obtain: (在 tri-training方案下,為了避免噪聲樣本,只有𝐻𝑠和𝐻𝑓都同意將用戶標記為正樣本,才能在𝐻𝑟將用戶進行標記。) (我們遵循這一規則,并將從這兩個視圖中得到的預測概率加起來,得到:)
-
(3) With the probabilities, we can select K K K positive samples with the highest confidence. This process can be formulated as: (根據這些概率,我們可以選擇可信度最高的𝐾個正樣本。此過程可表述為:)
-
(4) In each iteration, G ~ \tilde{\mathcal{G}} G~? is reconstructed with the random edge dropout for varying user representations. (在每次迭代中,通過隨機邊dropout進行 G ~ \tilde{\mathcal{G}} G~?的重構,生成不同的用戶表示。)
- SEPT dynamically generates positive pseudo-labels over this data augmentation for each user in every view. (SEPT在每個視圖中為每個用戶在數據增強中動態的生成正向偽標簽)
- Then these labels are used as the supervisory signals to refine the shared bottom representations. (然后,這些標簽被用作監督信號來定義初始的底部表示。)
3.3.3 Contrastive Learning.
-
(1) Having the generated pseudo-labels, we develop the neighbor-discrimination contrastive learning method to fulfill self-supervision in SEPT. (通過生成的偽標簽,我們開發了鄰居識別對比學習方法來實現SEPT中的自監督)
-
(2) Given a certain user,
- we encourage the consistency between his node representation and the labeled user representations from P u + \mathcal{P}_{u+} Pu+?, (我們鼓勵(應該就是最大化)他的節點表示和來自 P u + \mathcal{P}_{u+} Pu+?,的標記用戶表示之間的一致性)
- and minimizing the agreement between his representation and the unlabeled user representations. (最小化他的表示和未標記的用戶表示之間的一致性。)
- The idea of the neighbor-discrimination is that, given a certain user in the current view, (最小化他的表示和未標記的用戶表示之間的一致性。)
- the positive pseudo-labels semantically represent his neighbors or potential neighbors in the other two views, then we should also bring these positive pairs together in the current view due to the homophily across different views. And this can be achieved through the neighbor-discrimination contrastive learning. (在當前視圖中給定確定的用戶,在其他兩個視圖中正向的偽標簽在語義上表示它的鄰居或潛在的鄰居,由于不同視圖的同質性,那么我們也應該把這些正對放到當前視圖中。這可以通過鄰居辨別對比學習來實現)
- Formally, we follow the previous studies [5, 29] to adopt InfoNCE [12], which is effective in mutual information estimation, as our learning objective to maximize the agreement between positive pairs and minimize that of negative pairs: (形式上,我們遵循之前的研究,采用在互信息估計中有效的InfoNCE作為我們的學習目標,以最大化正對之間的一致性,最小化負對之間的一致性:)
- where ψ ( z v U u , z ~ p ) = e x p ( ? ( z u v ? z ~ p ) τ ) \psi(z^vUu, \tilde{z}_p) = exp(\phi(z^v_u· \tilde{z}_p)\tau ) ψ(zvUu,z~p?)=exp(?(zuv??z~p?)τ)
- ? ( ? ) : R d × R d ? R \phi(\cdot) : R^d \times R^d \longmapsto R ?(?):Rd×Rd?R is the discriminator function that takes two vectors as the input and then scores the agreement between them, (是一個辨別器函數,它輸入兩個向量,預估它們之間的一致性)
- and τ \tau τ is the temperature to amplify the effect of discrimination ( τ \tau τ = 0.1 is the best in our implementation). (是為了放大辨別效果的溫度系數(在我們的實現中,=0.1是最佳的效果))
- We simply implement the discriminator by applying the cosine operation. (我們可以簡單地通過應用余弦操作來實現辨別器)
- Compared with the self-discrimination, the neighbor-discrimination leverages the supervisory signals from the other users. (與自辨別器相比,鄰居辨別器利用了來自其他用戶的監控信號。)
- When only one positive example is used and if the user itself in Z ~ \tilde{Z} Z~ has the highest confidence in y u + y_{u+} yu+?, the neighbor-discrimination degenerates to the self-discrimination. So, the self-discrimination can be seen as a special case of the neighbor-discrimination. (當只使用一個正樣本,并且如果在 𝒚 𝑢 + 𝒚_{𝑢+} yu+? 中用戶自身 Z ~ \tilde{Z} Z~的得分最高時,鄰居辨別器就會退化為自辨別器。因此,自辨別器可以看作是鄰居辨別器的一種特例。 )
- But when a sufficient number of positive examples are used, these two methods could also be simultaneously adopted because the user itself in Z ~ \tilde{Z} Z~ is often highly likely to be in the Top-K similar examples P u + \mathcal{P}_{u+} Pu+?. With the training proceeding, the encoders iteratively improve to generate evolving pseudo-labels, which in turn recursively benefit the encoders again. (但是,當使用足夠數量的正樣本時,這兩種方法也可以同時被采用,因為 Z ~ \tilde{Z} Z~中的用戶本身通常很可能在Top-K類似的示例 P 𝑢 + P_{𝑢+} Pu+?中。隨著訓練過程的進行,編碼器不斷改進,生成不斷演化的偽標簽,進而遞歸地使編碼器再次受益。)
-
(3) Compared with the vanilla tri-training, it is worth noting that in SEPT, we do not add the pseudo-labels into the adjacency matrices for subsequent graph convolution during training. Instead, we adopt a soft and flexible way to guide the user representations via mutual information maximization, which is distinct from the vanilla tri-training that adds the pseudo-labels to the training set for next-round training. The benefits of this modeling are two-fold. Firstly, adding pseudo-labels leads to reconstruction of the adjacency matrices after each iteration, which is time-consuming; secondly, the pseudo-labels generated at the early stage might not be informative; repeatedly using them would mislead the framework.
與普通的tri-training相比,值得注意的是,在SEPT中,我們沒有將偽標簽添加到鄰接矩陣中,用于后續的圖卷積。相反,我們采用了一種靈活的方式,通過互信息最大化來指導用戶表示,這不同于普通的tri-training訓練,后者將偽標簽添加到訓練集中,以進行進一步的循環訓練。這種建模的好處是有兩方面的。首先,添加偽標簽會導致每次迭代后重建鄰接矩陣,這很耗時;其次,早期生成的偽標簽可能沒有信息;重復使用它們會誤導框架。
3.3.4 Optimization.
- (1) The learning of SEPT consists of two tasks:
- recommendation (推薦)
- and the neighbor-discrimination based contrastive learning. (基于鄰居辨別的對比學習)
- (2) Let L r \mathcal{L}_r Lr? be the BPR pairwise loss function [24] which is defined as:
- where I ( u ) \mathcal{I}(u) I(u) is the item set that user u u u has interacted with,
- r ^ u i = P u ? Q i \hat{r}_{ui} = P^{\top}_u Q_i r^ui?=Pu??Qi?, P P P and Q Q Q are obtained by splitting Z r Z^r Zr,
- and λ \lambda λ is the coefficient controlling the L 2 L_2 L2? regularization.
- (2) The training of SEPT proceeds in two stages:
- initialization and
- joint learning.
- To start with, we warm up the framework with the recommendation task by optimizing L r L_r Lr?. (首先,我們通過優化 L 𝑟 L_𝑟 Lr? 的推薦任務來初始化框架)
- Once trained with L r Lr Lr, the shared bottom E E E has gained far strong errepresentations than randomly initialized embeddings. The self-supervised tri-training then proceeds as described in Eq. (4) - (7), ( 一旦通過 L r L_r Lr?訓練,共享的底部𝑬比隨機初始化的嵌入獲得了更強的表示能力。然后,自監督tri-training如公式(4)-(7)中所述進行)
- acting as an auxiliary task which is unified into a joint learning objective to enhance the performance of the recommendation task. The overall objective of the joint learning is defined as:。輔助任務與主要任務被合并為聯合學習目標,提高推薦任務的性能。聯合學習的總體目標被定義為:
- where β \beta β is a hyper-parameter used to control the magnitude of the self-supervised tri-training. The overall process of SEPT is presented in Algorithm 1. (其中𝛽是一個超參數,用來控制self-supervised tri-training的大小。算法1給出了SEPT的總體過程。)
- where β \beta β is a hyper-parameter used to control the magnitude of the self-supervised tri-training. The overall process of SEPT is presented in Algorithm 1. (其中𝛽是一個超參數,用來控制self-supervised tri-training的大小。算法1給出了SEPT的總體過程。)
3.4 Discussions
3.4.1 Connection with Social Regularization.
- (1) Social recommendation [38, 43, 44] integrates social relations into recommender systems to address the data sparsity issue. A common idea of social recommendation is to regularize user representations by minimizing the euclidean distance between socially connected users, which is termed social regularization [18]. (社會推薦將社會關系集成到推薦系統中,以解決數據稀疏性問題。社交推薦的一個常見思想是通過最小化社會連接用戶之間的歐氏距離來規范用戶表示,這被稱為社會正則化)
- (2) Although the proposed SEPT also leverages socially-aware supervisory signals to refine user representations, it is distinct from the social regularization. The differences are also two-fold. (盡管提出的SEPT也利用了社交-感知監督信號來完善用戶表示,它不同于社會的正則化。有兩方面的差異)
- Firstly, the social regularization is a static process which is always performed on the socially connected users, whereas the neighbor-discrimination is dynamic and iteratively improves the supervisory signals imposed on uncertain users; (首先,社交正則化是一個靜態的過程,它總是作用在有社會關聯的用戶上,而鄰居辨別是動態的,并且迭代地改進對不確定用戶施加監督信號)
- secondly, negative social relations (dislike) cannot be readily retrieved in social recommendation, and hence the social regularization can only keep socially connected users close. But SEPT can also pushes users who are not semantically positive in the three views apart. (其次,負面的社會關系(不喜歡)不能在社會推薦中被輕易地檢索到,因此社交正規化只能使有社會聯系的用戶保持接近。但SEPT也可以將那些在三種視圖中語義不正向的用戶分開。
3.4.2 Complexity.
- (1) Architecturally, SEPT can be model-agnostic, and its complexity mainly depends on the structure of the used encoders. In this paper, we present a LightGCN-based architecture. Given O ( ∣ R ∣ d ) O(|R|d) O(∣R∣d) as the time complexity of the recommendation encoder for graph convolution, the total complexity for the graph convolution is less than 3 O ( ∣ R ∣ d ) 3O(|R|d) 3O(∣R∣d) because A f A_f Af? and A s A_s As? are usually sparser than R R R. The prime cost of the labeling process comes from the Top-K operation in ==Eq. (6), which usually requires O ( m l o g ( K ) ) O(mlog(K)) O(mlog(K)) by using the max heap. To reduce the cost and speed up training, in each batch for training, only c (c ? m, e.g. 1000) users in a batch are randomly selected and being the unlabeled example set of the pseudo-labels, and this sampling method can also prevent overfitting. The complexity of the neighbor-discrimination based contrastive learning is O ( c d ) O(cd) O(cd).
在結構上,SEPT可以是與模型無關的,其復雜性主要取決于所使用的編碼器的結構。在本文中,我們提出了一個基于LightGCN的結構。給定O(|𝑹|𝑑)作為圖卷積的推薦編碼器的時間復雜度,圖卷積的總復雜度小于4O(|𝑹|𝑑),因為𝑨𝑓、𝑨𝑠和?G通常比R更稀疏。另一個成本來自于等式(6)中標記過程的Top-K操作,通過使用最大堆,它通常需要O(𝑚log(𝐾))。為了降低成本,加快訓練速度,在每批訓練中,一批隨機選擇𝑐個(𝑐?𝑚,如1000)用戶作為偽標簽的無標簽樣本集,這種采樣方法也可以防止過擬合。基于鄰居辨別的對比學習的復雜度為O(𝑐𝑑)。
4 EXPERIMENTAL RESULTS
4.1 Experimental Settings
4.1.1 Datasets.
- Three real-world datasets: Last.fm2, Douban-Book3, and Yelp4 are used in our experiments to evaluate SEPT. As SEPT aims to improve Top-N recommendation, we follow the convention in previous research [43, 44] to leave out ratings less than 4 in the dataset of Douban-Book which consists of explicit ratings with a 1-5 rating scale, and assign 1 to the rest. The statistics of the datasets is shown in Table 1. For precise assessment, 5-fold cross-validation is conducted in all the experiments and the average results are presented. (三個真實的數據集:Last.fm,Douban-Book和Yelp在我們的實驗中用來評估SEPT。SEPT旨在改進Top-N推薦,Douban-Book數據集包含1-5的評分,我們遵循之前研究的慣例,在豆瓣的數據集中省略小于4的評分,其余的賦值為1。數據集的統計數據如表1所示。為了進行精確的評估,對所有的實驗都進行了5倍交叉驗證,并給出了平均結果。
4.1.2 Baselines.
-
(1) Three recent graph neural recommendation models are compared with SEPT to test the effectiveness of the self-supervised tri-training for recommendation: (將三種最近的圖神經推薦模型與SEPT進行比較,以檢驗 self-supervised tri-training推薦的有效性:)
- LightGCN [11] is a GCN-based general recommendation model that leverages the== user-item proximity to learn node representations== and generate recommendations, which is reported as the state-of-the-art. (LightGCN是一個基于GCN的通用推薦模型,它利用用戶-項目的接近程度來學習節點表示并生成推薦,這被報告是最先進的方法。)
- DiffNet++ [30] is a recent GCN-based social recommendation method that models the recursive dynamic social diffusion in both the user and item spaces. (DiffNet++是最新的一種基于GCN的社會推薦方法,它模擬了用戶和項目空間中的遞歸動態社交擴散。)
- MHCN [44] is a latest hypergraph convolutional network-based social recommendation method that models the complex correlations among users with hyperedges to improve recommendation performance. (MHCN是一種最新的基于超圖卷積網絡的社交推薦方法,它用超邊形成用戶之間的復雜相關性,以提高推薦性能。)
-
(2) LightGCN [11] is the basic encoder in SEPT. Investigating the performance of LightGCN and SEPT is essential. Since LightGCN is a widely acknowledged SOTA baseline reported in many recent papers [29, 44], we do not compare SEPT with other weak baselines such as NGCF [28], GCMC [2], and BPR [24]. Two strong social recommendation models are also compared to SEPT to verify that the self-supervised tri-training, rather than the use of social relations, is the main driving force of the performance improvements. (LightGCN是SEPT中的基本編碼器。研究LightGCN和SEPT的性能至關重要。由于LightGCN是最近許多論文中報道的一個廣泛公認的SOTA基線,我們沒有將SEPT與其他弱基線如NGCF、GCMC和BPR進行比較。兩種強的社交推薦模型也與SEPT進行了比較,以證明使用社交關系不是性能改進的主要驅動力。
4.1.3 Metrics.
- To evaluate all the methods, we first perform item ranking on all the candidate items. Then two relevancy-based metrics Precision@10 and Recall@10 and one ranking-based metric NDCG@10 are calculated on the truncated recommendation lists, and the values are presented in percentage. (為了評估所有的方法,我們首先對所有的候選項目進行項目排序。然后是兩個基于相關性的指標,Precision@10和Recall@10,以及一個基于排名的指標NDCG@10。)
4.1.4 Settings.
- For a fair comparison, we refer to the best parameter settings reported in the original papers of the baselines and then fine tune all the hyperparameters of the baselines to ensure the best performance of them. As for the general settings of all the methods, we empirically set the dimension of latent factors (embeddings) to 50, the regularization parameter λ \lambda λ to 0.001, and the batch size to 2000. In section 4.4, we investigate the parameter sensitivity of SEPT, and the best parameters are used in section 4.2 and 4.3. We use Adam to optimize all these models with an initial learning rate 0.001. (為了進行公平的比較,我們參考基線原始論文中報告的最佳參數設置,然后微調基線的所有超參數,以確保它們的最佳性能。對于所有方法的一般設置,我們將潛在因素(嵌入)的維數設置為50,正則化參數𝜆設置為0.001,批處理大小設置為2000。在第4.4節中,我們研究了SEPT的參數敏感度,并在第4.2節和第4.3節中使用了最佳參數。我們使用Adam來優化所有這些模型,初始學習率為0.001。
4.2 Overall Performance Comparison
-
(1) In this part, we validate if SEPT can improve recommendation. The performance comparisons are shown in Table 2 and 3. We conduct experiments with different layer numbers in Table 2. In Table 3, a two-layer setting is adopted for all the methods because they all reach their best performance on the used datasets under this setting. The performance improvement (drop) marked by ↑ (↓) is calculated by using the performance difference to divide the subtrahend. According to the results, we can draw the following observations and conclusions: (在這部分中,我們驗證了SEPT是否可以改進推薦。性能比較如表2和表3所示。我們在表2中使用不同的層數進行了實驗。在表3中,所有方法都采用了兩層設置,因為它們在此設置下在所使用的數據集上都達到了最好的性能。↑(↓)標記的性能提升(下降)是通過使用性能差除分來計算的。根據研究結果,我們可以得出以下觀察結果和結論:
-
Under all the different layer settings, SEPT can significantly boost LightGCN. Particularly, on the sparser datasets: Douban-Book and Yelp, the improvements get higher. The maximum improvement can even reach 11%. This can be an evidence that demonstrates the effectiveness of self-supervised learning. Besides, although both LightGCN and SEPT suffer the over-smoothed problem when the layer number is 3, SEPT can still outperform LightGCN. We think the possible reason is that contrastive learning can, to some degree, alleviate the over-smooth problem because the dynamically generated unlabeled examples provide sufficient data variance. (在所有不同的圖層設置下,SEPT可以顯著提高LightGCN。特別是在更稀疏的數據集上(Douban-Book和Yelp),提升更大。最大的改進甚至可以達到11%。這可以是證明自監督學習有效性的一個證據。此外,雖然當層數為3時,LightGCN和SEPT都存在過平滑的問題,但SEPT的性能仍然優于LightGCN。我們認為可能的原因是,對比學習可以在一定程度上緩解過平滑的問題,因為動態生成的無標簽的樣本提供了足夠的數據方差。
-
(2) In addition to the comparison with LightGCN, we also compare SEPT with social recommendation models to validate if the self-supervised tri-training rather than social relations primarily promote the recommendation performance. Since MHCN is also built upon LightGCN, comparing these two models can be more informative. Besides, S2-MHCN, which is the self-supervised variant of MHCN is also ompared. The improvements (drops) are calculated by comparing the results of SEPT and S2-MHCN. According to the results in Table 3, we make the following observations and conclusions: (除了與LightGCN進行比較外,我們還將SEPT與社會推薦模型進行了比較,以驗證 selfsupervised tri-training主要不是因為社會關系提升了推薦性能。由于MHCN也建立在LightGCN之上,因此比較這兩種模型可以提供更多的信息。此外,𝑆2-MHCN是MHCN自監督的變體,通過比較SEPT和𝑆2-MHCN的結果,計算了性能提升(下降)。根據表3中的結果,我們有以下觀察和結論:)
-
(3) Although integrating social relations into graph neural models are helpful (comparing MHCN with LightGCN), learning under the scheme of SEPT can achieve more performance gains (comparing SEPT with MHCN). DiffNet++ is uncompetitive compared with the other three methods. Its failure can be attributed to its redundant and useless parameters and operations [11]. On both LastFM and Douban-Book, SEPT outperformsS2-MHCN. On Yelp, S2-MHCN exhibits better performance than SEPT does. The supe- riority of SEPT and S2-MHCN demonstrates that self-supervised learning holds vast capability for improving recommendation. In addition, SEPT does not need to learn other parameters except the bottom embeddings, whereas there are a number of other parameters thatS2-MHCN needs to learn. Meanwhile, SEPT runs much faster than S2-MHCN does in our experiments, which makes it more competitive even that it is beaten by S2-MHCN on Yelp by a small margin. (雖然將社會關系整合到圖神經模型中是有幫助的(比較MHCN和LightGCN),但在SEPT方案下的學習可以獲得更多的性能提高(比較SEPT和MHCN)。DiffNet++與其他三種方法相比沒有競爭力。它的失敗可以歸因于其冗余和無用的參數和操作。在LastFM 和 Douban-Book上,SEPT的表現都優于𝑆2-MHCN。在Yelp上,𝑆2-MHCN的 。比SEPT更好。SEPT和𝑆2-MHCN的優越性表明,自監督學習具有巨大的提升推薦的能力。此外,SEPT不需要學習除了底部嵌入之外的其他參數,而𝑆2-MHCN還需要學習許多其他參數。與此同時,SEPT的運行速度比𝑆2-MHCN快得多,這使得它更具競爭力,即使它在Yelp上被𝑆2-MHCN小幅度擊敗。
4.3 Self-Discrimination v.s.Neighbor-Discrimination
- In SEPT, the generated positive examples can include both the user itself and other users in the unlabeled example set. It is not clear which part contributes more to the recommendation performance. In this part, we investigate the self-discrimination and the neighbor- discrimination without the user itself being the positive example. For convenience, we use SEPT-SD to denote the self-discrimination, and SEPT-ND to denote the latter. It also should be mentioned that, for SEPT-ND only, a small β \beta β = 0.001 can lead to the best performance on all the datasets. A two-layer setting is used in this case. (在SEPT中,生成的正樣本可以包括用戶本身和無標簽樣本集的其他用戶。目前還不清楚是哪一部分對推薦性能的貢獻更大。在這部分中,我們研究了在沒有用戶本身的情況下,自我辨別和鄰居辨別。為方便起見,我們使用SEPT-SD表示自我辨別,使用SEPT-ND表示鄰居辨別。還應該提到的是,僅對于SEPT-ND,𝛽=0.001可以在所有數據集上獲得最好的性能。在本例中使用了兩層設置
- (2) According to Fig. 2, we can observe that both SEPT-SD and SEPTND exhibit better performances than LightGCN does, which proves that both the supervisory signals from the user itself and other users can benefit a self-supervised recommendation model. Our claim about the self-supervision signals from other users is validated. Besides, the importance of the self-discrimination and the neighbor-discrimination varies from dataset to dataset. On LastFM, they almost contribute equally. On Douban-Book, self-discrimination shows much more importance. On Yelp, neighbor-discrimination is more effective. ((根據圖2,我們可以觀察到,SEPT-SD和SEPT-ND都表現出比LightGCN更好的性能,這證明了來自用戶本身和其他用戶的監督信號都可以受益于自監督推薦模型。我們認為來自其他用戶的自監督信號的想法得到了驗證。此外,自我辨別和鄰居辨別的重要性也因數據集而異。在LastFM上,他們的貢獻幾乎相同。在 Douban-Book上,自我辨別表現得更為重要。在Yelp上,鄰居辨別更有效。)
- (3)This phenomenon can be explained by Fig. 5. With the increase of the used positive examples, we see that the performance of SEPT almost remains stable on LastFM and Yelp but gradually declines on Douban-Book. We guess that there is widely observed homophily in LastFM and Yelp, so a large number of users share similar preferences, which can be the high-quality positive examples in these two datasets. However, users in Douban-Book may have more diverse interests, which results in the quality drop when the number of used positive examples increases. (這種現象可以用圖5來解釋。隨著正例的增加,我們看到SEPT在LastFM和Yelp上的表現幾乎保持穩定,但在Douban-Book上逐漸下降。我們推測在LastFM和Yelp中存在廣泛觀察到的同質性,因此大量用戶具有相似的偏好,這可能是這兩個數據集中高質量的正例。然而,Douban-Book的用戶可能會有更多樣化的興趣,當使用的正面例子數量增加時,就會導致性能下降。)
4.4 View Study
- (1) In SEPT, we build two augmented views to conduct tri-training for mining supervisory signals. In this part, we ablate the framework to investigate the contribution of each view. A two-layer setting is used in this case. In Fig. 3, ‘Friend’ or ‘Sharing’ means that the corresponding view is detached. When only two views are used, SEPT degenerates to the self-supervised co-training. ‘Preference Only’ means that only the preference view is used. In this case, SEPT further degenerates to the self-training. (在SEPT中,我們構建了兩個增強視圖來對挖掘監控信號進行tri-training。在這部分中,我們研究每個視圖的貢獻。在本研究中使用了兩層設置。在圖3中,“Friend”或“Sharing”意味著對應的視圖被分離。當只使用兩個視圖時,SEPT退化為自我監督的co-training。“Preference-Only”意味著只使用偏好視圖,在這種情況下,SEPT進一步退化為 self-training。
- (2) From Fig. 3, we can observe that on both LastFM and Yelp, all the views contribute, whereas on Douban-Book, the self-supervised co-training setting achieves the best performance. Moreover, when only the preference view is used, SEPT shows lower performance but it is still better than that of LightGCN. With the decrease of used number of views, the performance of SEPT slightly declines on LastFM, and an obvious performance drop is observed on Yelp. On Douban-Book, the performance firstly gets a slight rise and then declines obviously when there is only one view. The results demonstrate that, under the semi-supervised setting, even a single view can generate desirable self-supervised signals, which is encouraging since social relations or other side information are not always accessible in some situations. Besides, increasing the used number of views may bring more performance gains, but it is not absolutely right. (從圖3中,我們可以看到,在LastFM和Yelp上,所有的視圖都有貢獻,而在Douban-Book上,自監督的co-training設置表現最好。此外,當只使用Preference-Only視圖時,SEPT的性能較低,但仍優于LightGCN。隨著使用視圖數的減少,SEPT在LastFM上的性能略有下降,而在Yelp上的性能明顯下降。在Douban-Book上,當只有一種視圖時,表現先略有上升,然后明顯下降。結果表明,在半監督設置下,即使是一個單一的視圖也能產生理想的自監督信號,因為社會關系或其他方面的信息在某些情況下并不總是可獲得的。此外,增加視圖的使用數量可能會帶來更多的性能提高,但這并不是絕對正確的。
4.5 Parameter Sensitivity Analysis
- (1) There are three important hyper-parameters used in SEPT: β for controlling the magnitude of self-supervised tri-training, K - the number of used positive examples and ρ - the edge dropout rate of ?G. We choose some representative values for them to investigate the parameter sensitivity of SEPT. The results are presented in Fig. 4 - 6. When investigating the influence of β, we fix K = 10 and ρ = 0.3. For the influence of K in Fig. 5, we fix β = 0.005 on LastFM and Yelp, β = 0.02 on Douban-Book, and ρ = 0.3. Finally, for the effect of ρ in Fig. 6, the setting of β is as the same as the last case, and K = 10. A two-layer setting is used in this case. (在SEPT中使用了三個重要的超參數:𝛽是用于控制self-supervised tri-training的大小,𝐾是使用的正例的數量和𝜌是?G的邊dropout。我們選擇了一些具有代表性的值來研究SEPT的參數敏感性。結果如圖4 - 6.所示。在研究𝛽的影響時,我們固定了𝐾=10和𝜌=0.3。對于圖5中𝐾的影響,我們在LastFM和Yelp上設置了𝛽=0.005,在Douban-Book上設置了𝛽=0.02,𝜌=0.3。最后,對于圖6中𝜌的影響,𝛽=0.02,𝐾=10。在本研究中使用了兩層設置。
- (2) As can be observed from Fig. 4, SEPT is sensitive to β \beta β. On different datasets, we need to choose different values of β \beta β for the best performance. Generally, a small value of β \beta β can lead to a desirable performance, and a large value of β \beta β results in a huge performance drop. Figure 5 has been interpreted in Section 4.3. According to Fig. 6, we observe that SEPT is not sensitive to the edge dropout rate. Even a large value of ρ (e.g., 0.8) can create informative self-supervision signals, which is a good property for the possible wide use of SEPT. When the perturbed graph is highly sparse, it cannot provide useful information for self-supervised learning. (從圖4中可以看出。SEPT對𝛽很敏感。在不同的數據集上,我們需要選擇不同的𝛽值以獲得最佳性能。一般來說,一個小的𝛽值就會導致理想的性能,而一個大的𝛽值就會導致性能的巨大下降。圖5已在第4.3節中進行了解釋。根據圖6中。我們看到SEPT對邊dropout不敏感。即使是一個很大的𝜌值(例如,0.8)也可以創建提供信息的自監督信號,這對于SEPT可能的廣泛使用是一個很好的特性。當被擾亂的圖是高度稀疏時,它不能為自監督學習提供有用的信息。
5 CONCLUSION AND FUTURE WORK
-
(1) The self-supervised graph contrastive learning, which is widely used in the field of graph representation learning, recently has
been transplanted to recommendation for improving the recommendation performance. However, most SSL-based methods only exploit self-supervision signals through the self-discrimination, and SSL cannot fully exert itself in the scenario of recommendation to leverage the widely observed homophily. (自監督圖對比學習被廣泛應用于圖表示學習領域,最近被移植到推薦學習中,以提高推薦性能。然而,大多數基于SSL的方法只通過自我識別來利用自監督信號,而SSL不能在推薦的場景中充分利用廣泛觀察到的同質性。) -
(2) To address this issue, in this paper,
- we propose a socially-aware self-supervised tri-training framework named SEPT to improve recommendation (我們提出了一個名為SEPT的社交-意識self-supervised tri-training框架)
- by discovering self-supervision signals from two complementary views of the raw data. (通過從原始數據的兩個互補視圖中發現自監督信號來改進推薦。)
- Under the self-supervised tri-training scheme, the neighbor-discrimination based contrastive learning method is developed to refine user representations with pseudo-labels from the neighbors=. (在self-supervised tri-training方案下,提出了基于鄰居識別== 的 對比學習方法,利用鄰居中的偽標簽來細化用戶表示。)
-
(3) Extensive experiments demonstrate the effectiveness of SEPT, and a thorough ablation study is conducted to verify the rationale of the self-supervised tri-training.
-
(4) In this paper, only the self-supervision signals from users are exploited. However, items can also analogously provide informative pseudo-labels for self-supervision. This can be implemented by leveraging the multimodality of items. We leave it as our future work. We also believe that the idea of self-supervised multi-view co-training can be generalized to more scenarios beyond recommendation.
(本文只利用了來自用戶的自監督信號。然而,項目也可以類似地為自監督提供信息性的偽標簽。這可以通過利用項目的多模式來實現。我們把它作為我們未來的工作。我們還相信,自我監督的多視圖共同訓練的想法可以推廣到除推薦之外的更多的場景。)
ACKNOWLEDGMENT
REFERENCES
總結
以上是生活随笔為你收集整理的2021_KDD_Socially-Aware Self-Supervised Tri-Training for Recommendation的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 20185111019石燕鹏组
- 下一篇: 2022年港澳台华侨生联考录取分数线40