Advances in Graph Neural Networks笔记4:Heterogeneous Graph Neural Networks
諸神緘默不語-個人CSDN博文目錄
本書網址:https://link.springer.com/book/10.1007/978-3-031-16174-2
本文是本書第四章的學習筆記。
感覺這一章寫得不怎么樣。以研究生組會講異質圖神經網絡主題論文作為標準的話,倒是還行,介紹了HGNN的常見范式和三個經典模型。以教材或綜述作為標準的話,建議別買。
文章目錄
- 1. HGNN
- 2. heterogeneous graph propagation network (hpn)
- 2.1 semantic confusion
- 2.2 HPN模型架構
- 2.3 HPN損失函數
- 2. distance encoding-based heterogeneous graph neural network (DHN)
- 2.1 HDE概念定義
- 2.2 DHN+鏈路預測
- 3. self-supervised heterogeneous graph neural network with co-contrastive learning (HeCo)
- 4. Further Reading
1. HGNN
兩步信息聚合:
對目標節點
本文介紹的方法關注兩個問題:deep degradation phenomenon and discriminative power
2. heterogeneous graph propagation network (hpn)
分析deep degradation phenomenon問題,減緩semantic confusion
2.1 semantic confusion
semantic confusion類似同質圖GNN中的過擬合問題
HAN模型在不同層上的聚類結果和論文節點表征的可視化圖像,每種顏色代表一種標簽(研究領域)。
(本文中對semantic confusion現象的解釋很含混,略)
HPN一文證明了HGNN和mutiple meta-paths-based random walk是等價的:
multiple meta-paths-based random walk(感覺跟同質圖隨機游走也差不太多嘛,就是多個用metapath處理如同質圖的隨機游走概率疊起來):
加上步數到極限以后,數學部分我就不太懂了??傊畣慰幢硎拘问降脑捯廊桓|圖隨機游走相似,意思就是說最終分布只與metapaths有關,與初始狀態無關:
與之相對,meta-path-based random walk with restart的極限分布是與初始狀態有關的:
meta-path-based random walk with restart:πΦ,k(i)=(1?γ)?MΦ?πΦ,k?1(i)+γ?i\boldsymbol{\pi }^{\Phi ,k}\left( \boldsymbol{i}\right) =(1-\gamma )\cdot {\textbf{M}^{\Phi }}\cdot \boldsymbol{\pi }^{\Phi ,k-1} \left( \boldsymbol{i}\right) +\gamma \cdot \boldsymbol{i}πΦ,k(i)=(1?γ)?MΦ?πΦ,k?1(i)+γ?i
2.2 HPN模型架構
HPN包含兩部分:
參考meta-path-based random walk with restart:
對每個metapath鄰居,先對節點做線性轉換,然后聚合鄰居信息(一式是底下兩式的加總):
ZΦ=PΦ(X)=gΦ(fΦ(X))HΦ=fΦ(X)=σ(X?WΦ+bΦ)ZΦ,k=gΦ(ZΦ,k?1)=(1?γ)?MΦ?ZΦ,k?1+γ?HΦ\begin{aligned} &\textbf{Z}^{\Phi } = \mathcal {P}_{\Phi }(\textbf{X}) =g_{\Phi }({f_{\Phi }(\textbf{X}))}\\ &\textbf{H}^{\Phi }= f_{\Phi }(\textbf{X})=\sigma (\textbf{X} \cdot \textbf{W}^{\Phi }+\textbf^{\Phi })\\ &\textbf{Z}^{\Phi ,k} = g_{\Phi }(\textbf{Z}^{\Phi ,k-1}) =(1-\gamma )\cdot {\textbf{M}^{\Phi }}\cdot \textbf{Z}^{\Phi ,k-1} + \gamma \cdot \textbf{H}^{\Phi } \end{aligned}?ZΦ=PΦ?(X)=gΦ?(fΦ?(X))HΦ=fΦ?(X)=σ(X?WΦ+bΦ)ZΦ,k=gΦ?(ZΦ,k?1)=(1?γ)?MΦ?ZΦ,k?1+γ?HΦ?
Z=F(ZΦ1,ZΦ2,…,ZΦP)wΦp=1∣V∣∑i∈VqT?tanh?(W?ziΦp+b)βΦp=exp?(wΦp)∑p=1Pexp?(wΦp)Z=∑p=1PβΦp?ZΦp\begin{aligned} &\textbf{Z}=\mathcal {F}(\textbf{Z}^{\Phi _1},\textbf{Z}^{\Phi _2},\ldots ,\textbf{Z}^{\Phi _{P}})\\ &w_{\Phi _p} =\frac{1}{|\mathcal {V}|}\sum _{i \in \mathcal {V}} \textbf{q}^\textrm{T} \cdot \tanh ( \textbf{W} \cdot \textbf{z}_{i}^{\Phi _p} +\textbf)\\ &\beta _{\Phi _p}=\frac{\exp (w_{\Phi _p})}{\sum _{p=1}^{P} \exp (w_{\Phi _p})}\\ &\textbf{Z}=\sum _{p=1}^{P} \beta _{\Phi _p}\cdot \textbf{Z}^{\Phi _p} \end{aligned}?Z=F(ZΦ1?,ZΦ2?,…,ZΦP?)wΦp??=∣V∣1?i∈V∑?qT?tanh(W?ziΦp??+b)βΦp??=∑p=1P?exp(wΦp??)exp(wΦp??)?Z=p=1∑P?βΦp???ZΦp??
2.3 HPN損失函數
半監督節點分類的損失函數:
L=?∑l∈YLYl?ln?(Zl?C)\mathcal {L}=-\sum _{l \in \mathcal {Y}_{L}} \textbf{Y}_{l} \cdot \ln ( \textbf{Z}_{l} \cdot \textbf{C})L=?l∈YL?∑?Yl??ln(Zl??C)
無監督節點推薦(?這什么任務)的損失函數,BPR loss with negative sampling:
L=?∑(u,v)∈Ωlog?σ(zu?zv)?∑(u,v′)∈Ω?log?σ(?zu?zv′),\begin{aligned} \mathcal {L}=-\sum _{(u, v) \in \Omega } \log \sigma \left( \textbf{z}_{u}^{\top } \textbf{z}_{v}\right) -\sum _{\left( u^{}, v^{\prime }\right) \in \Omega ^{-}} \log \sigma \left( -\textbf{z}_{u}^{\top } \textbf{z}_{v'}\right) , \end{aligned}L=?(u,v)∈Ω∑?logσ(zu??zv?)?(u,v′)∈Ω?∑?logσ(?zu??zv′?),?
第一項是observed (positive) node pairs,第二項是negative node pairs sampled from all unobserved node pairs
(HPN實驗部分略。簡單地說,做了ablation study,做了無監督節點聚類任務,驗證了模型對層級加深的魯棒性(與HAN對比))
2. distance encoding-based heterogeneous graph neural network (DHN)
關注discriminative power:在聚合時加入heterogeneous distance encoding (HDE)(distance encoding (DE) 也是一個同質圖GNN中用過的概念)
Heterogeneous Graph Neural Network with Distance Encoding
傳統HGNN嵌入各個節點,本文關注節點之間的關聯。
2.1 HDE概念定義
heterogeneous shortest path distance:節點數最少路徑的節點數
上圖中所指的圖片就是下圖:
HDE
異質圖最短路徑距離Hete-SPD:每一維度衡量一類節點在最短路徑中出現的次數(不包括第一個節點)
目標節點對錨節點(組)的HDE:目標節點對所有錨節點Hete-SPD先嵌入,后融合
2.2 DHN+鏈路預測
Heterogeneous Distance Encoding for Link Prediction
目標節點分別是節點對的兩個節點,錨節點組也就是這兩個節點。
ci=onehot(?(i))ei{u,v}=σ(W0?ci∣∣hi{u,v}+b0)\begin{aligned} &\textbf{c}_{i} = onehot(\phi (i))\\ &\textbf{e}_i^{\{u, v\}} =\sigma (\textbf{W}_0 \cdot \textbf{c}_i||\textbf{h}_i^{\{u, v\}} + \textbf_0) \end{aligned}?ci?=onehot(?(i))ei{u,v}?=σ(W0??ci?∣∣hi{u,v}?+b0?)?
xu,l{u,v}=σ(Wl?(xu,l?1{u,v}∣∣Avg{xi,l?1{u,v}}+bl),?i∈Nu{u,v}\begin{aligned} \textbf{x}_{u, l}^{\{u, v\}}=\sigma (\textbf{W}^{l} \cdot (\textbf{x}_{u,l-1}^{\{u, v\}}|| Avg \{\textbf{x}_{i,l-1}^{\{u, v\}}\} +\textbf^{l}), \forall i \in \mathcal {N}^{\{u, v\}}_u \end{aligned}xu,l{u,v}?=σ(Wl?(xu,l?1{u,v}?∣∣Avg{xi,l?1{u,v}?}+bl),?i∈Nu{u,v}??
y^u,v=σ(W1?(zu{u,v}∣∣zv{u,v})+b1)L=∑(u,v)∈E+∪E?(yu,vlog?y^u,v+(1?yu,v)log?(1?y^u,v))\begin{aligned} &\hat{y}_{u,v}=\sigma \left( \textbf{W}_1 \cdot (\textbf{z}_u^{\{u, v\}}||\textbf{z}_v^{\{u, v\}}) +b_1\right)\\ &\mathcal {L}=\sum _{(u, v) \in \mathcal {E}^+\cup \mathcal {E}^-}\left( y_{u, v} \log \hat{y}_{u, v}+\left( 1-y_{u, v}\right) \log \left( 1-\hat{y}_{u, v}\right) \right) \end{aligned}?y^?u,v?=σ(W1??(zu{u,v}?∣∣zv{u,v}?)+b1?)L=(u,v)∈E+∪E?∑?(yu,v?logy^?u,v?+(1?yu,v?)log(1?y^?u,v?))?
(實驗部分略。同時做了transductive和inductive場合下的鏈路預測任務)
3. self-supervised heterogeneous graph neural network with co-contrastive learning (HeCo)
關注discriminative power:使用cross-view contrastive mechanism同時捕獲局部和高階結構
以前的工作:原網絡VS破壞后的網絡
HeCo選擇的view:network schema(局部結構) and meta-path structure(高階結構)
從兩個view編碼節點(編碼時應用view mask mechanism),在兩個view-specific embeddings之間應用對比學習。
hi=σ(W?i?xi+b?i)\begin{aligned} h_i=\sigma \left( W_{\phi _i} \cdot x_i+b_{\phi _i}\right)\end{aligned}hi?=σ(W?i???xi?+b?i??)?
attention:節點級別和類別級別
對某一metapath的鄰居(在實踐中是抽樣一部分節點),應用節點級別attention:
hiΦm=σ(∑j∈NiΦmαi,jΦm?hj)αi,jΦm=exp?(LeakyReLU(aΦm??[hi∣∣hj]))∑l∈NiΦmexp?(LeakyReLU(aΦm??[hi∣∣hl]))\begin{aligned} &h_i^{\Phi _m}=\sigma \left( \sum \limits _{j \in N_i^{\Phi _m}}\alpha _{i,j}^{\Phi _m} \cdot h_j\right)\\ &\alpha _{i,j}^{\Phi _m}=\frac{\exp \left( LeakyReLU\left( {\textbf {a}}_{\Phi _m}^\top \cdot [h_i||h_j]\right) \right) }{\sum \limits _{l\in N_i^{\Phi _m}} \exp \left( LeakyReLU\left( {\textbf {a}}_{\Phi _m}^\top \cdot [h_i||h_l]\right) \right) } \end{aligned}?hiΦm??=σ???j∈NiΦm??∑?αi,jΦm???hj????αi,jΦm??=l∈NiΦm??∑?exp(LeakyReLU(aΦm????[hi?∣∣hl?]))exp(LeakyReLU(aΦm????[hi?∣∣hj?]))??
對所有metapath的嵌入,應用類別級別attention:
wΦm=1∣V∣∑i∈Vasc??tanh?(WschiΦm+bsc)βΦm=exp?(wΦm)∑i=1Sexp?(wΦi)zisc=∑m=1SβΦm?hiΦm\begin{aligned} w_{\Phi _m}&=\frac{1}{|V|}\sum \limits _{i\in V} {\textbf {a}}_{sc}^\top \cdot \tanh \left( {\textbf {W}}_{sc}h_i^{\Phi _m}+{\textbf }_{sc}\right) \\ \beta _{\Phi _m}&=\frac{\exp \left( w_{\Phi _m}\right) }{\sum _{i=1}^S\exp \left( w_{\Phi _i}\right) }\\ z_i^{sc}&=\sum _{m=1}^S \beta _{\Phi _m}\cdot h_i^{\Phi _m} \end{aligned}wΦm??βΦm??zisc??=∣V∣1?i∈V∑?asc???tanh(Wsc?hiΦm??+bsc?)=∑i=1S?exp(wΦi??)exp(wΦm??)?=m=1∑S?βΦm???hiΦm???
meta-path表示語義相似性,用meta-path specific GCN編碼:
hiPn=1di+1hi+∑j∈NiPn1(di+1)(dj+1)hj\begin{aligned} h_i^{\mathcal {P}_n}=\frac{1}{d_i+1}h_i+\sum \limits _{j\in {N_i^{\mathcal {P}_n}}}\frac{1}{\sqrt{(d_i+1)(d_j+1)}}h_j \end{aligned}hiPn??=di?+11?hi?+j∈NiPn??∑?(di?+1)(dj?+1)?1?hj??
在所有metapath的表征上加attention:
zimp=∑n=1MβPn?hiPnwPn=1∣V∣∑i∈Vamp??tanh?(WmphiPn+bmp)βPn=exp?(wPn)∑i=1Mexp?(wPi)\begin{aligned} z_i^{mp}&=\sum _{n=1}^M \beta _{\mathcal {P}_n}\cdot h_i^{\mathcal {P}_n}\\ w_{\mathcal {P}_n}&=\frac{1}{|V|}\sum \limits _{i\in V} {\textbf {a}}_{mp}^\top \cdot \tanh \left( {\textbf {W}}_{mp}h_i^{\mathcal {P}_n}+{\textbf }_{mp}\right) \\ \beta _{\mathcal {P}_n}&=\frac{\exp \left( w_{\mathcal {P}_n}\right) }{\sum _{i=1}^M\exp \left( w_{\mathcal {P}_i}\right) } \end{aligned}zimp?wPn??βPn???=n=1∑M?βPn???hiPn??=∣V∣1?i∈V∑?amp???tanh(Wmp?hiPn??+bmp?)=∑i=1M?exp(wPi??)exp(wPn??)??
network schema view:不使用目標節點本身上一級別的嵌入
meta-path view:不使用與目標節點不同類(不是目標節點基于metapath的鄰居的節點)的嵌入
損失函數:傳統對比學習+圖數據
將兩個view得到的嵌入映射到對比學習的隱空間:
zisc_proj=W(2)σ(W(1)zisc+b(1))+b(2)zimp_proj=W(2)σ(W(1)zimp+b(1))+b(2)\begin{aligned} \begin{aligned} z_i^{sc}\_proj&= W^{(2)}\sigma \left( W^{(1)}z_i^{sc}+b^{(1)}\right) +b^{(2)}\\ z_i^{mp}\_proj&= W^{(2)}\sigma \left( W^{(1)}z_i^{mp}+b^{(1)}\right) +b^{(2)} \end{aligned} \end{aligned}zisc?_projzimp?_proj?=W(2)σ(W(1)zisc?+b(1))+b(2)=W(2)σ(W(1)zimp?+b(1))+b(2)??
計算兩個節點之間連的metapath數:Ci(j)=∑n=1M1(j∈NiPn)\mathbb {C}_i(j) = \sum \limits _{n=1}^M \mathbb {1}\left( j\in N_i^{\mathcal {P}_n}\right)Ci?(j)=n=1∑M?1(j∈NiPn??)(1(?)\mathbb {1}(\cdot )1(?)是indicator function)
將與節點i有metapath相連的節點組成集合Si={j∣j∈VandCi(j)≠0}S_i=\{j|j\in V\ and\ \mathbb {C}_i(j)\ne 0\}Si?={j∣j∈V?and?Ci?(j)?=0},基于Ci(?)\mathbb {C}_i(\cdot )Ci?(?)數降序排列,如果節點集合元素數大于某一閾值,選出該閾值數個正樣本Pi\mathbb {P}_iPi?,沒選出來的作為負樣本Ni\mathbb {N}_iNi?
計算network schema view的對比損失:
Lisc=?log?∑j∈Piexp(sim(zisc_proj,zjmp_proj)/τ)∑k∈{Pi?Ni}exp(sim(zisc_proj,zkmp_proj)/τ)\begin{aligned} \mathcal {L}_i^{sc}=-\log \frac{\sum _{j\in \mathbb {P}_i}exp\left( sim\left( z_i^{sc}\_proj,z_j^{mp}\_proj\right) /\tau \right) }{\sum _{k\in \{\mathbb {P}_i\bigcup \mathbb {N}_i\}}exp\left( sim\left( z_i^{sc}\_proj,z_k^{mp}\_proj\right) /\tau \right) }\end{aligned}Lisc?=?log∑k∈{Pi??Ni?}?exp(sim(zisc?_proj,zkmp?_proj)/τ)∑j∈Pi??exp(sim(zisc?_proj,zjmp?_proj)/τ)??
總的學習目標:
J=1∣V∣∑i∈V[λ?Lisc+(1?λ)?Limp]\begin{aligned} \mathcal {J} = \frac{1}{|V|}\sum _{i\in V}\left[ \lambda \cdot \mathcal {L}_i^{sc}+\left( 1-\lambda \right) \cdot \mathcal {L}_i^{mp}\right] \end{aligned}J=∣V∣1?i∈V∑?[λ?Lisc?+(1?λ)?Limp?]?
(和前兩篇一樣,實驗部分略。重要比較baseline有single-view的對比學習方法DMGI。做了節點分類任務,和Collaborative Trend Analysis(計算隨訓練epoch數增加,attention變化情況??梢园l現兩個view上attention是協同演變的))
4. Further Reading
慣例啦,因為參考文獻序號匹配不上,所以不確定我找的論文 是否正確。
分層級聚合/intent recommendation/文本分類/GTN-軟擇邊類型、自動生成metapaths/HGT-用heterogeneous mutual attention聚合meta-relation triplet/MAGNN-用relational rotation encoder聚合meta-path instances
總結
以上是生活随笔為你收集整理的Advances in Graph Neural Networks笔记4:Heterogeneous Graph Neural Networks的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: WAP网站设计之xhtml mp
- 下一篇: IBM:以现代化存储支持企业数字化转型