node2vec: Scalable Feature Learning for networks
Node2vec歷史意義:
- 是目前引用量比較高的文章
- 與DeepWalk文章一樣,屬于早期網(wǎng)絡(luò)表征學(xué)習(xí)的代表性工作,后期作為經(jīng)典baseline
- 啟發(fā)了大量基于random walk來做網(wǎng)絡(luò)表征學(xué)習(xí)的工作
圖學(xué)習(xí)領(lǐng)域(人工特征提取->特征篩選-> 輸入分類器) ---------- DeepWalk、node2vec ---------- 深度學(xué)習(xí)領(lǐng)域(特征工程和分類集于一體)
? ? ? ? ? ? ? ? ?(基于特征工程)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (基于神經(jīng)網(wǎng)絡(luò))
?
論文主要結(jié)構(gòu)如下:
一、摘要Abstract
介紹背景及提出node2vec模型,靈活可調(diào)的搜索策略
1、強調(diào)之前的基于特征工程的工作缺點,從而引出node2vec并能探索鄰域的多樣性
2、通過biased random walk算法提出可調(diào)的搜索策略,生成不同語義的節(jié)點序列的信息
3、討論算法的高效性、魯棒性,從案例分析和大量實驗論文模型的特點
4、基于以上算法 node2vec算法在多個領(lǐng)域的網(wǎng)絡(luò)數(shù)據(jù)集上達(dá)到比較好的效果
二、Introduction
介紹圖的重要性,與以前的方法做對比
三、Realted Work
介紹傳統(tǒng)基于圖的人工特征的算法,降維算法等
四、Feature Learning
介紹圖的基本概念,skip-gram算法,優(yōu)化目標(biāo)函數(shù)
模型細(xì)節(jié)一? 優(yōu)化目標(biāo)函數(shù) 類似skip-gram
獨立性假設(shè): 鄰居節(jié)點之間互不影響、負(fù)采樣、SGD優(yōu)化方法
核心: 通過隨機游走策略生成Ns(u)
五、Search strategies
介紹BFS、DFS、搜索策略
模型細(xì)節(jié)二 bfs、dfs
""" BFS算法 數(shù)據(jù)結(jié)構(gòu):queue (隊列) FIFO 先進(jìn)先出 BFS具有結(jié)構(gòu)相似性 (structural equivalence)"""from collections import dequedef iter_bfs(G,s,S=None):S,Q = set(),deque()Q.append(s)while Q:u = Q.popleft()if u in S: continueS.add(u)Q.extend(G[u])yield u""" DFS算法 數(shù)據(jù)結(jié)構(gòu): stack(棧) 先進(jìn)先出 DFS具有同質(zhì)相似性、社群相似性(homophily)"""def iter_dfs(G,s):S,Q = set(),[]Q.append(s)while Q:u = Q.pop()if u in S: continueS.add(u)Q.extend(G[u])yield u?
? ? ?模型細(xì)節(jié)三?Biased Random Walk(2nd order)
?
dtx: t、x之間的最短路徑、dtx: {0,1,2} p、q控制了從源點(v)離開其鄰居的快慢
超參數(shù)的理解:
p:?
?p值大:傾向不回溯,降低了2-hop的冗余度
p值小:傾向回溯,采樣序列集中在起始點的周圍
q:
q>1: BFS-behaviour,local view
q<1:DFS-behaviour
六、Node2vec
介紹Biased random-walk算法,參數(shù)p,q的選取方法,時間復(fù)雜度分析、alias sampling
模型細(xì)節(jié)四 node2vec算法
?
時間復(fù)雜度O(m)
模型細(xì)節(jié)五?alias sampling
采樣技巧:
?p=[0.3,0.2,0.1,0.4] sump = [0.3,0.5,0.6,1]?
O(n): linear search
O(logn):binary search
O(1): alias sampling
Alias sampling講的比較好的一篇文章:https://blog.csdn.net/haolexiao/article/details/65157026
七、Effectiveness
介紹實驗探究模型的有效性:? Case study、baselines 參數(shù)設(shè)定、分類任務(wù)和邊預(yù)測任務(wù)
?
八、結(jié)論
? 關(guān)鍵點:
啟發(fā)點:
?
九、代碼
# -*- coding: utf-8 -*-# @Time : 2020/12/14 下午8:22 # @Author : TaoWang # @Description : 主函數(shù)import argparse import networkx as nx from gensim.models import Word2Vec from model import Graphdef parse_args():""":return: 相關(guān)參數(shù)"""parser = argparse.ArgumentParser(description="Run node2vec.")parser.add_argument('--input', nargs='?', default='graph/karate.edgelist', help='Input graph path.')parser.add_argument('--output', nargs='?', default='emb/karate.emb', help='Embedding path')parser.add_argument("--dimensions", type=int, default=128, help='Number of dimensions. Default is 128.')parser.add_argument('--walk-length', type=int, default=80, help='Length of walk per source. Default is 80.')parser.add_argument('--num-walks', type=int, default=10, help='Number of walks per source. Default is 10.')parser.add_argument('--window-size', type=int, default=10, help='Context size for optimization. Default is 10.')parser.add_argument('--iter', type=int, default=1, help='Number of epochs in SGD')parser.add_argument('--workers', type=int, default=8, help='Number of parallel workers. Default is 8.')parser.add_argument('--p', type=float, default=1, help='Return hyperparameter. Default is 1.')parser.add_argument('--q', type=float, default=1, help='Inout hyperparameter. Default is 1.')parser.add_argument('--weighted', dest='weighted', action='store_true', help='Default is unweighted.')parser.add_argument('--unweighted', dest='unweighted', action='store_false')parser.set_defaults(weighted=False)parser.add_argument('--directed', dest='directed', action='store_true', help='Graph is undirected.')parser.add_argument('--undirected', dest='undirected', action='store_false')parser.set_defaults(directed=False)return parser.parse_args()def read_graph():""":return:"""print(args.weighted)"""有權(quán)圖"""if args.weighted:G = nx.read_edgelist(args.input, nodetype=int, data=(('weight', float),), create_using=nx.DiGraph())else:"""無權(quán)圖"""G = nx.read_edgelist(args.input, nodetype=int, create_using=nx.DiGraph())for edge in G.edges():G[edge[0]][edge[1]]['weight'] = 1print((edge[0], edge[1]))# 無向操作 a->b,b->a 去掉一個邊if not args.directed:G = G.to_undirected()return Gdef learning_embedding(walks):""":param walks::return:"""# 將node 的類型 int轉(zhuǎn)化為stringwalks = [list(map(str, walk)) for walk in walks]# 調(diào)用gensim包運行word2vecmodel = Word2Vec(walks,size=args.dimensions,window=args.window_size,min_count=0,sg=1,workers=args.workers,iter=args.iter)## # 保存embedding信息model.wv.save_word2vec_format(args.output)def main(args):""":param args::return:"""nx_G = read_graph()G = Graph(nx_G,args.directed,args.p,args.q)G.propeocess_transition_probs()walks = G.simulate_walk(args.num_walks, args.walk_length)print(walks)learning_embedding(walks)if __name__ == "__main__":args = parse_args()main(args)# -*- coding: utf-8 -*-# @Time : 2020/12/14 下午8:49 # @Author : TaoWang # @Description : 構(gòu)建node2vec算法import numpy as np import randomclass Graph():"""初始化參數(shù)"""def __init__(self, nx_G, is_directed, p, q):self.G = nx_Gself.is_directed = is_directedself.p = pself.q = qdef node2vec_walk(self, walk_length, start_node):""":param walk_length::param start_node::return:"""G = self.G# 上一步計算出的alias table,完成O(1)的采樣alias_nodes = self.alias_nodesalias_edges = self.alias_edgeswalk = [start_node]# 直到生成長度為walk_length的節(jié)點序列位為止while len(walk) < walk_length:cur = walk[-1]# 對鄰居節(jié)點排序,目的是和alias table計算時的順序?qū)?yīng)起來cur_nbrs = sorted(G.neighbors(cur))if len(cur_nbrs) > 0:# 節(jié)點序列只有一個節(jié)點的情況if len(walk) == 1:walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])])else:"""節(jié)點序列大于一個節(jié)點的情況,看前一個節(jié)點,prev是論文中的節(jié)點t"""prev = walk[-2]next = cur_nbrs[alias_draw(alias_edges[(prev, cur)][0],alias_edges[(prev, cur)][1])]walk.append(next)else:breakreturn walkdef simulate_walk(self, num_walks, walk_length):"""有偏的隨機游走生成節(jié)點序列:param num_walks::param walk_length::return:"""G = self.Gwalks = []nodes = list(G.nodes())print("Walk iteration:")for walk_iter in range(num_walks):"""遍歷 num_walks次圖,也就是遍歷所有節(jié)點num_walks次"""print(str(walk_iter + 1), '/', str(num_walks))random.shuffle(nodes)for node in nodes:# node2vec_walk是一次有偏的隨機游走walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node))return walksdef get_alias_edge(self, src, dst):""":param src::param dst::return:"""G, p, q = self.G, self.p, self.qunnormalized_probs = []# node2vec算法 論文3.2.2 dst->v"""apq(t,x) = 1/p if dtx = 0 = 1 if dtx = 1= 1/q if dtx = 2"""for dst_nbr in sorted(G.neighbors(dst)):if dst_nbr == src: # 上一個點srcunnormalized_probs.append(G[dst][dst_nbr]['weight']/p)elif G.has_edge(dst_nbr, src):unnormalized_probs.append(G[dst][dst_nbr]['weight'])else:unnormalized_probs.append(G[dst][dst_nbr]['weight']/q)# 歸一化norm_const = sum(unnormalized_probs)normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]return alias_setup(normalized_probs)def propeocess_transition_probs(self):""":return:"""G = self.Gis_directed = self.is_directed# 創(chuàng)建詞典alias_nodes = {}# 節(jié)點概率alias sampling和歸一化for node in G.nodes():unnormalized_probs = [G[node][nbr]['weight'] for nbr in sorted(G.neighbors(node))]norm_const = sum(unnormalized_probs)normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] # 概率列表[1/9,1/9,....1/9]alias_nodes[node] = alias_setup(normalized_probs)if node==2:print(unnormalized_probs)print(norm_const)print(normalized_probs)print(alias_nodes[node])alias_edges = {}"""邊概率alias sampling 和歸一化"""if is_directed:for edge in G.edges():alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])else:for edge in G.edges():alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0])self.alias_nodes = alias_nodesself.alias_edges = alias_edgesreturndef alias_setup(probs):"""Algorithm: Naive Alias MethodInitialization:1.Multiply each probability pi by n.2.Create arrays Alias and Prob,each of size n3.For j=1 to n-1:1.Find a probability pl satisfying pl <=12.Find a probability pg(l!=g) satisfying pg>=13.Set Prob[l] = pl4.Set Prob[l] = g5.Remove pl from the list of initial probabilities.6.Set pg = pg - (1-pl)4.Let i be the last probability remaining, which must have weight 1.5.Set Prob[i] = 1.Generation:1.Generate a fair die roll from an n-sided die; call the side i.2.Flip a biased coin that comes up heads with probability Prob[i]3.if the coin comes up "heads" return u4.otherwise return Alias[i]:param probs::return:"""K = len(probs)q = np.zeros(K) # probabilityJ = np.zeros(K, dtype=np.int) # aliassmaller, larger = [], []# 將各個概率分成兩組,一組的概率小于1,另一組的概率值大于1for kk, prob in enumerate(probs):q[kk] = K*probif q[kk] < 1.0:smaller.append(kk)else:larger.append(kk)# 使用貪心算法,將概率值小于1的不斷填滿while len(smaller) > 0 and len(larger) > 0:small = smaller.pop()large = larger.pop()J[small] = larger# 更新概率值q[large] = q[large] + q[small] - 1.0if q[large] < 1.0:smaller.append(large)else:larger.append(large)return J, qdef alias_draw(J, q):""":param J::param q::return:"""K = len(J)kk = int(np.float(np.random.rand()*K))# 取自己if np.random.rand() < q[kk]:return kk# 取aliaselse:return J[kk]?
總結(jié)
以上是生活随笔為你收集整理的node2vec: Scalable Feature Learning for networks的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LINE: Large-scale In
- 下一篇: GraphSAGE: Inductive