當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

社区发现算法之——Louvain

發布時間：2023/12/10 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了社区发现算法之——Louvain 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、什么是社區

如果一張圖是對一片區域的描述的話，我們將這張圖劃分為很多個子圖。當子圖之內滿足關聯性盡可能大，而子圖之間關聯性盡可能低時，這樣的子圖我們可以稱之為一個社區。

2、社區發現算法及評價標準

社區發現算法有很多，例如LPA，HANP，SLPA以及我們今天的主人公——Louvain。不同的算法劃分社區的效果不盡相同。那么，如何評價這些算法孰優孰劣呢？
用模塊度modularity來衡量。模塊度定義如下：模塊度是評估一個社區網絡劃分好壞的度量方法，它的物理含義是社區內節點的連邊數與隨機情況下的邊數只差，它的取值范圍是 [?1/2,1)。可以簡單地理解為社區內部所有邊權重和減去與社區相連的邊權重和。

Louvain算法

一種基于模塊度的圖算法模型，與普通的基于模塊度和模塊度增益不同的是，該算法速度很快，而且對一些點多邊少的圖，進行聚類效果特別明顯。
算法流程：
1、初始時將每個頂點當作一個社區，社區個數與頂點個數相同。
2、依次將每個頂點與之相鄰頂點合并在一起，計算它們的模塊度增益是否大于0，如果大于0，就將該結點放入該相鄰結點所在社區。
3、迭代第二步，直至算法穩定，即所有頂點所屬社區不再變化。
4、將各個社區所有節點壓縮成為一個結點，社區內點的權重轉化為新結點環的權重，社區間權重轉化為新結點邊的權重。
5、重復步驟1-3，直至算法穩定。

# coding=utf-8 import collections import randomdef load_graph(path):G = collections.defaultdict(dict)with open(path) as text:for line in text:vertices = line.strip().split()v_i = int(vertices[0])v_j = int(vertices[1])w = float(vertices[2])G[v_i][v_j] = wG[v_j][v_i] = wreturn Gclass Vertex():def __init__(self, vid, cid, nodes, k_in=0):self._vid = vidself._cid = cidself._nodes = nodesself._kin = k_in # 結點內部的邊的權重class Louvain():def __init__(self, G):self._G = Gself._m = 0 # 邊數量self._cid_vertices = {} # 需維護的關于社區的信息(社區編號,其中包含的結點編號的集合)self._vid_vertex = {} # 需維護的關于結點的信息(結點編號，相應的Vertex實例)for vid in self._G.keys():self._cid_vertices[vid] = set([vid])self._vid_vertex[vid] = Vertex(vid, vid, set([vid]))self._m += sum([1 for neighbor in self._G[vid].keys() if neighbor > vid])def first_stage(self):mod_inc = False # 用于判斷算法是否可終止visit_sequence = self._G.keys()random.shuffle(list(visit_sequence))while True:can_stop = True # 第一階段是否可終止for v_vid in visit_sequence:v_cid = self._vid_vertex[v_vid]._cidk_v = sum(self._G[v_vid].values()) + self._vid_vertex[v_vid]._kincid_Q = {}for w_vid in self._G[v_vid].keys():w_cid = self._vid_vertex[w_vid]._cidif w_cid in cid_Q:continueelse:tot = sum([sum(self._G[k].values()) + self._vid_vertex[k]._kin for k in self._cid_vertices[w_cid]])if w_cid == v_cid:tot -= k_vk_v_in = sum([v for k, v in self._G[v_vid].items() if k in self._cid_vertices[w_cid]])delta_Q = k_v_in - k_v * tot / self._m # 由于只需要知道delta_Q的正負，所以少乘了1/(2*self._m)cid_Q[w_cid] = delta_Qcid, max_delta_Q = sorted(cid_Q.items(), key=lambda item: item[1], reverse=True)[0]if max_delta_Q > 0.0 and cid != v_cid:self._vid_vertex[v_vid]._cid = cidself._cid_vertices[cid].add(v_vid)self._cid_vertices[v_cid].remove(v_vid)can_stop = Falsemod_inc = Trueif can_stop:breakreturn mod_incdef second_stage(self):cid_vertices = {}vid_vertex = {}for cid, vertices in self._cid_vertices.items():if len(vertices) == 0:continuenew_vertex = Vertex(cid, cid, set())for vid in vertices:new_vertex._nodes.update(self._vid_vertex[vid]._nodes)new_vertex._kin += self._vid_vertex[vid]._kinfor k, v in self._G[vid].items():if k in vertices:new_vertex._kin += v / 2.0cid_vertices[cid] = set([cid])vid_vertex[cid] = new_vertexG = collections.defaultdict(dict)for cid1, vertices1 in self._cid_vertices.items():if len(vertices1) == 0:continuefor cid2, vertices2 in self._cid_vertices.items():if cid2 <= cid1 or len(vertices2) == 0:continueedge_weight = 0.0for vid in vertices1:for k, v in self._G[vid].items():if k in vertices2:edge_weight += vif edge_weight != 0:G[cid1][cid2] = edge_weightG[cid2][cid1] = edge_weightself._cid_vertices = cid_verticesself._vid_vertex = vid_vertexself._G = Gdef get_communities(self):communities = []for vertices in self._cid_vertices.values():if len(vertices) != 0:c = set()for vid in vertices:c.update(self._vid_vertex[vid]._nodes)communities.append(c)return communitiesdef execute(self):iter_time = 1while True:iter_time += 1mod_inc = self.first_stage()if mod_inc:self.second_stage()else:breakreturn self.get_communities()if __name__ == '__main__':G = load_graph(r'C:\\Users\\程勇\\Desktop\\similarity.txt')algorithm = Louvain(G)communities = algorithm.execute()# 按照社區大小從大到小排序輸出communities = sorted(communities, key=lambda b: -len(b)) # 按社區大小排序count = 0for communitie in communities:count += 1print("社區", count, " ", communitie)

測試用例文件如圖：

這是部分測試用例的截圖，一行的前兩個數是頂點編號，第三個數是權重。按照每個社區大小順序從大到小打印：

$\quad$ 需要測試文件的話在評論區留下你的郵箱哦，求關注~

總結

以上是生活随笔為你收集整理的社区发现算法之——Louvain的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：移植linux内核-映像文件,移植Lin
下一篇：手机两列布局，正方形