生活随笔
收集整理的這篇文章主要介紹了
社区发现算法之——Louvain
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1、什么是社區
如果一張圖是對一片區域的描述的話,我們將這張圖劃分為很多個子圖。當子圖之內滿足關聯性盡可能大,而子圖之間關聯性盡可能低時,這樣的子圖我們可以稱之為一個社區。
2、社區發現算法及評價標準
社區發現算法有很多,例如LPA,HANP,SLPA以及我們今天的主人公——Louvain。不同的算法劃分社區的效果不盡相同。那么,如何評價這些算法孰優孰劣呢?
用模塊度modularity來衡量。模塊度定義如下:模塊度是評估一個社區網絡劃分好壞的度量方法,它的物理含義是社區內節點的連邊數與隨機情況下的邊數只差,它的取值范圍是 [?1/2,1)。可以簡單地理解為社區內部所有邊權重和減去與社區相連的邊權重和。
Louvain算法
一種基于模塊度的圖算法模型,與普通的基于模塊度和模塊度增益不同的是,該算法速度很快,而且對一些點多邊少的圖,進行聚類效果特別明顯。
算法流程:
1、初始時將每個頂點當作一個社區,社區個數與頂點個數相同。
2、依次將每個頂點與之相鄰頂點合并在一起,計算它們的模塊度增益是否大于0,如果大于0,就將該結點放入該相鄰結點所在社區。
3、迭代第二步,直至算法穩定,即所有頂點所屬社區不再變化。
4、將各個社區所有節點壓縮成為一個結點,社區內點的權重轉化為新結點環的權重,社區間權重轉化為新結點邊的權重。
5、重復步驟1-3,直至算法穩定。
import collections
import random
def load_graph(path
):G
= collections
.defaultdict
(dict)with open(path
) as text
:for line
in text
:vertices
= line
.strip
().split
()v_i
= int(vertices
[0])v_j
= int(vertices
[1])w
= float(vertices
[2])G
[v_i
][v_j
] = wG
[v_j
][v_i
] = w
return G
class Vertex():def __init__(self
, vid
, cid
, nodes
, k_in
=0):self
._vid
= vidself
._cid
= cidself
._nodes
= nodesself
._kin
= k_in
class Louvain():def __init__(self
, G
):self
._G
= Gself
._m
= 0 self
._cid_vertices
= {} self
._vid_vertex
= {} for vid
in self
._G
.keys
():self
._cid_vertices
[vid
] = set([vid
])self
._vid_vertex
[vid
] = Vertex
(vid
, vid
, set([vid
]))self
._m
+= sum([1 for neighbor
in self
._G
[vid
].keys
() if neighbor
> vid
])def first_stage(self
):mod_inc
= False visit_sequence
= self
._G
.keys
()random
.shuffle
(list(visit_sequence
))while True:can_stop
= True for v_vid
in visit_sequence
:v_cid
= self
._vid_vertex
[v_vid
]._cidk_v
= sum(self
._G
[v_vid
].values
()) + self
._vid_vertex
[v_vid
]._kincid_Q
= {}for w_vid
in self
._G
[v_vid
].keys
():w_cid
= self
._vid_vertex
[w_vid
]._cid
if w_cid
in cid_Q
:continueelse:tot
= sum([sum(self
._G
[k
].values
()) + self
._vid_vertex
[k
]._kin
for k
in self
._cid_vertices
[w_cid
]])if w_cid
== v_cid
:tot
-= k_vk_v_in
= sum([v
for k
, v
in self
._G
[v_vid
].items
() if k
in self
._cid_vertices
[w_cid
]])delta_Q
= k_v_in
- k_v
* tot
/ self
._m cid_Q
[w_cid
] = delta_Qcid
, max_delta_Q
= sorted(cid_Q
.items
(), key
=lambda item
: item
[1], reverse
=True)[0]if max_delta_Q
> 0.0 and cid
!= v_cid
:self
._vid_vertex
[v_vid
]._cid
= cidself
._cid_vertices
[cid
].add
(v_vid
)self
._cid_vertices
[v_cid
].remove
(v_vid
)can_stop
= Falsemod_inc
= Trueif can_stop
:breakreturn mod_inc
def second_stage(self
):cid_vertices
= {}vid_vertex
= {}for cid
, vertices
in self
._cid_vertices
.items
():if len(vertices
) == 0:continuenew_vertex
= Vertex
(cid
, cid
, set())for vid
in vertices
:new_vertex
._nodes
.update
(self
._vid_vertex
[vid
]._nodes
)new_vertex
._kin
+= self
._vid_vertex
[vid
]._kin
for k
, v
in self
._G
[vid
].items
():if k
in vertices
:new_vertex
._kin
+= v
/ 2.0cid_vertices
[cid
] = set([cid
])vid_vertex
[cid
] = new_vertexG
= collections
.defaultdict
(dict)for cid1
, vertices1
in self
._cid_vertices
.items
():if len(vertices1
) == 0:continuefor cid2
, vertices2
in self
._cid_vertices
.items
():if cid2
<= cid1
or len(vertices2
) == 0:continueedge_weight
= 0.0for vid
in vertices1
:for k
, v
in self
._G
[vid
].items
():if k
in vertices2
:edge_weight
+= v
if edge_weight
!= 0:G
[cid1
][cid2
] = edge_weightG
[cid2
][cid1
] = edge_weightself
._cid_vertices
= cid_verticesself
._vid_vertex
= vid_vertexself
._G
= G
def get_communities(self
):communities
= []for vertices
in self
._cid_vertices
.values
():if len(vertices
) != 0:c
= set()for vid
in vertices
:c
.update
(self
._vid_vertex
[vid
]._nodes
)communities
.append
(c
)return communities
def execute(self
):iter_time
= 1while True:iter_time
+= 1mod_inc
= self
.first_stage
()if mod_inc
:self
.second_stage
()else:breakreturn self
.get_communities
()if __name__
== '__main__':G
= load_graph
(r
'C:\\Users\\程勇\\Desktop\\similarity.txt')algorithm
= Louvain
(G
)communities
= algorithm
.execute
()communities
= sorted(communities
, key
=lambda b
: -len(b
)) count
= 0for communitie
in communities
:count
+= 1print("社區", count
, " ", communitie
)
測試用例文件如圖:
這是部分測試用例的截圖,一行的前兩個數是頂點編號,第三個數是權重。按照每個社區大小順序從大到小打印:
\quad需要測試文件的話在評論區留下你的郵箱哦,求關注~
總結
以上是生活随笔為你收集整理的社区发现算法之——Louvain的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。