生活随笔
收集整理的這篇文章主要介紹了
Python社区发现—Louvain—networkx和community
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
社區(qū)
如果一張圖是對(duì)一片區(qū)域的描述的話,將這張圖劃分為很多個(gè)子圖。當(dāng)子圖之內(nèi)滿足關(guān)聯(lián)性盡可能大,而子圖之間關(guān)聯(lián)性盡可能低時(shí),這樣的子圖可以稱之為一個(gè)社區(qū)。
社區(qū)發(fā)現(xiàn)算法
社區(qū)發(fā)現(xiàn)算法有很多,例如LPA,HANP,SLPA以及Louvain,不同的算法劃分社區(qū)的效果不盡相同。Louvain算法是基于模塊度的社區(qū)發(fā)現(xiàn)算法,該算法在效率和效果上都表現(xiàn)較好,并且能夠發(fā)現(xiàn)層次性的社區(qū)結(jié)構(gòu),其優(yōu)化目標(biāo)是最大化整個(gè)社區(qū)網(wǎng)絡(luò)的模塊度。
模塊度
模塊度是評(píng)估一個(gè)社區(qū)網(wǎng)絡(luò)劃分好壞的度量方法,它的物理含義是社區(qū)內(nèi)節(jié)點(diǎn)的連邊數(shù)與隨機(jī)情況下的邊數(shù)只差,它的取值范圍是 [?1/2,1)。可以簡(jiǎn)單地理解為社區(qū)內(nèi)部邊的權(quán)重減去所有與社區(qū)節(jié)點(diǎn)相連的邊的權(quán)重和,對(duì)無(wú)向圖更好理解,即社區(qū)內(nèi)部邊的度數(shù)減去社區(qū)內(nèi)節(jié)點(diǎn)的總度數(shù)。
Louvain算法
算法流程: 1、初始時(shí)將每個(gè)頂點(diǎn)當(dāng)作一個(gè)社區(qū),社區(qū)個(gè)數(shù)與頂點(diǎn)個(gè)數(shù)相同。 2、依次將每個(gè)頂點(diǎn)與之相鄰頂點(diǎn)合并在一起,計(jì)算它們的模塊度增益是否大于0,如果大于0,就將該結(jié)點(diǎn)放入該相鄰結(jié)點(diǎn)所在社區(qū)。 3、迭代第二步,直至算法穩(wěn)定,即所有頂點(diǎn)所屬社區(qū)不再變化。 4、將各個(gè)社區(qū)所有節(jié)點(diǎn)壓縮成為一個(gè)結(jié)點(diǎn),社區(qū)內(nèi)點(diǎn)的權(quán)重轉(zhuǎn)化為新結(jié)點(diǎn)環(huán)的權(quán)重,社區(qū)間權(quán)重轉(zhuǎn)化為新結(jié)點(diǎn)邊的權(quán)重。 5、重復(fù)步驟1-3,直至算法穩(wěn)定。
import collections
import random
def load_graph ( path
) : G
= collections
. defaultdict
( dict ) with open ( path
) as text
: for line
in text
: vertices
= line
. strip
( ) . split
( ) v_i
= int ( vertices
[ 0 ] ) v_j
= int ( vertices
[ 1 ] ) w
= float ( vertices
[ 2 ] ) G
[ v_i
] [ v_j
] = wG
[ v_j
] [ v_i
] = w
return G
class Vertex ( ) : def __init__ ( self
, vid
, cid
, nodes
, k_in
= 0 ) : self
. _vid
= vidself
. _cid
= cidself
. _nodes
= nodesself
. _kin
= k_in
class Louvain ( ) : def __init__ ( self
, G
) : self
. _G
= Gself
. _m
= 0 self
. _cid_vertices
= { } self
. _vid_vertex
= { } for vid
in self
. _G
. keys
( ) : self
. _cid_vertices
[ vid
] = set ( [ vid
] ) self
. _vid_vertex
[ vid
] = Vertex
( vid
, vid
, set ( [ vid
] ) ) self
. _m
+= sum ( [ 1 for neighbor
in self
. _G
[ vid
] . keys
( ) if neighbor
> vid
] ) def first_stage ( self
) : mod_inc
= False visit_sequence
= self
. _G
. keys
( ) random
. shuffle
( list ( visit_sequence
) ) while True : can_stop
= True for v_vid
in visit_sequence
: v_cid
= self
. _vid_vertex
[ v_vid
] . _cidk_v
= sum ( self
. _G
[ v_vid
] . values
( ) ) + self
. _vid_vertex
[ v_vid
] . _kincid_Q
= { } for w_vid
in self
. _G
[ v_vid
] . keys
( ) : w_cid
= self
. _vid_vertex
[ w_vid
] . _cid
if w_cid
in cid_Q
: continue else : tot
= sum ( [ sum ( self
. _G
[ k
] . values
( ) ) + self
. _vid_vertex
[ k
] . _kin
for k
in self
. _cid_vertices
[ w_cid
] ] ) if w_cid
== v_cid
: tot
-= k_vk_v_in
= sum ( [ v
for k
, v
in self
. _G
[ v_vid
] . items
( ) if k
in self
. _cid_vertices
[ w_cid
] ] ) delta_Q
= k_v_in
- k_v
* tot
/ self
. _m cid_Q
[ w_cid
] = delta_Qcid
, max_delta_Q
= sorted ( cid_Q
. items
( ) , key
= lambda item
: item
[ 1 ] , reverse
= True ) [ 0 ] if max_delta_Q
> 0.0 and cid
!= v_cid
: self
. _vid_vertex
[ v_vid
] . _cid
= cidself
. _cid_vertices
[ cid
] . add
( v_vid
) self
. _cid_vertices
[ v_cid
] . remove
( v_vid
) can_stop
= False mod_inc
= True if can_stop
: break return mod_inc
def second_stage ( self
) : cid_vertices
= { } vid_vertex
= { } for cid
, vertices
in self
. _cid_vertices
. items
( ) : if len ( vertices
) == 0 : continue new_vertex
= Vertex
( cid
, cid
, set ( ) ) for vid
in vertices
: new_vertex
. _nodes
. update
( self
. _vid_vertex
[ vid
] . _nodes
) new_vertex
. _kin
+= self
. _vid_vertex
[ vid
] . _kin
for k
, v
in self
. _G
[ vid
] . items
( ) : if k
in vertices
: new_vertex
. _kin
+= v
/ 2.0 cid_vertices
[ cid
] = set ( [ cid
] ) vid_vertex
[ cid
] = new_vertexG
= collections
. defaultdict
( dict ) for cid1
, vertices1
in self
. _cid_vertices
. items
( ) : if len ( vertices1
) == 0 : continue for cid2
, vertices2
in self
. _cid_vertices
. items
( ) : if cid2
<= cid1
or len ( vertices2
) == 0 : continue edge_weight
= 0.0 for vid
in vertices1
: for k
, v
in self
. _G
[ vid
] . items
( ) : if k
in vertices2
: edge_weight
+= v
if edge_weight
!= 0 : G
[ cid1
] [ cid2
] = edge_weightG
[ cid2
] [ cid1
] = edge_weightself
. _cid_vertices
= cid_verticesself
. _vid_vertex
= vid_vertexself
. _G
= G
def get_communities ( self
) : communities
= [ ] for vertices
in self
. _cid_vertices
. values
( ) : if len ( vertices
) != 0 : c
= set ( ) for vid
in vertices
: c
. update
( self
. _vid_vertex
[ vid
] . _nodes
) communities
. append
( c
) return communities
def execute ( self
) : iter_time
= 1 while True : iter_time
+= 1 mod_inc
= self
. first_stage
( ) if mod_inc
: self
. second_stage
( ) else : break return self
. get_communities
( ) if __name__
== '__main__' : G
= load_graph
( 's.txt' ) algorithm
= Louvain
( G
) communities
= algorithm
. execute
( ) communities
= sorted ( communities
, key
= lambda b
: - len ( b
) ) count
= 0 for communitie
in communities
: count
+= 1 print ( "社區(qū)" , count
, " " , communitie
)
networkx和community社區(qū)劃分和可視化
安裝
使用community安裝python-louvain即可 pip install python-louvain pip install networkx
使用
最佳劃分
community
. best_partition
( graph
, partition
= None , weight
= 'weight' , resolution
= 1.0 )
Compute the partition of the graph nodes which maximises the modularity (or try…) using the Louvain heuristics. This is the partition of highest modularity, i.e. the highest partition of the dendrogram generated by the Louvain algorithm.
import community
import networkx
as nx
import matplotlib
. pyplot
as plt
G
= nx
. erdos_renyi_graph
( 30 , 0.05 )
partition
= community
. best_partition
( G
)
size
= float ( len ( set ( partition
. values
( ) ) ) )
pos
= nx
. spring_layout
( G
)
count
= 0 .
for com
in set ( partition
. values
( ) ) : count
= count
+ 1 . list_nodes
= [ nodes
for nodes
in partition
. keys
( ) if partition
[ nodes
] == com
] nx
. draw_networkx_nodes
( G
, pos
, list_nodes
, node_size
= 20 , node_color
= str ( count
/ size
) ) nx
. draw_networkx_edges
( G
, pos
, alpha
= 0.5 )
plt
. show
( )
總結(jié)
以上是生活随笔 為你收集整理的Python社区发现—Louvain—networkx和community 的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔 推薦給好友。