ML之HierarchicalClustering:自定义HierarchicalClustering层次聚类算法
生活随笔
收集整理的這篇文章主要介紹了
ML之HierarchicalClustering:自定义HierarchicalClustering层次聚类算法
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
ML之HierarchicalClustering:自定義HierarchicalClustering層次聚類算法
?
?
目錄
輸出結果
實現代碼
?
?
輸出結果
更新……
?
實現代碼
# -*- encoding=utf-8 -*-from numpy import *class cluster_node: #定義cluster_node類,類似Java中的構造函數def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):self.left=left self.right=rightself.vec=vecself.id=idself.distance=distanceself.count=count #only used for weighted average def L2dist(v1,v2): return sqrt(sum((v1-v2)**2))def L1dist(v1,v2): return sum(abs(v1-v2))def hcluster(features,distance=L2dist): #cluster the rows of the "features" matrixdistances={} currentclustid=-1 # clusters are initially just the individual rowsclust=[cluster_node(array(features[i]),id=i) for i in range(len(features))]while len(clust)>1: lowestpair=(0,1) closest=distance(clust[0].vec,clust[1].vec)for i in range(len(clust)):for j in range(i+1,len(clust)):# distances is the cache of distance calculationsif (clust[i].id,clust[j].id) not in distances: distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)d=distances[(clust[i].id,clust[j].id)] if d<closest: closest=dlowestpair=(i,j) mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \for i in range(len(clust[0].vec))]newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]],right=clust[lowestpair[1]],distance=closest,id=currentclustid)currentclustid-=1 del clust[lowestpair[1]]del clust[lowestpair[0]]clust.append(newcluster)return clust[0]def extract_clusters(clust,dist): #(clust上邊的樹形結構,dist閾值)# extract list of sub-tree clusters from hcluster tree with distance<distclusters = {}if clust.distance<dist:# we have found a cluster subtreereturn [clust] else:# check the right and left branchescl = [] cr = []if clust.left!=None: cl = extract_clusters(clust.left,dist=dist)if clust.right!=None: cr = extract_clusters(clust.right,dist=dist)return cl+cr def get_cluster_elements(clust): #用于取出算好聚類的元素# return ids for elements in a cluster sub-treeif clust.id>=0: # positive id means that this is a leafreturn [clust.id]else:# check the right and left branchescl = []cr = []if clust.left!=None: cl = get_cluster_elements(clust.left)if clust.right!=None: cr = get_cluster_elements(clust.right)return cl+crdef printclust(clust,labels=None,n=0): #將值打印出來# indent to make a hierarchy layoutfor i in range(n): print (' '),if clust.id<0: # negative id means that this is branchprint ('-')else: # positive id means that this is an endpointif labels==None: print (clust.id)else: print (labels[clust.id])if clust.left!=None: printclust(clust.left,labels=labels,n=n+1)if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)def getheight(clust): #樹的高度,遞歸方法# Is this an endpoint? Then the height is just 1if clust.left==None and clust.right==None: return 1# Otherwise the height is the same of the heights of# each branchreturn getheight(clust.left)+getheight(clust.right)def getdepth(clust): #樹的深度,遞歸方法if clust.left==None and clust.right==None: return 0return max(getdepth(clust.left),getdepth(clust.right))+clust.distance?
相關文章
ML之H-clustering:自定義HierarchicalClustering層次聚類算法
總結
以上是生活随笔為你收集整理的ML之HierarchicalClustering:自定义HierarchicalClustering层次聚类算法的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ML之Kmeans:利用自定义Kmean
- 下一篇: ML之Hierarchical clus