(数据挖掘 —— 无监督学习(聚类)
生活随笔
收集整理的這篇文章主要介紹了
(数据挖掘 —— 无监督学习(聚类)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
數據挖掘 —— 無監督學習(聚類)
- 1. K-means
- 1.1 生成指定形狀的隨機數據
- 1.2 進行聚類
- 1.3 結果
- 2. 系統聚類
- 2.1 代碼
- 2.2 結果
- 3 DBSCAN
- 3.1 參數選擇
- 3.2 代碼
- 3.3 結果
1. K-means
K-Means為基于切割的聚類算法
1.1 生成指定形狀的隨機數據
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans # *************** 生成指定形狀的隨機數據 ***************** from sklearn.datasets import make_circles,make_moons,make_blobs n_samples = 1000# 生成環裝數據 circles = make_circles(n_samples = n_samples,factor = 0.5,noise = 0.05) """ n_samples: 為樣本點個數 factor:為大圓與小圓的間距 """ # 生成月牙形數據 moons = make_moons(n_samples = n_samples,noise = 0.05)# 生成簇狀數據 blobs = make_blobs(n_samples = n_samples,random_state = 100,center_box = (-10,10),cluster_std = 1,centers = 3) """ random_state: 隨機數種子,多少代保持隨機數不變 center_box: 中心確定后的數據邊界 默認(-10,10) cluster_std:數據分布的標準差,決定各類數據的緊湊程度,默認為1.0 centers:產生數據點中心的個數 默認為3 """ # 產生隨機數 random_data = np.random.rand(n_samples,2),np.array([0 for i in range(n_samples)]) datasets = [circles,moons,blobs,random_data] fig = plt.figure(figsize=(20,8))1.2 進行聚類
colors = "rgbykcm" for index,data in enumerate(datasets):X = data[0]Y_old = data[1]km_cluster = KMeans(n_clusters = 2)km_cluster.fit(X)Y_new = km_cluster.labels_fig.add_subplot(2,len(datasets),index+1)[plt.scatter(X[i,0],X[i,1],color = colors[Y_old[i]]) for i in range(len(X[:,0]))] fig.add_subplot(2,len(datasets),index+5)[plt.scatter(X[i,0],X[i,1],color = colors[Y_new[i]]) for i in range(len(X[:,0]))]1.3 結果
2. 系統聚類
2.1 代碼
AgglomerativeClustering(n_clusters,affinity,linkage)- affinity:
- linkage:{“ward”, “complete”, “average”, “single”}, default=”ward”
2.2 結果
3 DBSCAN
3.1 參數選擇
即選中一個點,計算它和所有其他點的距離,
從小到大排序,發現距離突變點。
需要做大量實驗觀察。
3.2 代碼
# 導入聚類數據 n_samples = 1000 from sklearn.datasets import make_circles,make_moons,make_blobs from sklearn.cluster import DBSCAN import pandas as pd import numpy as np import matplotlib.pyplot as plt circles = make_circles(n_samples = n_samples,noise = 0.05,factor = 0.5,random_state = 10) moons = make_moons(n_samples = n_samples,noise = 0.05,random_state = 10) blobs = make_blobs(n_samples = n_samples,centers = 3,cluster_std = 0.1,center_box = (-1,1),random_state = 10) np.random.seed(10) random_data = (np.random.rand(n_samples,2),np.zeros((n_samples)).astype(np.int)) datasets = [circles,moons,blobs,random_data] fig = plt.figure(figsize = (20,8),dpi = 72) colors = "rgbky" for index,data in enumerate(datasets):X = data[0]Y_old = data[1]dbscan_model = DBSCAN(eps = 0.1,min_samples = 20)dbscan_model.fit(X)Y_new = dbscan_model.labels_fig.add_subplot(2,len(datasets),index+1)[plt.scatter(X[i,0],X[i,1],color = colors[Y_old[i]]) for i in range(len(X[:,0]))]plt.title("original algorithm")fig.add_subplot(2,len(datasets),index + 5)[plt.scatter(X[i,0],X[i,1],color = colors[Y_new[i]]) for i in range(len(X[:,0]))]plt.title("DBSCA algorithm")3.3 結果
by CyrusMay 2022 04 05
總結
以上是生活随笔為你收集整理的(数据挖掘 —— 无监督学习(聚类)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 数据挖掘 —— 有监督学习(回归)
- 下一篇: 数据挖掘 —— 无监督学习(关联)