knn机器学习算法_K-最近邻居(KNN)算法| 机器学习
knn機器學習算法
Goal: To classify a query point (with 2 features) using training data of 2 classes using KNN.
目標:使用KNN使用2類的訓練數據對查詢點(具有2個要素)進行分類。
K最近鄰居(KNN) (K- Nearest Neighbor (KNN))
KNN is a basic machine learning algorithm that can be used for both classifications as well as regression problems but has limited uses as a regression problem. So, we would discuss classification problems only.
KNN是一種基本的機器學習算法,可用于分類和回歸問題,但作為回歸問題用途有限。 因此,我們僅討論分類問題。
It involves finding the distance of a query point with the training points in the training datasets. Sorting the distances and picking k points with the least distance. Then check which class these k points belong to and the class with maximum appearance is the predicted class.
它涉及在訓練數據集中找到查詢點與訓練點之間的距離。 排序距離并選擇距離最小的k個點。 然后檢查這k個點屬于哪個類別,并且外觀最大的類別是預測的類別。
Red and green are two classes here, and we have to predict the class of star point. So, from the image, it is clear that the points of the red class are much closer than points of green class so the class prediction will be red for this point.
紅色和綠色是這里的兩個類別,我們必須預測星點的類別。 因此,從圖像中可以明顯看出,紅色類別的點比綠色類別的點近得多,因此該類別的預測將是紅色。
We will generally work on the matrix, and make use of "numpy" libraries to evaluate this Euclid’s distance.
通常,我們將在矩陣上工作,并使用“ numpy”庫來評估該Euclid的距離。
Algorithm:
算法:
STEP 1: Take the distance of a query point or a query reading from all the training points in the training dataset.
步驟1:從訓練數據集中的所有訓練點獲取查詢點或查詢讀數的距離。
STEP 2: Sort the distance in increasing order and pick the k points with the least distance.
步驟2:按遞增順序對距離進行排序,并選擇距離最小的k個點。
STEP 3: Check the majority of class in these k points.
步驟3:在這k點中檢查大部分班級。
STEP 4: Class with the maximum majority is the predicted class of the point.
步驟4:具有最大多數的類別是該點的預測類別。
Note: In the code, we have taken only two features for a better explanation but the code works for N features also just you have to generate training data of n features and a query point of n features. Further, I have used numpy to generate two feature data.
注:在代碼中,我們采取了只有兩個功能,一個更好的解釋,但該代碼適用于N個特征也只是你要生成的n個特征和n個特征查詢點的訓練數據。 此外,我使用numpy生成了兩個特征數據。
Python Code
Python代碼
import numpy as npdef distance(v1, v2):# Eucledian return np.sqrt(((v1-v2)**2).sum())def knn(train, test, k=5):dist = []for i in range(train.shape[0]):# Get the vector and labelix = train[i, :-1]iy = train[i, -1]# Compute the distance from test pointd = distance(test, ix)dist.append([d, iy])# Sort based on distance and get top kdk = sorted(dist, key=lambda x: x[0])[:k]# Retrieve only the labelslabels = np.array(dk)[:, -1]# Get frequencies of each labeloutput = np.unique(labels, return_counts=True)# Find max frequency and corresponding labelindex = np.argmax(output[1])return output[0][index]# monkey_data && chimp data # Data has 2 features monkey_data = np.random.multivariate_normal([1.0,2.0],[[1.5,0.5],[0.5,1]],1000) chimp_data = np.random.multivariate_normal([4.0,4.0],[[1,0],[0,1.8]],1000)data = np.zeros((2000,3)) data[:1000,:-1] = monkey_data data[1000:,:-1] = chimp_data data[1000:,-1] = 1label_to_class = {1:'chimp', 0 : 'monkey'}## query point for the check print("Enter the 1st feature") x = input() print("Enter the 2nd feature") y = input()x = float(x) y = float(y)query = np.array([x,y]) ans = knn(data, query)print("the predicted class for the points is {}".format(label_to_class[ans]))Output
輸出量
Enter the 1st feature 3 Enter the 2nd feature 2 the predicted class for the points is chimp翻譯自: https://www.includehelp.com/ml-ai/k-nearest-neighbors-knn-algorithm.aspx
knn機器學習算法
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的knn机器学习算法_K-最近邻居(KNN)算法| 机器学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 日期getTime()方法以及JavaS
- 下一篇: 48张图|手摸手教你性能监控、压测和调优