當前位置：首頁 > 编程语言 > python >内容正文

python

机器学习之支持向量机SVM之python实现ROC曲线绘制（二分类和多分类）

發布時間：2023/12/10 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习之支持向量机SVM之python实现ROC曲线绘制（二分类和多分类）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、ROC曲線

二、TP、FP、TN、FN

三、 python繪制ROC曲線(二分類)

1、思路

2、關鍵代碼

3、完整代碼

四、 python繪制ROC曲線(多分類)

五、參考文獻

一、ROC曲線

定義可見：

《機器學習——支持向量機SVM實例》

作用：通過ROC曲線可以獲取相關性能指標，如EER、AUC等，這些性能指標可以用來評價一個SVM訓練出來模型的優劣

二、TP、FP、TN、FN

三、 python繪制ROC曲線(二分類)

1、思路

python主要的思路是通過roc_curve函數和測試樣本的實際標簽集test_label、由訓練模型預測得到的標簽集test_predict_label獲得。通過比對實際標簽和預測標簽來計算TP、FP、TN、FN，通過roc_curve函數可以實現，返回的是TP、FP、以及閾值threshold。計算ROC曲線只需要TPFP即可。

2、關鍵代碼

................#train_data用于訓練的樣本集, test_data用于測試的樣本集, train_label訓練樣本對應的標簽集, test_label測試樣本對應的標簽集###通過decision_function()計算得到的test_predict_label的值，用在roc_curve()函數中 test_predict_label = svm.fit(train_data, train_label).decision_function(test_data) #首先通過fit來對訓練樣本和訓練樣本標簽進行訓練得到模型，然后通過decision_function來獲得模型對于測試樣本集預測的標簽集# Compute ROC curve and ROC area for each class#計算tp,fp #通過測試樣本輸入的標簽集和模型預測的標簽集進行比對，得到fp,tp,不同的fp,tp是算法通過一定的規則改變閾值獲得的 fpr,tpr,threshold = roc_curve(test_label, test_predict_label) ###計算真正率和假正率 roc_auc = auc(fpr,tpr) ###計算auc的值，auc就是曲線包圍的面積，越大越好..................

#test_predict_label [ 0.17284263 0.65445393 -0.54087101 0.3555818 0.00579262 -0.201742480.0565328 0.00571205 -0.1517872 0.25656427 0.39764688 0.045499890.33455816 -0.12499602 0.23724787 -0.36250412 -0.0874348 -0.11575856-0.25270656 -0.23457408 -0.18239472 -0.10728706 -0.32201471 0.71954289-0.29292995 -0.22073314 -0.32473373 -0.19383585 -0.24296148 0.37524795]

在二分類問題中，閾值的改變其實就是相當于從一個邊界移動到另一個邊界，閾值的改變也就使得tp和fp的改變

#閾值threshold [ 1.71954289 0.71954289 0.25656427 0.0565328 0.00571205 -0.0874348-0.10728706 -0.12499602 -0.1517872 -0.18239472 -0.20174248 -0.23457408-0.24296148 -0.54087101] #tp [0. 0. 0. 0.2 0.2 0.266666670.26666667 0.4 0.4 0.46666667 0.46666667 0.60.6 1. ]

#fp [0. 0.06666667 0.46666667 0.46666667 0.66666667 0.666666670.73333333 0.73333333 0.8 0.8 0.93333333 0.933333331. 1. ]

3、完整代碼

# -*- coding: utf-8 -*-import numpy as np import matplotlib.pyplot as plt from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc ###計算roc和auc from sklearn import model_selection# Import some data to play with iris = datasets.load_iris() X = iris.data#得到樣本集 y = iris.target#得到標簽集##變為2分類 X, y = X[y != 2], y[y != 2]#通過取y不等于2來取兩種類別# Add noisy features to make the problem harder添加擾動 random_state = np.random.RandomState(0) n_samples, n_features = X.shape X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]# shuffle and split training and test sets劃分樣本集 train_data, test_data, train_label, test_label = model_selection.train_test_split(X, y, test_size=.3,random_state=0) #train_data用于訓練的樣本集, test_data用于測試的樣本集, train_label訓練樣本對應的標簽集, test_label測試樣本對應的標簽集# Learn to predict each class against the other分類器設置 svm = svm.SVC(kernel='linear', probability=True,random_state=random_state)#使用核函數為線性核，參數默認，創建分類器###通過decision_function()計算得到的test_predict_label的值，用在roc_curve()函數中 test_predict_label = svm.fit(train_data, train_label).decision_function(test_data) #首先通過fit來對訓練樣本和訓練樣本標簽進行訓練得到模型，然后通過decision_function來獲得模型對于測試樣本集預測的標簽集 print(test_predict_label)# Compute ROC curve and ROC area for each class#計算tp,fp #通過測試樣本輸入的標簽集和模型預測的標簽集進行比對，得到fp,tp,不同的fp,tp是算法通過一定的規則改變閾值獲得的 fpr,tpr,threshold = roc_curve(test_label, test_predict_label) ###計算真正率和假正率 print(fpr) print(tpr) print(threshold) roc_auc = auc(fpr,tpr) ###計算auc的值，auc就是曲線包圍的面積，越大越好plt.figure() lw = 2 plt.figure(figsize=(10,10)) plt.plot(fpr, tpr, color='darkorange',lw=lw, label='ROC curve (area = %0.2f)' % roc_auc) ###假正率為橫坐標，真正率為縱坐標做曲線 plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic example') plt.legend(loc="lower right") plt.show()

?虛線表示auc為0.5，正確率與錯誤率一樣的情況

四、 python繪制ROC曲線(多分類)

對于多分類問題，ROC曲線的獲取主要有兩種方法：
假設測試樣本個數為m，類別個數為n。在訓練完成后，計算出每個測試樣本的在各類別下的概率或置信度，得到一個[m， n]形狀的矩陣P，每一行表示一個測試樣本在各類別下概率值（按類別標簽排序）。相應地，將每個測試樣本的標簽轉換為類似二進制的形式，每個位置用來標記是否屬于對應的類別（也按標簽排序，這樣才和前面對應），由此也可以獲得一個[m， n]的標簽矩陣L。
①方法一：每種類別下，都可以得到m個測試樣本為該類別的概率（矩陣P中的列）。所以，根據概率矩陣P和標簽矩陣L中對應的每一列，可以計算出各個閾值下的假正例率（FPR）和真正例率（TPR），從而繪制出一條ROC曲線。這樣總共可以繪制出n條ROC曲線。最后對n條ROC曲線取平均，即可得到最終的ROC曲線。
②方法二：
首先，對于一個測試樣本：1）標簽只由0和1組成，1的位置表明了它的類別（可對應二分類問題中的‘’正’’），0就表示其他類別（‘’負‘’）；2）要是分類器對該測試樣本分類正確，則該樣本標簽中1對應的位置在概率矩陣P中的值是大于0對應的位置的概率值的。基于這兩點，將標簽矩陣L和概率矩陣P分別按行展開，轉置后形成兩列，這就得到了一個二分類的結果。所以，此方法經過計算后可以直接得到最終的ROC曲線。
上面的兩個方法得到的ROC曲線是不同的，當然曲線下的面積AUC也是不一樣的。在python中，方法1和方法2分別對應sklearn.metrics.roc_auc_score函數中參數average值為’macro’和’micro’的情況。下面參考sklearn官網提供的例子，對兩種方法進行實現。
?

# 引入必要的庫 import numpy as np import matplotlib.pyplot as plt from itertools import cycle from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp# 加載數據 iris = datasets.load_iris() X = iris.data y = iris.target # 將標簽二值化 y = label_binarize(y, classes=[0, 1, 2]) # 設置種類 n_classes = y.shape[1]# 訓練模型并預測 random_state = np.random.RandomState(0) n_samples, n_features = X.shape# shuffle and split training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,random_state=0)# Learn to predict each class against the other classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,random_state=random_state))#一對多 y_score = classifier.fit(X_train, y_train).decision_function(X_test)# 計算每一類的ROC fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes):fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])roc_auc[i] = auc(fpr[i], tpr[i])# Compute micro-average ROC curve and ROC area（方法二） fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])# Compute macro-average ROC curve and ROC area（方法一） # First aggregate all false positive rates all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)])) # Then interpolate all ROC curves at this points mean_tpr = np.zeros_like(all_fpr) for i in range(n_classes):mean_tpr += interp(all_fpr, fpr[i], tpr[i]) # Finally average it and compute AUC mean_tpr /= n_classes fpr["macro"] = all_fpr tpr["macro"] = mean_tpr roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])# Plot all ROC curves lw=2 plt.figure() plt.plot(fpr["micro"], tpr["micro"],label='micro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["micro"]),color='deeppink', linestyle=':', linewidth=4)plt.plot(fpr["macro"], tpr["macro"],label='macro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["macro"]),color='navy', linestyle=':', linewidth=4)colors = cycle(['aqua', 'darkorange', 'cornflowerblue']) for i, color in zip(range(n_classes), colors):plt.plot(fpr[i], tpr[i], color=color, lw=lw,label='ROC curve of class {0} (area = {1:0.2f})'''.format(i, roc_auc[i]))plt.plot([0, 1], [0, 1], 'k--', lw=lw) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Some extension of Receiver operating characteristic to multi-class') plt.legend(loc="lower right") plt.show()

五、參考文獻

ROC和AUC介紹以及如何計算AUC

《ROC原理介紹及利用python實現二分類和多分類的ROC曲線》
ROC曲線、AUC、Precision、Recall、F-measure理解及Python實現
ROC曲線
多分類下的ROC曲線和AUC
用Python畫ROC曲線

總結

以上是生活随笔為你收集整理的机器学习之支持向量机SVM之python实现ROC曲线绘制（二分类和多分类）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。