當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

网格搜索的原理以及实战以及相关API（GridSearchCV）

發布時間：2023/12/14 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了网格搜索的原理以及实战以及相关API（GridSearchCV）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

網格搜索是調參俠常用的一種調參手段

一、含義&優缺點&簡單實現

含義：手動的給出一個模型中想要改動的所有參數，讓程序幫助我們使用窮舉法把所有參數組合運行一遍，選出最好的參數組合。一般和交叉驗證搭配使用，因為使用交叉驗證可以使得評分更加嚴謹。
優點：并行計算，速度很快
缺點：當參數量很多時，非常耗費計算資源。

1-1、網格搜索簡單實現

代碼介紹：使用鳶尾花數據集，嵌套兩層for循環來遍歷兩個參數列表，在訓練集上訓練之后，用模型在測試集上找到最好的分數并且輸出對應參數以及分數。

from sklearn.datasets import load_iris from sklearn.svm import SVC from sklearn.model_selection import train_test_splitiris = load_iris() X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,random_state=0) print("Size of training set:{} size of testing set:{}".format(X_train.shape[0],X_test.shape[0]))#### grid search start best_score = 0 for gamma in [0.001,0.01,0.1,1,10,100]:for C in [0.001,0.01,0.1,1,10,100]:svm = SVC(gamma=gamma,C=C)#對于每種參數可能的組合，進行一次訓練；svm.fit(X_train,y_train)score = svm.score(X_test,y_test)if score > best_score:#找到表現最好的參數best_score = scorebest_parameters = {'gamma':gamma,'C':C} #### grid search endprint("Best score:{:.2f}".format(best_score)) print("Best parameters:{}".format(best_parameters))

輸出：
Size of training set:112 size of testing set:38
Best score:0.97
Best parameters:{‘gamma’: 0.001, ‘C’: 100}

1-2、帶有交叉驗證的網格搜索

代碼介紹：同樣的使用鳶尾花數據集，使用兩層for循環來賦值參數，不同的是每一層循環內使用對應的參數來做訓練，并且使用交叉驗證函數cross_val_score來得到一個訓練的平均分數，循環結束，得到最好的參數，重新在訓練集上訓練，并且在測試集上得到分數。

from sklearn.datasets import load_iris from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_scoreiris = load_iris() X_trainval,X_test,y_trainval,y_test = train_test_split(iris.data,iris.target,random_state=0) best_score = 0.0 for gamma in [0.001,0.01,0.1,1,10,100]:for C in [0.001,0.01,0.1,1,10,100]:svm = SVC(gamma=gamma,C=C)scores = cross_val_score(svm,X_trainval,y_trainval,cv=5) #5折交叉驗證score = scores.mean() #取平均數if score > best_score:best_score = scorebest_parameters = {"gamma":gamma,"C":C} svm = SVC(**best_parameters) svm.fit(X_trainval,y_trainval) test_score = svm.score(X_test,y_test) print("Best score on validation set:{:.2f}".format(best_score)) print("Best parameters:{}".format(best_parameters)) print("Score on testing set:{:.2f}".format(test_score))

輸出：
Best score on validation set:0.97
Best parameters:{‘gamma’: 0.1, ‘C’: 10}
Score on testing set:0.97

二、GridSearchCV（網格搜索&交叉驗證）

2-1、GridSearchCV簡介

含義：（sklearn類的一個方法）GridSearchCV既包含了網格搜索，又包含了交叉驗證。只要輸入參數列表，就可以保證在指定的參數范圍內找到精度最高的參數，適合小型數據集，但是缺點是要遍歷所有可能的參數組合的話，在面對大數據集和多參數的情況下，將會非常耗時。
補充：當數據量較大時，可以選擇使用坐標下降法，即拿對模型影響較大的參數依次調優。
網格搜索：使用不同的參數組合來找到在驗證集上精度最高的參數。
k折交叉驗證：k折交叉驗證將所有數據集分成k份，不重復地每次取其中一份做測試集，用其余k-1份做訓練集訓練模型，之后計算該模型在測試集上的得分,將k次的得分取平均得到最后的得分。

2-2、GridSearchCV方法

GridSearchCV參數說明

sklearn.model_selection.GridSearchCV(# 選擇使用的分類器 estimator, # 需要最優化的參數的取值，值為字典或者列表。param_grid, *, # 模型評價標準。scoring=None, # 使用處理器的個數，默認為1，當為-1時，表示使用所有處理器。n_jobs=None, # 默認為True，為True時，默認為各個樣本fold概率分布一致。iid='deprecated', # 默認為True，即在搜索參數結束后，用最佳參數結果再次fit一遍全部數據集。refit=True, # 交叉驗證參數，默認為5，即使用五折交叉驗證。cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False )

GridSearchCV屬性以及方法說明

cv_results_ : dict of numpy (masked) ndarrays具有鍵作為列標題和值作為列的dict，可以導入到DataFrame中。注意，“params”鍵用于存儲所有參數候選項的參數設置列表。
best_estimator_ : 最優模型以及對應的參數，如果refit = False，則不可用。
best_score_ ：觀察到的最好的評分。
best_parmas_ : 給出最佳結果的參數設置
best_index_ : int 對應于最佳候選參數設置的索引（cv_results_數組）search.cv_results _ [‘params’] [search.best_index_]中的dict給出了最佳模型的參數設置，給出了最高的平均分數（search.best_score_）
grid.fit(): 運行網格搜索
predict: 使用找到的最佳參數在估計器上調用預測
grid.score(): 模型在測試集上表現最好的分數。

2-3、GridSearchCV實戰

from sklearn.model_selection import GridSearchCV from sklearn.datasets import load_iris from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score#把要調整的參數以及其候選值列出來； param_grid = {"gamma":[0.001,0.01,0.1,1,10,100],"C":[0.001,0.01,0.1,1,10,100]} print("Parameters:{}".format(param_grid))grid_search = GridSearchCV(SVC(),param_grid,cv=5) #實例化一個GridSearchCV類 X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,random_state=10) grid_search.fit(X_train,y_train) #訓練，找到最優的參數，同時使用最優的參數實例化一個新的SVC estimator。 print("Test set score:{:.2f}".format(grid_search.score(X_test,y_test))) print("Best parameters:{}".format(grid_search.best_params_)) print("Best score on train set:{:.2f}".format(grid_search.best_score_))

輸出：
Parameters:{‘gamma’: [0.001, 0.01, 0.1, 1, 10, 100], ‘C’: [0.001, 0.01, 0.1, 1, 10, 100]}
Test set score:0.97
Best parameters:{‘C’: 10, ‘gamma’: 0.1}
Best score on train set:0.98

參考文章：
sklearn中的GridSearchCV方法詳解.
機器學習（四）——模型調參利器 gridSearchCV（網格搜索）.
Python機器學習筆記：Grid SearchCV（網格搜索）.
sklearn官網.

總結

今天是周日！斗破更新了，休息的時候我要馬上去看！😄

總結

以上是生活随笔為你收集整理的网格搜索的原理以及实战以及相关API（GridSearchCV）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Arcgis空间自相关Moran I（莫
下一篇：算法笔记（23）网格搜索及Python代