當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

网格搜索法

發(fā)布時(shí)間：2023/12/14 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了网格搜索法小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

網(wǎng)格搜索法是指定參數(shù)值的一種窮舉搜索方法，通過將估計(jì)函數(shù)的參數(shù)通過交叉驗(yàn)證的方法進(jìn)行優(yōu)化來得到最優(yōu)的學(xué)習(xí)算法。

即，將各個(gè)參數(shù)可能的取值進(jìn)行排列組合，列出所有可能的組合結(jié)果生成“網(wǎng)格”。然后將各組合用于SVM訓(xùn)練，并使用交叉驗(yàn)證對(duì)表現(xiàn)進(jìn)行評(píng)估。在擬合函數(shù)嘗試了所有的參數(shù)組合后，返回一個(gè)合適的分類器，自動(dòng)調(diào)整至最佳參數(shù)組合，可以通過clf.best_params_獲得參數(shù)值

版權(quán)聲明：本文為博主原創(chuàng)文章，未經(jīng)博主允許不得轉(zhuǎn)載。 https://blog.csdn.net/sinat_32547403/article/details/73008127

交叉驗(yàn)證與網(wǎng)格搜索

交叉驗(yàn)證與網(wǎng)格搜索是機(jī)器學(xué)習(xí)中的兩個(gè)非常重要且基本的概念，但是這兩個(gè)概念在剛?cè)腴T的時(shí)候并不是非常容易理解與掌握，自己開始學(xué)習(xí)的時(shí)候，對(duì)這兩個(gè)概念理解的并不到位，現(xiàn)在寫一篇關(guān)于交叉驗(yàn)證與網(wǎng)格搜索的文章，將這兩個(gè)基本的概念做一下梳理。

網(wǎng)格搜索

網(wǎng)格搜索（Grid Search）名字非常大氣，但是用簡(jiǎn)答的話來說就是你手動(dòng)的給出一個(gè)模型中你想要改動(dòng)的所用的參數(shù)，程序自動(dòng)的幫你使用窮舉法來將所用的參數(shù)都運(yùn)行一遍。決策樹中我們常常將最大樹深作為需要調(diào)節(jié)的參數(shù)；AdaBoost中將弱分類器的數(shù)量作為需要調(diào)節(jié)的參數(shù)。

評(píng)分方法

為了確定搜索參數(shù)，也就是手動(dòng)設(shè)定的調(diào)節(jié)的變量的值中，那個(gè)是最好的，這時(shí)就需要使用一個(gè)比較理想的評(píng)分方式（這個(gè)評(píng)分方式是根據(jù)實(shí)際情況來確定的可能是accuracy、f1-score、f-beta、pricise、recall等）

交叉驗(yàn)證

有了好的評(píng)分方式，但是只用一次的結(jié)果就能說明某組的參數(shù)組合比另外的參數(shù)組合好嗎？這顯然是不嚴(yán)謹(jǐn)?shù)?#xff0c;上小學(xué)的時(shí)候老師就告訴我們要求平均��。所以就有了交叉驗(yàn)證這一概念。下面以K折交叉驗(yàn)證為例介紹這一概念。

首先進(jìn)行數(shù)據(jù)分割?
將原始數(shù)據(jù)集分為訓(xùn)練集和測(cè)試集。如下圖以8：2的方式分割：?
?
訓(xùn)練集使用來訓(xùn)練模型，測(cè)試集使用來測(cè)試模型的準(zhǔn)確率。?
注意：絕對(duì)不能使用測(cè)試集來訓(xùn)練數(shù)據(jù)，這相當(dāng)于考試的時(shí)候先讓你把考試的答案背過了，又讓你參加考試。

數(shù)據(jù)驗(yàn)真?
在k折交叉驗(yàn)證方法中其中K-1份作為訓(xùn)練數(shù)據(jù)，剩下的一份作為驗(yàn)真數(shù)據(jù)：

?
這個(gè)過程一共需要進(jìn)行K次，將最后K次使用實(shí)現(xiàn)選擇好的評(píng)分方式的評(píng)分求平均返回，然后找出最大的一個(gè)評(píng)分對(duì)用的參數(shù)組合。這也就完成了交叉驗(yàn)證這一過程。

### 舉例下面使用一個(gè)簡(jiǎn)單的例子（預(yù)測(cè)年收入是否大于5萬美元）來進(jìn)行說明網(wǎng)格搜索與交叉驗(yàn)證的使用。數(shù)據(jù)集來自[UCI機(jī)器學(xué)習(xí)知識(shí)庫](https://archive.ics.uci.edu/ml/datasets/Census+Income)。

import numpy as np import pandas as pd from IPython.display import display from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.metrics import make_scorer, fbeta_score, accuracy_score from sklearn.model_selection import GridSearchCV, KFold%matplotlib inline data = pd.read_csv("census.csv")# 將數(shù)據(jù)切分成特征和標(biāo)簽 income_raw = data['income'] features_raw = data.drop('income', axis=1)# 顯示部分?jǐn)?shù)據(jù) # display(features_raw.head(n=1))# 因?yàn)樵紨?shù)據(jù)中的，capital-gain 和 capital-loss的傾斜度非常高，所以要是用對(duì)數(shù)轉(zhuǎn)換。 skewed = ['capital-gain', 'capital-loss'] features_raw[skewed] = data[skewed].apply(lambda x: np.log(x + 1))# 歸一化數(shù)字特征,是為了保證所有的特征均被平等的對(duì)待 scaler = MinMaxScaler() numerical = ['age', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'] features_raw[numerical] = scaler.fit_transform(data[numerical]) # display(features_raw.head(n=1))# 獨(dú)熱編碼，將非數(shù)字的形式轉(zhuǎn)化為數(shù)字 features = pd.get_dummies(features_raw) income = income_raw.replace(['>50K', ['<=50K']], [1, 0])# 切分?jǐn)?shù)據(jù)集 X_train, X_test, y_train, y_test = train_test_split(features, income, test_size=0.2, random_state=0)# Adaboost from sklearn.ensemble import AdaBoostClassifier clf_Ada = AdaBoostClassifier(random_state=0)# 決策樹 from sklearn.tree import DecisionTreeClassifier clf_Tree = DecisionTreeClassifier(random_state=0)# KNN from sklearn.neighbors import KNeighborsClassifier clf_KNN = KNeighborsClassifier()# SVM from sklearn.svm import SVC clf_svm = SVC(random_state=0)# Logistic from sklearn.linear_model import LogisticRegression clf_log = LogisticRegression(random_state=0)# 隨機(jī)森林 from sklearn.ensemble import RandomForestClassifier clf_forest = RandomForestClassifier(random_state=0)# GBDT from sklearn.ensemble import GradientBoostingClassifier clf_gbdt = GradientBoostingClassifier(random_state=0)# GaussianNB from sklearn.naive_bayes import GaussianNB clf_NB = GaussianNB()scorer = make_scorer(accuracy_score)# 參數(shù)調(diào)優(yōu)kfold = KFold(n_splits=10) # 決策樹 parameter_tree = {'max_depth': xrange(1, 10)} grid = GridSearchCV(clf_Tree, parameter_tree, scorer, cv=kfold) grid = grid.fit(X_train, y_train)print "best score: {}".format(grid.best_score_) display(pd.DataFrame(grid.cv_results_).T)

best score: 0.855737070514.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

?012345678mean_fit_timemean_score_timemean_test_scoremean_train_scoreparam_max_depthparamsrank_test_scoresplit0_test_scoresplit0_train_scoresplit1_test_scoresplit1_train_scoresplit2_test_scoresplit2_train_scoresplit3_test_scoresplit3_train_scoresplit4_test_scoresplit4_train_scoresplit5_test_scoresplit5_train_scoresplit6_test_scoresplit6_train_scoresplit7_test_scoresplit7_train_scoresplit8_test_scoresplit8_train_scoresplit9_test_scoresplit9_train_scorestd_fit_timestd_score_timestd_test_scorestd_train_score

0.0562535	0.0692133	0.0885126	0.110233	0.128337	0.158719	0.17124	0.193637	0.223979
0.00240474	0.00228212	0.00221529	0.0026047	0.00226772	0.00254297	0.00231481	0.00246696	0.00256622
0.75114	0.823811	0.839345	0.839926	0.846671	0.852392	0.851508	0.853139	0.855737
0.75114	0.82421	0.839628	0.840503	0.847878	0.853329	0.855264	0.859202	0.863667
1	2	3	4	5	6	7	8	9
{u’max_depth’: 1}	{u’max_depth’: 2}	{u’max_depth’: 3}	{u’max_depth’: 4}	{u’max_depth’: 5}	{u’max_depth’: 6}	{u’max_depth’: 7}	{u’max_depth’: 8}	{u’max_depth’: 9}
9	8	7	6	5	3	4	2	1
0.760641	0.8267	0.843836	0.844666	0.851575	0.855721	0.855445	0.86042	0.859038
0.750084	0.823828	0.839184	0.83943	0.847538	0.852913	0.854295	0.859947	0.863233
0.758154	0.821172	0.839138	0.842454	0.845218	0.849641	0.847706	0.850746	0.852957
0.750361	0.824442	0.839706	0.845911	0.850088	0.854203	0.855831	0.861482	0.864984
0.754837	0.824212	0.840243	0.84052	0.8466	0.854616	0.854339	0.854063	0.856551
0.750729	0.824718	0.839031	0.839307	0.847323	0.852237	0.854203	0.859578	0.86397
0.73162	0.820619	0.838032	0.838308	0.8466	0.850746	0.848535	0.846877	0.852957
0.753309	0.824503	0.839829	0.840106	0.848337	0.853742	0.85537	0.858104	0.863171
0.746545	0.818684	0.83361	0.833886	0.83969	0.847982	0.845495	0.85047	0.848811
0.751651	0.824718	0.840321	0.840597	0.844897	0.853558	0.858319	0.861912	0.864922
0.754284	0.826147	0.844942	0.845218	0.854063	0.859038	0.85738	0.858209	0.861802
0.750791	0.823889	0.839061	0.839338	0.847323	0.852729	0.854111	0.856967	0.862741
0.754284	0.825318	0.838032	0.837756	0.845495	0.848535	0.848535	0.852128	0.857103
0.750791	0.823981	0.839829	0.840167	0.848429	0.853773	0.855647	0.857766	0.863141
0.749793	0.821399	0.835499	0.835499	0.844623	0.85264	0.852087	0.853746	0.85264
0.75129	0.824416	0.840111	0.840418	0.848372	0.853501	0.854945	0.860811	0.863882
0.753387	0.826375	0.838264	0.83854	0.84407	0.852087	0.852917	0.852364	0.858446
0.750891	0.823864	0.839803	0.84008	0.848372	0.853071	0.854945	0.857801	0.863391
0.747857	0.827481	0.841858	0.842411	0.84877	0.852917	0.85264	0.852364	0.857064
0.751505	0.823741	0.839404	0.839681	0.848096	0.853563	0.854975	0.857647	0.863237
0.0123583	0.00442788	0.00552026	0.00631691	0.0053195	0.0157011	0.00476991	0.00622854	0.0147429
0.000529214	0.000467091	0.000355028	0.000760624	0.000460829	0.000504627	0.000446289	0.000445256	0.000449312
0.00769898	0.00292464	0.00333118	0.00358776	0.00382496	0.00324406	0.00360414	0.00366389	0.00363761
0.000855482	0.000366166	0.000418973	0.00185264	0.00124698	0.000553171	0.00116151	0.00168732	0.000726325

總結(jié)

以上是生活随笔為你收集整理的网格搜索法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

网格

上一篇：空间分析知识点总结
下一篇： 2048游戏作者：2048的成功和我