當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

xgboost参数_XGBoost实战和参数详解

發(fā)布時間：2025/3/15 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 xgboost参数_XGBoost实战和参数详解小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

xgboost優(yōu)點

正則化
并行處理？
靈活性，支持自定義目標(biāo)函數(shù)和損失函數(shù)，二階可導(dǎo)
缺失值的處理
剪枝，不容易過擬合
內(nèi)置了交叉驗證

參數(shù)的設(shè)置

params = {'booster': 'gbtree', 'objective': 'multi:softmax', # 多分類的問題'num_class': 10, # 類別數(shù)，與 multisoftmax 并用'gamma': 0.1, # 用于控制是否后剪枝的參數(shù),越大越保守，一般0.1、0.2這樣子。'max_depth': 12, # 構(gòu)建樹的深度，越大越容易過擬合'lambda': 2, # 控制模型復(fù)雜度的權(quán)重值的L2正則化項參數(shù)，參數(shù)越大，模型越不容易過擬合。'subsample': 0.7, # 隨機采樣訓(xùn)練樣本'colsample_bytree': 0.7, # 生成樹時進行的列采樣'min_child_weight': 3,'silent': 1, # 設(shè)置成1則沒有運行信息輸出，最好是設(shè)置為0.'eta': 0.007, # 如同學(xué)習(xí)率'seed': 1000,'nthread': 4, # cpu 線程數(shù) }

booster 默認是gbtree ,gblinear
slient 0是打印運行時的信息，1代表緘默方式運行
nthread 運行的線程數(shù)
num_pbuffer 緩存區(qū)的大小，訓(xùn)練實例的數(shù)目，不需要人為進行設(shè)置
num_feature 特征的個數(shù)，自動進行設(shè)置

##############################################################################

eta 防止過擬合的更新步長 0.3
gamma 默認為0
max_depth 6 樹的最大深度
min_child_weight 默認是1 ，孩子節(jié)點中最小樣本的權(quán)重之和，小于該值，拆分結(jié)束
max_delta_step 0 每個數(shù)的權(quán)重被估計的值。通常設(shè)置為0，沒有約束。正數(shù)，跟新的過程更加保守，Lr中。樣本不均衡，可以設(shè)置為大于0的數(shù)
subsample 【depault=1】訓(xùn)練模型的子樣本占整個樣本集合的比例。防止過采樣
colsample_btree 1 特征的采樣比例

#################################################################################

lambda 正則化l2的懲罰系數(shù)
alpha l1正則化的懲罰系數(shù)
lambda_bias 在偏智上的L2正則

#################################################################################

objective [ default=reg:linear ]
定義學(xué)習(xí)任務(wù)及相應(yīng)的學(xué)習(xí)目標(biāo)，可選的目標(biāo)函數(shù)如下：
- “reg:linear” —— 線性回歸。
- “reg:logistic”—— 邏輯回歸。
- “binary:logistic”—— 二分類的邏輯回歸問題，輸出為概率。
- “binary:logitraw”—— 二分類的邏輯回歸問題，輸出的結(jié)果為wTx。
- “count:poisson”—— 計數(shù)問題的poisson回歸，輸出結(jié)果為poisson分布。在poisson回歸中，max_delta_step的缺省值為0.7。(used to safeguard optimization)
- “multi:softmax” –讓XGBoost采用softmax目標(biāo)函數(shù)處理多分類問題，同時需要設(shè)置參數(shù)num_class（類別個數(shù)）
- “multi:softprob” –和softmax一樣，但是輸出的是ndata * nclass的向量，可以將該向量reshape成ndata行nclass列的矩陣。沒行數(shù)據(jù)表示樣本所屬于每個類別的概率。
- “rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss

base_score [ default=0.5 ]
- 所有實例的初始化預(yù)測分數(shù)，全局偏置；
- 為了足夠的迭代次數(shù)，改變這個值將不會有太大的影響。

eval_metric [ default according to objective ]
- 校驗數(shù)據(jù)所需要的評價指標(biāo)，不同的目標(biāo)函數(shù)將會有缺省的評價指標(biāo)（rmse for regression, and error for classification, mean average precision for ranking）-
- 用戶可以添加多種評價指標(biāo)，對于Python用戶要以list傳遞參數(shù)對給程序，而不是map參數(shù)list參數(shù)不會覆蓋’eval_metric’
- 可供的選擇如下:
  - “rmse”: root mean square error
  - “l(fā)ogloss”: negative log-likelihood
  - “error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
  - “merror”: Multiclass classification error rate. It is calculated as #(wrongcases)#(allcases).
  - “mlogloss”: Multiclass logloss
  - “auc”: Area under the curve for ranking evaluation.
  - “ndcg”:Normalized Discounted Cumulative Gain
  - “map”:Mean average precision
  - “ndcg@n”,”map@n”: n can be assigned as an integer to cut off the top positions in the lists for evaluation.
  - “ndcg-“,”map-“,”ndcg@n-“,”map@n-“: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions. training repeatively
seed [ default=0 ]
- 隨機數(shù)的種子。缺省值為0

章華燕：史上最詳細的XGBoost實戰(zhàn)?zhuanlan.zhihu.com

參數(shù)調(diào)整

確定boosting參數(shù)，預(yù)先設(shè)定其他參數(shù)的初始值

max_depth = 5 min_child_weight = 1 gamma = 0 subsample,colsample_bytree = 0.8 scale_pos_weight = 1 cv 確定 n_estimators

網(wǎng)格搜索確定max_depth 和min_child_weight

確定gamma參數(shù)的調(diào)優(yōu)

調(diào)整subsample和colsample_bytree 的參數(shù)

正則化參數(shù)的調(diào)優(yōu)

降低學(xué)習(xí)速率

Dukey：【轉(zhuǎn)】XGBoost參數(shù)調(diào)優(yōu)完全指南（附Python代碼）?zhuanlan.zhihu.com

總結(jié)

以上是生活随笔為你收集整理的xgboost参数_XGBoost实战和参数详解的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： videojs暂停时显示大按钮_紧急！西
下一篇： css label 居中布局_用好这20