LGBM函数及参数详解
LGBM Python API
Dataset
class lightgbm.Dataset(data, label=None, max_bin=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)
創(chuàng)建一個(gè)帶label的訓(xùn)練集和交叉驗(yàn)證集
trn_data = lgb.Dataset(X_tr, label=y_tr)val_data = lgb.Dataset(X_valid, label=y_valid)Booster?
class lightgbm.Booster(params=None, train_set=None, model_file=None, silent=False)? ? #booster可用lgb來代替
- params???字典形式的參數(shù)
- train_set?Training dataset.
- model_file model文件的路徑
- silent 構(gòu)建模型時(shí)是否打印信息
只要有l(wèi)gb實(shí)例,就可以調(diào)用下列函數(shù) ,包括?predict,
lgb.add_valid??添加交叉訓(xùn)練集?
lgb.attr(key)? Get attribute string from the Booster.
lgb.current_iteration()??Get the index of the current iteration.
lgb.dump_model(num_iteration=-1)? ?Dump Booster to json format.
lgb.eval(data, name, feval=None)? ??Evaluate for data.? ?Result?– List with evaluation results.
lgb.eval_train(feval=None)? ?Evaluate for training data.
lgb.eval_valid(feval=None)? ? Evaluate for validation data.
lgb.feature_importance(importance_type='split', iteration=-1)? ?Get feature importances.
lgb.feature_name()? ?Get names of features.
lgb.free_dataset()? ? ?Free Booster’s Datasets.
lgb.free_network()? ? Free Network.
lgb.get_leaf_output(tree_id, leaf_id)? ?Get the output of a leaf.
lgb.num_feature()? ? ?Get number of features.
predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, pred_contrib=False, data_has_header=False, is_reshape=True, pred_parameter=None)
lgb.reset_parameter(params)
lgb.rollback_one_iter()? ? ? Rollback one iteration.
lgb.save_model(filename, num_iteration=-1)
lgb.set_attr(**kwargs)? ? ??Set the attribute of the Booster.
lgb.set_network(machines, local_listen_port=12400, listen_time_out=120, num_machines=1)? ??Set the network configuration.
lgb.set_train_data_name(name)
lgb.update(train_set=None, fobj=None)? ? ?Update for one iteration.
Train API
?lightgbm.train
lightgbm.train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, keep_training_booster=False, callbacks=None)
- params?(dict) – Parameters for training.
- train_set?(Dataset) – Data to be trained.
- num_boost_round?(int_,_?optional?(default=100)) – 迭代次數(shù),也就是弱學(xué)習(xí)器的數(shù)量
- valid_sets?– List of data to be evaluated during training.
訓(xùn)練參數(shù)中沒有單個(gè)弱學(xué)習(xí)器的參數(shù)(即決策樹的參數(shù)) ,params表示所有弱學(xué)習(xí)器的參數(shù)
params參數(shù)包括:
- "objective" : "regression", "binary",
- "metric" : "rmse","auc"
- "boosting": 'gbdt',
- ?"max_depth": -1,
- "min_child_samples": 20,?
- "num_leaves" : 31,
- "learning_rate" : 0.1,?
- "subsample" : 0.8,
- "colsample_bytree" : 0.8,?
- "verbosity": -1
- "bagging_freq": 5,
- "bagging_fraction" : 0.4,
- "feature_fraction" : 0.05,
- "min_data_in_leaf": 80,
- "min_sum_heassian_in_leaf": 10,
- "tree_learner": "serial",
- "boost_from_average": "false",
- "lambda_l1" : 5,
- "lambda_l2" : 5
- "bagging_seed" : random_state
- ?"seed": random_state
boosting_type?
提升算法類型:‘gbdt’,‘dart’,‘goss’,‘rf’ 。默認(rèn)算法為gbdt,一般用的最多。
dart:Dropouts meet Multiple Additive Regression Trees,是利用dropout解決過擬合的Regression Trees,利用了深度神經(jīng)網(wǎng)絡(luò)中dropout設(shè)置的技巧,隨機(jī)丟棄生成的決策樹,然后再?gòu)氖O碌臎Q策樹集中迭代優(yōu)化提升樹,特點(diǎn)是
- 因?yàn)殡S機(jī)dropout不使用用于保存預(yù)測(cè)結(jié)果的buffer所以訓(xùn)練會(huì)更慢
- 因?yàn)殡S機(jī),早停可能不夠穩(wěn)定
? ? dart與gbdt的不同點(diǎn):計(jì)算下一棵樹要擬合的梯度的時(shí)候,僅僅隨機(jī)從已經(jīng)生成的樹中選取一部分。?DART添加一棵樹時(shí)需? ? ? ? 要先歸一化。
goss?:基本思想是首先對(duì)訓(xùn)練集數(shù)據(jù)根據(jù)梯度排序,預(yù)設(shè)一個(gè)比例劃分梯度大小,保留在所有樣本中梯度大的數(shù)據(jù)樣本;再設(shè)? ? ? ?置一個(gè)采樣比例,從梯度小的樣本中按比例抽取樣本。為了彌補(bǔ)對(duì)樣本分布造成的影響,GOSS算法在計(jì)算信息增益時(shí),會(huì)? ? ? ?對(duì)較小梯度的數(shù)據(jù)集乘以一個(gè)系數(shù),用來放大。這樣,在計(jì)算信息增益時(shí),算法可以更加關(guān)注“未被充分訓(xùn)練”的樣本數(shù)據(jù)。? ? ? ? ?GOSS通過對(duì)較小的樣本數(shù)據(jù)集估算增益,大大的減少了計(jì)算量。而且通過證明,GOSS算法不會(huì)過多的降低訓(xùn)練的精度。
rf :隨機(jī)森林,很熟悉了。
num_leaves
因?yàn)長(zhǎng)ightGBM使用的是leaf-wise的算法,因此在調(diào)節(jié)樹的復(fù)雜程度時(shí),使用的是num_leaves而不是max_depth。大致?lián)Q算關(guān)系:num_leaves = 2^(max_depth)。它的值的設(shè)置應(yīng)該小于2^(max_depth),否則可能會(huì)導(dǎo)致過擬合。
max_depth
每個(gè)弱學(xué)習(xí)器也就是決策樹的最大深度,-1表示不限制,
n_estimators
弱學(xué)習(xí)器的數(shù)目,因?yàn)間bdt原理是利用通過梯度不斷擬合新的弱學(xué)習(xí)器,直到達(dá)到設(shè)定的弱學(xué)習(xí)器的數(shù)量。
learning_rate?
Boosting learning rate.?
max_bin
為直方圖算法中特征值離散化的分段數(shù)量
lightgbm.cv
lightgbm.cv(params, train_set, num_boost_round=10, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
總結(jié)
以上是生活随笔為你收集整理的LGBM函数及参数详解的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 双碳时代,数据中心供配电的“智”与“能”
- 下一篇: 最大化最小值和最小化最大值