sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的簡介、使用方法之詳細(xì)攻略
?
?
目錄
feature_importances_
1、feature_importances_方法的解釋
2、feature_importances_的原生代碼
plot_importance
1、plot_importance方法的解釋
2、XGBModel之plot_importance的原生代碼
?
相關(guān)文章
ML之xgboost:解讀用法之xgboost庫的core.py文件中的get_score(importance_type=self.importance_type)方法
ML之xgboost :xgboost.plot_importance()函數(shù)的解讀
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的簡介、使用方法之詳細(xì)攻略
?
?
feature_importances_
1、feature_importances_方法的解釋
XGBRegressor( ).feature_importances_
參數(shù)
- 注意:特性重要性只定義為樹增強器。只有在選擇決策樹模型作為基礎(chǔ)時,才定義特征重要性。
- 學(xué)習(xí)器(“助推器= gbtree”)。它不定義為其他基本的學(xué)習(xí)者類型,如線性學(xué)習(xí)者?(`booster=gblinear`).。
返回
- feature_importances_: ' ' [n_features] ' '形狀的數(shù)組
注意:importance_type: string, default "gain", The feature importance type for the feature_importances_ property: either "gain", "weight", "cover", "total_gain" or "total_cover".
?
2、feature_importances_的原生代碼
class XGBModel(XGBModelBase):# pylint: disable=too-many-arguments, too-many-instance-attributes, invalid-name"""Implementation of the Scikit-Learn API for XGBoost.Parameters----------max_depth : intMaximum tree depth for base learners.learning_rate : floatBoosting learning rate (xgb's "eta")n_estimators : intNumber of boosted trees to fit.silent : booleanWhether to print messages while running boosting.objective : string or callableSpecify the learning task and the corresponding learning objective ora custom objective function to be used (see note below).booster: stringSpecify which booster to use: gbtree, gblinear or dart.nthread : intNumber of parallel threads used to run xgboost. (Deprecated, please use ``n_jobs``)n_jobs : intNumber of parallel threads used to run xgboost. (replaces ``nthread``)gamma : floatMinimum loss reduction required to make a further partition on a leaf node of the tree.min_child_weight : intMinimum sum of instance weight(hessian) needed in a child.max_delta_step : intMaximum delta step we allow each tree's weight estimation to be.subsample : floatSubsample ratio of the training instance.colsample_bytree : floatSubsample ratio of columns when constructing each tree.colsample_bylevel : floatSubsample ratio of columns for each split, in each level.reg_alpha : float (xgb's alpha)L1 regularization term on weightsreg_lambda : float (xgb's lambda)L2 regularization term on weightsscale_pos_weight : floatBalancing of positive and negative weights.base_score:The initial prediction score of all instances, global bias.seed : intRandom number seed. (Deprecated, please use random_state)random_state : intRandom number seed. (replaces seed)missing : float, optionalValue in the data which needs to be present as a missing value. IfNone, defaults to np.nan.importance_type: string, default "gain"The feature importance type for the feature_importances_ property: either "gain","weight", "cover", "total_gain" or "total_cover".\*\*kwargs : dict, optionalKeyword arguments for XGBoost Booster object. Full documentation of parameters canbe found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst.Attempting to set a parameter via the constructor args and \*\*kwargs dict simultaneouslywill result in a TypeError... note:: \*\*kwargs unsupported by scikit-learn\*\*kwargs is unsupported by scikit-learn. We do not guarantee that parameterspassed via this argument will interact properly with scikit-learn.Note----A custom objective function can be provided for the ``objective``parameter. In this case, it should have the signature``objective(y_true, y_pred) -> grad, hess``:y_true: array_like of shape [n_samples]The target valuesy_pred: array_like of shape [n_samples]The predicted valuesgrad: array_like of shape [n_samples]The value of the gradient for each sample point.hess: array_like of shape [n_samples]The value of the second derivative for each sample point"""def __init__(self, max_depth=3, learning_rate=0.1, n_estimators=100,silent=True, objective="reg:linear", booster='gbtree',n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0,subsample=1, colsample_bytree=1, colsample_bylevel=1,reg_alpha=0, reg_lambda=1, scale_pos_weight=1,base_score=0.5, random_state=0, seed=None, missing=None,importance_type="gain", **kwargs):if not SKLEARN_INSTALLED:raise XGBoostError('sklearn needs to be installed in order to use this module')self.max_depth = max_depthself.learning_rate = learning_rateself.n_estimators = n_estimatorsself.silent = silentself.objective = objectiveself.booster = boosterself.gamma = gammaself.min_child_weight = min_child_weightself.max_delta_step = max_delta_stepself.subsample = subsampleself.colsample_bytree = colsample_bytreeself.colsample_bylevel = colsample_bylevelself.reg_alpha = reg_alphaself.reg_lambda = reg_lambdaself.scale_pos_weight = scale_pos_weightself.base_score = base_scoreself.missing = missing if missing is not None else np.nanself.kwargs = kwargsself._Booster = Noneself.seed = seedself.random_state = random_stateself.nthread = nthreadself.n_jobs = n_jobsself.importance_type = importance_typedef feature_importances_(self):"""Feature importances property.. note:: Feature importance is defined only for tree boostersFeature importance is only defined when the decision tree model is chosen as baselearner (`booster=gbtree`). It is not defined for other base learner types, suchas linear learners (`booster=gblinear`).Returns-------feature_importances_ : array of shape ``[n_features]``"""if getattr(self, 'booster', None) is not None and self.booster != 'gbtree':raise AttributeError('Feature importance is not defined for Booster type {}'.format(self.booster))b = self.get_booster()score = b.get_score(importance_type=self.importance_type)all_features = [score.get(f, 0.) for f in b.feature_names]all_features = np.array(all_features, dtype=np.float32)return all_features / all_features.sum()?
?
?
?
plot_importance
1、plot_importance方法的解釋
作用:基于擬合樹的重要性可視化。
參數(shù)
- ? ? booster : Booster, XGBModel or dict. Booster or XGBModel instance, or dict taken by Booster.get_fscore()
- ? ? ax : matplotlib Axes, default None. Target axes instance. If None, new figure and axes will be created.
- ? ? grid : bool, Turn the axes grids on or off. ?Default is True (On).
- ? ? importance_type : str, default "weight".? ?How the importance is calculated: either "weight", "gain", or "cover"
? ? ? ? * "weight" is the number of times a feature appears in a tree,在樹中出現(xiàn)的次數(shù)。
? ? ? ? * "gain" is the average gain of splits which use the feature,使用該特性的分割的平均增益。
? ? ? ? * "cover" is the average coverage of splits which use the feature,? where coverage is defined as the number of samples affected by the split.? 分割的平均覆蓋率,其中覆蓋率定義為受分割影響的樣本數(shù)。 - ? ? max_num_features : int, default None. ?Maximum number of top features displayed on plot. If None, all features will be displayed.
- ? ? height : float, default 0.2. ?Bar height, passed to ax.barh()
- ? ? xlim : tuple, default None. ? Tuple passed to axes.xlim()
- ? ? ylim : tuple, default None. ?Tuple passed to axes.ylim()
- ? ? title : str, default "Feature importance". ?Axes title. To disable, pass None.
- ? ? xlabel : str, default "F score". ?X axis title label. To disable, pass None.
- ? ? ylabel : str, default "Features". ?Y axis title label. To disable, pass None.
- ? ? show_values : bool, default True. ?Show values on plot. To disable, pass False.
- ? ? kwargs : ?Other keywords passed to ax.barh()
返回
- ax : matplotlib Axes
?
2、XGBModel之plot_importance的原生代碼
# coding: utf-8 # pylint: disable=too-many-locals, too-many-arguments, invalid-name, # pylint: disable=too-many-branches """Plotting Library.""" from __future__ import absolute_importimport re from io import BytesIO import numpy as np from .core import Booster from .sklearn import XGBModeldef plot_importance(booster, ax=None, height=0.2,xlim=None, ylim=None, title='Feature importance',xlabel='F score', ylabel='Features',importance_type='weight', max_num_features=None,grid=True, show_values=True, **kwargs):"""Plot importance based on fitted trees.Parameters----------booster : Booster, XGBModel or dictBooster or XGBModel instance, or dict taken by Booster.get_fscore()ax : matplotlib Axes, default NoneTarget axes instance. If None, new figure and axes will be created.grid : bool, Turn the axes grids on or off. Default is True (On).importance_type : str, default "weight"How the importance is calculated: either "weight", "gain", or "cover"* "weight" is the number of times a feature appears in a tree* "gain" is the average gain of splits which use the feature* "cover" is the average coverage of splits which use the featurewhere coverage is defined as the number of samples affected by the splitmax_num_features : int, default NoneMaximum number of top features displayed on plot. If None, all features will be displayed.height : float, default 0.2Bar height, passed to ax.barh()xlim : tuple, default NoneTuple passed to axes.xlim()ylim : tuple, default NoneTuple passed to axes.ylim()title : str, default "Feature importance"Axes title. To disable, pass None.xlabel : str, default "F score"X axis title label. To disable, pass None.ylabel : str, default "Features"Y axis title label. To disable, pass None.show_values : bool, default TrueShow values on plot. To disable, pass False.kwargs :Other keywords passed to ax.barh()Returns-------ax : matplotlib Axes"""# TODO: move this to compat.pytry:import matplotlib.pyplot as pltexcept ImportError:raise ImportError('You must install matplotlib to plot importance')if isinstance(booster, XGBModel):importance = booster.get_booster().get_score(importance_type=importance_type)elif isinstance(booster, Booster):importance = booster.get_score(importance_type=importance_type)elif isinstance(booster, dict):importance = boosterelse:raise ValueError('tree must be Booster, XGBModel or dict instance')if len(importance) == 0:raise ValueError('Booster.get_score() results in empty')tuples = [(k, importance[k]) for k in importance]if max_num_features is not None:tuples = sorted(tuples, key=lambda x: x[1])[-max_num_features:]else:tuples = sorted(tuples, key=lambda x: x[1])labels, values = zip(*tuples)if ax is None:_, ax = plt.subplots(1, 1)ylocs = np.arange(len(values))ax.barh(ylocs, values, align='center', height=height, **kwargs)if show_values is True:for x, y in zip(values, ylocs):ax.text(x + 1, y, x, va='center')ax.set_yticks(ylocs)ax.set_yticklabels(labels)if xlim is not None:if not isinstance(xlim, tuple) or len(xlim) != 2:raise ValueError('xlim must be a tuple of 2 elements')else:xlim = (0, max(values) * 1.1)ax.set_xlim(xlim)if ylim is not None:if not isinstance(ylim, tuple) or len(ylim) != 2:raise ValueError('ylim must be a tuple of 2 elements')else:ylim = (-1, len(values))ax.set_ylim(ylim)if title is not None:ax.set_title(title)if xlabel is not None:ax.set_xlabel(xlabel)if ylabel is not None:ax.set_ylabel(ylabel)ax.grid(grid)return ax_NODEPAT = re.compile(r'(\d+):\[(.+)\]') _LEAFPAT = re.compile(r'(\d+):(leaf=.+)') _EDGEPAT = re.compile(r'yes=(\d+),no=(\d+),missing=(\d+)') _EDGEPAT2 = re.compile(r'yes=(\d+),no=(\d+)')def _parse_node(graph, text, condition_node_params, leaf_node_params):"""parse dumped node"""match = _NODEPAT.match(text)if match is not None:node = match.group(1)graph.node(node, label=match.group(2), **condition_node_params)return nodematch = _LEAFPAT.match(text)if match is not None:node = match.group(1)graph.node(node, label=match.group(2), **leaf_node_params)return noderaise ValueError('Unable to parse node: {0}'.format(text))def _parse_edge(graph, node, text, yes_color='#0000FF', no_color='#FF0000'):"""parse dumped edge"""try:match = _EDGEPAT.match(text)if match is not None:yes, no, missing = match.groups()if yes == missing:graph.edge(node, yes, label='yes, missing', color=yes_color)graph.edge(node, no, label='no', color=no_color)else:graph.edge(node, yes, label='yes', color=yes_color)graph.edge(node, no, label='no, missing', color=no_color)returnexcept ValueError:passmatch = _EDGEPAT2.match(text)if match is not None:yes, no = match.groups()graph.edge(node, yes, label='yes', color=yes_color)graph.edge(node, no, label='no', color=no_color)returnraise ValueError('Unable to parse edge: {0}'.format(text))def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT',yes_color='#0000FF', no_color='#FF0000',condition_node_params=None, leaf_node_params=None, **kwargs):"""Convert specified tree to graphviz instance. IPython can automatically plot thereturned graphiz instance. Otherwise, you should call .render() methodof the returned graphiz instance.Parameters----------booster : Booster, XGBModelBooster or XGBModel instancefmap: str (optional)The name of feature map filenum_trees : int, default 0Specify the ordinal number of target treerankdir : str, default "UT"Passed to graphiz via graph_attryes_color : str, default '#0000FF'Edge color when meets the node condition.no_color : str, default '#FF0000'Edge color when doesn't meet the node condition.condition_node_params : dict (optional)condition node configuration,{'shape':'box','style':'filled,rounded','fillcolor':'#78bceb'}leaf_node_params : dict (optional)leaf node configuration{'shape':'box','style':'filled','fillcolor':'#e48038'}kwargs :Other keywords passed to graphviz graph_attrReturns-------ax : matplotlib Axes"""if condition_node_params is None:condition_node_params = {}if leaf_node_params is None:leaf_node_params = {}try:from graphviz import Digraphexcept ImportError:raise ImportError('You must install graphviz to plot tree')if not isinstance(booster, (Booster, XGBModel)):raise ValueError('booster must be Booster or XGBModel instance')if isinstance(booster, XGBModel):booster = booster.get_booster()tree = booster.get_dump(fmap=fmap)[num_trees]tree = tree.split()kwargs = kwargs.copy()kwargs.update({'rankdir': rankdir})graph = Digraph(graph_attr=kwargs)for i, text in enumerate(tree):if text[0].isdigit():node = _parse_node(graph, text, condition_node_params=condition_node_params,leaf_node_params=leaf_node_params)else:if i == 0:# 1st string must be noderaise ValueError('Unable to parse given string as tree')_parse_edge(graph, node, text, yes_color=yes_color,no_color=no_color)return graphdef plot_tree(booster, fmap='', num_trees=0, rankdir='UT', ax=None, **kwargs):"""Plot specified tree.Parameters----------booster : Booster, XGBModelBooster or XGBModel instancefmap: str (optional)The name of feature map filenum_trees : int, default 0Specify the ordinal number of target treerankdir : str, default "UT"Passed to graphiz via graph_attrax : matplotlib Axes, default NoneTarget axes instance. If None, new figure and axes will be created.kwargs :Other keywords passed to to_graphvizReturns-------ax : matplotlib Axes"""try:import matplotlib.pyplot as pltimport matplotlib.image as imageexcept ImportError:raise ImportError('You must install matplotlib to plot tree')if ax is None:_, ax = plt.subplots(1, 1)g = to_graphviz(booster, fmap=fmap, num_trees=num_trees,rankdir=rankdir, **kwargs)s = BytesIO()s.write(g.pipe(format='png'))s.seek(0)img = image.imread(s)ax.imshow(img)ax.axis('off')return ax
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Appendix:附录文件内容记录脚本代
- 下一篇: sklearn:sklearn.feat