sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略
sklearn:sklearn.feature_selection的SelectFromModel函數的簡介、使用方法之詳細攻略
?
?
目錄
SelectFromModel函數的簡介
1、使用SelectFromModel和LassoCV進行特征選擇
2、L1-based feature selection
3、Tree-based feature selection
SelectFromModel函數的使用方法
1、SelectFromModel的原生代碼
?
?
SelectFromModel函數的簡介
? ? ? ? SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.
? ? ? ? SelectFromModel是一個元轉換器,可以與任何在擬合后具有coef_或feature_importances_屬性的estimator 一起使用。如果相應的coef_或feature_importances_值低于提供的閾值參數,則認為這些特性不重要并將其刪除。除了以數字方式指定閾值外,還有使用字符串參數查找閾值的內置啟發式方法。可用的試探法是“平均數”、“中位數”和這些數的浮點倍數,如“0.1*平均數”。
?
?
官網API:https://scikit-learn.org/stable/modules/feature_selection.html#feature-selection-using-selectfrommodel
| ? """Meta-transformer for selecting features based on importance weights.? ? .. versionadded:: 0.17 | 用于根據重要性權重來選擇特征的元轉換器。 . .加入在0.17版本:: |
| ? ? ? Parameters ? ? ---------- ? ? estimator : object ? ? The base estimator from which the transformer is built. ? ? This can be both a fitted (if ``prefit`` is set to True) ? ? or a non-fitted estimator. The estimator must have either a ? ? ``feature_importances_`` or ``coef_`` attribute after fitting. ? ?? ? ? threshold : string, float, optional default None ? ? The threshold value to use for feature selection. Features whose ? ? importance is greater or equal are kept while the others are ? ? discarded. If "median" (resp. "mean"), then the ``threshold`` value is ? ? the median (resp. the mean) of the feature importances. A scaling ? ? factor (e.g., "1.25*mean") may also be used. If None and if the ? ? estimator has a parameter penalty set to l1, either explicitly ? ? or implicitly (e.g, Lasso), the threshold used is 1e-5. ? ? Otherwise, "mean" is used by default. ? ?? ? ? prefit : bool, default False ? ? Whether a prefit model is expected to be passed into the constructor ? ? directly or not. If True, ``transform`` must be called directly ? ? and SelectFromModel cannot be used with ``cross_val_score``, ? ? ``GridSearchCV`` and similar utilities that clone the estimator. ? ? Otherwise train the model using ``fit`` and then ``transform`` to do ? ? feature selection. ? ?? ? ? norm_order : non-zero int, inf, -inf, default 1 ? ? Order of the norm used to filter the vectors of coefficients below ? ? ``threshold`` in the case where the ``coef_`` attribute of the ? ? estimator is of dimension 2. | 參數
用于特征選擇的閾值。重要性大于或等于的特征被保留,其他特征被丟棄。如果“中位數”(分別地。(“均值”),則“閾值”為中位數(resp,特征重要性的平均值)。也可以使用比例因子(例如“1.25*平均值”)。如果沒有,并且估計量有一個參數懲罰設置為l1,不管是顯式的還是隱式的(例如Lasso),閾值為1e-5。否則,默認使用“mean”。
prefit模型是否應直接傳遞給構造函數。如果為True,則必須直接調用“transform”,SelectFromModel不能與cross_val_score 、GridSearchCV以及類似的克隆估計器的實用程序一起使用。否則,使用' ' fit ' '和' ' transform ' '訓練模型進行特征選擇。
|
| ? ? Attributes ? ? ---------- ? ? estimator_ : an estimator ? ? The base estimator from which the transformer is built. ? ? This is stored only when a non-fitted estimator is passed to the ? ? ``SelectFromModel``, i.e when prefit is False. ? ?? ? ? threshold_ : float ? ? The threshold value used for feature selection. ? ? """ | 屬性 建立轉換器的基estimator,只有在將非擬合估計量傳遞給SelectFromModel 時,才會存儲它。當prefit 為假時。 ? threshold_ :浮點類型 |
?
1、使用SelectFromModel和LassoCV進行特征選擇
# Author: Manoj Kumar <mks542@nyu.edu> # License: BSD 3 clauseprint(__doc__)import matplotlib.pyplot as plt import numpy as npfrom sklearn.datasets import load_boston from sklearn.feature_selection import SelectFromModel from sklearn.linear_model import LassoCV# Load the boston dataset. X, y = load_boston(return_X_y=True)# We use the base estimator LassoCV since the L1 norm promotes sparsity of features. clf = LassoCV()# Set a minimum threshold of 0.25 sfm = SelectFromModel(clf, threshold=0.25) sfm.fit(X, y) n_features = sfm.transform(X).shape[1]# Reset the threshold till the number of features equals two. # Note that the attribute can be set directly instead of repeatedly # fitting the metatransformer. while n_features > 2:sfm.threshold += 0.1X_transform = sfm.transform(X)n_features = X_transform.shape[1]# Plot the selected two features from X. plt.title("Features selected from Boston using SelectFromModel with ""threshold %0.3f." % sfm.threshold) feature1 = X_transform[:, 0] feature2 = X_transform[:, 1] plt.plot(feature1, feature2, 'r.') plt.xlabel("Feature number 1") plt.ylabel("Feature number 2") plt.ylim([np.min(feature2), np.max(feature2)]) plt.show()2、L1-based feature selection
>>> from sklearn.svm import LinearSVC >>> from sklearn.datasets import load_iris >>> from sklearn.feature_selection import SelectFromModel >>> X, y = load_iris(return_X_y=True) >>> X.shape (150, 4) >>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y) >>> model = SelectFromModel(lsvc, prefit=True) >>> X_new = model.transform(X) >>> X_new.shape (150, 3)?
3、Tree-based feature selection
>>> from sklearn.ensemble import ExtraTreesClassifier >>> from sklearn.datasets import load_iris >>> from sklearn.feature_selection import SelectFromModel >>> X, y = load_iris(return_X_y=True) >>> X.shape (150, 4) >>> clf = ExtraTreesClassifier(n_estimators=50) >>> clf = clf.fit(X, y) >>> clf.feature_importances_ array([ 0.04..., 0.05..., 0.4..., 0.4...]) >>> model = SelectFromModel(clf, prefit=True) >>> X_new = model.transform(X) >>> X_new.shape (150, 2)?
?
SelectFromModel函數的使用方法
1、SelectFromModel的原生代碼
class SelectFromModel Found at: sklearn.feature_selection.from_modelclass SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):"""Meta-transformer for selecting features based on importance weights... versionadded:: 0.17Parameters----------estimator : objectThe base estimator from which the transformer is built.This can be both a fitted (if ``prefit`` is set to True)or a non-fitted estimator. The estimator must have either a``feature_importances_`` or ``coef_`` attribute after fitting.threshold : string, float, optional default NoneThe threshold value to use for feature selection. Features whoseimportance is greater or equal are kept while the others arediscarded. If "median" (resp. "mean"), then the ``threshold`` value isthe median (resp. the mean) of the feature importances. A scalingfactor (e.g., "1.25*mean") may also be used. If None and if theestimator has a parameter penalty set to l1, either explicitlyor implicitly (e.g, Lasso), the threshold used is 1e-5.Otherwise, "mean" is used by default.prefit : bool, default FalseWhether a prefit model is expected to be passed into the constructordirectly or not. If True, ``transform`` must be called directlyand SelectFromModel cannot be used with ``cross_val_score``,``GridSearchCV`` and similar utilities that clone the estimator.Otherwise train the model using ``fit`` and then ``transform`` to dofeature selection.norm_order : non-zero int, inf, -inf, default 1Order of the norm used to filter the vectors of coefficients below``threshold`` in the case where the ``coef_`` attribute of theestimator is of dimension 2.Attributes----------estimator_ : an estimatorThe base estimator from which the transformer is built.This is stored only when a non-fitted estimator is passed to the``SelectFromModel``, i.e when prefit is False.threshold_ : floatThe threshold value used for feature selection."""def __init__(self, estimator, threshold=None, prefit=False, norm_order=1):self.estimator = estimatorself.threshold = thresholdself.prefit = prefitself.norm_order = norm_orderdef _get_support_mask(self):# SelectFromModel can directly call on transform.if self.prefit:estimator = self.estimatorelif hasattr(self, 'estimator_'):estimator = self.estimator_else:raise ValueError('Either fit SelectFromModel before transform or set "prefit=''True" and pass a fitted estimator to the constructor.')scores = _get_feature_importances(estimator, self.norm_order)threshold = _calculate_threshold(estimator, scores, self.threshold)return scores >= thresholddef fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")self.estimator_ = clone(self.estimator)self.estimator_.fit(X, y, **fit_params)return self@propertydef threshold_(self):scores = _get_feature_importances(self.estimator_, self.norm_order)return _calculate_threshold(self.estimator, scores, self.threshold)@if_delegate_has_method('estimator')def partial_fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer only once.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")if not hasattr(self, "estimator_"):self.estimator_ = clone(self.estimator)self.estimator_.partial_fit(X, y, **fit_params)return self?
?
?
?
總結
以上是生活随笔為你收集整理的sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: sklearn之XGBModel:XGB
- 下一篇: 成功解决 python 不是内部或外部命