ML之DT:基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测
生活随笔
收集整理的這篇文章主要介紹了
ML之DT:基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
ML之DT:基于DT決策樹算法(對比是否經特征篩選FS處理)對Titanic(泰坦尼克號)數據集進行二分類預測
?
?
?
目錄
輸出結果
設計思路
核心代碼
?
?
?
?
輸出結果
| 初步處理后的 X_train: (984, 474)? ? ?(0, 0)?? ?31.19418104265403 ? (0, 78)?? ?1.0 ? (0, 82)?? ?1.0 ? (0, 366)?? ?1.0 ? (0, 391)?? ?1.0 ? (0, 435)?? ?1.0 ? (0, 437)?? ?1.0 ? (0, 473)?? ?1.0 ? (1, 0)?? ?31.19418104265403 ? (1, 73)?? ?1.0 ? (1, 79)?? ?1.0 ? (1, 296)?? ?1.0 ? (1, 389)?? ?1.0 ? (1, 397)?? ?1.0 ? (1, 436)?? ?1.0 ? (1, 446)?? ?1.0 ? (2, 0)?? ?31.19418104265403 ? (2, 78)?? ?1.0 ? (2, 82)?? ?1.0 ? (2, 366)?? ?1.0 ? (2, 391)?? ?1.0 ? (2, 435)?? ?1.0 ? (2, 437)?? ?1.0 ? (2, 473)?? ?1.0 ? (3, 0)?? ?32.0 ? :?? ?: ? (980, 473)?? ?1.0 ? (981, 0)?? ?12.0 ? (981, 73)?? ?1.0 ? (981, 81)?? ?1.0 ? (981, 84)?? ?1.0 ? (981, 390)?? ?1.0 ? (981, 435)?? ?1.0 ? (981, 436)?? ?1.0 ? (981, 473)?? ?1.0 ? (982, 0)?? ?18.0 ? (982, 78)?? ?1.0 ? (982, 81)?? ?1.0 ? (982, 277)?? ?1.0 ? (982, 390)?? ?1.0 ? (982, 435)?? ?1.0 ? (982, 437)?? ?1.0 ? (982, 473)?? ?1.0 ? (983, 0)?? ?31.19418104265403 ? (983, 78)?? ?1.0 ? (983, 82)?? ?1.0 ? (983, 366)?? ?1.0 ? (983, 391)?? ?1.0 ? (983, 435)?? ?1.0 ? (983, 436)?? ?1.0 ? (983, 473)?? ?1.0 | 經過FS處理后的 X_train_fs: (984, 94)? ? ?(0, 93)?? ?1.0 ? (0, 85)?? ?1.0 ? (0, 83)?? ?1.0 ? (0, 76)?? ?1.0 ? (0, 71)?? ?1.0 ? (0, 27)?? ?1.0 ? (0, 24)?? ?1.0 ? (0, 0)?? ?31.19418104265403 ? (1, 84)?? ?1.0 ? (1, 74)?? ?1.0 ? (1, 63)?? ?1.0 ? (1, 25)?? ?1.0 ? (1, 19)?? ?1.0 ? (1, 0)?? ?31.19418104265403 ? (2, 93)?? ?1.0 ? (2, 85)?? ?1.0 ? (2, 83)?? ?1.0 ? (2, 76)?? ?1.0 ? (2, 71)?? ?1.0 ? (2, 27)?? ?1.0 ? (2, 24)?? ?1.0 ? (2, 0)?? ?31.19418104265403 ? (3, 93)?? ?1.0 ? (3, 85)?? ?1.0 ? (3, 83)?? ?1.0 ? :?? ?: ? (980, 24)?? ?1.0 ? (980, 0)?? ?31.19418104265403 ? (981, 93)?? ?1.0 ? (981, 84)?? ?1.0 ? (981, 83)?? ?1.0 ? (981, 75)?? ?1.0 ? (981, 28)?? ?1.0 ? (981, 26)?? ?1.0 ? (981, 19)?? ?1.0 ? (981, 0)?? ?12.0 ? (982, 93)?? ?1.0 ? (982, 85)?? ?1.0 ? (982, 83)?? ?1.0 ? (982, 75)?? ?1.0 ? (982, 26)?? ?1.0 ? (982, 24)?? ?1.0 ? (982, 0)?? ?18.0 ? (983, 93)?? ?1.0 ? (983, 84)?? ?1.0 ? (983, 83)?? ?1.0 ? (983, 76)?? ?1.0 ? (983, 71)?? ?1.0 ? (983, 27)?? ?1.0 ? (983, 24)?? ?1.0 ? (983, 0)?? ?31.19418104265403 |
?
設計思路
?
核心代碼
class SelectPercentile Found at: sklearn.feature_selection.univariate_selectionclass SelectPercentile(_BaseFilter):"""Select features according to a percentile of the highest scores.Read more in the :ref:`User Guide <univariate_feature_selection>`.Parameters----------score_func : callableFunction taking two arrays X and y, and returning a pair of arrays(scores, pvalues) or a single array with scores.Default is f_classif (see below "See also"). The default function onlyworks with classification tasks.percentile : int, optional, default=10Percent of features to keep.Attributes----------scores_ : array-like, shape=(n_features,)Scores of features.pvalues_ : array-like, shape=(n_features,)p-values of feature scores, None if `score_func` returned only scores.Notes-----Ties between features with equal scores will be broken in an unspecifiedway.See also--------f_classif: ANOVA F-value between label/feature for classification tasks.mutual_info_classif: Mutual information for a discrete target.chi2: Chi-squared stats of non-negative features for classification tasks.f_regression: F-value between label/feature for regression tasks.mutual_info_regression: Mutual information for a continuous target.SelectKBest: Select features based on the k highest scores.SelectFpr: Select features based on a false positive rate test.SelectFdr: Select features based on an estimated false discovery rate.SelectFwe: Select features based on family-wise error rate.GenericUnivariateSelect: Univariate feature selector with configurable mode."""def __init__(self, score_func=f_classif, percentile=10):super(SelectPercentile, self).__init__(score_func)self.percentile = percentiledef _check_params(self, X, y):if not 0 <= self.percentile <= 100:raise ValueError("percentile should be >=0, <=100; got %r" % self.percentile)def _get_support_mask(self):check_is_fitted(self, 'scores_')# Cater for NaNsif self.percentile == 100:return np.ones(len(self.scores_), dtype=np.bool)elif self.percentile == 0:return np.zeros(len(self.scores_), dtype=np.bool)scores = _clean_nans(self.scores_)treshold = stats.scoreatpercentile(scores, 100 - self.percentile)mask = scores > tresholdties = np.where(scores == treshold)[0]if len(ties):max_feats = int(len(scores) * self.percentile / 100)kept_ties = ties[:max_feats - mask.sum()]mask[kept_ties] = Truereturn mask?
?
?
?
?
?
?
?
?
?
?
總結
以上是生活随笔為你收集整理的ML之DT:基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ML之NB:利用NB朴素贝叶斯算法(Co
- 下一篇: 成功解决FutureWarning: U