當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ML之DT：基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测

發布時間：2025/3/21 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之DT：基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之DT：基于DT決策樹算法(對比是否經特征篩選FS處理)對Titanic(泰坦尼克號)數據集進行二分類預測

輸出結果

設計思路

核心代碼

輸出結果

初步處理后的 X_train： (984, 474)?
? ?(0, 0)?? ?31.19418104265403
? (0, 78)?? ?1.0
? (0, 82)?? ?1.0
? (0, 366)?? ?1.0
? (0, 391)?? ?1.0
? (0, 435)?? ?1.0
? (0, 437)?? ?1.0
? (0, 473)?? ?1.0
? (1, 0)?? ?31.19418104265403
? (1, 73)?? ?1.0
? (1, 79)?? ?1.0
? (1, 296)?? ?1.0
? (1, 389)?? ?1.0
? (1, 397)?? ?1.0
? (1, 436)?? ?1.0
? (1, 446)?? ?1.0
? (2, 0)?? ?31.19418104265403
? (2, 78)?? ?1.0
? (2, 82)?? ?1.0
? (2, 366)?? ?1.0
? (2, 391)?? ?1.0
? (2, 435)?? ?1.0
? (2, 437)?? ?1.0
? (2, 473)?? ?1.0
? (3, 0)?? ?32.0
? :?? ?:
? (980, 473)?? ?1.0
? (981, 0)?? ?12.0
? (981, 73)?? ?1.0
? (981, 81)?? ?1.0
? (981, 84)?? ?1.0
? (981, 390)?? ?1.0
? (981, 435)?? ?1.0
? (981, 436)?? ?1.0
? (981, 473)?? ?1.0
? (982, 0)?? ?18.0
? (982, 78)?? ?1.0
? (982, 81)?? ?1.0
? (982, 277)?? ?1.0
? (982, 390)?? ?1.0
? (982, 435)?? ?1.0
? (982, 437)?? ?1.0
? (982, 473)?? ?1.0
? (983, 0)?? ?31.19418104265403
? (983, 78)?? ?1.0
? (983, 82)?? ?1.0
? (983, 366)?? ?1.0
? (983, 391)?? ?1.0
? (983, 435)?? ?1.0
? (983, 436)?? ?1.0
? (983, 473)?? ?1.0

經過FS處理后的 X_train_fs： (984, 94)?
? ?(0, 93)?? ?1.0
? (0, 85)?? ?1.0
? (0, 83)?? ?1.0
? (0, 76)?? ?1.0
? (0, 71)?? ?1.0
? (0, 27)?? ?1.0
? (0, 24)?? ?1.0
? (0, 0)?? ?31.19418104265403
? (1, 84)?? ?1.0
? (1, 74)?? ?1.0
? (1, 63)?? ?1.0
? (1, 25)?? ?1.0
? (1, 19)?? ?1.0
? (1, 0)?? ?31.19418104265403
? (2, 93)?? ?1.0
? (2, 85)?? ?1.0
? (2, 83)?? ?1.0
? (2, 76)?? ?1.0
? (2, 71)?? ?1.0
? (2, 27)?? ?1.0
? (2, 24)?? ?1.0
? (2, 0)?? ?31.19418104265403
? (3, 93)?? ?1.0
? (3, 85)?? ?1.0
? (3, 83)?? ?1.0
? :?? ?:
? (980, 24)?? ?1.0
? (980, 0)?? ?31.19418104265403
? (981, 93)?? ?1.0
? (981, 84)?? ?1.0
? (981, 83)?? ?1.0
? (981, 75)?? ?1.0
? (981, 28)?? ?1.0
? (981, 26)?? ?1.0
? (981, 19)?? ?1.0
? (981, 0)?? ?12.0
? (982, 93)?? ?1.0
? (982, 85)?? ?1.0
? (982, 83)?? ?1.0
? (982, 75)?? ?1.0
? (982, 26)?? ?1.0
? (982, 24)?? ?1.0
? (982, 0)?? ?18.0
? (983, 93)?? ?1.0
? (983, 84)?? ?1.0
? (983, 83)?? ?1.0
? (983, 76)?? ?1.0
? (983, 71)?? ?1.0
? (983, 27)?? ?1.0
? (983, 24)?? ?1.0
? (983, 0)?? ?31.19418104265403

設計思路

核心代碼

class SelectPercentile Found at: sklearn.feature_selection.univariate_selectionclass SelectPercentile(_BaseFilter):"""Select features according to a percentile of the highest scores.Read more in the :ref:`User Guide <univariate_feature_selection>`.Parameters----------score_func : callableFunction taking two arrays X and y, and returning a pair of arrays(scores, pvalues) or a single array with scores.Default is f_classif (see below "See also"). The default function onlyworks with classification tasks.percentile : int, optional, default=10Percent of features to keep.Attributes----------scores_ : array-like, shape=(n_features,)Scores of features.pvalues_ : array-like, shape=(n_features,)p-values of feature scores, None if `score_func` returned only scores.Notes-----Ties between features with equal scores will be broken in an unspecifiedway.See also--------f_classif: ANOVA F-value between label/feature for classification tasks.mutual_info_classif: Mutual information for a discrete target.chi2: Chi-squared stats of non-negative features for classification tasks.f_regression: F-value between label/feature for regression tasks.mutual_info_regression: Mutual information for a continuous target.SelectKBest: Select features based on the k highest scores.SelectFpr: Select features based on a false positive rate test.SelectFdr: Select features based on an estimated false discovery rate.SelectFwe: Select features based on family-wise error rate.GenericUnivariateSelect: Univariate feature selector with configurable mode."""def __init__(self, score_func=f_classif, percentile=10):super(SelectPercentile, self).__init__(score_func)self.percentile = percentiledef _check_params(self, X, y):if not 0 <= self.percentile <= 100:raise ValueError("percentile should be >=0, <=100; got %r" % self.percentile)def _get_support_mask(self):check_is_fitted(self, 'scores_')# Cater for NaNsif self.percentile == 100:return np.ones(len(self.scores_), dtype=np.bool)elif self.percentile == 0:return np.zeros(len(self.scores_), dtype=np.bool)scores = _clean_nans(self.scores_)treshold = stats.scoreatpercentile(scores, 100 - self.percentile)mask = scores > tresholdties = np.where(scores == treshold)[0]if len(ties):max_feats = int(len(scores) * self.percentile / 100)kept_ties = ties[:max_feats - mask.sum()]mask[kept_ties] = Truereturn mask

總結

以上是生活随笔為你收集整理的ML之DT：基于DT决策树算法(对比是否经特征筛选FS处理)对Titanic(泰坦尼克号)数据集进行二分类预测的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ML之NB：利用NB朴素贝叶斯算法(Co
下一篇：成功解决FutureWarning: U