當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ML之DT：基于DT决策树算法(交叉验证FS+for遍历最佳FS)对Titanic(泰坦尼克号)数据集进行二分类预测

發(fā)布時間：2025/3/21 编程问答 16 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之DT：基于DT决策树算法(交叉验证FS+for遍历最佳FS)对Titanic(泰坦尼克号)数据集进行二分类预测小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

ML之DT：基于DT決策樹算法(交叉驗證FS+for遍歷最佳FS)對Titanic(泰坦尼克號)數(shù)據(jù)集進(jìn)行二分類預(yù)測

輸出結(jié)果

設(shè)計思路

核心代碼

輸出結(jié)果

設(shè)計思路

核心代碼

fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile = i) X_train_fs = fs.fit_transform(X_train, y_train) scores = cross_val_score(dt, X_train_fs, y_train, cv=5) class SelectPercentile(_BaseFilter):"""Select features according to a percentile of the highest scores.Read more in the :ref:`User Guide <univariate_feature_selection>`.Parameters----------score_func : callableFunction taking two arrays X and y, and returning a pair of arrays(scores, pvalues) or a single array with scores.Default is f_classif (see below "See also"). The default function onlyworks with classification tasks.percentile : int, optional, default=10Percent of features to keep.Attributes----------scores_ : array-like, shape=(n_features,)Scores of features.pvalues_ : array-like, shape=(n_features,)p-values of feature scores, None if `score_func` returned only scores.Notes-----Ties between features with equal scores will be broken in an unspecifiedway.See also--------f_classif: ANOVA F-value between label/feature for classification tasks.mutual_info_classif: Mutual information for a discrete target.chi2: Chi-squared stats of non-negative features for classification tasks.f_regression: F-value between label/feature for regression tasks.mutual_info_regression: Mutual information for a continuous target.SelectKBest: Select features based on the k highest scores.SelectFpr: Select features based on a false positive rate test.SelectFdr: Select features based on an estimated false discovery rate.SelectFwe: Select features based on family-wise error rate.GenericUnivariateSelect: Univariate feature selector with configurable mode."""def __init__(self, score_func=f_classif, percentile=10):super(SelectPercentile, self).__init__(score_func)self.percentile = percentiledef _check_params(self, X, y):if not 0 <= self.percentile <= 100:raise ValueError("percentile should be >=0, <=100; got %r" % self.percentile)def _get_support_mask(self):check_is_fitted(self, 'scores_')# Cater for NaNsif self.percentile == 100:return np.ones(len(self.scores_), dtype=np.bool)elif self.percentile == 0:return np.zeros(len(self.scores_), dtype=np.bool)scores = _clean_nans(self.scores_)treshold = stats.scoreatpercentile(scores, 100 - self.percentile)mask = scores > tresholdties = np.where(scores == treshold)[0]if len(ties):max_feats = int(len(scores) * self.percentile / 100)kept_ties = ties[:max_feats - mask.sum()]mask[kept_ties] = Truereturn mask

總結(jié)

以上是生活随笔為你收集整理的ML之DT：基于DT决策树算法(交叉验证FS+for遍历最佳FS)对Titanic(泰坦尼克号)数据集进行二分类预测的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：成功解决TypeError: only
下一篇： ML之LiR：使用线性回归LiR回归模型