當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

唐宇迪学习笔记12：sklearn构造决策树

發布時間：2023/12/10 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了唐宇迪学习笔记12：sklearn构造决策树小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、樹模型可視化展示

1、導包

?2、樹模型的可視化展示

3、保存為pic.dot文件?

4、在dot文件目錄下生成.png文件

二、決策邊界展示分析

1、展示png文件

2、決策邊界展示?

3、概率估計?

三、樹模型預剪枝參數作用

1、決策樹中的正則化

2、舉例

3、樹模型對數據的敏感程度

四、回歸樹模型

回歸任務

1、構建數據

2、導包

3、樹模型展示

4、在dot文件目錄下生成.png文件

5、png展示

?對比樹的深度對結果的影響

一、樹模型可視化展示

1、導包

import numpy as np import os %matplotlib inline import matplotlib import matplotlib.pyplot as plt plt.rcParams['axes.labelsize'] = 14 plt.rcParams['xtick.labelsize'] = 12 plt.rcParams['ytick.labelsize'] = 12 import warnings warnings.filterwarnings('ignore')

?2、樹模型的可視化展示

下載安裝包：https://graphviz.org/download/
環境變量配置：https://jingyan.baidu.com/article/020278115032461bcc9ce598.html

from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifieriris = load_iris() X = iris.data[:,2:] # petal length and width y = iris.targettree_clf = DecisionTreeClassifier(max_depth=2) tree_clf.fit(X,y)

3、保存為pic.dot文件?

from sklearn.tree import export_graphvizexport_graphviz(tree_clf,out_file="iris_tree.dot",feature_names=iris.feature_names[2:],class_names=iris.target_names,rounded=True,filled=True )

4、在dot文件目錄下生成.png文件

使用graphviz包中的dot命令行工具將此.dot文件轉換為各種格式，如PDF或PNG。在dot文件目錄下生成了一個pic.png文件? ：

打開終端，切換到dot文件所在目錄，輸入：dot iris_tree.dot -T png -o iris_tree.png?

二、決策邊界展示分析

1、展示png文件

from IPython.display import Image Image(filename='iris_tree.png',width=400,height=400)

2、決策邊界展示?

from matplotlib.colors import ListedColormapdef plot_decision_boundary(clf, X, y, axes=[0, 7.5, 0, 3], iris=True, legend=False, plot_training=True):x1s = np.linspace(axes[0], axes[1], 100)x2s = np.linspace(axes[2], axes[3], 100)x1, x2 = np.meshgrid(x1s, x2s)X_new = np.c_[x1.ravel(), x2.ravel()]y_pred = clf.predict(X_new).reshape(x1.shape)custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)if not iris:custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)if plot_training:plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", label="Iris-Setosa")plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", label="Iris-Versicolor")plt.plot(X[:, 0][y==2], X[:, 1][y==2], "g^", label="Iris-Virginica")plt.axis(axes)if iris:plt.xlabel("Petal length", fontsize=14)plt.ylabel("Petal width", fontsize=14)else:plt.xlabel(r"$x_1$", fontsize=18)plt.ylabel(r"$x_2$", fontsize=18, rotation=0)if legend:plt.legend(loc="lower right", fontsize=14)plt.figure(figsize=(8, 4)) plot_decision_boundary(tree_clf, X, y) #實際分裂的位置 plt.plot([2.45, 2.45], [0, 3], "k-", linewidth=2) plt.plot([2.45, 7.5], [1.75, 1.75], "k--", linewidth=2) plt.plot([4.95, 4.95], [0, 1.75], "k:", linewidth=2) plt.plot([4.85, 4.85], [1.75, 3], "k:", linewidth=2) plt.text(1.40, 1.0, "Depth=0", fontsize=15) plt.text(3.2, 1.80, "Depth=1", fontsize=13) plt.text(4.05, 0.5, "(Depth=2)", fontsize=11) plt.title('Decision Tree decision boundaries')plt.show()

3、概率估計?

估計類概率?輸入數據為：花瓣長5厘米，寬1.5厘米的花。相應的葉節點是深度為2的左節點，因此決策樹應輸出以下概率：

Iris-Setosa 為 0％（0/54），
Iris-Versicolor 為 90.7％（49/54），
Iris-Virginica 為 9.3％（5/54）。

tree_clf.predict_proba([[5,1.5]])

tree_clf.predict([[5,1.5]])

三、樹模型預剪枝參數作用

1、決策樹中的正則化

DecisionTreeClassifier類還有一些其他參數類似地限制了決策樹的形狀：

min_samples_split（節點在分割之前必須具有的最小樣本數），
min_samples_leaf（葉子節點必須具有的最小樣本數），
max_leaf_nodes（葉子節點的最大數量），
max_features（在每個節點處評估用于拆分的最大特征數）。
max_depth(樹最大的深度)

2、舉例

from sklearn.datasets import make_moons X,y = make_moons(n_samples=100,noise=0.25,random_state=53) tree_clf1 = DecisionTreeClassifier(random_state=42) tree_clf2 = DecisionTreeClassifier(min_samples_leaf=4,random_state=42)#最小樣本數 tree_clf1.fit(X,y) tree_clf2.fit(X,y)plt.figure(figsize=(12,4)) plt.subplot(121) plot_decision_boundary(tree_clf1,X,y,axes=[-1.5,2.5,-1,1.5],iris=False) plt.title('No restrictions')plt.subplot(122) plot_decision_boundary(tree_clf2,X,y,axes=[-1.5,2.5,-1,1.5],iris=False) plt.title('min_samples_leaf=4')

?不做任何限制，會抓住有問題的點，易出現過擬合現象。

3、樹模型對數據的敏感程度

np.random.seed(6) Xs = np.random.rand(100, 2) - 0.5 ys = (Xs[:, 0] > 0).astype(np.float32) * 2angle = np.pi / 4 rotation_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]]) Xsr = Xs.dot(rotation_matrix)tree_clf_s = DecisionTreeClassifier(random_state=42) tree_clf_s.fit(Xs, ys) tree_clf_sr = DecisionTreeClassifier(random_state=42) tree_clf_sr.fit(Xsr, ys)plt.figure(figsize=(11, 4)) plt.subplot(121) plot_decision_boundary(tree_clf_s, Xs, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False) plt.title('Sensitivity to training set rotation')plt.subplot(122) plot_decision_boundary(tree_clf_sr, Xsr, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False) plt.title('Sensitivity to training set rotation')plt.show()

四、回歸樹模型

回歸任務

評估標準發生變化。MSE指標。

1、構建數據

np.random.seed(42) m=200 X=np.random.rand(m,1) y = 4*(X-0.5)**2 y = y + np.random.randn(m,1)/10

2、導包

from sklearn.tree import DecisionTreeRegressortree_reg = DecisionTreeRegressor(max_depth=2) tree_reg.fit(X,y)

3、樹模型展示

export_graphviz(tree_reg,out_file=("regression_tree.dot"),feature_names=["x1"],rounded=True,filled=True)

4、在dot文件目錄下生成.png文件

dot regression_tree.dot -T png -o regression_tree.png?

5、png展示

# 你的第二個決策樹長這樣 from IPython.display import Image Image(filename="regression_tree.png",width=400,height=400,)

?對比樹的深度對結果的影響

from sklearn.tree import DecisionTreeRegressortree_reg1 = DecisionTreeRegressor(random_state=42, max_depth=2) tree_reg2 = DecisionTreeRegressor(random_state=42, max_depth=3) tree_reg1.fit(X, y) tree_reg2.fit(X, y)def plot_regression_predictions(tree_reg, X, y, axes=[0, 1, -0.2, 1], ylabel="$y$"):x1 = np.linspace(axes[0], axes[1], 500).reshape(-1, 1)y_pred = tree_reg.predict(x1)plt.axis(axes)plt.xlabel("$x_1$", fontsize=18)if ylabel:plt.ylabel(ylabel, fontsize=18, rotation=0)plt.plot(X, y, "b.")plt.plot(x1, y_pred, "r.-", linewidth=2, label=r"$\hat{y}$")plt.figure(figsize=(11, 4)) plt.subplot(121)plot_regression_predictions(tree_reg1, X, y) for split, style in ((0.1973, "k-"), (0.0917, "k--"), (0.7718, "k--")):plt.plot([split, split], [-0.2, 1], style, linewidth=2) plt.text(0.21, 0.65, "Depth=0", fontsize=15) plt.text(0.01, 0.2, "Depth=1", fontsize=13) plt.text(0.65, 0.8, "Depth=1", fontsize=13) plt.legend(loc="upper center", fontsize=18) plt.title("max_depth=2", fontsize=14)plt.subplot(122)plot_regression_predictions(tree_reg2, X, y, ylabel=None) for split, style in ((0.1973, "k-"), (0.0917, "k--"), (0.7718, "k--")):plt.plot([split, split], [-0.2, 1], style, linewidth=2) for split in (0.0458, 0.1298, 0.2873, 0.9040):plt.plot([split, split], [-0.2, 1], "k:", linewidth=1) plt.text(0.3, 0.5, "Depth=2", fontsize=13) plt.title("max_depth=3", fontsize=14)plt.show()

tree_reg1 = DecisionTreeRegressor(random_state=42) tree_reg2 = DecisionTreeRegressor(random_state=42, min_samples_leaf=10) tree_reg1.fit(X, y) tree_reg2.fit(X, y)x1 = np.linspace(0, 1, 500).reshape(-1, 1) y_pred1 = tree_reg1.predict(x1) y_pred2 = tree_reg2.predict(x1)plt.figure(figsize=(11, 4))plt.subplot(121) plt.plot(X, y, "b.") plt.plot(x1, y_pred1, "r.-", linewidth=2, label=r"$\hat{y}$") plt.axis([0, 1, -0.2, 1.1]) plt.xlabel("$x_1$", fontsize=18) plt.ylabel("$y$", fontsize=18, rotation=0) plt.legend(loc="upper center", fontsize=18) plt.title("No restrictions", fontsize=14)plt.subplot(122) plt.plot(X, y, "b.") plt.plot(x1, y_pred2, "r.-", linewidth=2, label=r"$\hat{y}$") plt.axis([0, 1, -0.2, 1.1]) plt.xlabel("$x_1$", fontsize=18) plt.title("min_samples_leaf={}".format(tree_reg2.min_samples_leaf), fontsize=14)plt.show()

總結

以上是生活随笔為你收集整理的唐宇迪学习笔记12：sklearn构造决策树的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：学习笔记整理之模式化方法
下一篇：小程序原生组件调用mpvue父组件方法