根据子树样本数对cart树剪枝与剪枝前后图形绘制
生活随笔
收集整理的這篇文章主要介紹了
根据子树样本数对cart树剪枝与剪枝前后图形绘制
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
代碼如下:
#-*- coding:utf-8 -*- import sys reload(sys) sys.setdefaultencoding('utf-8') import numpy as np import pandas as pd from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.tree import DecisionTreeClassifier from IPython.display import display, Image import pydotplus from sklearn import tree from sklearn.tree import _tree from sklearn import tree import collections import drawtree import os from sklearn.tree._tree import TREE_LEAFdef train():X, y = make_classification(n_samples=1000,n_features=6,n_informative=3,n_classes=2,random_state=0,shuffle=False)print"y=",y# Creating a dataFramedf = pd.DataFrame({'Feature 1':X[:,0],'Feature 2':X[:,1],'Feature 3':X[:,2],'Feature 4':X[:,3],'Feature 5':X[:,4],'Feature 6':X[:,5],'Class':y})y_train = df['Class']X_train = df.drop('Class',axis = 1)dt = DecisionTreeClassifier( random_state=42) dt.fit(X_train, y_train)return dt,X_train #------------------上面是生成決策樹模型----------------------------------- # os.environ["PATH"] += os.pathsep + 'C:\\Anaconda3\\Library\\bin\\graphviz' def draw_file(model,dot_file,png_file,X_train):dot_data = tree.export_graphviz(model, out_file =dot_file ,feature_names=X_train.columns, filled = True, rounded = True, special_characters = True)graph = pydotplus.graph_from_dot_file(dot_file) thisIsTheImage = Image(graph.create_png())display(thisIsTheImage)#print(dt.tree_.feature)from subprocess import check_callcheck_call(['dot','-Tpng',dot_file,'-o',png_file])# 剪枝函數(shù)(這里使用的不是著名的CCP剪枝,而是根據(jù)的當前的子樹剩余的樣本數(shù)是否超過閾值,如果小于閾值,就進行剪枝) def prune_index(inner_tree, index, threshold):if inner_tree.value[index].min() < threshold:# turn node into a leaf by "unlinking" its childreninner_tree.children_left[index] = TREE_LEAF#對左子樹進行剪枝操作inner_tree.children_right[index] = TREE_LEAF#對右子樹進行剪枝操作# if there are shildren, visit them as wellif inner_tree.children_left[index] != TREE_LEAF:prune_index(inner_tree, inner_tree.children_left[index], threshold)#對左子樹進行遞歸prune_index(inner_tree, inner_tree.children_right[index], threshold)#對右子樹進行遞歸#***************************************************************if __name__ == '__main__':model,X_train=train()dot_file='unprunedtree.dot'png_file='unprunedtree.png'draw_file(model,dot_file,png_file,X_train)print(sum(model.tree_.children_left < 0))print"************************************************"print model.tree_.valueprune_index(model.tree_, 0, 5)dot_file='prunedtree.dot'png_file='prunedtree.png'print"當前的model是",modeldraw_file(model,dot_file,png_file,X_train)print sum(model.tree_.children_left < 0)# 參考鏈接: # https://stackoverflow.com/questions/49428469/pruning-decision-trees/49496027剪枝前:
剪枝后:
注意,上面這種剪枝方法有什么問題呢?
雖然剪枝后準確率發(fā)生了相應(yīng)的變化,
但是相關(guān)參數(shù)是不變的,依然是剪枝前的參數(shù)值,這些參數(shù)是:
model.tree_.impurity,
model.tree_.value,
model.tree_.children_left,
model.tree_.children_right
所以其實并不是什么很好的剪枝方法。
參考鏈接:
https://stackoverflow.com/questions/49428469/pruning-decision-trees/49496027#49496027
總結(jié)
以上是生活随笔為你收集整理的根据子树样本数对cart树剪枝与剪枝前后图形绘制的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python卡方分布计算
- 下一篇: 关于sklearn中“决策树是否可以转化