决策树——CART和模型树
CART樹
理解: ???? 如果CART樹處理離散型數(shù)據(jù),叫做分類決策樹,那么,引入基尼指數(shù)作為尋找最好的數(shù)據(jù)劃分的依據(jù),基尼指數(shù)越小,說明數(shù)據(jù)的“純度越高”,隨機(jī)森林的代碼里邊就運(yùn)用到了基尼指數(shù)。如果CART樹處理連續(xù)型數(shù)據(jù)時(shí),叫做回歸決策樹,那么,引入了平方誤差,首先,它使用二元切分來處理數(shù)據(jù),得到兩個(gè)子集,計(jì)算誤差,找到最小誤差,確定最佳切分的特征編號(hào)和特征值,然后進(jìn)行建樹。
構(gòu)建回歸樹,需要給定某個(gè)誤差計(jì)算方法,該函數(shù)會(huì)找到數(shù)據(jù)集上最佳的二元切分方式。另外,該函數(shù)還要確定什么時(shí)候停止劃分,一旦停止劃分會(huì)生成一個(gè)葉節(jié)點(diǎn)。這里引入reLeaf(),regErr()分別得到葉節(jié)點(diǎn)和總方差。葉節(jié)點(diǎn)的模型是目標(biāo)變量的 均值,var()是均方差,所以需要乘以數(shù)據(jù)集的樣本個(gè)數(shù)。
劃分?jǐn)?shù)據(jù)集時(shí),如果找不到一個(gè)‘好’的二元切分,該函數(shù)返回None值并產(chǎn)生葉節(jié)點(diǎn),葉節(jié)點(diǎn)的值也為None。
通過降低決策樹的復(fù)雜度來避免過擬合的過程叫剪枝,預(yù)剪枝和后剪枝的單個(gè)效果可能是不好的,一般來說,我們可以同時(shí)采用這兩種剪枝方法。
模型樹:
理解:模型樹和回歸樹的區(qū)別就是回歸樹的葉節(jié)點(diǎn)是一個(gè)常數(shù)值,而模型樹的葉節(jié)點(diǎn)是分段線性函數(shù),分段線性模型就是我們對(duì)數(shù)據(jù)集的一部分?jǐn)?shù)據(jù)以某個(gè)線性模型建模,而另一份數(shù)據(jù)以另一個(gè)線性模型建模。
#模型樹 # 主要功能:將數(shù)據(jù)格式化成目標(biāo)變量Y和自變量X。X、Y用于執(zhí)行簡單的線性規(guī)劃。 def linearSolve(dataSet) :m,n = shape(dataSet) X = mat(ones((m,n))); Y = mat(ones((m,1)))X[:, 1:n] = dataSet[:, 0:n-1]; Y = dataSet[:, -1]xTx = X.T*X# 矩陣的逆不存在時(shí)會(huì)造成程序異常if linalg.det(xTx) == 0.0 :raise NameError('This matrix is singular, cannot do inverse, \n try increasing the second value of ops')ws = xTx.I * (X.T * Y)return ws, X, Y# 與regLeaf()類似,當(dāng)數(shù)據(jù)不需要切分時(shí),它負(fù)責(zé)生成葉節(jié)點(diǎn)的模型。 def modelLeaf(dataSet) :ws, X, Y = linearSolve(dataSet)return ws# 在給定的數(shù)據(jù)集上計(jì)算誤差。與regErr()類似,會(huì)被chooseBestSplit()調(diào)用來找到最佳切分。 def modelErr(dataSet) :ws, X, Y = linearSolve(dataSet)yHat = X * wsreturn sum(power(Y-yHat, 2))# 為了和modeTreeEval()保持一致,保留兩個(gè)輸入?yún)?shù) def regTreeEval(model, inDat) :return float(model)# 對(duì)輸入數(shù)據(jù)進(jìn)行格式化處理,在原數(shù)據(jù)矩陣上增加第0列,元素的值都是1 def modelTreeEval(model, inDat) :n = shape(inDat)[1]X = mat(ones((1, n+1)))X[:, 1:n+1] = inDatreturn float(X*model)def isTree(obj): return (type(obj).__name__=='dict') # 在給定樹結(jié)構(gòu)的情況下,對(duì)于單個(gè)數(shù)據(jù)點(diǎn),該函數(shù)會(huì)給出一個(gè)預(yù)測(cè)值。 # modeEval是對(duì)葉節(jié)點(diǎn)進(jìn)行預(yù)測(cè)的函數(shù)引用,指定樹的類型,以便在葉節(jié)點(diǎn)上調(diào)用合適的模型。 # 此函數(shù)自頂向下遍歷整棵樹,直到命中葉節(jié)點(diǎn)為止,一旦到達(dá)葉節(jié)點(diǎn),它就會(huì)在輸入數(shù)據(jù)上 # 調(diào)用modelEval()函數(shù),該函數(shù)的默認(rèn)值為regTreeEval() def treeForeCast(tree, inData, modelEval=regTreeEval) :if not isTree(tree) : return modelEval(tree, inData)if inData[tree['spInd']] > tree['spVal'] :if isTree(tree['left']) :return treeForeCast(tree['left'], inData, modelEval)else : return modelEval(tree['left'], inData)else :if isTree(tree['right']) :return treeForeCast(tree['right'], inData, modelEval)else :return modelEval(tree['right'], inData)# 多次調(diào)用treeForeCast()函數(shù),以向量形式返回預(yù)測(cè)值,在整個(gè)測(cè)試集進(jìn)行預(yù)測(cè)非常有用 def createForeCast(tree, testData, modelEval=regTreeEval) :m = len(testData)yHat = mat(zeros((m,1)))for i in range(m) :yHat[i,0] = treeForeCast(tree, mat(testData[i]), modelEval)return yHat使用Tkinter工具構(gòu)建圖形用戶界面:
from numpy import * from tkinter import * import regTrees as regTreesimport matplotlib matplotlib.use('TkAgg') #設(shè)置后端TkAgg #將TkAgg和matplotlib鏈接起來 from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg from matplotlib.figure import Figure# def reDraw(tolS, tolN) :reDraw.f.clf() #清空之前的圖像reDraw.a = reDraw.f.add_subplot(111) #重新添加新圖if chkBtnVar.get() : #檢查選框model tree是否被選中if tolN < 2 : tolN = 2myTree = regTrees.createTree(reDraw.rawDat, regTrees.modelLeaf, regTrees.modelErr, (tolS, tolN))yHat = regTrees.createForeCast(myTree, reDraw.testDat, regTrees.modelTreeEval)else :myTree = regTrees.createTree(reDraw.rawDat, ops=(tolS, tolN))yHat = regTrees.createForeCast(myTree, reDraw.testDat)# reDraw.rawDat[:,0].A,需要將矩陣轉(zhuǎn)換成數(shù)組reDraw.a.scatter(reDraw.rawDat[:,0].A, reDraw.rawDat[:,1].A, s=5) # 繪制真實(shí)值reDraw.a.plot(reDraw.testDat, yHat, linewidth=2.0) # 繪制預(yù)測(cè)值reDraw.canvas.show()# def getInputs() :#獲取輸入try : tolN = int(tolNentry.get()) #期望輸入是整數(shù)except : #清楚錯(cuò)誤用默認(rèn)值替換tolN = 10print ("enter Integer for tolN")tolNentry.delete(0, END)tolNentry.insert(0, '10')try : tolS = float(tolSentry.get())except : #期望輸入是浮點(diǎn)數(shù)tolS = 1.0print ("enter Float for tolS")tolSentry.delete(0, END)tolSentry.insert(0, '1.0')return tolN, tolS# def drawNewTree() :# 取得輸入框的值tolN, tolS = getInputs() # 從輸入文本框中獲取參數(shù)# 利用tolN,tolS,調(diào)用reDraw生成漂亮的圖reDraw(tolS, tolN) #繪制圖#布局GUI root = Tk() # 創(chuàng)建畫布 Label(root, text='Plot Place Holder').grid(row=0, columnspan=3)Label(root, text='tolN').grid(row=1, column=0) tolNentry = Entry(root) tolNentry.grid(row=1, column=1) tolNentry.insert(0, '10') Label(root, text='tolS').grid(row=2, column=0) tolSentry = Entry(root) tolSentry.grid(row=2, column=1) tolSentry.insert(0, '1.0') # 點(diǎn)擊“ReDraw”按鈕后,調(diào)用drawNewTree()函數(shù) Button(root, text='ReDraw', command=drawNewTree).grid(row=1, column=2, rowspan=3)chkBtnVar = IntVar() chkBtn = Checkbutton(root, text='Model Tree', variable=chkBtnVar) chkBtn.grid(row=3, column=0, columnspan=2)reDraw.f = Figure(figsize=(5,4), dpi=100) reDraw.canvas = FigureCanvasTkAgg(reDraw.f, master=root) reDraw.canvas.show() reDraw.canvas.get_tk_widget().grid(row=0, columnspan=3)reDraw.rawDat = mat(regTrees.loadDataSet('ex00.txt')) reDraw.testDat = arange(min(reDraw.rawDat[:, 0]), max(reDraw.rawDat[:, 0]), 0.01)reDraw(1.0, 10)root.mainloop()總結(jié)
以上是生活随笔為你收集整理的决策树——CART和模型树的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: cad道路里程桩号标注_CAD道路桩号自
- 下一篇: STM32F103按键操作的另一种实现—