第8章 多项式回归与模型泛化
問題:線性回歸要求假設(shè)我們的數(shù)據(jù)背后存在線性關(guān)系;
,?如果將x的平方理解成一個(gè)特征,x理解成另一個(gè)特征;本來只有一個(gè)特征x,現(xiàn)在看成有兩個(gè)特征的數(shù)據(jù)集,多了一個(gè)特征,就是x的平方,其實(shí)式子本身依然是一個(gè)線性回歸的式子,但是從x?的角度來看,也就是所謂的非線性方程,這樣的方式就叫做多項(xiàng)式回歸
?PCA降維,多項(xiàng)式回歸提升維度
?
, import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_errornp.random.seed(666) x=np.random.uniform(-3.0,3.0,size=100) X=x.reshape(-1,1) y=0.5*x**2+x+2+np.random.normal(0,1,size=100)from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler def PolynomialRegression(degree):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("lin_reg", LinearRegression())]) poly2_reg=PolynomialRegression(degree=2) poly2_reg.fit(X,y) y2_predict=poly2_reg.predict(X) mean_squared_error(y,y2_predict) plt.scatter(x,y) plt.plot(np.sort(x),y2_predict[np.argsort(x)],color='r') plt.show()# 不同的degree poly10_reg=PolynomialRegression(degree=10) poly10_reg.fit(X,y) y10_predict=poly10_reg.predict(X) mean_squared_error(y,y10_predict)plt.scatter(x,y) plt.plot(np.sort(x),y10_predict[np.argsort(x)],color='r') plt.show() # degree=100 poly100_reg=PolynomialRegression(degree=100) poly100_reg.fit(X,y) y100_predict=poly100_reg.predict(X) mean_squared_error(y,y100_predict)plt.scatter(x,y) plt.plot(np.sort(x),y100_predict[np.argsort(x)],color='r') plt.show() 不同的degree?
?機(jī)器學(xué)習(xí)主要解決的問題其實(shí)是過擬合的問題。
泛化能力:由此及彼的能力(根據(jù)已知的訓(xùn)練數(shù)據(jù)得到的這條曲線,可是這條曲線在面對新的數(shù)據(jù)的時(shí)候它的能力卻非常弱,也就是泛化能力差)
我們要訓(xùn)練這個(gè)模型為的不是最大程度的擬合這些點(diǎn),而是為了獲得一個(gè)可以預(yù)測的模型,當(dāng)有了新的數(shù)據(jù)的時(shí)候,我們的模型可以給出很好的解答。
所以,我們?nèi)ズ饬课覀兊哪P蛯τ谶@個(gè)訓(xùn)練的數(shù)據(jù)它的擬合程度有多好是沒有意義的,我們真正需要的是能夠衡量我們的得到的這個(gè)模型的泛化能力有多好。
因此使用訓(xùn)練數(shù)據(jù)集和測試數(shù)據(jù)集
?如果使用訓(xùn)練數(shù)據(jù)獲得的的這個(gè)模型面對測試數(shù)據(jù)也能獲得很好的結(jié)果的話,我們就說這個(gè)模型的泛化能力就是很強(qiáng)的!!!但是如果面對測試數(shù)據(jù)集它的效果很差的話,那么的的泛化能力就是很弱的,多半我們就遭遇了過擬合
模型的復(fù)雜度:不同的模型代表的意思不同
KNN:K越小,模型越復(fù)雜;K=1,最復(fù)雜
多項(xiàng)式回歸:階數(shù)越大,degree越大,模型越復(fù)雜
,
?
, ,?通過學(xué)習(xí)曲線也可以看到模型的過擬合和欠擬合
?學(xué)習(xí)曲線:隨著訓(xùn)練樣本的逐漸增多,算法訓(xùn)練出的模型的表現(xiàn)能力
import numpy as np import matplotlib.pyplot as plt np.random.seed(666) x=np.random.uniform(-3.0,3.0,size=100) X=x.reshape(-1,1) y=0.5*x**2+x+2+np.random.normal(0,1,size=100)from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=10)from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error train_score=[] test_score=[] for i in range(1,76):lin_reg=LinearRegression()lin_reg.fit(X_train[:i],y_train[:i]) # 依次取訓(xùn)練數(shù)據(jù)、測試數(shù)據(jù)的前i個(gè)元素y_train_predict=lin_reg.predict(X_train[:i])train_score.append(mean_squared_error(y_train[:i],y_train_predict))y_test_predict=lin_reg.predict(X_test)test_score.append(mean_squared_error(y_test,y_test_predict))plt.plot([i for i in range(1,76)],np.sqrt(train_score),label="train") plt.plot([i for i in range(1,76)],np.sqrt(test_score),label="test") plt.legend() plt.show()# 封裝成函數(shù) def plot_learning_curve(algo,X_train,X_test,y_train,y_test):train_score=[]test_score=[]for i in range(1,len(X_train)+1):algo.fit(X_train[:i],y_train[:i])y_train_predict=algo.predict(X_train[:i])train_score.append(mean_squared_error(y_train[:i],y_train_predict))y_test_predict=algo.predict(X_test)test_score.append(mean_squared_error(y_test,y_test_predict))plt.plot([i for i in range(1,len(X_train)+1)],np.sqrt(train_score),label="train")plt.plot([i for i in range(1,len(X_train)+1)],np.sqrt(test_score),label="test")plt.legend()plt.axis([0,len(X_train)+1,0,4])plt.show() plot_learning_curve(LinearRegression(),X_train,X_test,y_train,y_test)# 使用多項(xiàng)式回歸 from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler def PolynomialRegression(degree):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("lin_reg", LinearRegression())])poly2_reg=PolynomialRegression(degree=2) plot_learning_curve(poly2_reg,X_train,X_test,y_train,y_test)poly2_reg=PolynomialRegression(degree=20) # 過擬合 plot_learning_curve(poly2_reg,X_train,X_test,y_train,y_test) 學(xué)習(xí)曲線,
,?
,?
,一定程度上圍繞著測試數(shù)據(jù)集打轉(zhuǎn),也就是說我們在想辦法找到一組參數(shù),這組參數(shù)使得我們用訓(xùn)練數(shù)據(jù)集獲得的模型在測試數(shù)據(jù)集上的效果最好,但是由于測試數(shù)據(jù)集是已知的,我們相當(dāng)于在針對這組測試數(shù)據(jù)集調(diào)參,那么它也有可能產(chǎn)生過擬合的情況,也就是說我們得到的這個(gè)模型針對這個(gè)測試數(shù)據(jù)集過擬合了解決方法:將整個(gè)數(shù)據(jù)分成三部分:訓(xùn)練數(shù)據(jù)集、驗(yàn)證數(shù)據(jù)集(validation?test)、測試數(shù)據(jù)集(將驗(yàn)證數(shù)據(jù)集當(dāng)成之前的測試數(shù)據(jù)集)
, import numpy as np from sklearn import datasetsdigits=datasets.load_digits() # 手寫識(shí)別數(shù)據(jù) X=digits.data y=digits.target # 測試train_test_split from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=666) from sklearn.neighbors import KNeighborsClassifierbest_score, best_p, best_k = 0, 0, 0 for k in range(2, 11):for p in range(1, 6):knn_clf = KNeighborsClassifier(weights="distance", n_neighbors=k, p=p)knn_clf.fit(X_train, y_train)knn_clf.score(X_test, y_test)score = knn_clf.score(X_test, y_test)if score > best_score:best_score, best_p, best_k = score, p, kprint("Best K=", best_k) print("Best P=", best_p) print("Best Score", best_score)# 使用交叉驗(yàn)證 from sklearn.model_selection import cross_val_score knn_clf=KNeighborsClassifier() cross_val_score(knn_clf,X_train,y_train,cv=3) best_score, best_p, best_k = 0, 0, 0 for k in range(2, 11):for p in range(1, 6):knn_clf = KNeighborsClassifier(weights="distance", n_neighbors=k, p=p)scores = cross_val_score(knn_clf, X_train, y_train, cv=3)score = np.mean(scores)if score > best_score:best_score, best_p, best_k = score, p, kprint("Best K=", best_k) print("Best P=", best_p) print("Best Score", best_score)best_knn_clf=KNeighborsClassifier(weights="distance",n_neighbors=2,p=2) # 選用剛剛找到的最好的參數(shù) best_knn_clf.fit(X_train,y_train) best_knn_clf.score(X_test,y_test) # 對于模型完全沒有見過的test# 回顧網(wǎng)格搜索 from sklearn.model_selection import GridSearchCV # cv:就是交叉驗(yàn)證的意思,cross_validation param_grid = [{'weights': ['distance'],'n_neighbors': [ i for i in range(2,11)],'p':[i for i in range(1,6)]} ] grid_search=GridSearchCV(knn_clf,param_grid,verbose=1,cv=3,n_jobs=-1) #n_jobs=-1的時(shí)候,表示cpu里的所有core進(jìn)行工作(cv:交叉驗(yàn)證,默認(rèn)3) grid_search.fit(X_train,y_train) grid_search.best_score_grid_search.best_params_cross_val_score(knn_clf,X_train,y_train,cv=5) 驗(yàn)證數(shù)據(jù)集與交叉驗(yàn)證 ,
?留一法:訓(xùn)練數(shù)據(jù)集有m個(gè)樣本,就分成m份;每次都將m-1份樣本用于訓(xùn)練,然后去看預(yù)測那剩下的一個(gè)樣本預(yù)測的準(zhǔn)不準(zhǔn),將這些結(jié)果綜合起來來進(jìn)行評均,作為衡量我們當(dāng)前參數(shù)下這個(gè)模型對應(yīng)的預(yù)測的準(zhǔn)確度
?偏差方差平衡:Bias?Variance?Trade?off
, ,
?導(dǎo)致較高方差:是模型太過復(fù)雜,沒有完全的學(xué)習(xí)到這個(gè)問題的實(shí)質(zhì),而學(xué)習(xí)到了很多的噪音
,?
,?
,?高方差:泛化能力差
解決方差:模型的正則化
?
,?α:新的超參數(shù)
這種正則化的方式又叫做嶺回歸
import numpy as np import matplotlib.pyplot as plt np.random.seed(42) x=np.random.uniform(-3.0,3.0,size=100) X=x.reshape(-1,1) y=0.5*x+3+np.random.normal(0,1,size=100)plt.scatter(x,y) plt.show()from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression def PolynomialRegression(degree):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("lin_reg", LinearRegression())]) from sklearn.model_selection import train_test_split np.random.seed(666) X_train,X_test,y_train,y_test=train_test_split(X,y)from sklearn.metrics import mean_squared_error poly_reg=PolynomialRegression(degree=20) poly_reg.fit(X_train,y_train) y_predict=poly_reg.predict(X_test) mean_squared_error(y_test,y_predict)X_plot=np.linspace(-3,3,100).reshape(100,1) y_plot=poly_reg.predict(X_plot) plt.scatter(x,y) plt.plot(X_plot[:,0],y_plot,color='r') plt.axis([-3,3,0,6]) plt.show()def plot_model(model):X_plot=np.linspace(-3,3,100).reshape(100,1)y_plot=model.predict(X_plot)plt.scatter(x,y)plt.plot(X_plot[:,0],y_plot,color='r')plt.axis([-3,3,0,6])plt.show()plot_model(poly_reg)# 使用嶺回歸 from sklearn.linear_model import Ridgedef RidgeRegression(degree,alpha):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("ridge_reg", Ridge(alpha=alpha))]) ridge1_reg=RidgeRegression(20,0.0001) ridge1_reg.fit(X_train,y_train) y1_predict=ridge1_reg.predict(X_test) mean_squared_error(y_test,y1_predict) plot_model(ridge1_reg)ridge2_reg=RidgeRegression(20,1) ridge2_reg.fit(X_train,y_train) y2_predict=ridge2_reg.predict(X_test) mean_squared_error(y_test,y2_predict) plot_model(ridge2_reg)ridge3_reg=RidgeRegression(20,100) ridge3_reg.fit(X_train,y_train) y3_predict=ridge3_reg.predict(X_test) mean_squared_error(y_test,y3_predict) plot_model(ridge3_reg) 嶺回歸
?另外一種模型正則化的方式:LASSO Regularization
, import numpy as np import matplotlib.pyplot as plt np.random.seed(42) x=np.random.uniform(-3.0,3.0,size=100) X=x.reshape(-1,1) y=0.5*x+3+np.random.normal(0,1,size=100) plt.scatter(x,y) plt.show()from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression def PolynomialRegression(degree):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("lin_reg", LinearRegression())])from sklearn.model_selection import train_test_split np.random.seed(666) X_train,X_test,y_train,y_test=train_test_split(X,y)from sklearn.metrics import mean_squared_error poly_reg=PolynomialRegression(degree=20) poly_reg.fit(X_train,y_train) y_predict=poly_reg.predict(X_test) mean_squared_error(y_test,y_predict)def plot_model(model):X_plot=np.linspace(-3,3,100).reshape(100,1)y_plot=model.predict(X_plot)plt.scatter(x,y)plt.plot(X_plot[:,0],y_plot,color='r')plt.axis([-3,3,0,6])plt.show()plot_model(poly_reg)# LASSO from sklearn.linear_model import Lasso def LassoRegression(degree,alpha):return Pipeline([("poly", PolynomialFeatures(degree=degree)),("std_scaler", StandardScaler()),("ridge_reg", Lasso(alpha=alpha))]) lasso1_reg=LassoRegression(20,0.01) lasso1_reg.fit(X_train,y_train)y1_predict=lasso1_reg.predict(X_test) mean_squared_error(y_test,y1_predict)plot_model(lasso1_reg)# 增大α lasso2_reg=LassoRegression(20,0.1) lasso2_reg.fit(X_train,y_train)y2_predict=lasso2_reg.predict(X_test) mean_squared_error(y_test,y2_predict)plot_model(lasso2_reg) LASSO Regression ,
?
? ? ,? ???
,?
,???
,?
總結(jié)
以上是生活随笔為你收集整理的第8章 多项式回归与模型泛化的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 第9章 逻辑回归
- 下一篇: 第10章 评价分类结果