【Python-ML】SKlearn库Pipeline工作流和K折交叉验证
生活随笔
收集整理的這篇文章主要介紹了
【Python-ML】SKlearn库Pipeline工作流和K折交叉验证
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
# -*- coding: utf-8 -*-
'''
Created on 2018年1月18日
@author: Jason.F
@summary:
Pipeline,流水線工作流,串聯模型擬合、數據轉換等
K折交叉驗證,采用無重復抽樣技術,數據集劃分k份,每次選擇其中一份作為測試集,其他k-1作為訓練集
'''
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import numpy as np
from sklearn.cross_validation import StratifiedKFold
from sklearn.cross_validation import cross_val_score
#導入數據
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data',header=None)
X=df.loc[:,2:].values
y=df.loc[:,1].values
le=LabelEncoder()
y=le.fit_transform(y)#類標整數化
print (le.transform(['M','B']))
#劃分訓練集合測試集
X_train,X_test,y_train,y_test = train_test_split (X,y,test_size=0.20,random_state=1)
#標準化、PCA降維、模型訓練串聯
pipe_lr=Pipeline([('scl',StandardScaler()),\('pca',PCA(n_components=2)),\('clf',LogisticRegression(random_state=1))])
pipe_lr.fit(X_train,y_train)
print ('Test Accuracy:%.3f' % pipe_lr.score(X_test, y_test))
#k折交叉驗證
kfold=StratifiedKFold(y=y_train,n_folds=10,random_state=1)
scores=[]
for k,(train,test) in enumerate(kfold):pipe_lr.fit(X_train[train],y_train[train])score=pipe_lr.score(X_train[test],y_train[test])scores.append(score)print ('Fold: %s, Class dist.: %s,Acc: %.3f' % (k+1,np.bincount(y_train[train]),score))
print ('CV accuracy: %.3f +/- %.3f'%(np.mean(scores),np.std(scores)) )
#scikit-learn實現的k折交叉驗證
scores=cross_val_score(estimator=pipe_lr,X=X_train,y=y_train,cv=10,n_jobs=1)#n_jobs分布到多少個cpu上執行
print ('Test Accuracy:%s' %scores)
print ('CV accuracy: %.3f +/- %.3f'%(np.mean(scores),np.std(scores)) )
結果:
[1 0] Test Accuracy:0.947 Fold: 1, Class dist.: [256 153],Acc: 0.891 Fold: 2, Class dist.: [256 153],Acc: 0.978 Fold: 3, Class dist.: [256 153],Acc: 0.978 Fold: 4, Class dist.: [256 153],Acc: 0.913 Fold: 5, Class dist.: [256 153],Acc: 0.935 Fold: 6, Class dist.: [257 153],Acc: 0.978 Fold: 7, Class dist.: [257 153],Acc: 0.933 Fold: 8, Class dist.: [257 153],Acc: 0.956 Fold: 9, Class dist.: [257 153],Acc: 0.978 Fold: 10, Class dist.: [257 153],Acc: 0.956 CV accuracy: 0.950 +/- 0.029 Test Accuracy:[ 0.89130435 0.97826087 0.97826087 0.91304348 0.93478261 0.977777780.93333333 0.95555556 0.97777778 0.95555556] CV accuracy: 0.950 +/- 0.029總結
以上是生活随笔為你收集整理的【Python-ML】SKlearn库Pipeline工作流和K折交叉验证的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【Python-ML】SKlearn库特
- 下一篇: 【Python-ML】SKlearn库学