生活随笔
收集整理的這篇文章主要介紹了
[Kaggle] Digit Recognizer 手写数字识别
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
文章目錄 1. Baseline KNN 2. Try SVC
Digit Recognizer 練習地址
相關博文:[Hands On ML] 3. 分類(MNIST手寫數字預測)
1. Baseline KNN
import pandas
as pd
train
= pd
. read_csv
( 'train.csv' )
X_test
= pd
. read_csv
( 'test.csv' )
train
. head
( )
y_train
= train
[ 'label' ]
X_train
= train
. drop
( [ 'label' ] , axis
= 1 )
X_train
from sklearn
. neighbors
import KNeighborsClassifier
from sklearn
. model_selection
import GridSearchCV
from sklearn
. metrics
import accuracy_score
para_dict
= [ { 'weights' : [ "uniform" , "distance" ] , 'n_neighbors' : [ 3 , 4 , 5 ] , 'leaf_size' : [ 10 , 20 ] }
]
knn_clf
= KNeighborsClassifier
( )
grid_search
= GridSearchCV
( knn_clf
, para_dict
, cv
= 3 , scoring
= 'accuracy' , n_jobs
= - 1 )
grid_search
. fit
( X_train
, y_train
)
輸出
GridSearchCV
( cv
= 3 , estimator
= KNeighborsClassifier
( ) , n_jobs
= - 1 , param_grid
= [ { 'leaf_size' : [ 10 , 20 ] , 'n_neighbors' : [ 3 , 4 , 5 ] , 'weights' : [ 'uniform' , 'distance' ] } ] , scoring
= 'accuracy' )
grid_search
. best_params_
grid_search
. best_score_
y_pred
= grid_search
. predict
( X_test
)
image_id
= pd
. Series
( range ( 1 , len ( y_pred
) + 1 ) )
output
= pd
. DataFrame
( { 'ImageId' : image_id
, 'Label' : y_pred
} )
output
. to_csv
( "submission.csv" , index
= False )
排行榜
以上 KNN 模型得分 0.97067,目前排名2467
2. Try SVC
import pandas
as pd
train
= pd
. read_csv
( 'train.csv' )
X_test
= pd
. read_csv
( 'test.csv' )
y_train
= train
[ 'label' ]
X_train
= train
. drop
( [ 'label' ] , axis
= 1 )
from sklearn
. pipeline
import Pipeline
from sklearn
. preprocessing
import StandardScaler
from sklearn
. model_selection
import train_test_split
from sklearn
. svm
import SVC
, LinearSVC
from sklearn
. model_selection
import GridSearchCV
from sklearn
. metrics
import classification_report
from sklearn
. metrics
import accuracy_score
pipeline
= Pipeline
( [ ( "scaler" , StandardScaler
( ) ) , ( 'clf' , SVC
( decision_function_shape
= "ovr" , gamma
= "auto" ) )
] ) from sklearn
. model_selection
import RandomizedSearchCV
from scipy
. stats
import reciprocal
, uniformparam_distributions
= { "clf__gamma" : reciprocal
( 0.001 , 0.1 ) , "clf__C" : uniform
( 1 , 10 ) }
rnd_search_cv
= RandomizedSearchCV
( pipeline
, param_distributions
, n_iter
= 10 , verbose
= 2 , cv
= 3 ) rnd_search_cv
. fit
( X_train
, y_train
)
訓練花費12個小時 [Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 744.1min finished
rnd_search_cv
. best_estimator_
Pipeline
( steps
= [ ( 'scaler' , StandardScaler
( ) ) , ( 'clf' , SVC
( C
= 10.729327185542381 , gamma
= 0.0022750096640207287 ) ) ] )
rnd_search_cv
. best_score_
y_pred
= rnd_search_cv
. best_estimator_
. predict
( X_test
)
image_id
= pd
. Series
( range ( 1 , len ( y_pred
) + 1 ) )
output
= pd
. DataFrame
( { 'ImageId' : image_id
, 'Label' : y_pred
} )
output
. to_csv
( "submission_svc.csv" , index
= False )
SVC 支持向量機分類模型 得分 0.96464 沒有上面 KNN 模型高(KNN 得分 0.97067)
總結
以上是生活随笔 為你收集整理的[Kaggle] Digit Recognizer 手写数字识别 的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網站內容還不錯,歡迎將生活随笔 推薦給好友。