【机器学习】太香啦!只需一行Python代码就可以自动完成模型训练!
自動化機器學習(Auto-ML)是指數據科學模型開發的管道組件自動化。AutoML 減少了數據科學家的工作量并加快了工作流程。AutoML 可用于自動化各種管道組件,包括數據理解,EDA,數據處理,模型訓練,超參數調整等。
對于端到端機器學習項目,每個組件的復雜性取決于項目。我們知道市面上有很多的 AutoML 開源庫可加快開發的速度。在本文中,我將分享一個非常棒的python工具庫「LazyPredict」。
什么是LazyPredict?
LazyPredict是一個開源Python庫,可自動執行模型訓練管道并加快工作流程。LazyPredict可以為分類數據集訓練約30個分類模型,為回歸數據集訓練約40個回歸模型。
LazyPredict將返回經過訓練的模型以及其性能指標,而無需編寫太多代碼。可以輕松比較每個模型的性能指標,并調整最佳模型以進一步提高性能。
安裝
可以使用以下方法從PyPl庫中安裝LazyPredict:
pip?install?lazypredict安裝后,可以導入庫以執行分類和回歸模型的自動訓練。
from?lazypredict.Supervised?import?LazyRegressor,?LazyClassifier用法
LazyPredict 同時支持分類和回歸問題,因此我將利用案例說明:波士頓住房(回歸)和泰坦尼克號(分類)數據集用于LazyPredict庫的演示。
分類任務
LazyPredict 的用法非常直觀,類似于scikit-learn。首先為分類任務創建一個估計器 LazyClassifier 的實例,可以通過自定義指標進行評估,默認情況下,每個模型都將根據準確性,ROC、AUC得分, F1-score進行評估。
在進行 lazypredict 模型訓練之前,必須先讀取數據集并進行處理,以使其適合訓練。在進行特征工程并將數據拆分為訓練測試數據之后,我們可以使用 LazyPredict 進行模型訓練。
#?LazyClassifier?Instance?and?fiting?data cls=?LazyClassifier(ignore_warnings=False,?custom_metric=None) models,?predictions?=?cls.fit(X_train,?X_test,?y_train,?y_test)回歸任務
與分類模型訓練相似,LazyPredict附帶了針對回歸數據集的自動模型訓練。該實現類似于分類任務,只是實例LazyRegressor有所更改。
reg?=?LazyRegressor(ignore_warnings=False,?custom_metric=None) models,?predictions?=?reg.fit(X_train,?X_test,?y_train,?y_test)觀察以上性能指標,AdaBoost分類器是分類任務的最佳表現模型,而GradientBoostingRegressor模型是回歸任務的最佳表現模型。
完整版案例
分類
from?lazypredict.Supervised?import?LazyClassifier from?sklearn.datasets?import?load_breast_cancer from?sklearn.model_selection?import?train_test_splitdata?=?load_breast_cancer() X?=?data.data y=?data.targetX_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,test_size=.5,random_state?=123)clf?=?LazyClassifier(verbose=0,ignore_warnings=True,?custom_metric=None) models,predictions?=?clf.fit(X_train,?X_test,?y_train,?y_test)print(models)|?Model??????????????????????????|???Accuracy?|???Balanced?Accuracy?|???ROC?AUC?|???F1?Score?|???Time?Taken?| |:-------------------------------|-----------:|--------------------:|----------:|-----------:|-------------:| |?LinearSVC??????????????????????|???0.989474?|????????????0.987544?|??0.987544?|???0.989462?|????0.0150008?| |?SGDClassifier??????????????????|???0.989474?|????????????0.987544?|??0.987544?|???0.989462?|????0.0109992?| |?MLPClassifier??????????????????|???0.985965?|????????????0.986904?|??0.986904?|???0.985994?|????0.426?????| |?Perceptron?????????????????????|???0.985965?|????????????0.984797?|??0.984797?|???0.985965?|????0.0120046?| |?LogisticRegression?????????????|???0.985965?|????????????0.98269??|??0.98269??|???0.985934?|????0.0200036?| |?LogisticRegressionCV???????????|???0.985965?|????????????0.98269??|??0.98269??|???0.985934?|????0.262997??| |?SVC????????????????????????????|???0.982456?|????????????0.979942?|??0.979942?|???0.982437?|????0.0140011?| |?CalibratedClassifierCV?????????|???0.982456?|????????????0.975728?|??0.975728?|???0.982357?|????0.0350015?| |?PassiveAggressiveClassifier????|???0.975439?|????????????0.974448?|??0.974448?|???0.975464?|????0.0130005?| |?LabelPropagation???????????????|???0.975439?|????????????0.974448?|??0.974448?|???0.975464?|????0.0429988?| |?LabelSpreading?????????????????|???0.975439?|????????????0.974448?|??0.974448?|???0.975464?|????0.0310006?| |?RandomForestClassifier?????????|???0.97193??|????????????0.969594?|??0.969594?|???0.97193??|????0.033?????| |?GradientBoostingClassifier?????|???0.97193??|????????????0.967486?|??0.967486?|???0.971869?|????0.166998??| |?QuadraticDiscriminantAnalysis??|???0.964912?|????????????0.966206?|??0.966206?|???0.965052?|????0.0119994?| |?HistGradientBoostingClassifier?|???0.968421?|????????????0.964739?|??0.964739?|???0.968387?|????0.682003??| |?RidgeClassifierCV??????????????|???0.97193??|????????????0.963272?|??0.963272?|???0.971736?|????0.0130029?| |?RidgeClassifier????????????????|???0.968421?|????????????0.960525?|??0.960525?|???0.968242?|????0.0119977?| |?AdaBoostClassifier?????????????|???0.961404?|????????????0.959245?|??0.959245?|???0.961444?|????0.204998??| |?ExtraTreesClassifier???????????|???0.961404?|????????????0.957138?|??0.957138?|???0.961362?|????0.0270066?| |?KNeighborsClassifier???????????|???0.961404?|????????????0.95503??|??0.95503??|???0.961276?|????0.0560005?| |?BaggingClassifier??????????????|???0.947368?|????????????0.954577?|??0.954577?|???0.947882?|????0.0559971?| |?BernoulliNB????????????????????|???0.950877?|????????????0.951003?|??0.951003?|???0.951072?|????0.0169988?| |?LinearDiscriminantAnalysis?????|???0.961404?|????????????0.950816?|??0.950816?|???0.961089?|????0.0199995?| |?GaussianNB?????????????????????|???0.954386?|????????????0.949536?|??0.949536?|???0.954337?|????0.0139935?| |?NuSVC??????????????????????????|???0.954386?|????????????0.943215?|??0.943215?|???0.954014?|????0.019989??| |?DecisionTreeClassifier?????????|???0.936842?|????????????0.933693?|??0.933693?|???0.936971?|????0.0170023?| |?NearestCentroid????????????????|???0.947368?|????????????0.933506?|??0.933506?|???0.946801?|????0.0160074?| |?ExtraTreeClassifier????????????|???0.922807?|????????????0.912168?|??0.912168?|???0.922462?|????0.0109999?| |?CheckingClassifier?????????????|???0.361404?|????????????0.5??????|??0.5??????|???0.191879?|????0.0170043?| |?DummyClassifier????????????????|???0.512281?|????????????0.489598?|??0.489598?|???0.518924?|????0.0119965?|回歸
from?lazypredict.Supervised?import?LazyRegressor from?sklearn?import?datasets from?sklearn.utils?import?shuffle import?numpy?as?npboston?=?datasets.load_boston() X,?y?=?shuffle(boston.data,?boston.target,?random_state=13) X?=?X.astype(np.float32)offset?=?int(X.shape[0]?*?0.9)X_train,?y_train?=?X[:offset],?y[:offset] X_test,?y_test?=?X[offset:],?y[offset:]reg?=?LazyRegressor(verbose=0,?ignore_warnings=False,?custom_metric=None) models,?predictions?=?reg.fit(X_train,?X_test,?y_train,?y_test)print(models)|?Model?????????????????????????|?Adjusted?R-Squared?|?R-Squared?|??RMSE?|?Time?Taken?| |:------------------------------|-------------------:|----------:|------:|-----------:| |?SVR???????????????????????????|???????????????0.83?|??????0.88?|??2.62?|???????0.01?| |?BaggingRegressor??????????????|???????????????0.83?|??????0.88?|??2.63?|???????0.03?| |?NuSVR?????????????????????????|???????????????0.82?|??????0.86?|??2.76?|???????0.03?| |?RandomForestRegressor?????????|???????????????0.81?|??????0.86?|??2.78?|???????0.21?| |?XGBRegressor??????????????????|???????????????0.81?|??????0.86?|??2.79?|???????0.06?| |?GradientBoostingRegressor?????|???????????????0.81?|??????0.86?|??2.84?|???????0.11?| |?ExtraTreesRegressor???????????|???????????????0.79?|??????0.84?|??2.98?|???????0.12?| |?AdaBoostRegressor?????????????|???????????????0.78?|??????0.83?|??3.04?|???????0.07?| |?HistGradientBoostingRegressor?|???????????????0.77?|??????0.83?|??3.06?|???????0.17?| |?PoissonRegressor??????????????|???????????????0.77?|??????0.83?|??3.11?|???????0.01?| |?LGBMRegressor?????????????????|???????????????0.77?|??????0.83?|??3.11?|???????0.07?| |?KNeighborsRegressor???????????|???????????????0.77?|??????0.83?|??3.12?|???????0.01?| |?DecisionTreeRegressor?????????|???????????????0.65?|??????0.74?|??3.79?|???????0.01?| |?MLPRegressor??????????????????|???????????????0.65?|??????0.74?|??3.80?|???????1.63?| |?HuberRegressor????????????????|???????????????0.64?|??????0.74?|??3.84?|???????0.01?| |?GammaRegressor????????????????|???????????????0.64?|??????0.73?|??3.88?|???????0.01?| |?LinearSVR?????????????????????|???????????????0.62?|??????0.72?|??3.96?|???????0.01?| |?RidgeCV???????????????????????|???????????????0.62?|??????0.72?|??3.97?|???????0.01?| |?BayesianRidge?????????????????|???????????????0.62?|??????0.72?|??3.97?|???????0.01?| |?Ridge?????????????????????????|???????????????0.62?|??????0.72?|??3.97?|???????0.01?| |?TransformedTargetRegressor????|???????????????0.62?|??????0.72?|??3.97?|???????0.01?| |?LinearRegression??????????????|???????????????0.62?|??????0.72?|??3.97?|???????0.01?| |?ElasticNetCV??????????????????|???????????????0.62?|??????0.72?|??3.98?|???????0.04?| |?LassoCV???????????????????????|???????????????0.62?|??????0.72?|??3.98?|???????0.06?| |?LassoLarsIC???????????????????|???????????????0.62?|??????0.72?|??3.98?|???????0.01?| |?LassoLarsCV???????????????????|???????????????0.62?|??????0.72?|??3.98?|???????0.02?| |?Lars??????????????????????????|???????????????0.61?|??????0.72?|??3.99?|???????0.01?| |?LarsCV????????????????????????|???????????????0.61?|??????0.71?|??4.02?|???????0.04?| |?SGDRegressor??????????????????|???????????????0.60?|??????0.70?|??4.07?|???????0.01?| |?TweedieRegressor??????????????|???????????????0.59?|??????0.70?|??4.12?|???????0.01?| |?GeneralizedLinearRegressor????|???????????????0.59?|??????0.70?|??4.12?|???????0.01?| |?ElasticNet????????????????????|???????????????0.58?|??????0.69?|??4.16?|???????0.01?| |?Lasso?????????????????????????|???????????????0.54?|??????0.66?|??4.35?|???????0.02?| |?RANSACRegressor???????????????|???????????????0.53?|??????0.65?|??4.41?|???????0.04?| |?OrthogonalMatchingPursuitCV???|???????????????0.45?|??????0.59?|??4.78?|???????0.02?| |?PassiveAggressiveRegressor????|???????????????0.37?|??????0.54?|??5.09?|???????0.01?| |?GaussianProcessRegressor??????|???????????????0.23?|??????0.43?|??5.65?|???????0.03?| |?OrthogonalMatchingPursuit?????|???????????????0.16?|??????0.38?|??5.89?|???????0.01?| |?ExtraTreeRegressor????????????|???????????????0.08?|??????0.32?|??6.17?|???????0.01?| |?DummyRegressor????????????????|??????????????-0.38?|?????-0.02?|??7.56?|???????0.01?| |?LassoLars?????????????????????|??????????????-0.38?|?????-0.02?|??7.56?|???????0.01?| |?KernelRidge???????????????????|?????????????-11.50?|?????-8.25?|?22.74?|???????0.01?|結論
在本文中,我們討論了LazyPredict庫的實現,該庫可以在幾行Python代碼中訓練大約70個分類和回歸模型。這是一個非常方便的工具,因為它提供了模型執行的總體情況,并且可以比較每個模型的性能。
每個模型都使用其默認參數進行訓練,因為它不執行超參數調整。選擇性能最佳的模型后,開發人員可以調整模型以進一步提高性能。
往期精彩回顧適合初學者入門人工智能的路線及資料下載機器學習及深度學習筆記等資料打印機器學習在線手冊深度學習筆記專輯《統計學習方法》的代碼復現專輯 AI基礎下載機器學習的數學基礎專輯溫州大學《機器學習課程》視頻 本站qq群851320808,加入微信群請掃碼: 與50位技術專家面對面20年技術見證,附贈技術全景圖總結
以上是生活随笔為你收集整理的【机器学习】太香啦!只需一行Python代码就可以自动完成模型训练!的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: win7系统0xt000000f错误代码
- 下一篇: 京东极速版如何取消订单 京东极速版怎样取