python模型部署方法_终极开箱即用的自动化Python模型选择方法
python模型部署方法
Choosing the best model is a key step after feature selection in any data science projects. This process consists of using the best algorithms (supervised, unsupervised) for obtaining the best predictions. Automate model selection methods for high dimensional datasets generally include Libra and Pycaret. A unicorn data-scientist needs to master the most advanced Automate model selections methods. In this article, we will review the 2 best Kaggle winners’ Automate model selections methods which can be implemented in short python codes.
在任何數據科學項目中選擇特征之后,選擇最佳模型都是關鍵的一步。 此過程包括使用最佳算法(有監督,無監督)來獲得最佳預測。 用于高維數據集的自動模型選擇方法通常包括Libra和Pycaret 。 獨角獸數據科學家需要掌握最先進的自動模型選擇方法。 在本文中,我們將介紹2種最佳的Kaggle獲獎者的Automate模型選擇方法,這些方法可以用簡短的python代碼實現。
For this article, we will analyze the sample chocolate bar rating dataset you can find here.
對于本文,我們將分析示例巧克力條評級數據集,您可以在此處找到。
Photo by Klara Avsenik on Unsplash照片由Klara Avsenik在Unsplash上拍攝A challenging dataset which after features selections contains 20 from 3400 features correlate to the target feature ‘review date’.
一個極具挑戰性的數據集,在特征選擇之后包含3400個特征中的20個,與目標特征“審查日期”相關。
Libra
天秤座
The challenge is to find the best performing combination of techniques so that you can minimize the error in your predictions. Libra provides out-of-the-box automated supervised machine learning that optimizes machine (or deep) learning pipelines, automatically searching for the best learning algorithms (Neural network, SVM, decision tree, KNN, etc) and best hyperparameters in seconds. Click here to see a complete list of estimators/models available in Libra.
面臨的挑戰是找到性能最佳的技術組合,以使預測誤差最小。 Libra提供了開箱即用的自動監督機器學習,可優化機器(或深度)學習管道,自動在幾秒鐘內搜索最佳學習算法(神經網絡,SVM,決策樹,KNN等)和最佳超參數。 單擊此處查看天秤座中可用的估計器/模型的完整列表。
Here an example predicting the review_date feature of the chocolate rating dataset, a complex multiclass classification (labels: 2006–2020).
這是一個預測巧克力評分數據集的review_date功能的示例,這是一個復雜的多類分類(標簽:2006–2020)。
#import libraries!pip install librafrom libra import client#open the dataseta_client = client('../input/preprocess-choc/dfn.csv')
print(a_client)#choose the modela_client.neural_network_query('review_date', epochs=20)
a_client.analyze()Automate Neural network using Libra使用Libra自動化神經網絡
Libra result in a neural network with an accuracy before optimizations of 0.796 and after of 0.860 reducing overfitting from train/test = 0.796–0.764 (0.35) to train/test = 0.860–0.851 (0.009) resulting in the best numbers of neural network layers from 3 to 6.
天秤座導致神經網絡的精度在優化之前為0.796,在優化之后為0.860,減少了從訓練/測試= 0.796–0.764(0.35)到訓練/測試= 0.860–0.851(0.009)的過度擬合,從而獲得了最佳的神經網絡層數從3到6。
Photo by Nick Kavounidis on Unsplash 尼克·卡沃尼迪斯 ( Nick Kavounidis)在Unsplash上拍攝的照片2. Pycaret
2. 皮卡雷
PyCaret is simple and easy to use sequential pipeline including a well integrate preprocessing functions with hyperparameters tuning and train models ensembling.
PyCaret是簡單易用的順序流水線,包括具有超參數調整和訓練模型集成的良好集成的預處理功能。
#import libraries!pip install pycaretfrom pycaret.classification import *#open the datasetdfn = pd.read_csv('../input/preprocess-choc/dfn.csv')#define target label and parametersexp1 = setup(dfn, target = 'review_date', feature_selection = True)Pycaret preprocessing functionsPycaret預處理功能
All the preprocessing steps are applied within setup(). With more than 40 features to prepare data for machine learning including missing values imputation, categorical variable encoding, label encoding (converting yes or no into 1 or 0), and train-test-split are automatically performed when setup() is initialized. For more details about PyCaret’s preprocessing abilities Click here.
所有預處理步驟都在setup()中應用。 初始化setup()時,將自動執行40多種功能來為機器學習準備數據,包括缺失值插補,分類變量編碼,標簽編碼(將yes或no轉換為1或0)和train-test-split。 有關PyCaret預處理功能的更多詳細信息,請單擊此處 。
Photo by toby otti on Unsplash照片由Toby otti在Unsplash上拍攝Pycaret makes model comparisons in one line, returning a table with k-fold cross-validated scores and algorithms scored metrics.
Pycaret在一行中進行模型比較,返回一張帶有k倍交叉驗證得分和算法得分指標的表格。
compare_models(fold = 5, turbo = True)Best compare classifiers最佳比較分類器PyCaret has over 60 open-source ready-to-use algorithms. Click here to see a complete list of estimators/models available in PyCaret.
PyCaret具有60多種開源即用型算法。 單擊此處查看PyCaret中可用的估算器/模型的完整列表。
The tune_model function is used for automatically tuning hyperparameters of a machine learning model. PyCaret uses random grid search over a predefined search space. This function returns a table with k-fold cross-validated scores.
tune_model函數用于自動調整機器學習模型的超參數。 PyCaret在預定義的搜索空間上使用隨機網格搜索 。 此函數返回具有k倍交叉驗證得分的表格。
The ensemble_model function is used for ensembling trained models. It takes only trained model object returning a table with k-fold cross validated scores.
ensemble_model函數用于組合訓練后的模型。 它僅需要訓練的模型對象返回具有k倍交叉驗證得分的表格。
# creating a decision tree modeldt = create_model(dt)# ensembling a trained dt modeldt_bagged = ensemble_model(dt)#plot_model dtplot_model(estimator = dt, plot = 'learning')# plot_model dt_baggedplot_model(estimator = dt_bagged, plot = 'learning')Simple and bagging decisions tree evaluations metrics簡單而袋裝的決策樹評估指標Performance evaluation and diagnostics of a trained machine learning model can be done using the plot_model function.
可以使用plot_model函數對經過訓練的機器學習模型進行性能評估和診斷。
#hyperparameters tunningtuned_dt = tune_model(dt,optimize = "Accuracy", n_iter = 500)#evaluate modelevaluate_model(estimator=tuned_dt)#plot tuned dt confusion matrixplot_model(tuned_dt, plot = 'confusion_matrix')Decision tree classifier evaluations methods using Pycaret使用Pycaret的決策樹分類器評估方法Finally, predict_model function can be used to predict unseen dataset.
最后, predict_model函數可用于預測看不見的數據集。
#predicting label on a new datasetpredictions = predict_model(dt)Review_date predictions using decision tree使用決策樹的Review_date預測 Photo by Element5 Digital on Unsplash Element5 Digital在Unsplash上拍攝的照片If you have some spare time I’d recommend, you’ll read this:
如果您有空閑時間,建議您閱讀以下內容:
Sum Up
總結
Refer to these links :
請參考以下鏈接:
https://jovian.ml/yeonathan/libra
https://jovian.ml/yeonathan/libra
https://jovian.ml/yeonathan/pycaret
https://jovian.ml/yeonathan/pycaret
For complete algorithms selections of chocolate bar review date estimations using these 2 methods.
對于完整的算法選擇,使用這兩種方法選擇巧克力棒的日期估計。
This brief overview is a reminder of the importance of using the right algorithms selection methods in data science. This post has for scope to cover the 2 best Python automate algorithms selection methods for high dimensional datasets, as well as share useful documentation.
這個簡短的概述提醒我們在數據科學中使用正確的算法選擇方法的重要性。 這篇文章的范圍涵蓋了針對高維數據集的2種最佳Python自動算法選擇方法,并分享了有用的文檔。
Photo by Ingmar on Unsplash Ingmar在Unsplash上的照片I hope you enjoy it, keep exploring!
希望您喜歡它,繼續探索!
翻譯自: https://towardsdatascience.com/the-ultimate-out-of-the-box-automated-python-model-selection-methods-f2188472d2a
python模型部署方法
總結
以上是生活随笔為你收集整理的python模型部署方法_终极开箱即用的自动化Python模型选择方法的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 上交所休市安排,在周末和法定节假日休市
- 下一篇: 炒股入门与技巧有什么专业术语?