机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用
機(jī)器學(xué)習(xí)特征構(gòu)建
Data scientist and ML experts often find it difficult to showcase their findings/result to others. Mostly ,power point or any web development tools are required to explain the results. With the introduction of Streamlit , it has become easier to develop web apps with python. We can also create multiple models and show the results based on the user selection. This comes as an easy-go hand in library specially for analyst or persons who would like to show a POC kind of solutions to the clients or to other team members. In this article, I have not detailed on the machine learning algorithms used as it is not the scope.
數(shù)據(jù)科學(xué)家和ML專家通常很難向他人展示他們的發(fā)現(xiàn)/結(jié)果。 通常,需要使用Power Point或任何Web開發(fā)工具來解釋結(jié)果。 隨著Streamlit的引入,使用python開發(fā)Web應(yīng)用程序變得更加容易。 我們還可以創(chuàng)建多個(gè)模型并根據(jù)用戶選擇顯示結(jié)果。 這是易于使用的資料庫,專門供分析人員或想向客戶或其他團(tuán)隊(duì)成員展示POC解決方案的人員使用。 在本文中,我沒有詳細(xì)介紹所使用的機(jī)器學(xué)習(xí)算法,因?yàn)樗皇欠秶?
Streamlit is an open-source python framework that allows us to create interactive websites for machine learning and data science related requirements[1]. In this article, we will develop a web app for classification algorithm where the user will be able to select the algorithm on which the model should be built, the model parameters and visualize the corresponding results of the model.
Streamlit是一個(gè)開放源代碼的python框架,允許我們創(chuàng)建用于機(jī)器學(xué)習(xí)和數(shù)據(jù)科學(xué)相關(guān)要求的交互式網(wǎng)站[1]。 在本文中,我們將開發(fā)一個(gè)用于分類算法的Web應(yīng)用程序,在該應(yīng)用程序中,用戶將能夠選擇應(yīng)在其上構(gòu)建模型的算法,模型參數(shù)并可視化模型的相應(yīng)結(jié)果。
1. Data Set :
1.數(shù)據(jù)集:
For demonstration purpose, I have taken a smaller diabetes dataset from the following link (Kaggle). The objective of the dataset is to predict whether a patient is diabetic or non-diabetic. Personally speaking, I have only explored with smaller and medium size datasets using streamlit.
為了演示,我從下面的鏈接( Kaggle )中獲取了一個(gè)較小的糖尿病數(shù)據(jù)集。 數(shù)據(jù)集的目的是預(yù)測(cè)患者是糖尿病患者還是非糖尿病患者。 就個(gè)人而言,我僅使用streamlit探索了中小型數(shù)據(jù)集。
2. Installing Streamlit :
2.安裝Streamlit:
Let us begin by installing Streamlit using the command :
讓我們開始使用以下命令安裝Streamlit:
pip install streamlit
點(diǎn)安裝streamlit
Run the following command to ensure that the installation is working,
運(yùn)行以下命令以確保安裝正常進(jìn)行,
streamlit hello
流光打招呼
To run a web app, run the command,
要運(yùn)行網(wǎng)絡(luò)應(yīng)用,請(qǐng)運(yùn)行以下命令,
streamlit run <filaname.py>
流式運(yùn)行<filaname.py>
This command will open a browser, where the web app will be displayed. If any changes are made to the source file, we can dynamically observe the changes in the app by using the re-run option.
此命令將打開一個(gè)瀏覽器,將在其中顯示W(wǎng)eb應(yīng)用程序。 如果對(duì)源文件進(jìn)行了任何更改,我們可以使用re-run選項(xiàng)動(dòng)態(tài)觀察應(yīng)用程序中的更改。
3. Streamlit Components:
3. Streamlit組件:
This article will discuss about the following components and how they are used in our machine learning web app,
本文將討論以下組件以及它們?cè)谖覀兊臋C(jī)器學(xué)習(xí)網(wǎng)絡(luò)應(yīng)用中的使用方式,
? Checkbox
?復(fù)選框
? Title
?標(biāo)題
? Sidebar
?側(cè)邊欄
? Markdown
?降價(jià)促銷
? Selectbox (drop-box)
?選擇框(下拉框)
? Multi-select
? 多選
? Radio (radio buttons)
?單選(單選按鈕)
? Number Input box
?數(shù)字輸入框
? Slider
?滑桿
? Caching
?緩存
? Button
?按鈕
Let us now import the streamilt and other necessary libraries for our machine learning model,
現(xiàn)在,讓我們?yōu)槲覀兊臋C(jī)器學(xué)習(xí)模型導(dǎo)入streamilt和其他必要的庫,
import streamlit as stimport pandas as pdimport numpy as npfrom sklearn.svm import SVCfrom sklearn.linear_model import LogisticRegressionfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curvefrom sklearn.metrics import precision_score, recall_scoreTitle, Sidebar and Markdown
標(biāo)題,邊欄和減價(jià)
To start with, let us add a title and sidebar to out app as below,
首先,讓我們?nèi)缦孪騩ut應(yīng)用添加標(biāo)題和側(cè)邊欄,
st.title(“Predicting Diabetes Web App”)st.sidebar.title(“Model Selection Panel”)
st.markdown(“Affected by Diabetes or not ?”)
st.sidebar.markdown(“Choose your model and its parameters”)
The title function adds a title to our web app, the sidebar creates a side-panel like component for the app. Streamlit also provides with a facility of markdown using markdown function.
標(biāo)題功能為我們的網(wǎng)絡(luò)應(yīng)用程序添加了一個(gè)標(biāo)題, 側(cè)邊欄為該應(yīng)用程序創(chuàng)建了類似于側(cè)面板的組件。 Streamlit還使用markdown功能提供了markdown功能。
Checkbox
選框
Let us now learn how to add a checkbox at the sidebar to the app. In this app, we use the checkbox to load the data from our csv file . The syntax for adding checkbox is : checkbox(<label for our checkbox>,<True/False>).
現(xiàn)在讓我們學(xué)習(xí)如何在側(cè)邊欄為應(yīng)用程序添加一個(gè)復(fù)選框。 在此應(yīng)用中,我們使用復(fù)選框從c??sv文件中加載數(shù)據(jù)。 添加復(fù)選框的語法是: checkbox(<我們復(fù)選框的標(biāo)簽>,<True / False>) 。
def load_data(): data = pd.read_csv(“diabetes.csv”)return datadf=load_data()if st.sidebar.checkbox(“Show raw data”, False): st.subheader(“Diabetes Raw Dataset”) st.write(df)Since the option is for the checkbox is “False”, the checkbox will be unchecked while the web app loads. After running the web app , the app can be visualized as below,
由于復(fù)選框的選項(xiàng)為“ False”,因此在加載Web應(yīng)用程序時(shí)將取消選中該復(fù)選框。 運(yùn)行網(wǎng)絡(luò)應(yīng)用后,該應(yīng)用可以如下顯示,
Caching
快取
Streamlit provides a functionality called caching, where data is not loaded each time when the app is loaded. Unless the data source is changed , the data is loaded from the cache and thus saving cpu cycles and memory time. i.e. Caching mechanism that allows your app to stay performant even when loading data from the web, manipulating large datasets, or performing expensive computations[1]. This is done with the help of the decorator @st.cache which is added before the function that requires the caching mechanism. In our case, it is added at the start of the data load function.
Streamlit提供了一種稱為緩存的功能,該功能在每次加載應(yīng)用程序時(shí)都不會(huì)加載數(shù)據(jù)。 除非更改數(shù)據(jù)源,否則將從緩存中加載數(shù)據(jù),從而節(jié)省CPU周期和內(nèi)存時(shí)間。 即緩存機(jī)制,即使從Web加載數(shù)據(jù),處理大型數(shù)據(jù)集或執(zhí)行昂貴的計(jì)算,也可以使您的應(yīng)用保持高性能。[1] 這是在裝飾器@ st.cache的幫助下完成的,該裝飾器在需要緩存機(jī)制的函數(shù)之前添加。 在我們的情況下,它是在數(shù)據(jù)加載功能開始時(shí)添加的。
#@st.cache(allow_output_mutation=True)@st.cache(persist=True)
def load_data():
data = pd.read_csv(“diabetes.csv”)
return data
The streamlit expects that the functions decorated with cache is not mutated within the function body.i.e. if the data that to be cached should not be changed during the course of the app. To by pass this , st.cache provides an option with allow_output_mutation=True and many such options which can be referred from their official site.
精打細(xì)算的人希望用緩存裝飾的功能不會(huì)在函數(shù)體內(nèi)發(fā)生突變,即,如果不應(yīng)在應(yīng)用程序運(yùn)行期間更改要緩存的數(shù)據(jù)。 為了繞過這個(gè)問題,st.cache提供了一個(gè)allow_output_mutation = True的選項(xiàng)以及許多這樣的選項(xiàng),可以從其官方站點(diǎn)引用它們。
Drop-down
落下
The selectbox helps in adding a drop-down box for the app. For our app, we have used this feature to aid us in selecting the different classifiers that will be used in creating the machine learning model. The syntax is : selectbox(“<name for the selectbox”, (“<options to go into drop-down”>)).
選擇框有助于為應(yīng)用添加一個(gè)下拉框。 對(duì)于我們的應(yīng)用程序,我們已經(jīng)使用此功能來幫助我們選擇將用于創(chuàng)建機(jī)器學(xué)習(xí)模型的不同分類器。 語法為: selectbox(“ <選擇框的名稱”,(“ <要進(jìn)入下拉菜單的選項(xiàng)”>)) 。
st.sidebar.subheader(“Select your Classifier”)classifier = st.sidebar.selectbox(“Classifier”, (“Decision Tree”,”Support Vector Machine (SVM)”, “Logistic Regression”, “Random Forest”))
Multiselect
多選
Multiselect widget allows the user to make multiple selections in a list box. In our app, we use this widget to choose the metrics for evaluating our machine learning model. Let us use Confusion Matrix, ROC and Precision-Recall curve to evaluate our model. The multiselect follows the syntax very similar to select box as : multiselect(“<label for the multiselect>“, (‘<options to get displayed>’)). Now, let us include it in our app and re-run our app to see the changes.
Multiselect小部件允許用戶在列表框中進(jìn)行多項(xiàng)選擇。 在我們的應(yīng)用程序中,我們使用此小部件選擇評(píng)估我們的機(jī)器學(xué)習(xí)模型的指標(biāo)。 讓我們使用混淆矩陣,ROC和Precision-Recall曲線評(píng)估模型。 多重選擇遵循與選擇框非常相似的語法: multiselect(“ <多重選擇的標(biāo)簽>”,(('<要顯示的選項(xiàng)>'))) 。 現(xiàn)在,讓我們將其包含在我們的應(yīng)用程序中,然后重新運(yùn)行我們的應(yīng)用程序以查看更改。
metrics = st.sidebar.multiselect(“Select your metrics : “, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))Radio
無線電
This widget adds radio button to our app. It can be done with the help of the syntax : radio(<label for the radio button>,<options to be displayed> , <index of the option for pre-selection on first render>, <function to change the label for the radio button>,key=<unique key for the widget that can be used to refer in other parts of our app>). In our app, one such place where we use them in selecting the criteria on which the decision tree algorithm works.
此小部件將單選按鈕添加到我們的應(yīng)用程序。 可以借助以下語法完成此操作: radio (<單選按鈕的標(biāo)簽>,<要顯示的選項(xiàng)>,<用于第一次渲染的預(yù)選選項(xiàng)的索引>,<用于更改標(biāo)簽的功能單選按鈕>,key = <窗口小部件的唯一鍵,可用于在我們應(yīng)用程序的其他部分中引用>) 。 在我們的應(yīng)用程序中,我們使用它們來選擇決策樹算法所依據(jù)的標(biāo)準(zhǔn)。
criterion= st.sidebar.radio(“Criterion(measures the quality of split)”, (“gini”, “entropy”), key=’criterion’)Slider
滑桿
Let us discuss on how to add a sliding widget to our app. The syntax is as follows : slider(<label for the slider>,<minimum value in which the slider should start>,<maximum value till which the slider can be used>,<the values that is to be displayed on its first render>,step=<step interval at which the slider should increase/decrease >,key=<unique key for the slider>). In our app, we have used slider in adjusting the regularization parameter for logistic regression.
讓我們討論如何向我們的應(yīng)用添加滑動(dòng)小部件。 語法如下: slider ( < slider 標(biāo)簽>,<滑塊應(yīng)在其中開始的最小值>,<可以使用該滑塊之前的最大值>,<要在其第一次渲染時(shí)顯示的值>,step = <滑塊應(yīng)增加/減少的步長>,key = <滑塊的唯一鍵> )。 在我們的應(yīng)用程序中,我們已使用滑塊調(diào)整邏輯回歸的正則化參數(shù)。
Number Input box
號(hào)碼輸入框
Displays a numeric input widget where the users can input their numbers .
顯示一個(gè)數(shù)字輸入小部件,用戶可以在其中輸入數(shù)字。
number_input(<label for the widget>,<minimum value for the widget>,<maximum value for the widget>, <Default value of the widget when in it is rendered for the first time>, step=<time interval in which the value should increase/decrease>, format=<the format in which the widget should display numbers>, key=<unique key for the widget>)
number_input (<widget的標(biāo)簽>,<widget的最小值>,<widget的最大值>,<首次呈現(xiàn)widget時(shí)widget的默認(rèn)值>,step = <值應(yīng)增加/減少>,格式= <小部件顯示數(shù)字的格式>,鍵= <小部件的唯一鍵>)
Eg :
例如:
n_estimators = st.sidebar.number_input(“The number of trees in the forest”, 100, 5000, step=10, key=’n_estimators’)Button
紐扣
This displays a clickable button on our app which can be added by the syntax : button(<Description about the button purpose>,key=<unique id for the button>).
這會(huì)在我們的應(yīng)用程序上顯示一個(gè)可單擊的按鈕,該按鈕可以通過以下語法添加: button ( <按鈕用途的描述>,key = <按鈕的唯一ID> )。
button(“Classify”, key=’classify’)I have discussed few widgets, you can find more like progress bar, text_input and so on from Streamlit official documentation. The entire python script for the app is below, which yo can directly execute after installing streamlit to your python environment.
我討論了一些小部件,您可以從Streamlit官方文檔中找到更多類似進(jìn)度條,text_input等的小部件。 該應(yīng)用程序的整個(gè)python腳本在下面,您可以在將streamlit安裝到python環(huán)境后直接執(zhí)行。
############ Import the required Libraries ##################import streamlit as st
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve
from sklearn.metrics import precision_score, recall_scoredef main():
st.title(“Predicting Diabetes Web App”)
st.sidebar.title(“Model Selection Panel”)
st.markdown(“Affected by Diabetes or not ?”)
st.sidebar.markdown(“Choose your model and its parameters”)#@st.cache(allow_output_mutation=True)
@st.cache(persist=True)
def load_data():# Function to load our dataset
data = pd.read_csv(“diabetes.csv”)
return data
def split(df):# Split the data to ‘train and test’ sets
req_cols = [‘Pregnancies’, ‘Insulin’, ‘BMI’, ‘Age’,’Glucose’,’BloodPressure’,’DiabetesPedigreeFunction’]
x = df[req_cols] # Features for our algorithm
y = df.Outcome
x = df.drop(columns=[‘Outcome’])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
return x_train, x_test, y_train, y_test
def plot_metrics(metrics_list):
if ‘Confusion Matrix’ in metrics_list:
st.subheader(“Confusion Matrix”)
plot_confusion_matrix(model, x_test, y_test, display_labels=class_names)
st.pyplot()if ‘ROC Curve’ in metrics_list:
st.subheader(“ROC Curve”)
plot_roc_curve(model, x_test, y_test)
st.pyplot()
if ‘Precision-Recall Curve’ in metrics_list:
st.subheader(‘Precision-Recall Curve’)
plot_precision_recall_curve(model, x_test, y_test)
st.pyplot()
df=load_data()
class_names = [‘Diabetec’, ‘Non-Diabetic’]
x_train, x_test, y_train, y_test = split(df)
st.sidebar.subheader(“Select your Classifier”)
classifier = st.sidebar.selectbox(“Classifier”, (“Decision Tree”, “Logistic Regression”, “Random Forest”))if classifier == ‘Decision Tree’:
st.sidebar.subheader(“Model parameters”)
#choose parameters
criterion= st.sidebar.radio(“Criterion(measures the quality of split)”, (“gini”, “entropy”), key=’criterion’)
splitter = st.sidebar.radio(“Splitter (How to split at each node?)”, (“best”, “random”), key=’splitter’)
metrics = st.sidebar.multiselect(“Select your metrics : “, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))
if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Decision Tree Results”)
model = DecisionTreeClassifier(criterion=criterion, splitter=splitter)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if classifier == ‘Logistic Regression’:
st.sidebar.subheader(“Model Parameters”)
C = st.sidebar.number_input(“C (Regularization parameter)”, 0.01, 10.0, step=0.01, key=’C_LR’)
max_iter = st.sidebar.slider(“Maximum number of iterations”, 100, 500, key=’max_iter’)metrics = st.sidebar.multiselect(“Select your metrics?”, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Logistic Regression Results”)
model = LogisticRegression(C=C, penalty=’l2', max_iter=max_iter)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if classifier == ‘Random Forest’:
st.sidebar.subheader(“Model Hyperparameters”)
n_estimators = st.sidebar.number_input(“The number of trees in the forest”, 100, 5000, step=10, key=’n_estimators’)
max_depth = st.sidebar.number_input(“The maximum depth of the tree”, 1, 20, step=1, key=’n_estimators’)
bootstrap = st.sidebar.radio(“Bootstrap samples when building trees”, (‘True’, ‘False’), key=’bootstrap’)
metrics = st.sidebar.multiselect(“What metrics to plot?”, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Random Forest Results”)
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, bootstrap=bootstrap, n_jobs=-1)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if st.sidebar.checkbox(“Show raw data”, False):
st.subheader(“Diabetes Raw Dataset”)
st.write(df)
if __name__ == ‘__main__’:
main()
https://docs.streamlit.io/en/stable/getting_started.html.
https://docs.streamlit.io/en/stable/getting_started.html 。
https://www.datacamp.com/community/tutorials/decision-tree-classification-python.
https://www.datacamp.com/community/tutorials/decision-tree-classification-python 。
https://www.kaggle.com/uciml/pima-indians-diabetes-database?select=diabetes.csv.
https://www.kaggle.com/uciml/pima-indians-diabetes-database?select=diabetes.csv 。
https://www.coursera.org/projects/data-science-streamlit-python.
https://www.coursera.org/projects/data-science-streamlit-python 。
https://www.coursera.org/projects/machine-learning-streamlit-python.
https://www.coursera.org/projects/machine-learning-streamlit-python 。
翻譯自: https://medium.com/analytics-vidhya/build-your-basic-machine-learning-web-app-with-streamlit-60e29e43f5f7
機(jī)器學(xué)習(xí)特征構(gòu)建
總結(jié)
以上是生活随笔為你收集整理的机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: lpr浮动利率定价方式是什么意思
- 下一篇: 30万房贷20年月供多少