机器学习模型的性能指标
There are various metrics that we can use to evaluate the performance of ML algorithms, classification as well as regression algorithms. We must carefully choose the metrics for evaluating ML performance because,
我們可以使用各種指標來評估ML算法,分類以及回歸算法的性能。 我們必須謹慎選擇評估ML性能的指標,因為,
- How the performance of ML algorithms is measured and compared will be dependent entirely on the metric we choose. ML算法的性能如何衡量和比較將完全取決于我們選擇的指標。
- How we weight the importance of various characteristics in the result will be influenced completely by the metric we choose. 我們?nèi)绾螜?quán)衡各種特征在結(jié)果中的重要性,將完全取決于我們選擇的指標。
The metrics that you choose to evaluate your machine learning model are very important. Choice of metrics influences how the performance of machine learning algorithms is measured and compared.
您選擇用來評估機器學習模型的指標非常重要。 度量標準的選擇會影響如何衡量和比較機器學習算法的性能。
內(nèi)容 (Contents)
1.Performance Metrics for Classification Problems
1.分類問題的績效指標
2.Performance Metrics for Regression Problems
2回歸問題的績效指標
3.Distribution of Errors
3,錯誤分布
分類問題的績效指標 (Performance Metrics for Classification Problems)
1.準確性 (1. Accuracy)
Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations.
準確性是最直觀的性能指標,它只是正確預測的觀測值與總觀測值的比率。
As a heuristic, or rule of thumb, accuracy can tell us immediately whether a model is being trained correctly and how it may perform generally. However, it does not give detailed information regarding its application to the problem.
作為一種啟發(fā)法或經(jīng)驗法則,準確性可以立即告訴我們模型是否被正確訓練以及其總體性能如何。 但是,它沒有提供有關(guān)此問題的應(yīng)用的詳細信息。
If we have high accuracy then our model is best. Yes, accuracy is a great measure but only when we have symmetric datasets where values of positive and negative classes are almost the same.
如果我們具有高精度,那么我們的模型是最好的。 是的,準確性是一個很好的衡量標準,但是只有當我們擁有對稱數(shù)據(jù)集時,正類別和負類別的值幾乎相同。
When data is imbalanced, Accuracy is not the best measure and accuracy cannot use the probability score.
當數(shù)據(jù)不平衡時 ,準確性不是最佳度量,準確性也不能使用概率分數(shù)。
Ex: In our Amazon food review sentiment analysis example with 100 reviews, 10 people have said the review is positive. Let’s assume our model is very bad and predicts every review is positive. For this, it has classified those 90 people negative reviews as positive and 10 positive reviews as negative reviews. Now even though the model is terrible at predicting reviews, The accuracy of such a bad model is also 90%.
例:在我們的100條評論的亞馬遜食品評論情緒分析示例中,有10個人說評論是肯定的。 讓我們假設(shè)我們的模型非常糟糕,并預測每個評論都是正面的。 為此,它將90個人的負面評論歸為正面,將10個人的負面評論歸為負面。 現(xiàn)在,即使該模型在預測評論方面很糟糕,這種不良模型的準確性也高達90%。
2.混淆矩陣 (2. Confusion Matrix)
The Confusion matrix is one of the most intuitive and easiest metrics used for finding the correctness and accuracy of the model. It is used for the Classification problem where the output can be of two or more types of classes.
混淆矩陣是用于查找模型的正確性和準確性的最直觀,最簡單的指標之一。 它用于分類問題,其中輸出可以是兩種或多種類型的類。
Confusion Matrix cannot process the probability score.
混淆矩陣無法處理概率得分。
A confusion matrix is an N X N matrix, where N is the number of classes being predicted. For the problem in hand, if we have N=2, and hence we get a 2 X 2 matrix.
混淆矩陣是NXN矩陣,其中N是要預測的類別數(shù)。 對于手頭的問題,如果我們有N = 2,則得到2 X 2矩陣。
Let’s assume in our Amazon Food reviews label to our target variable:1: When a person says the review is Positive.
讓我們在我們的Amazon Food評論標簽中假設(shè)目標變量為: 1 :當某人說評論為肯定。
0: When a person says the review is Negative.
0:當某人說評論為負面時。
The confusion matrix is a table with two dimensions (“Actual” and “Predicted”), and sets of “classes” in both dimensions. Our Actual classifications are rows and Predicted ones are Columns.
混淆矩陣是一個具有兩個維度(“實際”和“預測”)的表,并且在兩個維度中都有“類別”的集合。 我們的實際分類是行,預測的是列。
The Confusion matrix in itself is not a performance measure as such, but almost all of the performance metrics are based on the Confusion Matrix and the numbers inside it.
混淆矩陣本身并不是一個性能指標,但是幾乎所有性能指標都是基于混淆矩陣及其內(nèi)部數(shù)字的。
Explanation of the terms associated with confusion matrix are as follows,
與混淆矩陣相關(guān)的術(shù)語的解釋如下,
True Negatives (TN) ? It is the case when both the actual class and predicted class of data point is 0.
真負數(shù)(TN) -數(shù)據(jù)點的實際類別和預測類別均為0的情況。
Ex: The case where a review is actually negative(0) and the model classifying this review as negative(0) comes under True Negative.
例如:評論實際上是負面的(0),并且將該評論歸為負面(0)的模型屬于True Negative。
False Positives (FP) ? It is the case when the actual class of data point is 0 and the predicted class of data point is 1.
誤報(FP) -實際數(shù)據(jù)點類別為0而預測數(shù)據(jù)點類別為1。
Ex: The case where a review is actually negative(0) and the model classifying this review as positive(1) comes under False Positive.
例如:評論實際上是負面的(0),并且將該評論歸為正面(1)的模型屬于誤報。
False Negatives (FN) ? It is the case when the actual class of data point is 1 and the predicted class of data point is 0.
假陰性(FN) -數(shù)據(jù)點的實際類別為1而數(shù)據(jù)點的預測類別為0的情況。
Ex: The case where a review is actually positive(1) and the model classifying this review as negative(0) comes under False Negative.
例如:評論實際上是肯定的(1)并且將該評論歸為否定(0)的模型屬于假否定。
True Positives (TP) ? It is the case when both the actual class and predicted class of data point is 1.
真實正值(TP) -數(shù)據(jù)點的實際類別和預測類別均為1。
Ex: The case where a review is actually positive(1) and the model classifying this review as positive(1) comes under True positive.
例:評論實際上是肯定的(1)且將該評論歸為肯定(1)的模型屬于“真正肯定”。
N is the total number of negatives in our given data and P is the total number of positives in our data shown in the below image.
N是給定數(shù)據(jù)中的負數(shù)總數(shù),P是下圖中所示的數(shù)據(jù)中的正數(shù)總數(shù)。
Accuracy in terms of the confusion matrix given by in classification problems is the number of correct predictions made by the model over all kinds of predictions made.
在分類問題中給出的混淆矩陣方面的準確性是模型在做出的所有預測中做出的正確預測的數(shù)量。
We can use the accuracy_score function of sklearn.metrics to compute the accuracy of our classification model.
我們可以用sklearn.metrics的accuracy_score函數(shù)來計算我們的分類模型的準確性。
For a good model, the True Positive Rate and True Negative Should be high, and the False Positive Rate and False Negative Rate Should be low.
對于一個好的模型,真正率和真負率應(yīng)該高,假正率和假負率應(yīng)該低。
Some examples
一些例子
False Positive (FP) moves a trusted email to junk in an anti-spam engine.False Negative (FN) in medical screening can incorrectly show disease absence when it is actually positive.
誤報(FP)將受信任的電子郵件發(fā)送到反垃圾郵件引擎中的垃圾郵件。醫(yī)學篩查中的誤報(FN)可能在實際為陽性時錯誤地表明沒有疾病。
精度,召回(或)靈敏度,特異性,F1-Score (Precision ,Recall (or) sensitivity ,Specificity,F1-Score)
Precision and Recall are extensively used in information retrieval problems when we have a large corpus of text data.
當我們擁有大量文本數(shù)據(jù)集時,Precision和Recall廣泛用于信息檢索問題。
Precision: Precision tells about us of all the points model predicted to be positive, what percentage of points are actually positive.
精度:精度告訴我們所有預測為正的點模型,實際上是多少點是正的。
Precision is about being precise. So even if we managed to capture only one cancer case, and we captured it correctly, then we are 100% precise.
精確就是精確。 因此,即使我們僅成功捕獲了一個癌癥病例,并且我們正確地捕獲了該癌癥病例,我們也可以做到100%精確。
Recall (or) Sensitivity (or) True Positive Rate: Recall tells about us of all the points are actually belong to the positive points, how many points are to be predicted as positive points.
召回(或)敏感度(或)真實正比率:召回告訴我們所有點實際上都屬于正點,要預測多少點為正點。
Recall is not so much about capturing cases correctly but more about capturing all cases that have “cancer” with the answer as “cancer”. So if we simply always say every case as “cancer”, we have 100% recall.
召回并不是正確地捕獲案例,而是捕獲所有具有“癌癥”且答案為“癌癥”的案例。 因此,如果我們只是總是將每種情況都說成是“癌癥”,那么我們有100%的回憶率。
So basically if we want to focus more on minimizing False Negatives, we would want our Recall to be as close to 100% as possible without precision being too bad and if we want to focus on minimizing False positives, then our focus should be to make Precision as close to 100% as possible.
因此,基本上,如果我們想將更多的精力集中在最小化誤報上,那么我們希望召回率盡可能地接近100%,而精度又不會太差;并且如果我們想將精力集中在最小化誤報上,那么我們的重點應(yīng)該放在精度盡可能接近100%。
It is clear that recall gives us information about a classifier’s performance with respect to false negatives (how many did we miss), while precision gives us information about its performance with respect to false positives(how many did we caught).
很明顯,召回為我們提供了有關(guān)分類器在誤報方面的表現(xiàn)的信息(我們錯過了多少),而精確度 向我們提供有關(guān)誤報方面性能的信息(我們捕獲了多少)。
Specificity (or) True Negative Rate: Specificity, in contrast, to recall, may be defined as the number of negatives returned by our ML model. We can easily calculate it by confusion matrix with the help of following formula
特異性(或)真實陰性率:相比之下,回憶性可以將特異性定義為我們的ML模型返回的陰性數(shù)。 我們可以借助以下公式輕松地通過混淆矩陣進行計算
F1-Score: We don’t really want to carry both Precision and Recall in our pockets every time we make a model for solving a classification problem. So it’s best if we can get a single score that kind of represents both Precision(P) and Recall(R).
F1-Score:每次創(chuàng)建模型來解決分類問題時,我們都不希望同時攜帶Precision和Recall。 因此,最好是獲得一個表示Precision(P)和Recall(R)的單一分數(shù)。
This score will give us the harmonic mean of precision and recall. Mathematically, the F1-Score is the weighted average of precision and recall. The best value of F1-Score would be 1 and the worst would be 0. F1-Score is having an equal relative contribution of precision and recall.
該分數(shù)將為我們提供精確度和查全率的調(diào)和平均值。 從數(shù)學上講,F1-分數(shù)是精度和召回率的加權(quán)平均值。 F1-Score的最佳值將是1,最差的將是0。F1-Score在精度和召回率上的相對貢獻相等。
For Multi-Class Classification, we use similar metrics like F1-Score there are
對于多類別分類,我們使用類似F1-Score的指標
Micro F1-Score: Micro F1-score (short for micro-averaged F1 score) is used to assess the quality of multi-label binary problems. It measures the F1-score of the aggregated contributions of all classes.
Micro F1分數(shù): Micro F1分數(shù)(Micro-F1平均分數(shù)的縮寫)用于評估多標簽二進制問題的質(zhì)量。 它衡量所有類別的總貢獻的F1得分。
If you are looking to select a model based on a balance between precision and recall, don’t miss out on assessing your F1-scores.
如果您希望基于精度和召回率之間的平衡來選擇模型,請不要錯過評估F1得分的機會。
Micro F1-score 1 is the best value (perfect micro-precision and micro-recall), and the worst value is 0. Note that precision and recall have the same relative contribution to the F1-score.
Micro F1分數(shù)1是最佳值(完美的微精度和微召回率 ),最差值為0。請注意,精度和召回率對F1分數(shù)具有相同的相對貢獻。
C is the Number of Classes and K ∈ C.
C是類數(shù), K∈C。
Micro F1-score is defined as the harmonic mean of the precision and recall:
Micro F1分數(shù)定義為精度和查全率的諧波平均值:
Micro-averaging F1-score is performed by first calculating the sum of all true positives, false positives, and false negatives over all the labels. Then we compute the micro-precision and micro-recall from the sums. And finally, we compute the harmonic mean to get the micro F1-score.
通過首先計算所有標簽上所有真陽性,假陽性和假陰性的總和來執(zhí)行F1分數(shù)的微平均。 然后,我們根據(jù)這些和計算出微精度和微召回率。 最后,我們計算諧波平均值以獲得微F1分數(shù)。
Micro-averaging will put more emphasis on the common labels in the data set since it gives each sample the same importance. This may be the preferred behavior for multi-label classification problems.
微觀平均將更加重視數(shù)據(jù)集中的通用標簽,因為它賦予每個樣本相同的重要性。 對于多標簽分類問題,這可能是首選的行為。
Macro F1-Score: Macro F1-score (short for macro-averaged F1 score) is used to assess the quality of problems with multiple binary labels or multiple classes.
宏F1分數(shù):宏F1分數(shù)(宏F1分數(shù)的縮寫)用于評估具有多個二進制標簽或多個類別的問題的質(zhì)量。
Macro F1-score is defined as the average harmonic mean of precision and recall of each class:
宏F1分數(shù)定義為每個類別的精度和召回率的平均諧波平均值:
C is the Number of Classes and K ∈ C.
C是類數(shù), K∈C。
Macro F1-score will give the same importance to each label/class. It will be low for models that only perform well on the common classes while performing poorly on the rare classes.
宏F1分數(shù)將對每個標簽/類給予相同的重視。 對于僅在普通類上表現(xiàn)良好而在稀有類上表現(xiàn)較差的模型,該值將較低。
Hamming Loss: Hamming loss is the fraction of wrong labels to the total number of labels. In multi-class classification, the hamming loss is calculated as the hamming distance between actual and predictions.
漢明丟失:漢明丟失是錯誤標簽占標簽總數(shù)的比例。 在多類分類中,漢明損失被計算為實際和預測之間的漢明距離。
This is a loss function, so the optimal value is zero.
這是一個損失函數(shù),因此最佳值為零。
3.接收機工作特性曲線 (3. Receiver Operating Characteristics Curve)
Receiver-operating characteristic (ROC) analysis was originally developed during World War II to analyze classification accuracy in differentiating signals from noise in radar detection. Recently, the methodology has been adapted to several clinical areas heavily dependent on screening and diagnostic tests, in particular, laboratory testing, epidemiology, radiology, and bioinformatics.
接收機工作特性(ROC)分析最初是在第二次世界大戰(zhàn)期間開發(fā)的,用于分析區(qū)分雷達檢測中的信號與噪聲的分類精度。 最近,該方法已經(jīng)適應(yīng)于嚴重依賴篩查和診斷測試的多個臨床領(lǐng)域,特別是實驗室測試,流行病學,放射學和生物信息學。
A Receiver Operating Characteristic (ROC) Curve is a way to compare diagnostic tests. It is a plot of the True Positive Rate against the False Positive Rate.
接收器工作特性(ROC)曲線是比較診斷測試的一種方法。 它是真實肯定率與錯誤肯定率的曲線圖。
AUC (Area Under Curve)-ROC (Receiver Operating Characteristic) is a performance metric, based on varying threshold values, for classification problems. As the name suggests, ROC is a probability curve, and AUC measures the separability.
AUC(曲線下面積)-ROC(接收器工作特性)是一種性能指標,基于變化的閾值,用于分類問題。 顧名思義,ROC是一條概率曲線,而AUC則測量可分離性。
ROC is used in binary Classification. To compute ROC first we want to do
ROC用于二進制分類。 首先要計算ROC
- Getting classification model probability predictions. The probabilities usually range between 0 and 1. 獲取分類模型概率預測。 概率通常在0到1之間。
- AUC cannot care about the probability score it only cares about the sorted order of data. AUC不在乎概率分數(shù),它只在乎數(shù)據(jù)的排序順序。
- Sort the data into discrete order. 將數(shù)據(jù)按離散順序排序。
- The next step is to find a threshold to classify the probabilities. 下一步是找到一個閾值以對概率進行分類。
- To plot the ROC curve, we need to calculate the TPR and FPR for different thresholds using a confusion matrix. 要繪制ROC曲線,我們需要使用混淆矩陣來計算不同閾值的TPR和FPR。
- For each threshold, we plot the FPR value in the x-axis and the TPR value in the y-axis. We then join the dots with a line. 對于每個閾值,我們在x軸上繪制FPR值,在y軸上繪制TPR值。 然后,我們用一條線將點連接起來。
In simple words, the AUC-ROC metric will tell us about the capability of the model in distinguishing the classes. Higher the AUC, the better the model.
簡而言之,AUC-ROC度量標準將告訴我們該模型區(qū)分類的能力。 AUC越高,模型越好。
The ROC curve is plotted with TPR against the FPR where TPR is on the y-axis and FPR is on the x-axis.
用TPR相對FPR繪制ROC曲線,其中TPR在y軸上,FPR在x軸上。
An excellent model has AUC near to the 1 which means it has a good measure of separability. A poor model has AUC near to the 0 which means it has the worst measure of separability. In fact, it means it is reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when AUC is 0.5, it means the model has no class separation capacity whatsoever.
出色的模型的AUC接近1,這意味著它具有很好的可分離性度量。 較差的模型的AUC接近于0,這意味著它具有最差的可分離性度量。 實際上,這意味著它在回報結(jié)果。 它預測0s為1s,1s為0s。 當AUC為0.5時,表示該模型沒有任何類別分離能力。
4.對數(shù)概率(對數(shù)損失) (4. Log probability (Log Loss))
Log Loss is the most important classification metric based on probabilities.
對數(shù)丟失是基于概率的最重要的分類指標。
If the model gives us the probability score, Log-loss is the best performance measure for both binary and Multi classification.
如果模型為我們提供了概率得分,則對數(shù)損失是二進制分類和多分類分類的最佳性能度量。
The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0.
我們的機器學習模型的目標是最小化此值。 理想模型的對數(shù)損失為0。
It’s hard to interpret raw log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a lower log-loss value means better predictions.
很難解釋原始的對數(shù)損失值,但是對數(shù)損失仍然是比較模型的良好指標。 對于任何給定的問題,較低的對數(shù)損失值意味著更好的預測。
Log loss quantifies the average difference between predicted and expected probability distributions.
對數(shù)損失量化了預測概率分布與預期概率分布之間的平均差異。
回歸問題的績效指標 (Performance Metrics for Regression Problems)
1.R2或確定系數(shù) (1. R2 or Coefficient of Determination)
It is known as the coefficient of determination. It is a statistical measure of how close the data are to the fitted regression line Or indicates the goodness of fit of a set of predictions to the actual values. The value of R2 lies between 0 and 1 where 0 means no-fit and 1 means perfectly-fit.
這就是確定系數(shù)。 它是一種統(tǒng)計度量,用于衡量數(shù)據(jù)與擬合的回歸線的接近程度,或者指示一組預測與實際值的擬合度。 R2的值介于0和1之間,其中0表示不適合,而1表示完全適合。
R-squared is calculated by dividing the sum of squares of residuals (SSres) from the regression model by the total sum of squares (SStot) of errors from the average model and then subtract it from 1.
R平方是通過從平均模型除以從回歸模型的殘差(SSres)平方和由平方誤差的總和(SStot)來計算,然后從1中減去它。
Where SSE is the Sum of Square of Residuals. Here residual is the difference between the predicted value and the actual value.it is also called an error.
其中SSE是殘差平方和。 此處殘差是預測值與實際值之間的差,也稱為誤差。
And SST is the Total Sum of Squared error using a simple mean model.
SST是使用簡單均值模型的平方誤差總和。
An R-squared value of 0.81, tells that the input variables explain 81 % of the variation in the output variable. The higher the R squared, the more variation is explained by the input variables and better is the model.
R平方值為0.81,表明輸入變量解釋了輸出變量變化的81%。 R平方越高,輸入變量說明的變化越大,并且模型越好。
2. 調(diào)整后的R2 (2. Adjusted R2)
The limitation of R-squared is that it will either stay the same or increases with the addition of more variables, even if they do not have any relationship with the output variables.
R平方的局限性在于 ,即使它們與輸出變量沒有任何關(guān)系,R平方也會隨著添加更多變量而保持不變或增加。
To overcome this limitation, Adjusted R-square comes into the picture as it penalizes you for adding the variables which do not improve your existing model.
為了克服此限制,調(diào)整后的R平方會出現(xiàn)在圖片中,因為它會因添加無法改善現(xiàn)有模型的變量而受到懲罰。
Adjusted R2 depicts the same meaning as R2 but is an improvement of it. R2 suffers from the problem that the scores improve on increasing terms even though the model is not improving which may misguide the researcher. Adjusted R2 is always lower than R2 as it adjusts for the increasing predictors and only shows improvement if there is a real improvement.
調(diào)整后的R2表示與R2相同的含義,但它是對R2的改進。 R 2的問題在于,即使模型沒有改進,分數(shù)也會隨著增加而提高,這可能會誤導研究人員。 調(diào)整后的R2始終低于R2,因為它會針對不斷增長的預測指標進行調(diào)整,并且只有在真正改善的情況下才會顯示出改善。
Hence, if you are building Linear regression on multiple variables, it is always suggested that you use Adjusted R-squared to judge the goodness of the model.
因此,如果要在多個變量上建立線性回歸,則始終建議您使用調(diào)整后的R平方來判斷模型的優(yōu)劣。
3.均方誤差(MSE) (3. MEAN SQUARE ERROR (MSE))
MSE or Mean Squared Error is one of the most preferred metrics for regression tasks. It is simply the average of the squared difference between the target value and the value predicted by the regression model.
MSE或均方誤差是回歸任務(wù)最優(yōu)選的指標之一。 它只是目標值與回歸模型預測的值之間的平方差的平均值。
As it squares the differences, it penalizes even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better.
當平方差異時,它甚至會懲罰一個很小的誤差,從而導致對模型的嚴重程度的高估。 與其他度量相比,它更可取,因為它是可區(qū)分的,因此可以更好地進行優(yōu)化。
Here, the error term is squared and thus more sensitive to outliers.
在這里,誤差項是平方的,因此對異常值更為敏感。
3.根均方誤差(RMSE) (3. ROOT MEAN SQUARE ERROR (RMSE))
RMSE is the most widely used metric for regression tasks and is the square root of the averaged squared difference between the target value and the value predicted by the model.
RMSE是用于回歸任務(wù)的最廣泛使用的度量標準,是目標值與模型預測的值之間的平均平方差的平方根。
MSE includes squared error terms, we take the square root of the MSE, which gives rise to Root Mean Squared Error (RMSE).
MSE包含平方誤差項,我們?nèi)SE的平方根,這將引起均方根誤差(RMSE)。
RMSE is highly affected by outlier values. Hence, make sure you’ve removed outliers from your data set prior to using this metric.
RMSE受異常值的影響很大。 因此,在使用此指標之前,請確保已從數(shù)據(jù)集中刪除了異常值。
4.平均絕對誤差(MAE) (4. Mean Absolute Error (MAE))
It is the simplest error metric used in regression problems. It is basically the sum of the average of the absolute difference between the predicted and actual values.
它是用于回歸問題的最簡單的誤差度量。 它基本上是預測值與實際值之間的絕對差的平均值之和。
In simple words, with MAE, we can get an idea of how wrong the predictions were
簡而言之,借助MAE,我們可以了解預測的錯誤程度
5.中值絕對偏差(MADE) (5. Median Absolute Deviation Error(MADE))
Median Absolute Deviation: The median absolute deviation(MAD) is a robust measure of how to spread out a set of data is. The variance and standard deviation are also measures of spread, but they are more affected by extremely high or extremely low values and non-normality.
中位數(shù)絕對偏差 :中位數(shù)絕對偏差(MAD)是衡量如何分散一組數(shù)據(jù)的可靠方法。 方差和標準偏差也是點差的度量,但是它們受極高或極低值以及非正態(tài)性的影響更大。
First, find the median of error, then subtract this median from each error, then take the absolute value of these differences, Find the median of these absolute differences.
首先,找到誤差的中位數(shù),然后從每個誤差中減去該中位數(shù),然后取這些差異的絕對值,然后找到這些絕對差異的中位數(shù)。
錯誤分布 (Distribution of Errors)
To understand errors, for every point compute error and distribute in PDF and CDF.
要了解錯誤,請為每個點計算錯誤并以PDF和CDF進行分發(fā)。
The probability distribution for a random error that is as likely to move the value in either direction is called a Gaussian distribution.
可能在任一方向上移動該值的隨機誤差的概率分布稱為高斯分布 。
In the above image, most of the errors are small, very few errors are large, smaller errors better for regression.
在上圖中,大多數(shù)誤差很小,很少的誤差很大,較小的誤差更適合回歸。
In the above image, 99 % of errors are < 0.1 and 1 % of errors are ≥ 0.1.
在上圖中,99%的誤差<0.1,1%的誤差≥0.1。
If we compare the errors of the two models, the red color model is M1 is having 95 % of errors below 0.1 and the blue color model is M2 is having 80% of errors are below 0.1. From this, we conclude the M1 is better than M2.
如果我們比較兩個模型的誤差,則紅色模型為M1,其95%的誤差低于0.1,藍色模型為M2,其80%的誤差低于0.1。 據(jù)此,我們得出結(jié)論,M1優(yōu)于M2。
I Performed some tasks for Classification and Regression from the scratch show in the below image.
從下圖開始,我從頭開始執(zhí)行一些分類和回歸任務(wù)。
For complete code to understand visit my GitHub link.
要了解完整的代碼,請訪問我的GitHub鏈接。
結(jié)論 (Conclusion)
In this post, we discovered about the various metrics used in Classification and Regression analysis in Machine Learning.
在這篇文章中,我們發(fā)現(xiàn)了有關(guān)機器學習的分類和回歸分析中使用的各種度量。
翻譯自: https://medium.com/analytics-vidhya/performance-metrics-for-machine-learning-models-80d7666b432e
總結(jié)
以上是生活随笔為你收集整理的机器学习模型的性能指标的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python 线性回归_Python中的
- 下一篇: tkmapper教程_tkmapper