机器学习 对回归的评估_在机器学习回归问题中应使用哪种评估指标?
機器學習 對回歸的評估
If you’re like me, you might have used R-Squared (R2), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE )evaluation metrics in your regression problems without giving them a lot of thought. 🤔
如果您像我一樣,可能在回歸問題中使用了R-Squared(R2),均方根誤差(RMSE)和平均絕對誤差(MAE)評估指標,而沒有多加考慮。 🤔
Although all of them are common metrics, it’s not obvious which one to use when. After writing this article I have a new favorite and a new plan for reporting them going forward. 😀
盡管它們都是通用的度量標準,但何時使用哪個度量標準卻并不明顯。 寫完這篇文章后,我有了一個新的最愛,并且有了一個新的計劃來報告它們的發展。 😀
I’ll share those conclusions with you in a bit. First, we’ll dig into each metric. You’ll learn the pros and cons of each for model selection and reporting. Let’s get to it! 🚀
我將與您分享這些結論。 首先,我們將深入研究每個指標。 您將了解每種方法在模型選擇和報告方面的利弊。 讓我們開始吧! 🚀
R平方(R2) (R-Squared (R2))
R2 represents the proportion of variance explained by your model.
R2代表模型解釋的方差比例。
R2 is a relative metric, so you can use it to compare with other models trained on the same data. And you can use it to get a rough a feel for how well a model performs, in general.
R2是相對度量,因此您可以將其與在相同數據上訓練的其他模型進行比較。 通常,您可以使用它粗略了解模型的性能。
Disclaimer: This article isn’t a review of machine learning methods, but make sure you use different data for training, validation, and testing. You always want to hold out some data that your model has not seen to evaluate its performance. Also, it’s a good idea to look at plot of your model’s predictions vs. the actual values to see how well your model fit the data.
免責聲明:本文不是對機器學習方法的評論,但請確保您使用不同的數據進行培訓,驗證和測試。 您始終希望保留一些模型尚未看到的數據來評估其性能。 另外,最好查看模型預測值與實際值的關系圖,以了解模型對數據的擬合程度。
Let’s see how R2 is computed. Onward! ??
讓我們看看如何計算R2。 向前! ??
公式和代碼 (Formula and code)
Here is one way to formulate R2.
這是形成R 2的一種方法。
1 - (SSE/SST)
1-(SSE / SST)
SSE is the sum of squared errors; the sum of the squared differences between the actual values and predicted values.
SSE是平方誤差的總和; 實際值和預測值之間的平方差之和。
SST is the total sum of squares (shown sometimes as TSS); the sum of the squared differences between the actual values and the mean of the actual values.
SST是平方的總和(有時顯示為TSS); 實際值與實際值的平均值之間的平方差之和。
With more mathy notation:
具有更多數學符號:
1 - (∑(y - ?)2 / ∑(y - y?)2)
1-(∑(y-?)2/ ∑(y-y?)2)
Here’s what the code looks like —adapted from scikit-learn, the primary Python machine learning library.
代碼如下所示-改編自主要的Python機器學習庫scikit-learn 。
numerator = ((y_true - y_pred) ** 2).sum()denominator = ((y_true - np.average(y_true)) ** 2).sum()r2_score = 1 - (numerator / denominator)
用文字 (In words)
subtract the predicted values from the actual y values
從實際y值中減去預測值
That’s the numerator.
那就是分子。
subtract the mean of the actual y values from each actual y value
從每個實際y值中減去實際y值的平均值
That’s the denominator.
那是分母。
1 - the numerator/denominator is the R2. 🎉
1-分子/分母為R2。 🎉
R2 is the default metric for scikit-learn regression problems. If you want to use it explicitly you can import it and then use it like this:
R2是scikit-learn回歸問題的默認度量。 如果要顯式使用它,可以將其導入,然后像這樣使用它:
from sklearn.metrics import r2_scorer2_score(y_true, y_pred)解釋 (Interpretation)
A model that explains no variance would have an R2 of 0. A model with an R2 of 1 would explain all of the variance. Higher scores are better.
解釋無方差的模型的R2為0。R2為1的模型解釋所有方差。 分數越高越好。
However, if your R2 is 1 on your test set you are probably leaking information or the problem is fairly simple for your model to learn. 👍
但是,如果您的測試集上的R2為1,則可能是在泄漏信息,或者問題很容易讓模型學習。 👍
In some fields, such as the social sciences, there are lots of factors that influence human behavior. Say you have a model with just a few independent variables that results in an R2 close to .5. Your model is able to account for half of the variance in your data, and that’s quite good. 😀
在某些領域,例如社會科學,有許多影響人類行為的因素。 假設您有一個只有幾個自變量的模型,其R2接近.5。 您的模型能夠解決數據差異的一半,這非常好。 😀
It is possible to have an R2 that is negative. Negative scores occur when the predictions the model makes fit that data worse than the mean of the output values. Predicting the mean each time is a null model. See more here.
R2可能為負。 當模型的預測使數據擬合度比輸出值的平均值差時,就會出現負分數。 每次均值的預測都是一個零模型。 在這里查看更多。
例 (Example)
Say you have the following small toy test dataset:
假設您有以下小型玩具測試數據集:
All code is available on GitHub in this Jupyter notebook.
此Jupyter筆記本中的 GitHub上提供了所有代碼。
Here’s a plot of the actual and predicted y values.
這是實際和預測y值的曲線圖。
The R2 of the model is 0.71. The model is accounting for 71% of the variance in the data. That’s not too shabby, although we’d like more test data. 😀
模型的R2為0.71。 該模型占數據差異的71%。 盡管我們想要更多的測試數據,但這還不是太破舊。 😀
As another example, let’s say the true values for y are [55, 2, 3]. The mean is 20. Predicting 20 for each y value results in an R2 of 0.
再舉一個例子,假設y的真實值為[55,2,3] 。 平均值為20。每個y值預測為20,則R2為0。
A model that predicts [1 , 2, 2] for the true values above results in an R2 of -0.59. Bottom line, you can do far worse than the null model! In fact, you can predict infinitely worse, resulting in an infinitely low R2. 😲
可以預測上述真實值的[1,2,2 ]的模型的R2為-0.59。 最重要的是,您可以做的比空模型還差! 實際上,您可以無限預測,導致R2無限低。 😲
As a brief aside, let’s look at the Adjusted R2 and machine learning vs. statistics.
簡要來說,讓我們看一下調整后的R2和機器學習與統計數據。
調整后的R2 (Adjusted R2)
The Adjusted R2 accounts for the addition of more predictor variables (features).
調整后的R2考慮了更多預測變量(功能)的增加。
Adjusted R2 will only increase with a new predictor variable when that variable improves the model performance more than would be expect by chance. Adjusted R2 helps you focus on using the most parsimonious model possible. 😉
調整后的R2僅在新的預測變量提高了模型性能超過偶然預期的情況下才會增加。 調整后的R2可幫助您專注于使用最簡約的模型。 😉
The Adjusted R2 is more common in statistical inference than in machine learning. Scikit-learn, the primary Python library for machine learning, doesn’t even have an Adjusted R2 metric. Statsmodels, the primary statistical library for Python does. If you want to learn more about when to use which Python library for data science, I wrote a guide here.
調整后的R2在統計推斷中比在機器學習中更常見。 Scikit-learn是機器學習的主要Python庫,甚至沒有調整過的R2指標。 Statsmodels ,Python的主要統計庫呢 。 如果您想了解有關何時使用哪個Python庫進行數據科學的更多信息,請在此處編寫指南。
You can compute the Adjusted R2 if you know the number of feature columns (p) and the number of observations (n). Here’s the code:
如果您知道要素列的數量( p )和觀測值的數量( n ),則可以計算調整后的R2。 這是代碼:
adjusted_r2 = 1 — ((1 — r2) * (n — 1)) / ( n — p — 1)n-1 is the degrees of freedom. Whenever you hear that term, you know you are in statistics land. In machine learning we generally care most about predictive ability, so R2 is favored over Adjusted R2.
n-1是自由度 。 每當您聽到該詞時,您就會知道自己在統計領域。 在機器學習中,我們通常最關心預測能力,因此,R2優于調整后的R2。
Another note on statistics vs. machine learning: our focus is on machine learning, so prediction rather than causality. R2 — and the other metrics that we’ll see, don’t say anything about causality by themselves.
關于統計與機器學習的另一個說明:我們的重點是機器學習,因此是預測而不是因果關系。 R2-和我們將要看到的其他度量標準,本身并沒有說明因果關系。
底線 (Bottom Line)
R2 tells you how much variance your model accounts for. It’s handy because the R2 for any regression problem will immediately provide some (limited) understanding of how well the model is performing. 😀
R2告訴您模型占多少差異。 這很方便,因為任何回歸問題的R2都會立即提供(有限的)對模型性能的了解。 😀
R2 is a relative metric. Let’s see a few absolute metrics now.
R 2是相對度量。 現在讓我們看一些絕對指標。
source: pixabay.com資料來源:foto.com均方根誤差(RMSE) (Root Mean Squared Error (RMSE))
RMSE is a very common evaluation metric. It can range between 0 and infinity. Lower values are better. To keep this straight, remember that it has error in the name and you want errors to be low. ??
RMSE是非常常見的評估指標。 范圍是0到無窮大。 值越低越好。 為了保持清晰,請記住名稱中有錯誤 ,并且您希望錯誤率低。 ??
公式和代碼 (Formula and code)
The RMSE can be formulated like this:
RMSE可以這樣制定:
square root of mean SSE
均值SSE的平方根
We saw SSE in the R2 score metric. It’s the sum of squared errors; the sum of the squared differences between the actual values and predicted values.
我們看到了上證所 在R2得分指標中。 這是平方誤差的總和; 實際值和預測值之間的平方差之和。
More mathy formula:
更多數學公式:
square root of (1/n * (∑(y -?)2))
(1 / n *(∑(y-?)2))的平方根
In Python code:
在Python代碼中:
np.sqrt(np.mean((y_true - y_pred) ** 2))用文字 (In words)
subtract the predicted values from the actual y values
從實際y值中減去預測值
Here’s how to get the RMSE with a function in scikit-learn:
這是通過scikit-learn中的函數獲取RMSE的方法:
from sklearn.model_selection import mean_squared_errormean_squared_error(y_true, y_pred, squared=False)You can use the squared=False argument as of scikit-learn version 0.22.0. Prior to that you had to take the square root yourself like this: np.sqrt(mean_squared_error(y_actual, y_predicted). ??
從scikit-learn 0.22.0版開始,您可以使用squared=False參數。 在此之前,您必須自己像這樣np.sqrt(mean_squared_error(y_actual, y_predicted)平方根: np.sqrt(mean_squared_error(y_actual, y_predicted) 。
解釋 (Interpretation)
Use RMSE if you want to:
如果需要,請使用RMSE:
- penalize large errors 懲罰大錯誤
- have the result be in the same units as the outcome variable 結果與結果變量的單位相同
- use a loss function for validation that can be quickly computed 使用損失函數進行驗證,可以快速計算出
You could use the Mean Squared Error (MSE) with no Root, but then the units are not as easily comprehensible. Just take the square root of the MSE and you’ve got the RMSE. 👍
您可以使用無根的均方誤差(MSE),但這樣的單位就不那么容易理解了。 只需以MSE的平方根為單位,就可以得到RMSE。 👍
In this excellent article JJ points out some issues with RMSE. Namely, that “RMSE does not necessarily increase with the variance of the errors. RMSE increases with the variance of the frequency distribution of error magnitudes.”
JJ在這篇出色的文章中指出了RMSE的一些問題。 即, RMSE不一定隨誤差的變化而增加。 RMSE隨著誤差幅度的頻率分布變化而增加?!?
Also, the RMSE is not so easily interpreted. The units might look familiar, but you are squaring differences. You can’t just say that an RMSE of 10 means you are off by 10 units on average, although that’s kind of how most folks think of the result. At least, it’s how I used to. 😉
同樣,RMSE并不是那么容易解釋。 這些單元可能看起來很熟悉,但是您正在平方差異。 您不能僅僅說RMSE為10意味著您平均減少了10個單位,盡管這是大多數人對結果的看法。 至少,這是我過去的習慣。 😉
例 (Example)
Turning to our example dataset again:
再次轉到示例數據集:
The RMSE is 0.48. The mean of the actual y values is 2.2. Together, that information tells us that the model is probably somewhere between great and terrible. It’s hard to do too much with this RMSE statistic without more context. 😐
RMSE為0.48。 實際y值的平均值為2.2。 這些信息一起告訴我們,該模型可能介于強大和可怕之間。 沒有更多上下文,使用此RMSE統計數據很難做太多事情。 😐
底線 (Bottom Line)
RMSE is an imperfect statistic for evaluation, but it’s very common. If you care a lot about penalizing large errors, it’s not a bad choice. It’s a great choice for a loss metric when hyperparameter tuning or batch training a deep neural network.
RMSE是不完善的評估統計,但它很常見。 如果您非常關心懲罰大錯誤,這不是一個壞選擇。 當超參數調整或批量訓練深度神經網絡時,它是損失度量的理想選擇。
source: pixabay.com資料來源:foto.com平均絕對誤差 (Mean Absolute Error)
Mean Absolute Error (MAE) is the average of the absolute value of the errors.
平均絕對誤差(MAE)是誤差絕對值的平均值。
公式和代碼 (Formula and code)
Let’s get right to math equation:
讓我們開始使用數學方程式:
(1 / n) * (∑ |y - ?|)
(1 / n)*(∑ | y-?|)
In code:
在代碼中:
np.average(np.abs(y_true - y_pred))用文字 (In words)
subtract the predicted values from the actual y values
從實際y值中減去預測值
Here’s how to get the MAE with a scikit-learn function:
這是通過scikit-learn函數獲得MAE的方法:
from sklearn.model_selection import mean_absolute_errormean_absolute_error(y_actual, y_predicted, squared=False)解釋 (Interpretation)
The MAE is conceptually the easiest evaluation metric for regression problems. It answers the question, “How far were you off in your predictions, on average?”
從概念上來說,MAE是最簡單的評估回歸指標的指標。 它回答了一個問題:“平均而言,您在預測中走了多遠?”
The units make intuitive sense. Yes! 🎉
單位具有直觀意義。 是! 🎉
For example, say you are predicting house sale prices and the mean actual sale price in the test set is $500,000. A MAE of $10,000 means the model was off by an average of $10k in its predictions. That’s not bad! 😀
例如,假設您正在預測房屋售價,而測試集中的平均實際售價為$ 500,000。 MAE為10,000美元,意味著該模型的預測平均減少了10,000美元。 不錯! 😀
Unlike RMSE scores, bad predictions don’t result in disproportionately high MAE scores.
與RMSE分數不同,不良預測不會導致MAE分數過高。
The MAE will always be closer to 0 than the RMSE (or the same).
MAE總是比RMSE(或相同)更接近0。
Note that the MAE isn’t as quick to compute as RMSE as an optimization metric for a model with a training loop.
請注意,對于具有訓練循環的模型,MAE的計算速度不如RMSE快。
例 (Example)
Turning to our example dataset for a final time:
最后來看我們的示例數據集:
The MAE is 0.37. The predictions were off from the mean of 2.2 by an average of 0.37. I can quickly understand that statement. 😀
MAE為0.37。 預測值與平均值2.2相差0.37。 我可以很快理解該聲明。 😀
The RMSE was 0.48 and the R2 was 0.71.
RMSE為0.48,R2為0.71。
底線 (Bottom Line)
MAE is the simplest evaluation metric and most easily interpreted. It’s a great metric to use if you don’t want a few far off predictions to overwhelm a lot of close ones. It’s a less good choice if you want to penalize predictions that were really far off the mark.
MAE是最簡單的評估指標,最容易解釋。 如果您不希望一些遙遙無期的預測壓倒許多緊密的預測,那么這是一個很好的指標。 如果您要對確實超出預期的預測進行懲罰,則這是一個不太理想的選擇。
source: pixabay.com資料來源:foto.com包 (Wrap)
So which metric should you use? In general, I suggest you report all three! 🚀
那么您應該使用哪個指標? 通常,我建議您報告所有這三個! 🚀
R2 gives people evaluating the performance an at-a-glance understanding of how well your model performs. It’s definitely worth report it.
R2使評估性能的人員可以一目了然地了解模型的性能。 絕對值得舉報。
RMSE is less intuitive to understand, but extremely common. It penalizes really bad predictions. It also make a great loss metric for a model to optimize because it can be computed quickly. It merits reporting.
RMSE不太直觀,但非常普遍。 它懲罰了非常糟糕的預測。 因為它可以快速計算,所以它也為模型優化提供了巨大的損失指標。 它值得報告。
I came out of this article with new respect for MAE. It’s straightforward to understand and treats all prediction errors proportionately. I would emphasize it in most regression problem evaluations.
我從這篇文章中得到了對MAE的新尊重。 易于理解并按比例處理所有預測誤差。 在大多數回歸問題評估中,我都會強調這一點。
Disagree? Let me know on Twitter. 👍
不同意? 在Twitter上讓我知道。 👍
I hope you enjoyed this guide to popular Python data science packages. If you did, please share it on your favorite social media so other folks can find it, too. 😀
我希望您喜歡流行的Python數據科學軟件包的本指南。 如果您這樣做了,請在您喜歡的社交媒體上分享它,以便其他人也可以找到它。 😀
I write about Python, SQL, Docker, and other tech topics. If any of that’s of interest to you, sign up for my mailing list of awesome data science resources and read more to help you grow your skills here. 👍
我撰寫有關Python , SQL , Docker和其他技術主題的文章。 如果您有任何興趣,請注冊我的超棒數據科學資源郵件列表,并在此處內容以幫助您提高技能。 👍
source: pixabay.com資料來源:foto.comHappy reporting! ??
報告愉快! ??
翻譯自: https://towardsdatascience.com/which-evaluation-metric-should-you-use-in-machine-learning-regression-problems-20cdaef258e
機器學習 對回歸的評估
總結
以上是生活随笔為你收集整理的机器学习 对回归的评估_在机器学习回归问题中应使用哪种评估指标?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 快手作品播放次数通知怎么关闭
- 下一篇: 网友吐槽美国医院3小时拔不出一根鱼刺 急