精度,精确率,召回率_了解并记住精度和召回率
精度,精確率,召回率
Hello folks, greetings. So, maybe you are thinking what’s so hard in precision and recall? Why just another article on this topic?
大家好,問候。 因此,也許您在思考精確度和召回率有何困難? 為什么只是關于該主題的另一篇文章?
I recommend reading this article with patience and a note and pencil in hand. Also, concentrate… Reread the same lines if needed.
我建議您耐心閱讀本文,并準備好筆記和鉛筆。 另外,集中精力……如果需要,請重讀相同的內容。
I have hard time in remembering things. I tend to forget things that I haven’t used for a while. I tend to forget the FORMULAS of Precision and Recall over time.
我很難記住事情。 我傾向于忘記我已經一段時間沒有使用過的東西了。 隨著時間的流逝,我傾向于忘記“精確度”和“召回率”的公式。
BUT, I have a tendency to remake things up in my mind. In the high school, I was having hard time cramming things up. I couldn’t remember formulas for a long period. So, what I did was, understanding them in natural language (for ex: English). And then, during my exams, I would simply recreate the formula from my understanding. Such an ability also allowed me, at times, to invent new formulas. Actually, that wasn’t any kind of invention but it was specialization. But then, I was a kid at that time, right!! So, let’s keep that “invention” ;)
但是,我傾向于重新構想。 在高中時,我很難把東西塞滿。 我很久都不記得公式了。 所以,我所做的就是用自然語言理解它們(例如:英語)。 然后,在考試期間,我將根據自己的理解簡單地重新創建公式。 這種能力有時也使我能夠發明新的公式。 實際上,這不是任何一種發明,而是專業化的。 但是那時候我還是個孩子,對吧! 因此,讓我們保持“發明”;)
Jobs in Machine Learning機器學習工作Now, you might be thinking that “I am not here to hear your story”. But I am here to make you hear my story XD. Just Kidding! Let’s start..
現在,您可能會想“我不是來這里聽聽您的故事”。 但是我在這里是為了讓您聽到我的XD故事。 開玩笑! 開始吧..
So, let’s understand Precision and Recall in an intuitive manner. And then, you won’t need to Google up every time what they mean and how are they formulated.
因此,讓我們以直觀的方式了解“精確度”和“調用率”。 然后,您不必每次都了解Google的含義和方式時就使用Google。
Mostly, you might be aware to the terms TP, FP, TN and FN. But I have habit of explaining thoroughly. So, maybe you should skip that section if you know it.
通常,您可能知道術語TP,FP,TN和FN。 但是我有徹底解釋的習慣。 因此,如果您知道的話,也許應該跳過該部分。
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
TP,FP,TN和FN (TP, FP, TN and FN)
Assume that you are performing a classification task. Let us keep it very simple. Suppose you are performing a single label image classification. This means that, the image belongs to one and only one of the given classes. Also, let’s make it simpler. Consider that there is only one class.
假設您正在執行分類任務。 讓我們保持非常簡單。 假設您正在執行單個標簽圖像分類。 這意味著,圖像屬于給定類別中的一個,并且僅屬于其中一個類別。 另外,讓我們簡化一下。 考慮到只有一類。
Now, if you don’t know the difference between single label and multi label classification, just google a bit.
現在,如果您不知道單標簽分類和多標簽分類之間的區別,只需谷歌一下。
So, you are now performing binary image classification. For example, the task of whether an image contains a dog or not, belongs to this category.
因此,您現在正在執行二進制 圖像分類。 例如,圖像是否包含狗的任務屬于此類別。
So, there are two target labels depending on if the predicted value is 1 or 0: dog and not dog. Consider being a dog as “positive” (1) and not being a dog as “negative” (0). In short, define positive as one of the two classes and negative as the other (leftover) class.
因此,根據預測值是1還是0,有兩個目標標簽:狗而不是狗。 考慮將狗視為“陽性”(1),而不將狗視為“陰性”(0)。 簡而言之,將正定義為兩個類之一,將負定義為另一個(剩余)類。
Now, you input an image to the model and the model predicts that the image is of a dog. This means that the model is “positive” that there is a dog. Now, the case is, the image isn’t actually of a dog. The image is of a person and not of a dog. Hence, the output of model is wrong. Wrong means “false”. This is an example to false positive.
現在,您將圖像輸入模型,模型將預測該圖像是狗的。 這意味著該模型“肯定”有一只狗。 現在,情況是,圖像實際上不是狗的。 該圖像是一個人而不是一條狗。 因此,模型的輸出是錯誤的。 錯誤的意思是“假”。 這是誤報的一個例子。
Suppose, that image actually contained a dog. Then, the model was correct. Correct means “true”. Now this became an example to true positive.
假設該圖像實際上包含一只狗。 然后,該模型是正確的。 正確表示“正確”。 現在,這已成為一個真正正面的例子。
So, true positive means that the model is positive and is correct. And false positive means that the model is positive but is wrong/incorrect.
因此,真正的肯定意味著模型是正確的并且是正確的。 誤報是指模型為正,但錯誤/不正確。
Same goes for true negative and false negative. If the model predicts that there is no dog (i.e. negative) but, actually there is a dog, then the model is wrong. This becomes a case of false negative. Similarly, if the model predicted that there is no dog and the image actually doesn’t contain a dog, then the model is correct. This is a case of true negative.
真否定和假否定也一樣。 如果模型預測沒有狗(即陰性),但實際上有狗,則該模型是錯誤的。 這成為假陰性的情況。 同樣,如果模型預測沒有狗,并且圖像實際上不包含狗,則該模型是正確的。 這是真正的消極情況 。
So, you guys got an idea of these terms. Let’s extend this for the whole training data instead of a single image. Suppose, you are classifying 100 images. The model classified 70 images correctly and 30 images incorrectly. Kudos! You now have a 70% accurate model.
因此,你們對這些術語有所了解。 讓我們將其擴展到整個訓練數據而不是單個圖像。 假設您要分類100張圖像。 該模型正確分類了70張圖像,錯誤地分類了30張圖像。 榮譽! 您現在擁有70%的準確模型。
Now, let’s focus on the correct images, i.e. TRUE classifications. Suppose, 20 of the 70 correctly classified images were not of dog, i.e. they were NEGATIVES. In this case, the value of TRUE NEGATIVES is 20. And hence, the value of TRUE POSITIVES is 50.
現在,讓我們關注正確的圖像,即TRUE分類。 假設正確分類的70張圖像中有20張不是狗的,即它們是負片 。 在這種情況下, TRUE NEGATIVES的值為20。因此, TRUE POSITIVES的值為50。
熱門AI文章: (Trending AI Articles:)
1. Machine Learning Concepts Every Data Scientist Should Know
1.每個數據科學家都應該知道的機器學習概念
2. AI for CFD: byteLAKE’s approach (part3)
2. CFD的人工智能:byteLAKE的方法(第3部分)
3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data
3. AI失敗:要普及和擴展聊天機器人,我們需要更好的數據
4. Top 5 Jupyter Widgets to boost your productivity!
4.前5個Jupyter小部件可提高您的生產力!
Now, consider the case of incorrectly classified images, i.e. FALSE classifications. Suppose, 10 images out of the 30 incorrectly classified images are of dogs i.e. POSITIVE. Then the value of FALSE POSITIVES became 10. Similarly, the value of FALSE NEGATIVES becomes 20.
現在,考慮圖像分類錯誤的情況,即FALSE分類。 假設在30個錯誤分類的圖像中,有10個圖像是狗,即POSITIVE 。 然后, FALSE POSITIVES的值變為10。類似地, FALSE NEGATIVES的值變為20。
Now, let’s add up. TP + FP + TN + FN = 50 + 20 + 20 + 10 = 100 = size of training data.
現在,讓我們加起來。 TP + FP + TN + FN = 50 + 20 + 20 + 10 = 100 =訓練數據的大小
Remember: Positive/Negative refers to the prediction made by the model. And True/False refers to the evaluation of that prediction i.e. if the prediction made is correct (true) or incorrect (false).
請記住:正/負是指模型所做的預測。 正確/錯誤是指對該預測的評估,即,做出的預測是正確的(是)還是錯誤的(是)。
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —-
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
So, now that you have understood these terms, let’s shift on to precision and recall. If you have read other articles in past, you might be thinking, what about the confusion matrix? Are you going to skip it? Maybe yes?! Maybe not! See, confusion matrix are, too confusing. The only reason they are needed, or the only reason they are included as a part of precision-recall articles, is that they help with the formulation of precision and recall.
因此,既然您已經理解了這些術語,那么讓我們繼續進行精確度和召回率。 如果您以前讀過其他文章,您可能會想,那混淆矩陣會如何? 你要跳過嗎? 也許是的?! 也許不會! 看到,混淆矩陣太混亂了。 需要它們的唯一原因,或將它們包含在精確召回文章中的唯一原因是,它們有助于制定精確性和召回率。
And as I said earlier, I am too bad at remembering formulas. So, let’s just invent (create) them.
正如我之前所說,我很難記住公式。 因此,讓我們發明(創建)它們。
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —-
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
介紹 (Introduction)
What does precision mean to you? Actually, the term precision is context dependent. Like, it depends on the task that you are performing. Whether you are solving a math problem, or you are performing image classification, or you are performing object detection, the term precision has different meanings in all the contexts. The current object detection metrics are just stumbled ones. They still use the same formula and then use additional calculations on precision and recall. No comments on that part. I ain’t a researcher, and hence, I can’t comment on how they calculate metrics.
精度對您意味著什么? 實際上,術語“精度”取決于上下文。 就像,它取決于您正在執行的任務。 無論您是要解決數學問題,還是要進行圖像分類,還是要進行對象檢測,術語“精度”在所有情況下都有不同的含義。 當前的對象檢測指標只是偶然發現的指標。 他們仍然使用相同的公式,然后對精度和召回率使用其他計算。 該部分暫無評論。 我不是研究人員,因此,我無法評論他們如何計算指標。
For those who don’t know what do I mean by metrics, go Google it..
對于那些不了解指標的含義的人,請使用Google。
So, for now, let’s understand the meaning of precision in the mostly used formulation. Precision just calculates how precise your model is. Above I mentioned that your model is 70% accurate. But can you answer how precise it was? No..
因此,現在,讓我們了解常用的公式中精度的含義。 精度只是計算模型的精度。 上面我提到您的模型是70%準確的。 但是你能回答它有多精確嗎? 沒有..
Accuracy, here, means the percentage of images correctly classified by the model. So, what does precision mean?
此處的準確性是指模型正確分類的圖像的百分比。 那么,精度是什么意思呢?
The thing is, as I said, the concept of precision is context dependent. But, you are lucky enough that for evaluating ML models, the concept remains same throughout. But then, to understanding precision in an intuitive manner, you will need to understand at first “why do you need precision”?
正如我所說的,精度的概念取決于上下文。 但是,您很幸運,對于評估ML模型,這一概念始終保持不變。 但是,為了以直觀的方式理解精度,您首先需要理解“為什么需要精度”?
Actually, you might have read several articles online related to what is precision and recall. But none of them clearly mentions why do you need them. Yes, there are separate articles covering this topic. I will share a reference at the end. But let me try to explain you here about why do you need precision and recall. I believe in completeness ;)
實際上,您可能已經在線閱讀了幾篇有關精度和召回率的文章。 但是他們都沒有清楚提到您為什么需要它們。 是的,有單獨的文章涉及此主題。 我將在最后分享參考。 但是,讓我在這里向您解釋為什么您需要精確度和召回率。 我相信完整性;)
Actually, the reference that I am going to share is quite good. Also, it lists the formulas of precision and recall. But, does it make you understand the formulas? No.. it just mentions that this is the formula and you have will to cram it. So, stay focused here XD
實際上,我要分享的參考文獻非常好。 此外,它還列出了精確度和召回率的公式。 但是,這是否使您理解公式? 不。它只是提到這是公式,您必須填寫它。 因此,請保持專注于XD
為什么需要召回? (Why do you need recall?)
“Are you serious? What about Precision? You just skipped all the things related to precision and directly jumped on to recall..” Yeah, I hear you. But, just wait and watch -_-.
“你是認真的嗎? 精度呢? 您只是跳過了所有與精度有關的事情,直接跳回去?!笔堑?#xff0c;我聽到了。 但是,請稍等一下-_-。
The real question is, “I have accuracy. My model is 99% accurate. Why do I still need recall?”. Now, this depends on the task you perform. If you are classifying whether the image is of a dog (positive class) or not of a dog (negative class), then accuracy is all you need. But, if you are classifying whether the person is infected by COVID-19 (-\_/-) or not, then you will need something else than accuracy. Let’s understand this with an example.
真正的問題是:“我有準確性。 我的模型是99%準確的。 為什么我仍然需要召回?”。 現在,這取決于您執行的任務。 如果要對圖像是狗(正類)還是狗(負類)進行分類,則只需要準確性。 但是,如果您要對人員是否感染了COVID-19(-\ _ //-)進行分類,那么您將需要除準確性之外的其他功能。 讓我們通過一個例子來理解這一點。
Suppose, you have 100 images to classify and task is to predict if it is a dog or not. Now, the model classified 9 images as positive and 91 images as negative.
假設您有100張圖像要分類,任務是預測它是否是狗。 現在,該模型將9張圖像分類為正,將91張圖像分類為負。
Suppose, the values of TP, FP, TN and FN are 9, 0, 90, 1 respectively.
假設TP,FP,TN和FN的值分別為9、0、90、1。
Note that TP + FP = Positives = 9 and TN + FN = Negatives = 91.
注意,T P + F P = P正數= 9,T N + F N = N正數= 91。
That means, the model correctly classified 99 images out of 100. Note that correct implies true and trues = TP + TN = 9 + 90 = 99. That is, 99% accuracy.
這意味著,該模型正確地將100張圖像中的99張分類了。注意,正確意味著true和trues = T P + T N = 9 + 90 =99。也就是說,精度為99%。
Here, the model miss-classified 1 image. Maybe, because it didn’t learn the features properly or maybe there’s some another reason like unbalanced dataset or something. But the thing to note is, that the model did miss-classify 1 image.
在這里,模型未分類1張圖片。 可能是因為它沒有正確學習功能,或者還有其他原因,例如數據集不平衡或其他原因。 但是要注意的是,該模型確實對1張圖片進行了誤分類。
If you don’t know what an unbalanced dataset means, and how can an unbalanced dataset cause such issues, Google it. Also, refer the references I share at the end.
如果您不知道不平衡的數據集的含義,以及不平衡的數據集如何導致此類問題,請使用Google。 另外,請參考我最后分享的參考資料。
You can do 99 things for someone and all they’ll remember is the one thing you didn’t do.
您可以為某人做99件事,而他們會記住的只是您沒有做的一件事。
Remember the quote? Yes.. and we are going to do the same with our model. We are going to look at that 1 miss-classified image. Consider the task now. If we miss-classify an image as not a dog, how will it impact the users? It won’t, right? Or maybe just a little. Now, suppose the task was classifying if the image captured using CCTV in a small town contained a lion or not. And if there was a lion, alert all the citizens of town to be aware and hide themselves. Now, if the model miss-classified an image of lion, then it would have a huge impact on citizens.
還記得報價嗎? 是的..我們將對我們的模型做同樣的事情。 我們將看那張未分類的圖像。 現在考慮任務。 如果我們將圖像誤分類為不是狗,它將對用戶產生什么影響? 不會吧? 也許只是一點點。 現在,假設任務是對在一個小鎮中使用CCTV捕獲的圖像是否包含獅子進行分類。 如果有一頭獅子,請警惕鎮上所有市民,并注意躲起來。 現在,如果該模型對獅子的圖像進行了錯誤分類,那么它將對市民產生巨大影響。
Consider a more serious task. Classifying if the person is infected by COVID-19 or not. If he/she is affected, alert the emergency staff and quarantine him/her. What if that infected person is not quarantined? The virus would spread, right? The impact here of wrong/false classification is huge. Hence, even if the model is 99% accurate and it only miss-classified 1% of data, we will still tell the model that you made a mistake and ask it to improve.
考慮一個更嚴肅的任務。 分類此人是否被COVID-19感染。 如果他/她受到影響,請提醒緊急工作人員并隔離他/她。 如果該感染者沒有被隔離怎么辦? 病毒會傳播,對嗎? 錯誤/錯誤分類的影響很大。 因此,即使模型的準確性為99%,并且僅誤分類了1%的數據,我們仍然會告訴模型您犯了一個錯誤,并要求對其進行改進。
Hence, we need something more than accuracy. And that metric is called recall. Now, in order to know how recall helps here, we will need to understand what is recall.
因此,我們需要的不僅僅是準確性。 該指標稱為召回率。 現在,為了知道召回如何在這里有所幫助,我們需要了解什么是召回。
Remember.. You haven’t yet understood Precision. I skipped that part :(
記住..您尚未了解Precision。 我跳過了那部分:(
召回 (Recall)
What do you mean by recall in simple terms? Forget about AI/ML etc. What you mean by “I am trying to recall but I can’t”? Or “let me try to recall what happened”. Does “recall” equals “think”? No.. it’s “remember”. Actually, recall and remember, these two words have a slight difference in their meaning but are mostly the same. In both of the above two sentences, you can replace recall with remember and it would work fine.
簡單地回憶一下是什么意思? 忘記AI / ML等?!拔蚁牖叵氲荒芑叵搿笔鞘裁匆馑?#xff1f; 或“讓我嘗試回憶發生的事情”。 “召回”等于“思考”嗎? 不,這是“記住”。 實際上,請記住和記住,這兩個詞的含義略有不同,但基本相同。 在以上兩個句子中,您都可以將“ recall”替換為“ remember”,它會很好地工作。
So, recall = remember.
因此,回憶=記住。
The thing here is, our model needs to recall if the features of a person indicates that he/she COVID-19 positive. Our model needs to remember the features of COVID-19 positive class such that it does not miss-classify a COVID-19 positive case as negative.
這里的事情是,如果一個人的特征表明他/她的COVID-19陽性,我們的模型就需要回憶。 我們的模型需要記住COVID-19陽性案例的特征,以使其不會將COVID-19陽性案例誤分類為陰性。
Recall can then be defined as, the number of positive classes correctly classified (remembered/recalled) by the model divided by total number of positive classes. Suppose, there are 50 positive classes in the dataset. Now, on running predictions on this dataset, model only predicts correctly 20 positive classes. This means that the model is only able to correctly remember 20 positive classes out of 50. And hence, the recall is 40%. (20/50 = 0.4)
然后可以將召回定義為通過模型正確分類(記憶/調用)的陽性類別數除以陽性類別總數 。 假設數據集中有50個陽性類別。 現在,在此數據集上運行預測時,模型只能正確預測 20個陽性類別。 這意味著該模型只能正確記住50個類別中的20個陽性類別。因此,召回率為40%。 (20/50 = 0.4)
Such a model predicting COVID-19 positive cases won’t work. Because, it is marking 60% COVID-19 positive cases as negative. And this number (60%) is too high to ignore.
這樣的模型無法預測COVID-19陽性病例。 因為,它將60%的COVID-19陽性病例標記為陰性。 而且這個數字(60%)太高了,無法忽略。
So, recall = number of positive classes correctly predicted by the model / total number of positive classes.
因此, 召回率=由模型正確預測的陽性類別數/陽性類別總數。
The number of classes correctly (true) classified as positive equals TP. The total number of positive classes in the dataset equals TP + FN. Because, FN means that the model said “negative” and the model is “wrong”. Hence, it was actually “positive”.
正確地分類為正的類數(正確)等于TP。 數據集中陽性類別的總數等于TP + FN。 因為,FN表示模型表示“負”,而模型表示“錯誤”。 因此,它實際上是“積極的”。
That means, the invented formula is:recall = TP / (TP + FN)
這意味著,發明的公式為: 召回率= TP /(TP + FN)
Hence, “How is the recall of the model?” will simply answer the question “How many of the total positive datapoints (images) are correctly remembered by the model?”
因此,“如何召回該模型?” 只會回答“模型正確記住了多少個陽性數據點 (圖像)?”的問題。
Total positive datapoints = TP + FN
總陽性數據點= TP + FN
Because, TP = Model predicts that the datapoint is positive and the model is correct i.e. datapoint is indeed positive.
因為,TP = Model預測數據點為正,而模型正確,即數據點確實為正。
And, FN = Model predicts that the datapoint is negative and the model is wrong here i.e. datapoint is positive.
并且,FN = Model預測數據點為負,此處模型錯誤,即數據點為正。
Also, datapoints correctly remembered by the model = TP + TNThat is, positive datapoints correctly remembered by the model = TP
此外,模型正確記住的數據點= TP + TN也就是說,模型正確記住的正數據點= TP
Finally, recall = positive datapoints correctly remembered / total positive datapoints = TP / (TP + FN)
最后, 召回=正確記住的陽性數據點/總陽性數據點= TP /(TP + FN)
So, remember that recall answers the question — How many of the total positive datapoints did the model correctly remember? Or, How well does the model recall positive datapoints?
因此,請記住,召回回答了這個問題-模型正確地記住了多少個陽性數據點? 或者,該模型對正數據點的回憶程度如何?
Wait.. what about TN and FP? Also, I have wrote “predicts correctly positive classes” all the time. So, what about the other cases? Like the case here “predicts incorrectly negative classes” i.e. classifying a person who is not infected with COVID-19 as positive. This became an example of FP. The model said that the person is infected but he/she isn’t. Now, does that matter? How much does it impact to quarantine a person who is not infected? A little yes? So, we can ignore it. Also, TN should be ignored as the prediction is true (correct).
等待..那TN和FP呢? 另外,我一直都在寫“正確預測積極的課堂”。 那么,其他情況呢? 像這里的情況一樣,“預測錯誤地歸為陰性類別”,即將未感染COVID-19的人歸為陽性。 這成為FP的一個例子。 模特說這個人被感染了,但他/她沒有被感染。 現在,這有關系嗎? 隔離未被感染的人有多大影響? 有點嗎? 因此,我們可以忽略它。 另外,由于預測為真(正確),因此應忽略TN。
為什么需要精度? (Why do you need precision?)
I said that, if a person who isn’t infected with COVID-19 is predicted as infected (positive), then it does not matter. And you blindly believed me!
我說過,如果未感染COVID-19的人被預測為感染(陽性),那就沒關系了。 而你卻盲目相信我!
But but but.. What if you are living in North Korea? You will be shot dead if you are detected positive. “What the hell…. That’s a high impact. You can’t just ignore this. I want to live man!!” Yeah.. I hear these words too. So, that’s the reason you need precision.
但是,但是。。。如果你住在朝鮮怎么辦? 如果檢測到陽性,您將被槍殺。 “我勒個去…。 影響很大。 您不能只是忽略這一點。 我想活下去!!” 是的..我也聽到了這些話。 因此,這就是您需要精度的原因。
There’s another reason too. What if I simply ask the model to classify all the images as positive? In this case, TP = x, FP = 100 - x (if size of dataset is 100), TN = 0 and FN = 0. Recall in this case would be, recall = 1 i.e. 100%.
還有另一個原因。 如果我只是簡單地要求模型將所有圖像分類為陽性,該怎么辦? 在這種情況下,TP = x,FP = 100-x(如果數據集的大小為100),TN = 0和FN =0。在這種情況下,召回率為:召回= 1,即100%。
What the heck!!! This means that, we will shoot every human in North Korea as the model will classify all the citizens of North Korea as COVID-19 Positive and also, we trust the model because recall is 100%. Like seriously!!!
有沒有搞錯!!! 這意味著,我們將射擊朝鮮的每一個人類,因為該模型會將朝鮮的所有公民歸類為COVID-19正面,而且我們信任該模型,因為召回率是100%。 喜歡認真!!!
That is one other reason why we need precision.
這是我們需要精度的另一個原因。
The things went in this order:1. Only accuracy won’t work in certain tasks2. We need recall3. Only recall won’t work4. We need precision along with recall
事情按以下順序進行:1。 在某些任務中,只有準確性不起作用2。我們需要召回3。 只有回憶不起作用4。 我們需要精確度和召回率
精確 (Precision)
Ahh.. now you know why I skipped precision. But remember, I have also skipped confusion matrix as it was too confusing.
啊..現在你知道為什么我跳過了精度。 但請記住,我也跳過混淆矩陣,因為它 太混亂。
At this stage, you should already know that precision will have something to do with FP. If you haven’t guessed this, go re-read the above two sections.
在此階段,您應該已經知道精度將與FP有關。 如果您還沒猜到,請重新閱讀以上兩節。
Consider the last example where model was simply classifying all the citizens as COVID-19 positive. In this case, though the recall of model is high (100%), the precision of model is very low. Hence, as with other topics in Machine Learning, here too, there is a trade-off. Just like bias-variance trade-off, there is precision-recall trade-off.
考慮最后一個示例,其中模型只是將所有市民歸類為COVID-19陽性。 在這種情況下,盡管模型的召回率很高(100%),但是模型的精度卻很低。 因此,與機器學習中的其他主題一樣,這也是一個折衷。 就像偏差方差折衷一樣,也存在精確調用折衷。
After reading this article, I need you to prove mathematically about why there’s a trade-off in precision and recall. And yeah.. Google a bit too. If you are successful, then leave a comment here of the method you used to prove this.
閱讀本文之后,我需要您用數學方式證明為什么在精度和召回率之間需要權衡取舍。 是的.. Google也有點。 如果成功,請在此處留下您用來證明這一點的方法的評論。
So, we need the model to also take care of “not miss-classifying negative samples” i.e. not marking an uninfected (negative) person as infected (positive).
因此,我們需要該模型還要注意“不要對陰性樣本進行誤分類”,即不要將未感染(陰性)的人標記為感染(陽性)。
We can do this by defining precision as the number of correct positive cases divided by the number of predicted positive cases. For example, if the number of positive cases in the dataset is 50 and the model predicts that the number of positive cases is 80. Now, out of these 80 cases, only 20 predicitons are correct and other 60 are incorrect. That means, 20 cases are predicted positive and correct i.e. TP = 20. And 60 cases are predicted positive but are incorrect i.e. FP = 60.
我們可以通過將精度定義為正確的陽性病例數除以預測的陽性病例數來做到這一點。 例如,如果數據集中的陽性病例數為50,并且模型預測陽性病例數為80?,F在,在這80個病例中,只有20個正確的謂詞,其他60個不正確。 也就是說,有20例被預測為陽性且正確,即TP =20。而有60例被預測為陽性但不正確,即FP = 60。
As you can see, the model is not at all precise. The model says that 80 cases are positive out of which only 20 cases are actually positive. Here, precision = 20/80 = 25% .
如您所見,模型一點也不精確。 該模型說,有80例是陽性的,而實際上只有20例是陽性的。 在此,精度= 20/80 = 25%。
We simply formulated precision above. Precision = TP / (TP + FP)
我們只是在上面簡單地制定了精度。 精度= TP /(TP + FP)
Understanding this in an intuitive way, “How precise your model is?” answers the question “How many datapoints are actually positive out of the total number of predicted positive datapoints?”
以直觀的方式理解這一點,“您的模型有多精確?” 回答問題“預測的陽性數據點總數中實際上有多少個陽性數據點?”
So, remember that precision answers the question — How many of the claimed (predicted) positive datapoints are actually positive? Or, How precise is the model in predicting positive datapoints?
因此,請記住,精度回答了這個問題-多少個聲稱的(預測的)正數據點實際上是正的? 或者,該模型在預測正面數據點時有多精確?
結論 (Conclusion)
Both the definitions of precision and recall matches their meaning in English.
精度和召回率的定義都與英語中的含義相匹配。
Like, how many positive datapoints (out of the total number of positive datapoints) does the model remember? — Recall
像該模型記住多少個陽性數據點(在陽性數據點總數中)? —召回
And, how many (of the total predicted positive datapoints) are actually positive? — Precision
而且,實際上有多少(在總預測的積極數據點中)是積極的? —精度
If you just understand what these two questions mean, you can then rebuild the formulas whenever you need them. If you don’t understand these questions clearly, try to translate them in your local language (mine is Gujarati) and you will be able to understand it.
如果您僅了解這兩個問題的含義,則可以在需要時重新構建公式。 如果您不清楚地理解這些問題,請嘗試以您的本地語言翻譯(我的語言是古吉拉特語),您將能夠理解。
Wait wait.. is it going to end? What about the confusion matrix?
等等等等..它會結束嗎? 那混亂矩陣呢?
Confusion matrix is just used in order to visualize all these things and help you cram the formulas. I won’t cover it! But yes, I will help you cram formulas using confusion matrix here.
混淆矩陣僅用于可視化所有這些內容并幫助您填充公式。 我不會報道! 但是,是的,我會在這里幫助您使用混淆矩陣填充公式。
Here is an image that will help you cram the denominators of precision and recall formulas. The numerator being same is TP.
這是一張可以幫助您填充精度和召回公式分母的圖像。 分子相同是TP。
Cancer being the positive class instead of COVID-19癌癥是陽性類別,而不是COVID-19What more? Nothing.. Maybe what I have written is too confusing. Maybe it is not. I don’t know. Just leave your comments, bad or good, so that I can know.
還有什么? 沒什么..也許我寫的太混亂了。 也許不是。 我不知道。 請留下您的評論,好壞,以便我知道。
But yeahh.. something more to do by yourself. Go read about F1 score and why you need it? — short answer — because of the trade-off between precision and recall. How would you select a model? Based on precision? Or based on recall? The answer is F1 score. Go read it..
但是,是的。.還有更多事情要做。 去閱讀有關F1分數的信息,為什么需要它? —簡短的答案—因為要在精度和召回率之間進行權衡。 您將如何選擇型號? 基于精度? 還是基于召回? 答案是F1分數。 去看吧..
The reference I promised is here:- https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
我答應過的參考資料在這里: -https : //towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
What more? Read about ROC curve, mAP and AR. Or wait for me to post about it.. Bye!
還有什么? 了解ROC曲線,mAP和AR。 或等待我發布相關信息。再見!
別忘了給我們您的👏! (Don’t forget to give us your 👏 !)
翻譯自: https://becominghuman.ai/understanding-and-remembering-precision-and-recall-e3261a1f487c
精度,精確率,召回率
總結
以上是生活随笔為你收集整理的精度,精确率,召回率_了解并记住精度和召回率的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: regex 正则表达式_使用正则表达式(
- 下一篇: 苹果xs max双卡双待怎么用