机器学习中倒三角符号_机器学习的三角误差
機器學習中倒三角符號
By David Weinberger
大衛·溫伯格(David Weinberger)
AI Outside In is a column by PAIR’s writer-in-residence, David Weinberger, who offers his outsider perspective on key ideas in machine learning. His opinions are his own and do not necessarily reflect those of Google.
AI Outside In 是PAIR的常駐作者David Weinberger的專欄文章,他提供了有關機器學習關鍵思想的局外人觀點。 他的觀點是他自己的,不一定反映Google的觀點。
機器學習的超能力 (Machine learning’s superpower)
When we humans argue over what’s fair, sometimes it’s about principles, sometimes about consequences, and sometimes about trade-offs. But machine learning systems can bring us to think about fairness — and many other things — in terms of three interrelated factors: two ways the machine learning (ML) can go wrong, and the most basic way of adjusting the balance between these potential errors. The types of error you’ll prefer to live with depends entirely on the sort of fairness — defined mathematically — you’re aiming your ML system at. But one way or another, you have to decide.
當我們人類爭論什么是公平的時候,有時是關于原則,有時是后果,有時是權衡。 但是,機器學習系統可以使我們從三個相互關聯的因素來考慮公平性以及許多其他方面:機器學習(ML)出錯的兩種方式,以及調節這些潛在錯誤之間的平衡的最基本的方式。 您更愿意忍受的錯誤類型完全取決于以ML系統為目標的公平性(以數學方式定義)。 但是,您必須決定一種方式。
At their heart, many ML systems are classifiers. They ask: Should this photo go into the bucket of beach photos or not? Should this dark spot on a medical scan be classified as a fibrous growth or something else? Should this book go on the “Recommended for You” or “You’re Gonna Hate It” list? ML’s superpower is that it lets computers make these sorts of “decisions” based on what they’ve inferred from looking at thousands or even millions of examples that have already been reliably classified. From these examples they notice patterns that indicate which categories new inputs should be put into.
本質上,許多機器學習系統都是分類器。 他們問:這張照片是否應該放在沙灘照片的桶中? 是否應該將醫學掃描上的黑點歸類為纖維狀生長或其他? 這本書應該放在“推薦給您”還是“您討厭它”清單上? ML的超強能力是,它使計算機可以根據從數千個甚至數百萬個已經可靠分類的示例中得出的結論來做出這些“決定”。 從這些示例中,他們注意到指示新輸入應放入哪些類別的模式。
While this works better than almost anyone would expect — and a tremendous amount of research is devoted to fundamental improvements in classification algorithms — virtually every ML system that classifies inputs mis-classifies some of them. An image classifier might think that the photo of a desert is a photo of a beach. The cellphone you’re dictating into might insist that you said “Wreck a nice beach” instead of “Recognize speech.”
盡管這比幾乎任何人都預期的要好,并且大量研究致力于分類算法的根本改進,但實際上,對輸入進行分類的每個ML系統都會對其中一些進行錯誤分類。 圖像分類器可能認為沙漠的照片就是海灘的照片。 您要輸入的手機可能會堅持要求您說“ 破壞美麗的海灘 ”,而不是“識別語音”。
So, researchers and developers typically test and tune their ML systems by having them classify data that’s already been reliably tagged — the same sort of data these systems were trained on. In fact, it’s typical to hold back some of the inputs the system is being trained on so that it can test itself on data it hasn’t yet seen. Since the right classifications are known for the test inputs, the developers can quickly see how well the system has done.
因此,研究人員和開發人員通常通過讓他們對已經可靠標記的數據進行分類來測試和優化ML系統,這些數據是對這些系統進行訓練的相同類型的數據。 實際上,通常會保留一些正在接受系統訓練的輸入,以便可以對尚未看到的數據進行自我測試。 由于測試輸入已知正確的分類,因此開發人員可以快速查看系統的性能。
In this sort of basic testing, there are two ways the system can go wrong. A image classifier designed simply to identify photos of beaches might, say, put an image of the Sahara into the “Beach” bucket, or it might put an image of a beach into the “Not a Beach” bucket.
在這種基本測試中,系統有兩種可能出錯的方法。 例如,僅用于識別海灘照片的圖像分類器可能會將撒哈拉沙漠的圖像放入“海灘”桶中,或者可能將海灘的圖像放入“非海灘”桶中。
For this post’s purposes, let’s call the first “False alarms”: the ML thinks the photo of the Sahara depicts a beach.
出于這篇文章的目的,我們稱第一個“虛假警報”:ML認為撒哈拉沙漠的照片描繪的是海灘。
The second “Missed targets”: the ML failed to recognize an actual beach photo.
第二個“失蹤目標”:ML無法識別實際的海灘照片。
ML practitioners use other terms for these errors. False alarms are false positives. Missed targets are false negatives. But just about everyone finds these confusing names, even many professionals. Non-medical folk understandably can assume that positive test results are always good news. In the ML world, it’s easy to confuse the positivity of the classification with the positivity of the trait being classified. For example, ML might be used to looking at lots of metrics to assess whether a car is likely to need service soon. If a healthy car is put into the “Needs Service” bucket, it would count as a false positive even though we might think of needing service as a negative. And logically, shouldn’t a false negative be a positive? The concepts are crucial, but the terms are not not unintuitive.
ML練習者使用其他術語來表示這些錯誤。 錯誤警報是誤報 。 錯過的目標是假陰性 。 但是幾乎每個人都發現了這些令人困惑的名字,甚至很多專業人員。 可以理解的是,非醫學人士可以假定陽性測試結果始終是個好消息。 在ML世界中,很容易將分類的積極性與要分類的特征的積極性混淆。 例如,ML可能用于查看大量指標以評估汽車是否可能很快需要維修。 如果將健康的汽車放入“需要服務”類別,即使我們可能認為需要服務是負面的,也將被視為誤報。 從邏輯上講,假否定不應該是肯定的嗎? 概念很關鍵,但術語并非并非直覺。
So, let’s go with false alarms and missed targets as we talk about errors.
因此,當我們談論錯誤時,讓我們帶著錯誤的警報和錯過的目標。
深刻的后果 (Deep-reaching consequences)
Take an example that doesn’t involve machine learning, at least not yet. Let’s say you’re adjusting a body scanner at an airport security checkpoint. Those who fly often (back in the day) can attest to the fact that most of the people for whom the scanner buzzes are in fact not security threats. They get manually screened by an agent — often a pat-down — and are sent on their way. That’s not an accident or a misadjustment. The scanners are set to generate false alarms rather frequently: if there’s any doubt, the machine beeps a human over to double check.
舉一個不涉及機器學習的例子,至少現在還不涉及。 假設您要在機場安全檢查站調整人體掃描儀。 那些經常飛行的人(白天回來)可以證明,掃描儀嗡嗡作響的大多數人實際上并不是安全威脅。 他們由代理人手動篩選(通常是輕拍),并按自己的方式發送。 這不是意外或錯誤調整。 掃描儀被設置為相當頻繁地產生誤報:如果有任何疑問,機器會發出嗶嗶聲,以進行仔細檢查。
That’s a bit of a bother for the mis-classified passengers, but if the machine were set to create fewer false alarms, it potentially would miss genuine threats. So it errs on the side of false alarms, rather than missed targets.
對于錯誤分類的乘客來說,這有點麻煩,但是如果機器設置為創建更少的錯誤警報,則可能會錯過真正的威脅。 因此,它會誤報警,而不是錯過目標。
There are two things to note here. First, reducing the false alarms can increase the number of missed targets, and vice versa. Second, which is the better thing to do depends on the goal of the machine learning system. And that always depends on the context.
這里有兩件事要注意。 首先,減少錯誤警報可以增加錯過目標的數量,反之亦然。 其次,哪個更好,取決于機器學習系統的目標。 這始終取決于上下文。
For example, false alarms are not too much of a bother when the result is that more passengers get delayed for a few seconds. But if the ML is being used to recommend preventive surgery, false alarms could potentially lead people to put themselves at unnecessary risk. Having a kidney removed for no good reason is far worse than getting an unnecessary pat down. (This is obviously why a human doctor will be involved in your decision.)
例如,當更多的乘客延遲幾秒鐘時,錯誤警報就不會太麻煩。 但是,如果使用ML來推薦預防性手術,則錯誤警報可能會導致人們將自己置于不必要的風險中。 無緣無故拔除腎臟遠比不必要的輕拍要差得多。 (這顯然就是為什么人類醫生會參與您的決定。)
The consequences can reach deep. If your ML system is predicting which areas of town ought to be patrolled most closely by the police, then tolerating a high rate of false alarms may mean that local people will feel targeted for stop-and-frisk operations, potentially alienating them from the police force, which can have its own harmful consequences on a community…as well as other highly consequential outcomes.
結果可能會很深。 如果您的機器學習系統正在預測哪個城鎮應該由警察最密切地巡邏,那么容忍高誤報率可能意味著當地人會感到有目標地進行停停加急操作,從而可能使他們與警察疏遠武力,可能對社區產生自己的有害后果……以及其他后果嚴重的后果。
False alarms are possible in every system designed by humans. They can be very expensive, in whatever dimensions you’re calculating costs.
在人為設計的每個系統中,錯誤警報都是可能的。 無論您要計算成本的任何維度,它們都可能非常昂貴。
It gets no less complex when considering how many missed targets you’re going to design your ML system to accept. If you tune your airport scanner so that it generates fewer false alarms, some people who are genuine threats may be waved on through, endangering an entire airplane. On the other hand, if your ML is deciding who is worthy of being granted a loan, a false alarm — someone who is granted a loan and then defaults on it — may be more costly to the lender than the missed opportunity of turning down someone who would have repaid the loan.
考慮要設計ML系統接受多少個錯過的目標時,它的復雜度也不會降低。 如果您對機場掃描儀進行調整,使其產生更少的錯誤警報,則可能會冒出一些真正的威脅,危及整架飛機。 另一方面,如果您的ML決定誰值得獲得貸款,那么錯誤的警報(某人獲得貸款然后拖欠貸款)對放貸方而言可能比錯過了拒絕某人的機會更為昂貴。誰會償還貸款。
Now, to not miss an opportunity to be confusing when talking about ML, consider an online book store that presents each user with suggestions for the next book to buy. What should the ML be told to prefer: Adding false alarms to the list, or avoiding missed opportunities? False alarms in this case are books the ML thinks the reader will be interested in, but the reader in fact doesn’t care about. Missed opportunities are the books the readers might actually buy but the ML thinks the reader wouldn’t care about. From the store’s point of view, what’s the best adjustment of those two sliders?
現在,為避免錯過談論ML的機會,請考慮一家在線書店,該書店向每個用戶提供有關購買下一本書的建議。 應該告訴ML更喜歡什么:將錯誤警報添加到列表中,或避免錯過機會? 在這種情況下,虛假警報是ML認為讀者會感興趣的書,但實際上讀者并不在意。 錯失的機會是讀者可能實際購買的書,但ML認為讀者不會在意。 從商店的角度來看,這兩個滑塊的最佳調整是什么?
That question isn’t easy, and not just because the terms are non-intuitive for most of us. For one thing, should the buckets for books be “User Will Buy It” or, perhaps, “User Will Enjoy It”? Or maybe, “User Will Be Stretched By It”?
這個問題并不容易,不僅僅是因為這些術語對我們大多數人而言都不直觀。 一方面,書桶應該是“用戶愿意購買”還是“用戶喜歡”? 或者,“用戶會被它吸引”?
Then, for reasons external to ML, not all missed opportunities and false alarms are equal. For example, maybe your loan application ML is doing fine sorting applications into “Approve” and “Disapprove” buckets in terms of the missed opportunities and false alarms your company can tolerate. But suppose many more applications that become missed opportunities are coming from women or racial minorities. The system is performing up to specification, but that specification turns out to have unfair and unacceptable results.
然后,由于ML之外的原因,并非所有錯過的機會和錯誤警報都是相等的。 例如,就您的公司可以容忍的錯失機會和虛假警報而言,也許您的貸款申請ML正在將申請分類為“批準”和“拒絕”兩個類別。 但是,假設更多的成為錯失良機的應用來自女性或少數民族。 該系統正在執行符合規范的要求,但事實證明該規范具有不公平和不可接受的結果。
努力思考并大聲說出來 (Think hard and out loud)
Adjusting the mix of false alarms and missed opportunities brings us to the third point of the Triangle of Error: the ML confidence level.
調整錯誤警報和錯過的機會的組合,使我們進入錯誤三角的第三點:機器學習置信度。
One of the easiest ways to adjust the percentage of false alarms and missed targets is to change the threshold of confidence required to make it into the bin. (Others way including training the system on better data or adjusting its classification algorithms.) For example, suppose you’ve trained an ML system on hundreds of thousands of images that have been manually labeled as “Smiling” or “Not Smiling”. From this training, the ML has learned that a broad expanse of light patches towards the bottom of the image is highly correlated with smiles, but then there are the Clint Eastwoods whose smiles are much subtler. When the ML comes across a photo like that, it may classify it as smiling, but not as confidently as the image of the person with the broad, toothy grin.
調整錯誤警報和錯過目標的百分比的最簡單方法之一是更改將其放入垃圾箱所需的置信度閾值 。 (其他方法包括在更好的數據上訓練系統或調整其分類算法。)例如,假設您已經在成千上萬個手動標記為“微笑”或“不微笑”的圖像上訓練了機器學習系統。 從這次培訓中,機器學習人員得知,朝向圖像底部的廣闊色塊與微笑高度相關,但隨后還有克林特·伊斯特伍德(Clint Eastwoods)的微笑更加微妙。 當ML遇到這樣的照片時,它可能將其分類為微笑,但不如帶有露齒露齒笑容的人的形象那樣自信。
If you want to lower the percentage of false alarms, you can raise the confidence level required to be put into the “Smiling” bin. Let’s say that on a scale of 0 to 10, the ML gives a particular toothy grin a 9, while Clint gets a 5. If you stipulate that it takes at least a 6 to make it into the “Smile” bin, Clint won’t make the grade; he’ll become a missed target. Your “Smile” bucket will become more accurate, but your “Not Smile” bucket will have at least one more missed opportunity.
如果要降低錯誤警報的百分比,則可以提高放入“微笑”容器中所需的置信度。 假設從0到10的比例,ML給特定的露齒笑容9,而Clint則得到5。如果您規定至少需要6才能使它進入“微笑”容器,Clint不會t成績; 他會成為錯過的目標。 您的“微笑”存儲桶將變得更加準確,但是您的“不微笑”存儲桶將至少有一個錯過的機會。
Was that the right choice? That’s not something the machine can answer. It takes humans — design teams, communities, the full range of people affected by the machine learning — to decide what they want from the system, and what the trade-offs should be to best achieve that result.
那是正確的選擇嗎? 這不是機器可以回答的。 需要人類(設計團隊,社區, 受機器學習影響的所有人員)來決定他們希望從系統中獲得什么,以及應該進行哪些取舍才能最好地實現該結果。
Deciding on the trade-offs occasions difficult conversations. But perhaps one of the most useful consequences of machine learning at the social level is not only that it requires us humans to think hard and out loud about these issues, but the requisite conversations implicitly acknowledge that we can never entirely escape error. At best we can decide how to err in ways that meet our goals and that treat all as fairly as possible.
在權衡取舍時,很難進行對話。 但是,在社會層面上機器學習最有用的后果之一不僅是它要求我們人類對這些問題進行認真思考和大聲思考,而且必要的對話含蓄地承認我們永遠無法完全避免錯誤。 充其量,我們可以決定如何以符合我們目標的方式來犯錯誤,并盡可能公平地對待所有人。
翻譯自: https://medium.com/people-ai-research/machine-learnings-triangle-of-error-2c05267cb2bd
機器學習中倒三角符號
總結
以上是生活随笔為你收集整理的机器学习中倒三角符号_机器学习的三角误差的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Python中的线性回归:Sklearn
- 下一篇: 3D看片神器?华为公开全新立体投影专利: