Huber loss--转
原文地址:https://en.wikipedia.org/wiki/Huber_loss
In?statistics, the?Huber loss?is a?loss function?used in?robust regression, that is less sensitive to?outliers?in data than the?squared error loss. A variant for classification is also sometimes used.
Definition
Huber loss (green,?{\displaystyle \delta =1}) and squared error loss (blue) as a function of?{\displaystyle y-f(x)} The Huber loss function describes the penalty incurred by an?estimation procedure?f.?Huber?(1964) defines the loss function piecewise by[1]
This function is quadratic for small values of?a, and linear for large values, with equal values and slopes of the different sections at the two points where?{\displaystyle |a|=\delta }. The variable?a?often refers to the residuals, that is to the difference between the observed and predicted values?{\displaystyle a=y-f(x)}, so the former can be expanded to[2]
Motivation
Two very commonly used loss functions are the?squared loss,?{\displaystyle L(a)=a^{2}}, and the?absolute loss,?{\displaystyle L(a)=|a|}. The squared loss function results in an?arithmetic mean-unbiased estimator, and the absolute-value loss function results in a?median-unbiased estimator (in the one-dimensional case, and a?geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of?{\displaystyle a}'s (as in?{\textstyle \sum _{i=1}^{n}L(a_{i})}), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of?estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.
As defined above, the Huber loss function is?convex?in a uniform neighborhood of its minimum?{\displaystyle a=0}, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points?{\displaystyle a=-\delta }?and?{\displaystyle a=\delta }. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).
Pseudo-Huber loss function
The?Pseudo-Huber loss function?can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as[3][4]
As such, this function approximates?{\displaystyle a^{2}/2}?for small values of?{\displaystyle a}, and approximates a straight line with slope?{\displaystyle \delta }?for large values of?{\displaystyle a}.
While the above is the most common form, other smooth approximations of the Huber loss function also exist.[5]
Variant for classification
For?classification?purposes, a variant of the Huber loss called?modified Huber?is sometimes used. Given a prediction?{\displaystyle f(x)}?(a real-valued classifier score) and a true?binary?class label?{\displaystyle y\in \{+1,-1\}}, the modified Huber loss is defined as[6]
The term?{\displaystyle \max(0,1-y\,f(x))}?is the?hinge loss?used by?support vector machines; the?quadratically smoothed hinge loss?is a generalization of?{\displaystyle L}.[6]
Applications
The Huber loss function is used in?robust statistics,?M-estimation?and?additive modelling.[7]
See also
- Winsorizing
- Robust regression
- M-estimator
- Visual comparison of different M-estimators
References
?
轉載于:https://www.cnblogs.com/davidwang456/articles/5586178.html
總結
以上是生活随笔為你收集整理的Huber loss--转的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 最小二乘拟合,L1、L2正则化约束--转
- 下一篇: Fisher 线性分类器--转