U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》
when calculating
UCFU_{CF}UCF?(e,N)
CF: Confidence Level(here is 25%)
e:misclassifying counts of current subtree we focus on
N:counts of sub-datasets relevant to current subtree who is under judgment whether to be pruned or not.
“Pr” is short for “Pessimistic error rate”.
UCFU_{CF}UCF?(e,N)=Pr=e+0.5+Coeff22+Coeff2?{(e+0.5)[1?e+0.5N]+Coeff24}N+Coeff2①\frac{e+0.5+ \frac{Coeff^2}{2}+ \sqrt{ Coeff^2·\{ (e+0.5)[1- \frac{e+0.5}{N} ]+ \frac{Coeff^2}{4} \} } } {N+Coeff^2}①N+Coeff2e+0.5+2Coeff2?+Coeff2?{(e+0.5)[1?Ne+0.5?]+4Coeff2?}??①
The method to get Coeff\ Coeff?Coeff is as follows:
Coeff?Deviation[i?1]Deviation[i]?Deviation[i?1]=ConfidenceLevel?val[i?1]val[i]?val[i?1]②\frac{Coeff-Deviation[i-1]}{Deviation[i]-Deviation[i-1]}=\frac{Confidence \ Level-val[i-1]}{val[i]-val[i-1]}②Deviation[i]?Deviation[i?1]Coeff?Deviation[i?1]?=val[i]?val[i?1]Confidence?Level?val[i?1]?②
Now Let’s analyse the meaning of ② and how to get “i” in formula ②.
What is Deviation and Val?
Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},
Deviation[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};
for Normal Distribution:
P{X>Deviation[i]}=val[i],
for example:
P{X>2.58}=0.005
--------------------
Now let’s analyse the whole meaning of formula ②
the red line is P(X>x)=∫x+∞12πe?x22dxP(X>x)=\int_x^{+∞}\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dxP(X>x)=∫x+∞?2π?1?e?2x2?dx
the blue line is f(x)=12πe?x22f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}f(x)=2π?1?e?2x2?
and the two points are:
(deviation[i],val[i])=(0.25,0.4)
(deviation[i-1,val[i-1])=(0.84,0.2)
Let’s amplify the above figure to the following picture:
so tan C=Val[i]?Val[i?1]Deviation[i?1]?Deviation[i]=ConfidenceLevel?Val[i?1]Deviation[i?1]?Coeff\frac{Val[i]-Val[i-1]}{Deviation[i-1]-Deviation[i]}=\frac{Confidence\ Level-Val[i-1]}{Deviation[i-1]-Coeff}Deviation[i?1]?Deviation[i]Val[i]?Val[i?1]?=Deviation[i?1]?CoeffConfidence?Level?Val[i?1]?
then we can get ②and compute CoeffCoeffCoeff and Coeff2Coeff^2Coeff2
but how to get “i” in above formula?
when Val[i?1]<Confidencelevel≤Val[i]Val[i-1]<Confidence \ level≤Val[i]Val[i?1]<Confidence?level≤Val[i]
is satisfied,
we get “i”.
when Confidence Level=0.25
i=7
val[i-1]=0.2
val[i]=0.4
deviation[i-1]=0.84
deviation[i]=0.25
substitute the above values in ②
we get Coeff=0.6925
then,
Coeff2=0.479556Coeff^2=0.479556Coeff2=0.479556
--------------------
**Now Let’s calculate **
U0.25(1,16)U_{0.25}(1,16)U0.25?(1,16)
CF=0.25
e=1
N=16
Coeff2=0.479556Coeff^2=0.479556Coeff2=0.479556
We’ll get :
Pr=
1+0.5+0.4795562+0.479556?{(1+0.5)[1?1+0.516]+0.4795564}16+0.479556\frac{1+0.5+ \frac{0.479556}{2}+ \sqrt{ 0.479556·\{ (1+0.5)[1- \frac{1+0.5}{16} ]+ \frac{0.479556}{4} \} } } {16+0.479556}16+0.4795561+0.5+20.479556?+0.479556?{(1+0.5)[1?161+0.5?]+40.479556?}??=0.156681
N·Pr=16×0.156681=2.507
Compared with Quinlan’s book page 42th:
Now Let’s calculate
U0.25(1,168)U_{0.25}(1,168)U0.25?(1,168)
CF=0.25
e=1
N=168
Pr=1+0.5+0.4795562+0.479556?{(1+0.5)[1?1+0.5168]+0.4795564}168+0.479556=0.015536\frac{1+0.5+ \frac{0.479556}{2}+ \sqrt{ 0.479556·\{ (1+0.5)[1- \frac{1+0.5}{168} ]+ \frac{0.479556}{4} \} } } {168+0.479556}=0.015536168+0.4795561+0.5+20.479556?+0.479556?{(1+0.5)[1?1681+0.5?]+40.479556?}??=0.015536
N?U0.25(1,168)=N?Pr=168?0.015536=2.61N·U_{0.25}(1,168)=N·Pr=168·0.015536=2.61N?U0.25?(1,168)=N?Pr=168?0.015536=2.61
Compared with Quinlan’s Book on Page 42th:
The code to veriy the above value:
/*************************************************************************/ /* */ /* Statistical routines for C4.5 */ /* ----------------------------- */ /* */ /*************************************************************************/ #include <stdio.h> #include <math.h>/*************************************************************************/ /* */ /* Compute the additional errors if the error rate increases to the */ /* upper limit of the confidence level. The coefficient is the */ /* square of the number of standard deviations corresponding to the */ /* selected confidence level. (Taken from Documenta Geigy Scientific */ /* Tables (Sixth Edition), p185 (with modifications).) */ /* */ /*************************************************************************/ float CF=0.25;float Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},//概率Dev[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};//這個(gè)就是分位點(diǎn)// P{X>Dev[i]}=val[i],// according to standardized normal distribution N(0,1)// for example:// P{X>2.58}=0.005,float AddErrs(float N, float e)//葉子的悲觀錯(cuò)誤 {static float Coeff=0;float Val0, Pr;if ( ! Coeff )//compute only once when pruning trees{/* Compute and retain the coefficient value, interpolating fromthe values in Val and Dev */int i;i = 0;while ( CF > Val[i] ) i++;//這里可以看到,是根據(jù)置信度對(duì)悲觀錯(cuò)誤數(shù)量進(jìn)行計(jì)算的。//但是呢,也沒(méi)有像理論中直接計(jì)算“悲觀錯(cuò)誤率”Coeff = Dev[i-1] +(Dev[i] - Dev[i-1]) * (CF - Val[i-1]) /(Val[i] - Val[i-1]);Coeff = Coeff * Coeff;printf("Coeff=%f\n",Coeff);//CF定下來(lái)以后,Coeff就是定的}if ( e < 1E-6 ){return N * (1 - exp(log(CF) / N));}elseif ( e < 0.9999 ){Val0 = N * (1 - exp(log(CF) / N));return Val0 + e * (AddErrs(N, 1.0) - Val0);//這里在進(jìn)行遞歸調(diào)用}elseif ( e + 0.5 >= N ){return 0.67 * (N - e);}else{float fenzi;fenzi=(e + 0.5 + Coeff/2+ sqrt( Coeff * ((e + 0.5) * (1 - (e + 0.5)/N) + Coeff/4)) );printf("fenzi=%f\n",fenzi);Pr = (e + 0.5 + Coeff/2+ sqrt( Coeff * ((e + 0.5) * (1 - (e + 0.5)/N) + Coeff/4)) )/ (N + Coeff);printf("Pr=%f\n",Pr);return (N * Pr - e);} }int main() {float result=AddErrs(16, 1); printf("result=%f\n",result);exit(0);}running method:
gcc -o stats stats.c -lm
./stats
the result is:
Coeff=0.479556
fenzi=2.582031
Pr=0.156681
result=1.506894
總結(jié)
以上是生活随笔為你收集整理的U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Earliest PEP Algorit
- 下一篇: Error Based Pruning剪