SRILM文档分析之Prob.h
Prob.h Prob.cc
文檔作者:jianzhu
創(chuàng)立時間:08.09.11
--------------------------------------
1、概述
--------------------------------------
??? 這兩個文件定義了一組函數(shù)用于處理浮點數(shù)和對數(shù)的加減操作。
同時定義一個用于將字符串浮點數(shù)轉(zhuǎn)換為浮點數(shù)的函數(shù)。
--------------------------------------
2、函數(shù)功能解釋
--------------------------------------
a) LogPtoProb函數(shù)
<src>
0??inline Prob LogPtoProb(LogP2 prob)
1??{
2????? if (prob == LogP_Zero) {
3?????? ?return 0;
4????? } else {
5??????return exp(prob * M_LN10);
6????? }
7??}
</src>
??? 功能:將以10為底的對數(shù)概率值轉(zhuǎn)化為概率值本身
????
??? 細(xì)解:第2行判斷對數(shù)概率值是否為LogP_Zero即-HUGE_VAL,若為該值,則
??? 直接返回0;否則執(zhí)行第5行。
??? 第5行首先將prob*M_LN10獲得概率的ln值,然后對其求以自然數(shù)為底的運算,
??? 則獲得概率值本身,同時返回該概率值。
????
??? 注: prob??? --> log10(a)
???????? M_LN10? --> ln10
???????? exp^X?? --> e^X
?????????
???????? log10(a)??? = ln(a)/ln10
????? -> prob*M_LN10 = ln(a)/ln10 * ln10 = ln(a)
????? -> exp(prob * M_LN10) = exp(ln(a)) = a
?????
b) LogPtoPPL函數(shù)
<src>
0??inline Prob LogPtoPPL(LogP prob)
1??{
2????? return exp(- prob * M_LN10);
3??}
</src>
??? 功能:將以10為底的對數(shù)概率值轉(zhuǎn)化為概率值對應(yīng)的perplexity。
??? 對于至含一個概率的情況,即為將以10為底的對數(shù)概率值轉(zhuǎn)化為概率值倒數(shù)
????????? perplexity = P(W1W2...Wn)^-1/N
????????? 當(dāng)只有一個概率值P時,上式簡化為:
????????? perplexity = P^-1
????
???? 細(xì)解:
???????? prob??? --> log10(a)
???????? M_LN10? --> ln10
???????? exp^X?? --> e^X
?????????
???????? log10(a)??? = ln(a)/ln10
????? -> prob*M_LN10 = ln(a)/ln10 * ln10 = ln(a)
????? -> exp(-prob * M_LN10) = exp(-ln(a)) = exp(ln(a^-1)) = a^-1 = 1/a
c) ProbToLogP函數(shù)
<src>
0??inline LogP ProbToLogP(Prob prob)
1??{
2????? return log10(prob);
3??}
</src>
??? 功能:將概率值轉(zhuǎn)化為以10為底的對數(shù)值
????
??? 細(xì)解:第2行通過調(diào)用log10函數(shù),直接將prob轉(zhuǎn)換為以10為底的對數(shù)值,同時返回該值
d) MixLogP函數(shù)
<src>
0??inline LogP MixLogP(LogP prob1, LogP prob2, double lambda)
1??{
2????? return ProbToLogP(lambda * LogPtoProb(prob1) +
3?????????????(1 - lambda) * LogPtoProb(prob2));
4??}
</src>
??? 功能:對prob1和prob2概率對數(shù)值求lambda線性插值
????
??? 細(xì)解:第2行首先通過調(diào)用LogPtoProb函數(shù)分別獲取prob1和prob2中的概率值,然后對該
??? 概率值使用lambda參數(shù)進(jìn)行線性插值運算,同時通過調(diào)用ProbToLogP函數(shù)將運算結(jié)果轉(zhuǎn)換
??? 為以10為底的概率對數(shù)值,同時返回該值。
????
e)AddLogP函數(shù)
<src>
0??inline LogP2 AddLogP(LogP2 x, LogP2 y)
1??{
2????? if (x<y) {
3??????LogP2 temp = x; x = y; y = temp;
4????? }
5????? if (y == LogP_Zero) {
6??????return x;
7????? } else {
8??????LogP2 diff = y - x;
9??????return x + log10(1.0 + exp(diff * M_LN10));
10???? }
11?}
</src>
??? 功能:對概率對數(shù)值中的概率進(jìn)行求和運算,同時返回求和運算結(jié)果的以10為底的對數(shù)值
????
??? 細(xì)解:第2-4行用于處理x小于y的情況,同時將較大的值存儲到x中,而將較小的值存儲到y(tǒng)中。
??? 第5-7行用于處理當(dāng)y為longP_Zero的情況,此時直接返回x值。
??? 第7-10行用于處理剩下的情況
?????????? x --> log10(a)
?????????? y --> log10(b)
???????????
?????????? diff = y-x = log10(b) - log10(a) = log10(b/a)
?????????? diff * M_LN10 = ln(b/a)
?????????? exp(diff*M_LN10) = b/a
?????????? log10(1.0 + exp(diff * M_LN10)) = log10(1.0 + b/a) = log10( (a+b)/a )
?????????? x + log10(1.0 + exp(diff * M_LN10)) = log10(a) + log10( (a+b)/a )
?????????????????????????????????????????????? = log10(a+b);
f) SubLogP函數(shù)
<src>
0??inline LogP2 SubLogP(LogP2 x, LogP2 y)
1??{
2????? assert(x >= y);
3????? if (x == y) {
4??????return LogP_Zero;
5????? } else if (y == LogP_Zero) {
6?????? ?return x;
7????? } else {
8??????LogP2 diff = y - x;
9??????return x + log10(1.0 - exp(diff * M_LN10));
10???? }
11?}
</src>
??功能:對概率對數(shù)值中的概率進(jìn)行求差運算,同時返回求差運算結(jié)果的以10為底的對數(shù)值
??
??細(xì)解:第2行用于處理x小于y情況,由于log10(A)是一個遞增函數(shù),且定義域為(0, 正無窮)
??因此需要保證 x >= y。
??第3-5行處理當(dāng)x==y的情況,此時直接返回LogP_Zero,即負(fù)無窮小。
??第5-7行處理當(dāng)y為負(fù)無窮小的情況,此時直接返回x。
??第7-10行處理剩余的情況
???????? x --> log10(a)
?????????? y --> log10(b)
???????????
?????????? diff = y-x = log10(b) - log10(a) = log10(b/a)
?????????? diff * M_LN10 = ln(b/a)
?????????? exp(diff*M_LN10) = b/a
?????????? log10(1.0 - exp(diff * M_LN10)) = log10(1.0 - b/a) = log10( (a-b)/a )
?????????? x + log10(1.0 - exp(diff * M_LN10)) = log10(a) + log10( (a-b)/a )
?????????????????????????????????????????????? = log10(a-b);
g) weightLogP函數(shù)
<src>
0??inline LogP weightLogP(double weight, LogP prob)
1??{
2????? /*
3????? * avoid NaN if weight == 0 && prob == -Infinity
4????? */
5????? if (weight == 0.0) {
6??????return 0.0;
7????? } else {
8??????return weight * prob;
9????? }
10?}
</src>
??功能:對將權(quán)重weight乘到prob上,并返回運算結(jié)果
??
??細(xì)解:第5-7行處理當(dāng)weight為0.0的情況,此時直接返回0.0;否則執(zhí)行第7-9行
??第7-9行對weight和prob進(jìn)行乘法運算,并返回運算結(jié)果。
??
h) rint函數(shù)
<src>
0??inline double rint(double x)?
1??{
2????if (x >= 0) {
3??????return (double)(int)(x + 0.5);
4????} else {
5??????return (double)(int)(x - 0.5);
6????}
7??}
</src>
??功能:對浮點數(shù)x進(jìn)行求頂或求底操作
??
??細(xì)解:第2-4行處理當(dāng)x >= 0時進(jìn)行求頂運算,同時返回運算結(jié)果
??第4-6行處理當(dāng)x < 0時進(jìn)行求底運算,同時返回運算結(jié)果
i) finite函數(shù)
<src>
0??inline int finite (double x)?
1??{
2????if (x < 1.e+300 && x > -1.e+300)
3??????return 1;
4????else?
5??????return 0;
6??}
</src>
??功能:判斷浮點數(shù)是否足夠大或足夠小,同時在滿足條件時返回1,否則返回0
??
??細(xì)解:第2-3行處理x大于10的300次方或小于-10的300次方的情況,此時認(rèn)為
??x為一個無窮值,直接返回1;否則執(zhí)行第5行返回0。
??
j) parseLogP函數(shù)
<src>
0??Boolean
1??parseLogP(const char *str, LogP &result)
2??{
3????? const unsigned maxDigits = 8;?// number of decimals in an integer
4
5????? const char *cp = str;
6????? const char *cp0;
7????? Boolean minus = false;
8
9????? /*
10????? * Log probabilties are typically negative values of magnitude > 0.0001,
11????? * and thus are usually formatted without exponential notation.
12????? * We parse this type of format using integer arithmetic for speed,
13????? * and fall back onto scanf() in all other cases.
14????? * We also use scanf() when there are too many digits to handle with
15????? * integers.
16????? * Finally, we also parse +/- infinity values as they are printed by?
17????? * printf().? These are "[Ii]nf" or "[Ii]nfinity".
18????? */
19
20???? /*
21????? * Parse optional sign
22????? */
23???? if (*cp == '-') {
24?????minus = true;
25?????cp++;
26???? } else if (*cp == '+') {
27?????cp++;
28???? }
29???? cp0 = cp;
30
31???? unsigned digits = 0;??// total value of parsed digits
32???? unsigned decimals = 1;??// scaling factor from decimal point
33???? unsigned precision = 0;??// total number of parsed digits
34?
35???? /*
36????? * Parse digits before decimal point
37????? */
38???? while (isdigit(*cp)) {
39?????digits = digits * 10 + (*(cp++) - '0');
40?????precision ++;
41???? }
42
43???? if (*cp == '.') {
44?????cp++;
45
46?????/*
47???? ?* Parse digits after decimal point
48?????? */
49?????while (isdigit(*cp)) {
50???????? digits = digits * 10 + (*(cp++) - '0');
51????? ???? precision ++;
52???????? decimals *= 10;
53?????}
54???? }
55
56???? /*
57???? * If we're at the end of the string then we're done.
58????? * Otherwise there was either an error or some format we can't
59????? * handle, so fall back on scanf(), after checking for infinity
60????? * values.
61????? */
62????? if (*cp == '\0' && precision <= maxDigits) {
63?????result = (minus ? - (LogP)digits : (LogP)digits) / (LogP)decimals;
64?????return true;
65???? } else if ((*cp0 == 'i' || *cp0 == 'I') &&
66?????????? (strncmp(cp0, "Inf", 3) == 0 || strncmp(cp0, "inf", 3) == 0))
67???? {
68?????result = (minus ? LogP_Zero : LogP_Inf);
69?????return true;
70???? } else {
71?????return (sscanf(str, "%f", &result) == 1);
72???? }
73?}
</src>
??功能:用于分析字符串表示的浮點數(shù)并將分析出的浮點數(shù)存儲到result中。
??
??細(xì)解:第23-28行用于分析str中表示的浮點數(shù)是否為負(fù)值,若為負(fù)值,則將minus設(shè)為true。
??第38-41行分析str表示的浮點數(shù)小數(shù)點之前的數(shù)值,并將其保存到digits中;
??第43-54行分析小數(shù)點之后的數(shù)值,同時將小數(shù)點之后的位數(shù)值記錄到decimals中;
??第62-65行處理當(dāng)str被正確分析,且precision小于8的情況,即浮點數(shù)的數(shù)值位數(shù)小于8,此時
??直接將result設(shè)為digits和decimals相除運算的結(jié)果。若minus為true,則將result設(shè)為負(fù)的結(jié)
??果,同時返回true;
??第65-70行通過分析剩下的無法分析的字符串是否為Inf或inf,同時結(jié)合minus來決定將result
??設(shè)為LogP_Zero(負(fù)無窮小)還是LogP_Inf(正無窮大),同時返回true;
??第70-72行用于處理其他情況,此時直接調(diào)用sscanf函數(shù),從str中讀出一個浮點數(shù)并將其保存在
??result中。同時返回調(diào)用sscanf是否成功的判斷結(jié)果。
??
k) ProbToBytelog函數(shù)
<src>
0??inline Bytelog ProbToBytelog(Prob prob)
1??{
2???? return (int)rint(log(prob) * (10000.5 / 1024.0));
3??}
</src>
??功能:將prob轉(zhuǎn)換為Bytelog
??
??細(xì)解:
??Bytelog介紹:A bytelog is a logarithm to base 1.0001, divided by 1024 and rounded to
??an integer。
??因此第2行通過使用ln(prob)*10000.5來模擬log1.0001(prob)
??舉例:prob == 2.0
??則
?????? ln(prob)*10000.5 = 6931.8183791897330668270298306425
?????? log1.0001(prob)? = ln(prob) / ln(1.0001) = 6931.8183734137953551959678499998
???????
??因此可見 10000.5 約等于 1/ln(1.0001) 因此這里直接用10000.5來模擬1/ln(1.0001)
??通過使用1024除上面結(jié)果,并將結(jié)果取整,即得到相應(yīng)的Bytelog值
l) ProbToIntlog函數(shù)
<src>
0??inline Intlog ProbToIntlog(Prob prob)
1??{
2????? return (int)rint(log(prob) * 10000.5);
3??}
</src>
??功能:將prob轉(zhuǎn)化為Intlog
??
??細(xì)解:
??Intlog介紹:A Intlog is a logarithm to base 1.0001, and rounded to
??an integer。
??因此第2行通過使用ln(prob)*10000.5來模擬log1.0001(prob)
??舉例:prob == 2.0
??則
?????? ln(prob)*10000.5 = 6931.8183791897330668270298306425
?????? log1.0001(prob)? = ln(prob) / ln(1.0001) = 6931.8183734137953551959678499998
???????
??因此可見 10000.5 約等于 1/ln(1.0001) 因此這里直接用10000.5來模擬1/ln(1.0001)
??通過將結(jié)果取整,即得到相應(yīng)的Intlog值
??
m) LogPtoBytelog函數(shù)
<src>
0??inline Bytelog LogPtoBytelog(LogP prob)
1??{
2????? return (int)rint(prob * (M_LN10 * 10000.5 / 1024.0));
3??}
</src>
??功能:將logProb轉(zhuǎn)化為Bytelog
??
??細(xì)解:
??Bytelog介紹:A bytelog is a logarithm to base 1.0001, divided by 1024 and rounded to
??an integer。
??因此第2行通過使用Prob * M_LN10(Prob == log10(prob) == ln(prob)/ln(10); M_LN10 == ln10)
??來獲得ln(prob),并通過ln(prob)*10000.5來模擬log1.0001(prob)
??舉例:prob == 2.0
??則
?????? ln(prob)*10000.5 = 6931.8183791897330668270298306425
?????? log1.0001(prob)? = ln(prob) / ln(1.0001) = 6931.8183734137953551959678499998
???????
??因此可見 10000.5 約等于 1/ln(1.0001) 因此這里直接用10000.5來模擬1/ln(1.0001)
??通過使用1024除上面結(jié)果,并將結(jié)果取整,即得到相應(yīng)的Bytelog值
n) LogPtoIntlog函數(shù)
<src>
0??inline Intlog LogPtoIntlog(LogP prob)
1??{
2???? ?return (int)rint(prob * (M_LN10 * 10000.5));
3??}
</src>
??功能:將logProb轉(zhuǎn)化為Intlog
??
??細(xì)解:
??Intlog介紹:A Intlog is a logarithm to base 1.0001, and rounded to
??an integer。
??因此第2行通過使用Prob * M_LN10(Prob == log10(prob) == ln(prob)/ln(10); M_LN10 == ln10)
??來獲得ln(prob),并通過ln(prob)*10000.5來模擬log1.0001(prob)
??舉例:prob == 2.0
??則
?????? ln(prob)*10000.5 = 6931.8183791897330668270298306425
?????? log1.0001(prob)? = ln(prob) / ln(1.0001) = 6931.8183734137953551959678499998
???????
??因此可見 10000.5 約等于 1/ln(1.0001) 因此這里直接用10000.5來模擬1/ln(1.0001)
??通過將結(jié)果取整,即得到相應(yīng)的Intlog值
o) IntlogToLogP函數(shù)
<src>
0??inline LogP IntlogToLogP(double prob)?/* use double argument to avoid loss
1????????????????????? * of information when converting from
2????????????????????? * floating point values */
3??{
4????? return prob/(M_LN10 * 10000.5);
5??}
</src>
??功能:將Intlog轉(zhuǎn)換為logProb
??
??細(xì)解:
??Intlog介紹:A Intlog is a logarithm to base 1.0001, and rounded to
??an integer。
??因此第2行通過使用Prob / M_LN10(Prob == log1.0001(prob) == ln(prob)/ln(1.0001); M_LN10 == ln10)
??來獲得log10(prob)/ln(1.0001),并通過log10(prob)/(ln(1.0001)*10000.5)來模擬log10(prob)
??因為 10000.5 約等于 1/ln(1.0001)
??
p) BytelogToLogP函數(shù)
<src>
0??inline LogP BytelogToLogP(double bytelog) /* use double argument so we can
1??????????????????????? * scale float values without loss of
2??????????????????????? * precision */
3??{
4????? return bytelog * (1024.0 / 10000.5 / M_LN10);
5??}
</src>
??? 功能:將Bytelog轉(zhuǎn)換為logProb
????
??? 細(xì)解:
??? Bytlog介紹:
??? A bytelog is a logarithm to base 1.0001, divided by 1024 and rounded to
??an integer。
??? 因此第2行通過使用bytelog*1024.0近似得到log1.0001(prob),通過使用10000.5/M_LN10
??? 即ln(1.0001)/ln10和log1.0001(prob)進(jìn)行乘法操作得到logProb
??? 因為log1.0001(prob) = lnprob/ln1.0001
??? 因此log1.0001(prob)*(ln(1.0001)/ln10) = lnprob/ln10 = log10(prob) --> logProb
????
q) IntlogToBytelog函數(shù)
<src>
0??inline? Bytelog IntlogToBytelog(Intlog intlog)
1??{
2???? ?int bytelog = ((-intlog) + (1 << (BytelogShift-1))) >> BytelogShift;
3
4???? ?if (bytelog > 255) {
5??????bytelog = 255;
6???? ?}
7???? ?return -bytelog;
8??}
</src>
??功能:將Intlog轉(zhuǎn)換為Bytelog函數(shù)
??
??細(xì)解:
??
r) BytelogToIntlog函數(shù)
<src>
0??inline Intlog BytelogToIntlog(Bytelog bytelog)
1??{
2????? return bytelog << BytelogShift;
3??}
</src>
??功能:將Bytelog轉(zhuǎn)換為Intlog
??
??細(xì)解:由于Bytelog和Intlog只差了1024,所以只需要
??將bytelog乘以1024即可得到Intlog。(也即左移10位)
????
??
--------------------------------------
知識點:
--------------------------------------
1、對數(shù)和其系數(shù)之間的相互轉(zhuǎn)換
?????? 系數(shù)轉(zhuǎn)對數(shù)
?????????? 該操作一般可以通過直接調(diào)用庫函數(shù)log10來實現(xiàn)。
?????? 對數(shù)轉(zhuǎn)系數(shù)
?????????? log10(a) -->? a
?????????? 根據(jù)以下公式
?????????? log10(a) = ln(a)/ln(10)?
?????????? M_LN10?? = ln(10)
?????????? 可以推導(dǎo)出
?????????? log10(a) * M_LN10 = ln(a)/ln(10) * ln(10) = ln(a)
?????????? 又有公式
?????????? exp(ln(a)) = a
?????????? 可得
?????????? exp(log10(a) * M_LN10) = exp(ln(a)) = a;
2、對數(shù)系數(shù)的加減運算
?????? 加運算
?????? log10(a) log10(b)? --> log10(a+b)
???????
?????? 假設(shè) b > a
?????? 由log10的遞增特性以及對數(shù)的相加和相減運算操作可得
?????? log10(b) - log10(a) = log10(b/a)
?????? 1 + exp(log10(b/a)*M_LN10) = 1 + b/a = (a+b)/a
?????? 因此
?????? log10(a) + log10(1 + exp(log10(b/a)*M_LN10)
?????? = log10(a) + log10((a+b)/a) = log10(a+b)
???????
?????? 減運算
?????? log10(a) log10(b)? --> log10(a-b)
???????
?????? 由log10的遞增特性以及對數(shù)的相加和相減運算操作可得
?????? log10(b) - log10(a) = log10(b/a)
?????? 1 - exp(log10(b/a)*M_LN10) = 1 - b/a = (a-b)/a
?????? 因此
?????? log10(a) + log10(1 - exp(log10(b/a)*M_LN10)
?????? = log10(a) + log10((a-b)/a) = log10(a-b)
總結(jié)
以上是生活随笔為你收集整理的SRILM文档分析之Prob.h的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ip库下载 mysql_IP地址库最新下
- 下一篇: 安装SQL Server 2000时“以