defpearson(x, y):n = len(x)vals = range(n)#簡單求和sumx = sum([float(x[i]) for i in vals])sumy = sum([float(y[i]) for i in vals])#求平方和sumxSq = sum([x[i] ** 2.0for i in vals])sumySq = sum([y[i] ** 2.0for i in vals])#求乘積之和pSum = sum([x[i] * y[i] for i in vals])#計算皮爾遜評價值num = pSum - (sumx * sumy / n)den = ((sumxSq - pow(sumx, 2) / n) * (sumySq - pow(sumy, 2) / n)) ** 0.5if den == 0:return1r = num / denreturn r
3.加權平均(Weighted Mean) 用途:求平均,以相似度為權重預測結果 公式:
代碼實現:
#加權平均defweightedmean(x, w):num = sum([x[i] * w[i] for i in range(len(w))])den = sum([w[i] for i in range(len(w))])if den == 0:return1return num / den
#Tanimoto系數deftanimoto(a, b):c = [v for v in a if v in b]return float(len(c)) / (len(a) + len(b) - len(c))
5.條件概率(Conditional Probability) 用途:用于預測 公式:
代碼實現:
#條件概率defcondprobability(pab, pb):return pab / pb
6.基尼不純度(Gini Impurity) 用途:度量一個集合有多純 公式:
代碼實現:
#基尼不純度defginiimpurity(l):total = len(l)counts = {}for item in l:counts.setdefault(item, 0)counts[item] += 1imp = 0for j in l:f1 = float(counts[j]) / totalfor k in l:if j == k:continuef2 = float(counts[j]) / totalimp += f1 * f2return imp
7.熵(Entropy) 用途:也是用來判斷集合的混亂程度 公式:
代碼實現:
#熵defentropy(l):from math import loglog2 = lambda x : log(x) / log(2)total = len(l)counts = {}for item in l:counts.setdefault(item, 0)counts[item] += 1ent = 0for i in counts:p = float(counts[i]) / totalent -= p * log2(p)return ent
8.方差(Variance) 用途:度量預測或分類結果 公式:
代碼實現:
#方差defvariance(vals):mean = float(sum(vals)) / len(vals)s = sum([(v - mean) ** 2for v in vals])return s / len(vals)
#點積from math import acos#計算兩向量的點積defdotproduct(a, b):return sum([a[i] * b[i] for i in range(len(a))])#計算一個向量的大小defveclength(a):return sum([a[i] for i in range(len(a))]) * 0.5#計算兩個向量間的夾角defangle(a, b):dp = dotproduct(a, b)la = veclength(a)lb = veclength(b)costheta = dp / (la * lb)return acos(costheta)