python 主语_前深度学习时代--FFM模型的原理与Python实现
基于上一篇分析中協同過濾、邏輯回歸及FM的比較,可以得出這樣一個結論:
主流模型迭代的關鍵在于增強模型表達能力,而增強方式的主要脈絡為:
更通俗的表達:
- 模型迭代在于找到更多的有效信息。
本文想要回顧的FFM(Field-aware Factorization Macheines)模型可以看作是FM模型的增強版,其正是沿用在FM模型的特征組合思想,并將其發揚光大,曾在多項CTR預估賽中奪魁,并且被Criteo、美團等公司深度應用在推薦系統與CTR預估等領域。
相較于FM模型,FFM模型在FM隱向量特征交叉組合的基礎上,進一步引入了特征感知(field-aware)這一概念,使得模型表達能力在理論上有了一個較大的提升。
FM模型表達式:
FFM的表達式如下所示:
從表達式中可以看出,FFM與FM的不同之處在于二階特征組合部分的隱向量由
FFM這么做的原因是什么?
什么是特征域Fileds,FFM怎么去做特征域級的特征組合?
參考論文原文 ,“features” can be grouped into “fields” 。把這句話的主語和賓語置換一下就可以得到Fileds的定義: “fields” is grouped by "features",也即特征域由某類特征組成。
借用論文的示例,或許可以更直觀的回答這個問題:
Table 1: An artificial CTR data set, where + (-) representsthe number of clicked (unclicked) impressions.Table 2:a example of click event在上面兩個表中,Publisher(P)、Advertiser(A)和Gander(G)是三個Fileds特征域。其中Publisher特征域的中的feature有ESPN、Vogue和NBC,套用作者定義那么有:fields of Publisher is grouped by features<ESPN,Vogue,NBC>。Advertiser與Gander特征域同理。
那么對于Table 2的數據,FFM的二階特征組合結果為:
而對于FM二階特征組合結果而言:
可以看到在FFM模型中,Feature ESPN在與NIKE及Male形成特征組合ESPN,NIKE)、(ESPN,Male)時,使用了不同的潛向量
及 來學習參數。同理,Feature NIKE在與ESPN及Male特征組合時,也分別使用了兩個不同的潛向量 及 來計算組合權重。而對于FM而言,Feature ESPN在與NIKE及Male特征組合時,使用的是同一個潛向量
,同時NIKE與Male特征也只存在唯一一個隱向量。至此,什么是特征域Fileds,FFM怎么去做特征域級的特征組合得到解釋。
一個問題
以上解釋了FFM模型相較于FM模型的優勢在于,每個特征在與不同Field下的Feature進行組合時,會使用與之特征域對應的隱向量來學習組合權重。而這也帶來了一個FFM不被大規模應用的問題--參數量暴增,解釋如下:
假設一份數據集dataset的維度為(m,n),FM在進行模型訓練時將隱向量特征維度設置為k維,那么FM的特征量便為nk。而對于FFM而言,假設dataset的n維特征對應著f個特征域,那么在隱向量維度同維k的情況下,FFM的特征量級為nfk(實際數量為n(f-1)k,由于特征不需要自我交叉,因此為f-1)。
nk與nfk之間的差別雖然只是線性級,但是由于互聯網數據特征量n動輒百萬級,雖然f的值會比n小若干個數量級,但這也足以使得FFM模型的參數量暴增到一個恐怖的級別。
Table 3:two CTR datasets Criteo and Avazu from Kaggle competitions以Table3數據集Criteo為例,當維度k取為10時,FM的參數量為
,而FFM的量級為 。雖然只增加了一個數量級別,但參數量從千萬級變為了億級別,可謂非常恐怖了。python實現
關于FFM的工程代碼其實論文作者已經在github上給出C++版本,python版,或者Amazon AI的馬超開源的XLearn。
本文引用Python implementation of FFM model (ctr, cvr)一文,來分析一下FFM的代碼實現。
import numpy as np import math import random class ffm(object):def __init__(self, feature_num, fild_num, feature_dim_num, feat_fild_dic, learning_rate, regular_para, stop_threshold):#n features, m domains, each feature dimension kself.n = feature_num #特征數量self.m = fild_num #特征域數self.k = feature_dim_num #隱向量特征長度self.dic = feat_fild_dic #特征對應的域#Set hyperparameter, learning rate eta, regularization coefficient lamdaself.eta = learning_rateself.lamda = regular_paraself.threshold = stop_thresholdself.w = np.random.rand(self.n, self.m , self.k) / math.sqrt(self.k)#權重初試值self.G = np.ones(shape = (feature_num, fild_num, feature_dim_num), dtype = np.float64)def train(self, tr_l, val_l, train_y, val_y, max_echo):#這一部分計算模型的訓練損失# tr_l, val_l, train_y, val_y, max_echo are# Training set, validation set, training set label, validation set label, maximum number of iterationsminloss = 0for i in range(max_echo):# Iterative training, max_echo is the maximum number of iterationsL_val = 0Logloss = 0order = range(len(train_y))# mess up the orderrandom.shuffle(order)for each_data_index in order:# Remove a recordtr_each_data = tr_l[each_data_index]# phi() is the model formulaphi = self.phi(tr_each_data)# y_i is the actual tag valuey_i = float(train_y[each_data_index])# Calculate the gradient belowg_phi = -y_i / (1 + math.exp(y_i * phi))# Begin to update the model parameters using the gradient descent methodself.sgd_para(tr_each_data, g_phi)# Next, check on the verification set, the basic process is the same as before.for each_vadata_index, each_va_y in enumerate(val_y):val_each_data = val_l[each_vadata_index]phi_v = self.phi(val_each_data)y_vai = float(each_va_y)Logloss += -(y_vai * math.log(phi_v) + (1 - y_vai) * math.log(1 - phi_v))Logloss = Logloss / len(val_y)# L_val += math.log(1+math.exp(-y_vai * phi_v))print("The %d iteration, LOGLOSS on the validation set: %f" % (i, Logloss))if minloss == 0:# minloss stores the smallest LOGLOSSminloss = Loglossif Logloss <= self.threshold:# It can also be considered that setting the threshold allows the program to jump, and personal needs can be removed.print('Less than the threshold!')breakif minloss < Logloss:# If the next iteration does not reduce LOGLOSS, break out (early stopping)print('early stopping')breakdef phi(self, tmp_dict):#這一部分計算這FFM二階部分的對應的值#Samples are normalized here to prevent calculation overflowsum_v = sum(tmp_dict.values())#First find the index of the non-zero feature in each piece of data and put it in a listphi_tmp = 0key_list = tmp_dict.keys()for i in range(len(key_list)):#feat_index is the index of the feature, fild_index1 is the index of the domain, and value1 is the value corresponding to the featurefeat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]#The purpose of dividing here by sum_v is to normalize this one (return all feature values ??to between 0 and 1)#Of course, each feature has been normalized before (0-1)value1 = tmp_dict[feat_index1] / sum_v#Two non-zero features pairwise inner productfor j in range(i+1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]#The final value is obtained by summing up multiple characteristic combinationsphi_tmp += np.dot(w1, w2) * value1 * value2return phi_tmpdef sgd_para(self, tmp_dict, g_phi):#這一部分梯度計算 ,參數的更新#學習率是用的AdaGrad算法sum_v = sum(tmp_dict.values())key_list = tmp_dict.keys()for i in range(len(key_list)):feat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]value1 = tmp_dict[feat_index1] / sum_vfor j in range(i + 1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]# Update g and Gg_feati_fildj = g_phi * value1 * value2 * w2 + self.lamda * w1g_featj_fildi = g_phi * value1 * value2 * w1 + self.lamda * w2self.G[feat_index1, fild_index2] += g_feati_fildj ** 2self.G[feat_index2, fild_index1] += g_featj_fildi ** 2# math.sqrt() can only accept one element, while np.sqrt() can root the entire vectorself.w[feat_index1, fild_index2] -= self.eta / np.sqrt(self.G[feat_index1, fild_index2]) * g_feati_fildjself.w[feat_index2, fild_index1] -= self.eta / np.sqrt(self.G[feat_index2, fild_index1]) * g_featj_fildi參考資料
- FFM原文
- https://programmersought.com/article/15904469481/
- 美團深入FFM原理與實踐
推薦閱讀
秋雨淅淅l:前深度學習時代--因子分解機模型FM的因與果。?zhuanlan.zhihu.com總結
以上是生活随笔為你收集整理的python 主语_前深度学习时代--FFM模型的原理与Python实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: qt能使用logback_SpringB
- 下一篇: 关不关机 扫地机器人_【小米智能家居】米