當前位置：首頁 > 编程语言 > python >内容正文

python

python 主语_前深度学习时代--FFM模型的原理与Python实现

發布時間：2024/10/5 python 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 主语_前深度学习时代--FFM模型的原理与Python实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

基于上一篇分析中協同過濾、邏輯回歸及FM的比較，可以得出這樣一個結論：

主流模型迭代的關鍵在于增強模型表達能力，而增強方式的主要脈絡為：

引入其它可用特征信息（CF->LR）。

將現有特征進行組合（LR->POLY2->FM）。

更通俗的表達：

模型迭代在于找到更多的有效信息。

本文想要回顧的FFM（Field-aware Factorization Macheines）模型可以看作是FM模型的增強版，其正是沿用在FM模型的特征組合思想，并將其發揚光大，曾在多項CTR預估賽中奪魁，并且被Criteo、美團等公司深度應用在推薦系統與CTR預估等領域。

相較于FM模型，FFM模型在FM隱向量特征交叉組合的基礎上，進一步引入了特征感知（field-aware）這一概念，使得模型表達能力在理論上有了一個較大的提升。

FM模型表達式：

FFM的表達式如下所示：

從表達式中可以看出，FFM與FM的不同之處在于二階特征組合部分的隱向量由

變成了。而這便意味著FFM模型中每一維特征對應的不是唯一一個隱向量，而是一組特征隱向量，這也具體的引出了FFM與FM特征組合不同之處與FFM模型的提升之處。

FFM這么做的原因是什么？

FM模型中每一特征共用同一個特征隱向量，意味著每一特征與不同域特征進行組合時使用的是同一隱特征向量來學習隱向量參數，這樣不夠細致，存在明顯的信息浪費。

什么是特征域Fileds，FFM怎么去做特征域級的特征組合？

參考論文原文，“features” can be grouped into “fields” 。把這句話的主語和賓語置換一下就可以得到Fileds的定義： “fields” is grouped by "features"，也即特征域由某類特征組成。

借用論文的示例，或許可以更直觀的回答這個問題：

Table 1: An artificial CTR data set, where + (-) representsthe number of clicked (unclicked) impressions.

Table 2:a example of click event

在上面兩個表中，Publisher(P)、Advertiser(A)和Gander(G)是三個Fileds特征域。其中Publisher特征域的中的feature有ESPN、Vogue和NBC，套用作者定義那么有：fields of Publisher is grouped by features<ESPN,Vogue,NBC>。Advertiser與Gander特征域同理。

那么對于Table 2的數據，FFM的二階特征組合結果為：

而對于FM二階特征組合結果而言：

可以看到在FFM模型中，Feature ESPN在與NIKE及Male形成特征組合ESPN,NIKE)、(ESPN,Male)時，使用了不同的潛向量

及來學習參數。同理，Feature NIKE在與ESPN及Male特征組合時，也分別使用了兩個不同的潛向量及來計算組合權重。

而對于FM而言，Feature ESPN在與NIKE及Male特征組合時，使用的是同一個潛向量

，同時NIKE與Male特征也只存在唯一一個隱向量。

至此，什么是特征域Fileds，FFM怎么去做特征域級的特征組合得到解釋。

一個問題

以上解釋了FFM模型相較于FM模型的優勢在于，每個特征在與不同Field下的Feature進行組合時，會使用與之特征域對應的隱向量來學習組合權重。而這也帶來了一個FFM不被大規模應用的問題--參數量暴增，解釋如下：

假設一份數據集dataset的維度為(m,n)，FM在進行模型訓練時將隱向量特征維度設置為k維，那么FM的特征量便為nk。而對于FFM而言，假設dataset的n維特征對應著f個特征域，那么在隱向量維度同維k的情況下，FFM的特征量級為nfk（實際數量為n(f-1)k，由于特征不需要自我交叉，因此為f-1）。

nk與nfk之間的差別雖然只是線性級，但是由于互聯網數據特征量n動輒百萬級，雖然f的值會比n小若干個數量級，但這也足以使得FFM模型的參數量暴增到一個恐怖的級別。

Table 3:two CTR datasets Criteo and Avazu from Kaggle competitions

以Table3數據集Criteo為例，當維度k取為10時，FM的參數量為

，而FFM的量級為。雖然只增加了一個數量級別，但參數量從千萬級變為了億級別，可謂非常恐怖了。

python實現

關于FFM的工程代碼其實論文作者已經在github上給出C++版本，python版，或者Amazon AI的馬超開源的XLearn。

本文引用Python implementation of FFM model (ctr, cvr)一文，來分析一下FFM的代碼實現。

import numpy as np import math import random class ffm(object):def __init__(self, feature_num, fild_num, feature_dim_num, feat_fild_dic, learning_rate, regular_para, stop_threshold):#n features, m domains, each feature dimension kself.n = feature_num #特征數量self.m = fild_num #特征域數self.k = feature_dim_num #隱向量特征長度self.dic = feat_fild_dic #特征對應的域#Set hyperparameter, learning rate eta, regularization coefficient lamdaself.eta = learning_rateself.lamda = regular_paraself.threshold = stop_thresholdself.w = np.random.rand(self.n, self.m , self.k) / math.sqrt(self.k)#權重初試值self.G = np.ones(shape = (feature_num, fild_num, feature_dim_num), dtype = np.float64)def train(self, tr_l, val_l, train_y, val_y, max_echo):#這一部分計算模型的訓練損失# tr_l, val_l, train_y, val_y, max_echo are# Training set, validation set, training set label, validation set label, maximum number of iterationsminloss = 0for i in range(max_echo):# Iterative training, max_echo is the maximum number of iterationsL_val = 0Logloss = 0order = range(len(train_y))# mess up the orderrandom.shuffle(order)for each_data_index in order:# Remove a recordtr_each_data = tr_l[each_data_index]# phi() is the model formulaphi = self.phi(tr_each_data)# y_i is the actual tag valuey_i = float(train_y[each_data_index])# Calculate the gradient belowg_phi = -y_i / (1 + math.exp(y_i * phi))# Begin to update the model parameters using the gradient descent methodself.sgd_para(tr_each_data, g_phi)# Next, check on the verification set, the basic process is the same as before.for each_vadata_index, each_va_y in enumerate(val_y):val_each_data = val_l[each_vadata_index]phi_v = self.phi(val_each_data)y_vai = float(each_va_y)Logloss += -(y_vai * math.log(phi_v) + (1 - y_vai) * math.log(1 - phi_v))Logloss = Logloss / len(val_y)# L_val += math.log(1+math.exp(-y_vai * phi_v))print("The %d iteration, LOGLOSS on the validation set: %f" % (i, Logloss))if minloss == 0:# minloss stores the smallest LOGLOSSminloss = Loglossif Logloss <= self.threshold:# It can also be considered that setting the threshold allows the program to jump, and personal needs can be removed.print('Less than the threshold!')breakif minloss < Logloss:# If the next iteration does not reduce LOGLOSS, break out (early stopping)print('early stopping')breakdef phi(self, tmp_dict):#這一部分計算這FFM二階部分的對應的值#Samples are normalized here to prevent calculation overflowsum_v = sum(tmp_dict.values())#First find the index of the non-zero feature in each piece of data and put it in a listphi_tmp = 0key_list = tmp_dict.keys()for i in range(len(key_list)):#feat_index is the index of the feature, fild_index1 is the index of the domain, and value1 is the value corresponding to the featurefeat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]#The purpose of dividing here by sum_v is to normalize this one (return all feature values ??to between 0 and 1)#Of course, each feature has been normalized before (0-1)value1 = tmp_dict[feat_index1] / sum_v#Two non-zero features pairwise inner productfor j in range(i+1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]#The final value is obtained by summing up multiple characteristic combinationsphi_tmp += np.dot(w1, w2) * value1 * value2return phi_tmpdef sgd_para(self, tmp_dict, g_phi):#這一部分梯度計算，參數的更新#學習率是用的AdaGrad算法sum_v = sum(tmp_dict.values())key_list = tmp_dict.keys()for i in range(len(key_list)):feat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]value1 = tmp_dict[feat_index1] / sum_vfor j in range(i + 1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]# Update g and Gg_feati_fildj = g_phi * value1 * value2 * w2 + self.lamda * w1g_featj_fildi = g_phi * value1 * value2 * w1 + self.lamda * w2self.G[feat_index1, fild_index2] += g_feati_fildj ** 2self.G[feat_index2, fild_index1] += g_featj_fildi ** 2# math.sqrt() can only accept one element, while np.sqrt() can root the entire vectorself.w[feat_index1, fild_index2] -= self.eta / np.sqrt(self.G[feat_index1, fild_index2]) * g_feati_fildjself.w[feat_index2, fild_index1] -= self.eta / np.sqrt(self.G[feat_index2, fild_index1]) * g_featj_fildi

參考資料

FFM原文
https://programmersought.com/article/15904469481/
美團深入FFM原理與實踐

總結

以上是生活随笔為你收集整理的python 主语_前深度学习时代--FFM模型的原理与Python实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： qt能使用logback_SpringB
下一篇：关不关机扫地机器人_【小米智能家居】米