當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【搜索/推荐排序】FM，FFM，AFM,PNN,DeepFM:进行CTR和CVR预估

發(fā)布時(shí)間：2024/7/5 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了【搜索/推荐排序】FM，FFM，AFM,PNN,DeepFM:进行CTR和CVR预估小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

1.FM
- 1.1代碼-是否點(diǎn)擊預(yù)測(cè)
- - 效果和參數(shù)量級(jí)
- 1.3 和其他模型的比較
- - SVM
  - MF
2. FFM
- one-hot的比較
- eg
- 訓(xùn)練注意事項(xiàng)
- - 效果和參數(shù)量級(jí)
- 實(shí)現(xiàn)
3. AFM
4.FNN/PNN
- 4.1 FNN
- 4.2 PNN
5. DeepFM
- 與Wide&Deep比較
- 與NFM
- FM 本來就可以在稀疏輸入的場(chǎng)景中進(jìn)行學(xué)習(xí)，為什么要跟 Deep 共享稠密輸入？

用于：進(jìn)行CTR和CVR預(yù)估
點(diǎn)擊率CTR（click-through rate）和轉(zhuǎn)化率CVR（conversion rate）

特征的選擇：
離散or連續(xù)

1.FM

用處：用于是否點(diǎn)擊預(yù)測(cè)、評(píng)分
優(yōu)點(diǎn)：數(shù)據(jù)稀疏也可用
對(duì)于每個(gè)二階組合特征的權(quán)重，是根據(jù)對(duì)應(yīng)兩個(gè)特征的Embedding向量?jī)?nèi)積，來作為這個(gè)組合特征重要性的指示。當(dāng)訓(xùn)練好FM模型后，每個(gè)特征都可以學(xué)會(huì)一個(gè)特征embedding向量
介紹
code
一個(gè)例子

wij分解為v：減少參數(shù)量，使訓(xùn)練更容易，這是矩陣分解的思路
v是特征的向量
交叉特征

其中f/k:是v的size，n是特征的個(gè)數(shù)

1.1代碼-是否點(diǎn)擊預(yù)測(cè)

vx:使用embedding實(shí)現(xiàn)
wx:也用embedding實(shí)現(xiàn)

class FactorizationMachineModel(torch.nn.Module):"""A pytorch implementation of Factorization Machine.Reference:S Rendle, Factorization Machines, 2010."""def __init__(self, field_dims, embed_dim):super().__init__()self.embedding = FeaturesEmbedding(field_dims, embed_dim)self.linear = FeaturesLinear(field_dims)self.fm = FactorizationMachine(reduce_sum=True)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``"""x = self.linear(x) + self.fm(self.embedding(x))# (batch_size, 1) + (batch_size, 1)# sigmoid：二分類，是/否點(diǎn)擊return torch.sigmoid(x.squeeze(1))

vx:

class FeaturesEmbedding(torch.nn.Module):def __init__(self, field_dims, embed_dim):super().__init__()# print(field_dims, embed_dim)"""對(duì)于MovieLens而言，只有兩種特征，用戶，電影所以field_dims = [用戶數(shù)n，電影數(shù)k]，是每種特征的維度的列表加個(gè)offset,embedding :0-n代表用戶特征，n-n+k代表電影特征"""self.embedding = torch.nn.Embedding(sum(field_dims), embed_dim)# print(self.embedding)self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)# print(self.offsets)torch.nn.init.xavier_uniform_(self.embedding.weight.data)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``return ``(batch_size, num_fields, embed_size)``"""x = x + x.new_tensor(self.offsets).unsqueeze(0)# print(x.shape, self.embedding(x).shape)return self.embedding(x)

wx：

class FeaturesLinear(torch.nn.Module):def __init__(self, field_dims, output_dim=1):"""從特征向量直接映射到embedding，相當(dāng)于做了次linear"""super().__init__()self.fc = torch.nn.Embedding(sum(field_dims), output_dim)# print(self.fc)self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``return ``(batch_size, num_fields, output_dim)``"""x = x + x.new_tensor(self.offsets).unsqueeze(0)# print(self.fc(x).shape, torch.sum(self.fc(x), dim=1).shape)return torch.sum(self.fc(x), dim=1) + self.bias

因子：和的平方-平方的和

class FactorizationMachine(torch.nn.Module):def __init__(self, reduce_sum=True):super().__init__()self.reduce_sum = reduce_sumdef forward(self, x):""":param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``"""square_of_sum = torch.sum(x, dim=1) ** 2 # (batch_size, embed_dim)sum_of_square = torch.sum(x ** 2, dim=1) # (batch_size, embed_dim)ix = square_of_sum - sum_of_square # (batch_size, embed_dim)if self.reduce_sum:ix = torch.sum(ix, dim=1, keepdim=True) # (batch_size, 1)return 0.5 * ix

效果和參數(shù)量級(jí)

MovieLens：
k=16
n=6040+3952(用戶數(shù)加電影數(shù)）
epoch=31，test auc: 0.8142968981785388,early stop

1.3 和其他模型的比較

SVM

相比SVM的二階多項(xiàng)式核而言，FM在樣本稀疏的情況下是有優(yōu)勢(shì)的；而且，FM的訓(xùn)練/預(yù)測(cè)復(fù)雜度是線性的，而二項(xiàng)多項(xiàng)式核SVM需要計(jì)算核矩陣，核矩陣復(fù)雜度就是N平方。

MF

FM：我們可以加任意多的特征，比如user的歷史購買平均值，item的歷史購買平均值等，
但是MF只能局限在兩類特征。
SVD++與MF類似，在特征的擴(kuò)展性上都不如FM

2. FFM

美團(tuán)的文章
FM公式

有n*k個(gè)二次項(xiàng)

FFM公式

有nkf個(gè)二次項(xiàng)，表達(dá)能力更強(qiáng)
$v_i$ -> $v_{i,f_j}$
特征的embedding->特征i對(duì)特征j的場(chǎng)，每個(gè)特征有多個(gè)場(chǎng)（向量？），每個(gè)特征都有一個(gè)特征矩陣
優(yōu)點(diǎn)：參數(shù)多，表達(dá)能力強(qiáng)
缺點(diǎn)：復(fù)雜度高，笨重，容易過擬合
復(fù)雜度：相對(duì)于n而言都是線性的

簡(jiǎn)化公式：

FFM模型作為排序模型，效果確實(shí)是要優(yōu)于FM模型的，但是FFM模型對(duì)參數(shù)存儲(chǔ)量要求太多，以及無法能做到FM的運(yùn)行效率，如果中小數(shù)據(jù)規(guī)模做排序沒什么問題，但是數(shù)據(jù)量一旦大起來，對(duì)資源和效率的要求會(huì)急劇升高，這是嚴(yán)重阻礙FFM模型大規(guī)模數(shù)據(jù)場(chǎng)景實(shí)用化的重要因素。
純結(jié)構(gòu)化數(shù)據(jù)還是不適合深度學(xué)習(xí)，LSTM和CNN做分類還是不太好。

解決過擬合：早停，選較小的k。
一般在幾千萬訓(xùn)練數(shù)據(jù)規(guī)模下，k取8到10能取得較好的效果

one-hot的比較

FM:

FFM:
“Day=26/11/15”、“Day=1/7/14”、“Day=19/2/15”這三個(gè)特征都是代表日期的，可以放到同一個(gè)field中
Coutry：一個(gè)field
Ad_type一個(gè)field
共有3個(gè)field
上面共有n=7個(gè)特征，共屬于3個(gè)field（f=3）

eg

值為1：存在該特征
值為9.99：價(jià)格等于9.99

訓(xùn)練注意事項(xiàng)

在訓(xùn)練FFM的過程中，有許多小細(xì)節(jié)值得特別關(guān)注。

第一，樣本歸一化。FFM默認(rèn)是進(jìn)行樣本數(shù)據(jù)的歸一化，即 pa.norm 為真；若此參數(shù)設(shè)置為假，很容易造成數(shù)據(jù)inf溢出，進(jìn)而引起梯度計(jì)算的nan錯(cuò)誤。因此，樣本層面的數(shù)據(jù)是推薦進(jìn)行歸一化的。

第二，特征歸一化。CTR/CVR模型采用了多種類型的源特征，包括數(shù)值型和categorical類型等。但是，categorical類編碼后的特征取值只有0或1，較大的數(shù)值型特征會(huì)造成樣本歸一化后categorical類生成特征的值非常小，沒有區(qū)分性。例如，一條用戶-商品記錄，用戶為“男”性，商品的銷量是5000個(gè)（假設(shè)其它特征的值為零），那么歸一化后特征“sex=male”（性別為男）的值略小于0.0002，而“volume”（銷量）的值近似為1。特征“sex=male”在這個(gè)樣本中的作用幾乎可以忽略不計(jì)，這是相當(dāng)不合理的。因此，將源數(shù)值型特征的值歸一化到 [0,1] 是非常必要的。

第三，省略零值特征。從FFM模型的表達(dá)式(4)可以看出，零值特征對(duì)模型完全沒有貢獻(xiàn)。包含零值特征的一次項(xiàng)和組合項(xiàng)均為零，對(duì)于訓(xùn)練模型參數(shù)或者目標(biāo)值預(yù)估是沒有作用的。因此，可以省去零值特征，提高FFM模型訓(xùn)練和預(yù)測(cè)的速度，這也是稀疏樣本采用FFM的顯著優(yōu)勢(shì)。

本文主要介紹了FFM的思路來源和理論原理，并結(jié)合源碼說明FFM的實(shí)際應(yīng)用和一些小細(xì)節(jié)。從理論上分析，FFM的參數(shù)因子化方式具有一些顯著的優(yōu)勢(shì)，特別適合處理樣本稀疏性問題，且確保了較好的性能；從應(yīng)用結(jié)果來看，站內(nèi)CTR/CVR預(yù)估采用FFM是非常合理的，各項(xiàng)指標(biāo)都說明了FFM在點(diǎn)擊率預(yù)估方面的卓越表現(xiàn)。當(dāng)然，FFM不一定適用于所有場(chǎng)景且具有超越其他模型的性能，合適的應(yīng)用場(chǎng)景才能成就FFM的“威名”。

效果和參數(shù)量級(jí)

MovieLens
k=4
f=2
n=6040+3952(用戶數(shù)加電影數(shù)）
epoch=99,test auc: 0.8083788345826268

實(shí)現(xiàn)

class FieldAwareFactorizationMachine(torch.nn.Module):def __init__(self, field_dims, embed_dim):super().__init__()print(field_dims, embed_dim)self.num_fields = len(field_dims)# n*(f=num_fields)個(gè)特征向量self.embeddings = torch.nn.ModuleList([torch.nn.Embedding(sum(field_dims), embed_dim) for _ in range(self.num_fields)])self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)for embedding in self.embeddings:torch.nn.init.xavier_uniform_(embedding.weight.data)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``"""x = x + x.new_tensor(self.offsets).unsqueeze(0)# print(x.shape)xs = [self.embeddings[i](x) for i in range(self.num_fields)]# xs=[(batch_size, num_fields, embed_dim),...]ix = list()for i in range(self.num_fields - 1):for j in range(i + 1, self.num_fields):ix.append(xs[j][:, i] * xs[i][:, j])ix = torch.stack(ix, dim=1) # (batch_size, 1, embed_dim)return ixclass FieldAwareFactorizationMachineModel(torch.nn.Module):"""A pytorch implementation of Field-aware Factorization Machine.Reference:Y Juan, et al. Field-aware Factorization Machines for CTR Prediction, 2015."""def __init__(self, field_dims, embed_dim):super().__init__()self.linear = FeaturesLinear(field_dims)self.ffm = FieldAwareFactorizationMachine(field_dims, embed_dim)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``"""ffm_term = torch.sum(torch.sum(self.ffm(x), dim=1), dim=1, keepdim=True)# print(torch.sum(self.ffm(x), dim=1).shape, ffm_term.shape)# (batch_size, emb_dim), (batch_size, 1)x = self.linear(x) + ffm_termreturn torch.sigmoid(x.squeeze(1))

3. AFM

AFM：Attentional FM

AttentionalFactorizationMachineModel(field_dims, embed_dim=16, attn_size=16, dropouts=(0.2, 0.2)) AttentionalFactorizationMachineModel = Linear+AttentionalFactorizationMachine

內(nèi)積部分:

class AttentionalFactorizationMachine(torch.nn.Module):def __init__(self, embed_dim, attn_size, dropouts):print(embed_dim, attn_size, dropouts)super().__init__()self.attention = torch.nn.Linear(embed_dim, attn_size)self.projection = torch.nn.Linear(attn_size, 1)self.fc = torch.nn.Linear(embed_dim, 1)self.dropouts = dropoutsdef forward(self, x):""":param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``"""print(x.shape)num_fields = x.shape[1]row, col = list(), list()for i in range(num_fields - 1):for j in range(i + 1, num_fields):row.append(i), col.append(j)p, q = x[:, row], x[:, col]inner_product = p * q # (batch_size, num_fields*(num_fields-1)/2=1, embed_dim)# self-attentionattn_scores = F.relu(self.attention(inner_product)) # (batch_size, num_fields*(num_fields-1)/2=1, attn_size)# print(attn_scores.shape)attn_scores = F.softmax(self.projection(attn_scores), dim=1)# (batch_size, num_fields*(num_fields-1)/2=1, 1)attn_scores = F.dropout(attn_scores, p=self.dropouts[0], training=self.training)# 給每個(gè)內(nèi)積一個(gè)權(quán)重，加權(quán)求和attn_output = torch.sum(attn_scores * inner_product, dim=1)# sum((batch_size, num_fields*(num_fields-1)/2=1, 1) * (batch_size, num_fields*(num_fields-1)/2=1, embed_dim))# =(batch_size, embed_dim)# print(attn_output.shape)attn_output = F.dropout(attn_output, p=self.dropouts[1], training=self.training)return self.fc(attn_output)# =(batch_size, 1)

4.FNN/PNN

缺點(diǎn)：對(duì)于低階的組合特征，學(xué)習(xí)到的比較少。而前面我們說過，低階特征對(duì)于CTR也是非常重要的。

4.1 FNN

先使用預(yù)先訓(xùn)練好的FM，得到隱向量，然后作為DNN的輸入來訓(xùn)練模型。
缺點(diǎn)在于：受限于FM預(yù)訓(xùn)練的效果。

4.2 PNN

PNN
PNN為了捕獲高階組合特征，在embedding layer和first hidden layer之間增加了一個(gè)product layer。根據(jù)product layer使用內(nèi)積、外積、混合分別衍生出IPNN, OPNN, PNN*三種類型。

對(duì)每一個(gè)item，拼接上交叉特征，獲得新的特征

class ProductNeuralNetworkModel(torch.nn.Module):"""A pytorch implementation of inner/outer Product Neural Network.Reference:Y Qu, et al. Product-based Neural Networks for User Response Prediction, 2016."""def __init__(self, field_dims, embed_dim, mlp_dims, dropout, method='inner'):super().__init__()num_fields = len(field_dims)if method == 'inner':self.pn = InnerProductNetwork()elif method == 'outer':self.pn = OuterProductNetwork(num_fields, embed_dim)else:raise ValueError('unknown product type: ' + method)self.embedding = FeaturesEmbedding(field_dims, embed_dim)self.linear = FeaturesLinear(field_dims, embed_dim)self.embed_output_dim = num_fields * embed_dimself.mlp = MultiLayerPerceptron(num_fields * (num_fields - 1) // 2 + self.embed_output_dim, mlp_dims, dropout)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``"""embed_x = self.embedding(x) # (batch_size, num_fields=2, embed_dim)cross_term = self.pn(embed_x) # (batch_size, num_fields*(num_fields-1)/2=1)x = torch.cat([embed_x.view(-1, self.embed_output_dim), cross_term], dim=1)# 內(nèi)積：(batch_size, num_fields * embed_dim+1(product=num_fields*(num_fields-1)/2))x = self.mlp(x)return torch.sigmoid(x.squeeze(1))

$FM(x)=w0+∑i=1nwixi+∑i=1n?1∑j=i+1n<vi,vj>xixjFM(x)=w_0 + \sum_{i=1}^nw_ix_i+\sum_{i=1}^{n-1}\sum_{j=i+1}^n<v_i,v_j>x_ix_j$
$FFM(x)=w0+∑i=1nwixi+∑i=1n?1∑j=i+1n<vi,fj,vj,fi>xixjFFM(x)=w_0 + \sum_{i=1}^nw_ix_i+\sum_{i=1}^{n-1}\sum_{j=i+1}^n<v_{i,f_j},v_{j,f_i}>x_ix_j$
$AFM(x)=w0+∑i=1nwixi+PT∑i=1n?1∑j=i+1naij<vi,vj>xixjAFM(x)=w_0 + \sum_{i=1}^nw_ix_i+P^T\sum_{i=1}^{n-1}\sum_{j=i+1}^na_{ij}<v_{i},v_{j}>x_ix_j$
$FNN(x)=mlp(l_z),l_z=embed(x)，embed$
$PNN（x）=mlp(l_z+l_p),l_z=embed(x),l_p=內(nèi)積或外積$

num_fields = x.shape[1] row, col = list(), list() for i in range(num_fields - 1):for j in range(i + 1, num_fields):row.append(i), col.append(j)

內(nèi)積：

torch.sum(x[:, row] * x[:, col], dim=2)

外積

class OuterProductNetwork(torch.nn.Module):def __init__(self, num_fields, embed_dim, kernel_type='mat'):super().__init__()num_ix = num_fields * (num_fields - 1) // 2if kernel_type == 'mat':kernel_shape = embed_dim, num_ix, embed_dimelif kernel_type == 'vec':kernel_shape = num_ix, embed_dimelif kernel_type == 'num':kernel_shape = num_ix, 1else:raise ValueError('unknown kernel type: ' + kernel_type)self.kernel_type = kernel_typeself.kernel = torch.nn.Parameter(torch.zeros(kernel_shape))torch.nn.init.xavier_uniform_(self.kernel.data)def forward(self, x):""":param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``"""num_fields = x.shape[1]row, col = list(), list()for i in range(num_fields - 1):for j in range(i + 1, num_fields):row.append(i), col.append(j)p, q = x[:, row], x[:, col]if self.kernel_type == 'mat':kp = torch.sum(p.unsqueeze(1) * self.kernel, dim=-1).permute(0, 2, 1)return torch.sum(kp * q, -1)else:return torch.sum(p * q * self.kernel.unsqueeze(0), -1)

5. DeepFM

DeepFM
https://www.cnblogs.com/xiaoqi/p/deepfm.html
https://zhuanlan.zhihu.com/p/35465875
優(yōu)點(diǎn)：

不需要預(yù)訓(xùn)練FM得到隱向量

不需要人工特征工程

能同時(shí)學(xué)習(xí)低階和高階的組合特征
FM模塊和Deep模塊共享Feature Embedding部分，可以更快的訓(xùn)練，以及更精確的訓(xùn)練學(xué)習(xí)
缺點(diǎn)：

1 將類別特征對(duì)應(yīng)的稠密向量拼接作為輸入，然后對(duì)元素進(jìn)行兩兩交叉。這樣導(dǎo)致模型無法意識(shí)到域的概念，FM 與 Deep 兩部分都不會(huì)考慮到域，屬于同一個(gè)域的元素應(yīng)該對(duì)應(yīng)同樣的計(jì)算

class DeepFactorizationMachineModel(torch.nn.Module):"""A pytorch implementation of DeepFM.Reference:H Guo, et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, 2017."""def __init__(self, field_dims, embed_dim, mlp_dims, dropout):super().__init__()self.linear = FeaturesLinear(field_dims)self.fm = FactorizationMachine(reduce_sum=True)self.embedding = FeaturesEmbedding(field_dims, embed_dim)self.embed_output_dim = len(field_dims) * embed_dimself.mlp = MultiLayerPerceptron(self.embed_output_dim, mlp_dims, dropout)def forward(self, x):""":param x: Long tensor of size ``(batch_size, num_fields)``"""embed_x = self.embedding(x)x = self.linear(x) + self.fm(embed_x) + self.mlp(embed_x.view(-1, self.embed_output_dim))return torch.sigmoid(x.squeeze(1))

與Wide&Deep比較

與 Wide&Deep 的異同：

相同點(diǎn)：都是線性模型與深度模型的結(jié)合，低階與高階特征交互的融合。

不同點(diǎn)：

輸入：DeepFM 兩個(gè)部分共享輸入，而 Wide&Deep 的 wide 側(cè)是稀疏輸入，deep 側(cè)是稠密輸入；
人工特征：DeepFM 無需加入人工特征，可端到端的學(xué)習(xí)，線上部署更方便，Wide&Deep 則需要在輸入上加入人工特征提升模型表達(dá)能力。

與NFM

DFM：并行
NFM:串行

FM 本來就可以在稀疏輸入的場(chǎng)景中進(jìn)行學(xué)習(xí)，為什么要跟 Deep 共享稠密輸入？

雖然 FM 具有線性復(fù)雜度 [公式] ，其中 [公式] 為特征數(shù)， [公式] 為隱向量維度，可以隨著輸入的特征數(shù)線性增長(zhǎng)。但是經(jīng)過 onehot 處理的類別特征維度往往要比稠密向量高上一兩個(gè)數(shù)量級(jí)，這樣還是會(huì)給 FM 側(cè)引入大量冗余的計(jì)算，不可取。

總結(jié)

以上是生活随笔為你收集整理的【搜索/推荐排序】FM，FFM，AFM,PNN,DeepFM:进行CTR和CVR预估的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：不能头脑一热，就布局颠覆性技术、上马未来
下一篇：顺序表的插入删除查找遍历