近期在看CF的相關論文,《Collaborative Filtering for Implicit Feedback Datasets》思想非常好,非常easy理解??墒菑哪繕撕瘮?
是怎樣推導出Xu和Yi的更新公式的推導過程卻沒有非常好的描寫敘述。所以以下寫一下
推導:
首先對Xu求導:
當中Y是item矩陣,n*f維,每一行是一個item_vec,C^u是n*n維的對角矩陣。
對角線上的每個元素是c_ui,P(u)是n*1的列向量,它的第i個元素為p_ui。
然后令導數=0,可得:
因為x_u和y_i在目標函數中是對稱的。所以非常easy得到:
當中X是user矩陣,m*f維度,每一行是一個user_vec,C^i是m*m的對角矩陣。對角線上的每個元素是c_ui。P(i)是m*1的列向量。它的第u和元素是p_ui
然后令導數=0,可得:
以下是論文算法思想的Python實現:
import numpy
as np
import scipy.sparse
as sparse
from scipy.sparse.linalg
import spsolve
import time
def load_matrix(filename, num_users, num_items):t0 = time.time()counts = np.zeros((num_users, num_items))total =
0.0num_zeros = num_users * num_items
'''假設要對一個列表或者數組既要遍歷索引又要遍歷元素時。能夠用enumerate,當傳入參數為文件時,索引為行號,元素相應的一行內容'''for i, line
in enumerate(open(filename,
'r')): user, item, count = line.strip().split(
'\t')user = int(user)item = int(item)count = float(count)
if user >= num_users:
continueif item >= num_items:
continueif count !=
0:counts[user, item] = counttotal += countnum_zeros -=
1if i %
100000 ==
0:
print 'loaded %i counts...' % ialpha = num_zeros / total
print 'alpha %.2f' % alphacounts *= alphacounts = sparse.csr_matrix(counts)t1 = time.time()
print 'Finished loading matrix in %f seconds' % (t1 - t0)
return counts
class ImplicitMF():def __init__(self, counts, num_factors=40, num_iterations=30,reg_param=0.8):self.counts = countsself.num_users = counts.shape[
0]self.num_items = counts.shape[
1]self.num_factors = num_factorsself.num_iterations = num_iterationsself.reg_param = reg_param
def train_model(self):self.user_vectors = np.random.normal(size=(self.num_users,self.num_factors))self.item_vectors = np.random.normal(size=(self.num_items,self.num_factors))
'''要生成非常大的數字序列的時候,用xrange會比range性能優非常多,因為不須要一上來就開辟一塊非常大的內存空間,這兩個基本上都是在循環的時候用'''for i
in xrange(self.num_iterations):t0 = time.time()
print 'Solving for user vectors...'self.user_vectors = self.iteration(
True, sparse.csr_matrix(self.item_vectors))
print 'Solving for item vectors...'self.item_vectors = self.iteration(
False, sparse.csr_matrix(self.user_vectors))t1 = time.time()
print 'iteration %i finished in %f seconds' % (i +
1, t1 - t0)
def iteration(self, user, fixed_vecs):num_solve = self.num_users
if user
else self.num_itemsnum_fixed = fixed_vecs.shape[
0]YTY = fixed_vecs.T.dot(fixed_vecs)eye = sparse.eye(num_fixed)lambda_eye = self.reg_param * sparse.eye(self.num_factors)solve_vecs = np.zeros((num_solve, self.num_factors))t = time.time()
for i
in xrange(num_solve):
if user:counts_i = self.counts[i].toarray()
else:counts_i = self.counts[:, i].T.toarray()
''' 原論文中c_ui=1+alpha*r_ui,可是在計算Y’CuY時為了減少時間復雜度,利用了Y'CuY=Y'Y+Y'(Cu-I)Y,因為Cu是對角矩陣,其元素為c_ui,即1+alpha*r_ui。所以Cu-I也就是對角元素為alpha*r_ui的對角矩陣'''CuI = sparse.diags(counts_i, [
0])pu = counts_i.copy()pu[np.where(pu !=
0)] =
1.0YTCuIY = fixed_vecs.T.dot(CuI).dot(fixed_vecs)YTCupu = fixed_vecs.T.dot(CuI + eye).dot(sparse.csr_matrix(pu).T)xu = spsolve(YTY + YTCuIY + lambda_eye, YTCupu)solve_vecs[i] = xu
if i %
1000 ==
0:
print 'Solved %i vecs in %d seconds' % (i, time.time() - t)t = time.time()
return solve_vecs
總結
以上是生活随笔為你收集整理的Alternating Least Squares(ASL) for Implicit Feedback Datasets的数学推导以及用Python实现的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。