當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

wide Deep tensorflow实现

發布時間：2024/1/17 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 wide Deep tensorflow实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言
　　最近讀了下Google的兩篇文章《Wide&Deep Learning》和《Deep&Cross Network》，趁著熱乎比較下，順道寫個demo，免得后面用的時候瞎搞。?
　　前者是用來給用戶推薦潛在喜歡的APP；后者是用來預測用戶可能點擊的廣告排序。基于用戶基本信息和行為日志來做個性化的推薦，是商業化的重要一步，做得好，用戶使用起來甚是滿意，廣告商支付更多費用；做得差，大家只能喝喝西風，吃點咸菜。

Why Deep-Network ?
　　關于推薦，前面博文FTRL系列講過，那是種基于基本特征和二維組合特征的線性推薦模型。其優點：模型簡單明白，工程實現快捷，badcase調整方便。缺點也很明顯：對更高維抽象特征無法表示，高維組合特征不全。而Deep-Network能夠表達高維抽象特征，剛好可以彌補了線性模型這方面的缺點。

Why Cross-Network ?
　　組合特征，為什么止步于兩維組合？多維組合，不單說手挑組合特征費時費力，假設特征都組合起來，特征的數量非得彪上天不可。但是Cross-Network(參考5)則可以很好地解決組合特征的數量飆升的問題。所以說，并不是問題真難以解決，只不過牛人還沒有解它而已。?
結構比較?
　　啥都不如圖明白，直接上圖，左側 Wide and Deep Network 右側 Deep and Cross Network?

　　上面兩個圖清晰地表示了兩種方法的框架結構。
特征輸入
　　1）W&D的特征包括三方面：?
　　　　User-Feature：contry, language, demographics.?
　　　　Contextual-Feature：device, hour of the day, day of the week.?
　　　　Impression-Feature：app age, historical statistics of an app.?
　　1.1）Wide部分的輸入特征：?
　　　　raw input features and transformed features [手挑的交叉特征].?
　　　　notice: W&D這里的cross-product transformation：?
　　　　只在離散特征之間做組合，不管是文本策略型的，還是離散值的；沒有連續值特征的啥事，至少在W&D的paper里面是這樣使用的。?
　　1.2）Deep部分的輸入特征： raw input+embeding處理?
　　　　對非連續值之外的特征做embedding處理，這里都是策略特征，就是乘以個embedding-matrix。在TensorFlow里面的接口是：tf.feature_column.embedding_column，默認trainable=True.?
　　　　對連續值特征的處理是：將其按照累積分布函數P(X≤x)，壓縮至[0,1]內。?
　　　　notice: Wide部分用FTRL+L1來訓練；Deep部分用AdaGrad來訓練。?
　　Wide&Deep在TensorFlow里面的API接口為：tf.estimator.DNNLinearCombinedClassifier?
　　2）D&C的輸入特征及處理：?
　　　　所有輸入統一處理，不再區分是給Deep部分還是Cross部分。?
　　　　對高維的輸入（一個特征有非常多的可能性），加embeding matrix，降低維度表示，dense維度估算：6?(category?cardinality)1/46?(category?cardinality)1/4。?
　　　　notice：W&D和D&C里的embedding不是語言模型中常說的Word2Vec（根據上下文學習詞的低維表示），而是僅僅通過矩陣W，將離散化且非常稀疏的one-hot形式詞，降低維度而已。參數矩陣的學習方法是正常的梯度下降。?
　　　　對連續值的，則用log做壓縮轉換。?
　　　　stack上述特征，分別做deep-network和Cross-network的輸入。?
cross解釋?
　　cross-network在引用5中有詳細介紹，但是在D&C里面是修改之后的cross-network。?
xl=x0?xTl?1?wembedding+b+xl?1
xl=x0?xl?1T?wembedding+b+xl?1

　　單樣本下大小為：x0=[d×1]x0=[d×1];xl=[d×1]xl=[d×1]; wembedding=[d×1]wembedding=[d×1];b=[d×1]b=[d×1]，注意 w是共享的，對這一層交叉特征而言，為啥共享呢，目測一方面為了節約空間，還一個可能原因是收斂困難（待定）。
tf實現D&C的注意事項
　　1）mult-hot的特征表示問題?
　　　　tf.feature_column.indicator_column來表示。?
　　　　注意，_IndicatorColumn不支持疊加_EmbeddingColumn操作。?
　　2）embedding問題?
　　　　tf.feature_column.embedding_column來表示，默認trainable=True?
　　　　特征間共享embed： tf.contrib.layers.shared_embedding_columns?
　　3）數據讀入的問題?
　　　　dataset流解析函數要在input_fn內部。?
　　　　tf.cast 與 tf.string_to_number。?
　　4）tf.estimator.Estimator問題?
　　　　自定義的model_fn的參數params項，是顯式地傳遞。?
　　　　注意，estimator本身帶有異步更新的機制，SycOpt。?
　　5）cross-network的實現?
　　　　借助廣播來計算。?
　　　　驗證，tile是不影響原始參數梯度計算的。?
　　6）不定長特征的embedding?
　　　　tf.feature_column + estimator?
　　　　是不支持不定長特征的處理的，僅支持定長的。?
　　　　只能用tf.nn.embedding_lookup_sparse來處理不定長特征。?
　　　　對字符串離散不定長特征的示例代碼附在后面。?
　　　　非用tf.feature_column處理不定長特征，會有報錯?
　　　　convert Sparse Tensor to Tensor的維度錯誤，但是不知道內部哪里的錯。

tf_debug
　　因為是用tf.estimator寫的模型，無法使用print查看內部變量，調試就成了大問題。tf.estimator在設計的時候，考慮到了這種情況，將其設計為可接收外部定義的hook，支持tf_debug。詳細代碼見下面的mult.py。?
　　hook的樣式，params[‘hooks’] =?
　　[tf_debug.LocalCLIDebugHook()]，?
　　然后傳遞到estimator內部，給train或者evaluate使用。?
　　用tf_debug查看內部變量，舉個栗子，想看下?
　　tf.feature_column.embedding_column的combiner=sum是怎么個操作。?
　　某特征輸入：?
　　1)State-gov|human 2)Self-emp-not-inc|human 3)State-gov|human?
　　為了方便，初始化embedding-matrix=ones.?

　　debug下運行，得到embedding-mat變量如下：?

　　對特征的處理結果：編碼表示和index值（embedding輸入側的的索引值）?

　　發現embedding-vec如下：?

　　發現：其中的combiner=sum是依照index找到embedding-vec,然后對embedding_vec加和得到embedding結果的。自行替換成隨機初始化的embedding-matrix，得到同樣的驗證結果。
github 源碼
　　利用tf.feature_column + dataset + tf.estimator 實現Deep and Cross。?
　　數據集是census income dataset。?
　　D&C 測試 demo : https://github.com/jxyyjm/tensorflow_test/blob/master/src/deep_and_cross.py?
　　tf_debug 測試 demo : https://github.com/jxyyjm/tensorflow_test/blob/master/src/multi.py?
　　下面給出cross_計算在tf里面的多種實現，對tf.matmul /tf.tensordot的應用是核心，簡潔高效是重要的。

#!/usr/bin/python
# -*- coding:utf-8 -*-
import tensorflow as tf
import sys?
reload(sys)
sys.setdefaultencoding('utf-8')

def cross_op(x0, x, w, b):?
? ## absolute the defination 計算速度最慢，低效 ##
? x0 = tf.expand_dims(x0, axis=2) # mxdx1
? x ?= tf.expand_dims(x, ?axis=2) # mxdx1
? multiple = w.get_shape().as_list()[0]
? x0_broad_horizon = tf.tile(x0, [1,1,multiple]) ? # mxdx1 -> mxdxd #
? x_broad_vertical = tf.transpose(tf.tile(x, ?[1,1,multiple]), [0,2,1]) # mxdx1 -> mxdxd #
? w_broad_horizon ?= tf.tile(w, ?[1,multiple]) ? ? # dx1 -> dxd #
? mid_res = tf.multiply(tf.multiply(x0_broad_horizon, x_broad_vertical), tf.transpose(w_broad_horizon)) # mxdxd # here use broadcast compute #?
? res = tf.reduce_sum(mid_res, axis=2) # mxd #
? res = res + tf.transpose(b) # mxd + 1xd # here also use broadcast compute #a
? return res?
def cross_op2(x0, x, w, b):?
? ## 充分利用了廣播計算來實現cross，也很低效 ##
? x0 = tf.expand_dims(x0, axis=2) # mxdx1
? x ?= tf.expand_dims(x, ?axis=2) # mxdx1
? dot = tf.matmul(x0, tf.transpose(x, [0, 2, 1]))
? mid_res = tf.multiply(dot, tf.transpose(w))
? res = tf.reduce_sum(mid_res, axis=2) + tf.transpose(b) # mxd ?+ 1xd # here also use broadcast compute #
? return res?
def cross_op_single_data(x0, x, w, b):
? ## 最簡潔的cross_實現方法，單條樣本 ##
? ## all para size is [d, 1] ##
? dot = tf.matmul(x0, tf.transpose(x)) # dxd
? cros= tf.tensordot(dot, w, [[1], [0]]) + b ## dot的某行 dot ?w的某列 ##
? return cros
def cross_op_batch_data(x0, x, w, b):
? ## x0 and x size is [batch, d]，與后面的方法一致，計算高效 ##
? ## w ?and b size is [d, 1]
? x0 = tf.expand_dims(x0, 2) # [batch, d, 1]
? x ?= tf.expand_dims(x, ?2) # [batch, d, 1]
? dot= tf.matmul(x0, tf.transpose(x, [0, 2, 1])) # [batch, d, d] = batch x {[dx1]x[1xd]
? #cros = tf.tensordot(dot, w, [[1], [0]) + b # [batch, d, 1] this is wrong
? cros = tf.tensordot(dot, w, 1) + b ## 這種寫法來源與maxnet ## 很奇妙 ##
? return tf.squeeze(cros, 2)
def cross_op_None_batch(x0, x, w, b):
? ## x0 and x size is [None, d] ## 借助了keras.backend.batch_dot ##
? ## w ?and b size is [d, 1]
? x0 = tf.expand_dims(x0, 2) # [batch, d, 1]
? x ?= tf.expand_dims(x, ?2) # [batch, d, 1]
? dot= tf.contrib.keras.backend.batch_dot(x0, tf.transpose(x, [0,2,1]), [2, 1])
? #cros = tf.tensordot(dot, w, [[1], [0]]) + b # this is wrong?
? cros = tf.tensordot(dot, w, 1) + b
? return tf.squeeze(cros, 2)

Reference
《2016-Wide & Deep Learning for Recommender Systems》
《2017-Deep & Cross Network for Ad Click Predictions》
https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html (google research blog)
https://github.com/tensorflow/models/tree/master/official/wide_deep (wide&deep github code)
《2016-Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features》?
附：tf.nn.embedding_lookup_sparse如何處理不定長的字符串的embedding問題。
輸入數據如下：
csv = [
? "1,oscars|brad-pitt|awards",
? "2,oscars|film|reviews",
? "3,matt-damon|bourne",
]
第二列是不定長的特征。處理如下：

import tensorflow as tf

# Purposefully omitting "bourne" to demonstrate OOV mappings.
TAG_SET = ["oscars", "brad-pitt", "awards", "film", "reviews", "matt-damon"]
NUM_OOV = 1

def sparse_from_csv(csv):
? ids, post_tags_str = tf.decode_csv(csv, [[-1], [""]])
? table = tf.contrib.lookup.index_table_from_tensor(
? ? ? mapping=TAG_SET, num_oov_buckets=NUM_OOV, default_value=-1) ## 這里構造了個查找表 ##
? split_tags = tf.string_split(post_tags_str, "|")
? return ids, tf.SparseTensor(
? ? ? indices=split_tags.indices,
? ? ? values=table.lookup(split_tags.values), ## 這里給出了不同值通過表查到的index ##
? ? ? dense_shape=split_tags.dense_shape)

# Optionally create an embedding for this.
TAG_EMBEDDING_DIM = 3

ids, tags = sparse_from_csv(csv)

embedding_params = tf.Variable(tf.truncated_normal([len(TAG_SET) + NUM_OOV, TAG_EMBEDDING_DIM]))
embedded_tags = tf.nn.embedding_lookup_sparse(embedding_params, sp_ids=tags, sp_weights=None)

# Test it out
with tf.Session() as s:
? s.run([tf.global_variables_initializer(), tf.tables_initializer()])
? print(s.run([ids, embedded_tags]))

1）這樣就可以處理非定長的特征了，壞處是無法納入到tf.feature_column + tf.estimator模型框架里，模型輸入和整體結構都暴露在外面，丑~?
2）改寫成共享embedding也非常容易。?
據說最新的tf 1.5里新增 Add support for sparse multidimensional feature columns.【鼓掌】抽空看看
?

總結

以上是生活随笔為你收集整理的wide Deep tensorflow实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： TensorFlow搭建简易Wide a
下一篇： Redis 哈希槽