文本分类的一种对抗训练方法
最近閱讀了有關(guān)文本分類的文章,其中有一篇名為《Adversarail Training for Semi-supervised Text Classification》, 其主要思路實(shí)在文本訓(xùn)練時(shí)增加了一個(gè)擾動(dòng)因子,即在embedding層加入一個(gè)小的擾動(dòng),發(fā)現(xiàn)訓(xùn)練的結(jié)果比不加要好很多。
模型的網(wǎng)絡(luò)結(jié)構(gòu)如下圖:
?
下面就介紹一下這個(gè)對(duì)抗因子r的生成過(guò)程:
在進(jìn)入lstm網(wǎng)絡(luò)前先進(jìn)行從w到v的計(jì)算,即將wordembedding 歸一化:
然后定義模型的損失函數(shù),令輸入為x,參數(shù)為θ,Radv為對(duì)抗訓(xùn)練因子,損失函數(shù)為:
其中一個(gè)細(xì)節(jié),雖然θ? 是θ的復(fù)制,但是它是計(jì)算擾動(dòng)的過(guò)程,不會(huì)參與到計(jì)算梯度的反向傳播算法中。
然后就是求擾動(dòng):
?
?
先對(duì)表達(dá)式求導(dǎo)得到倒數(shù)g,然后對(duì)倒數(shù)g進(jìn)行l(wèi)2正則化的線性變換。
至此擾動(dòng)則計(jì)算完成然后加入之前的wordembedding中參與模型訓(xùn)練。
下面則是模型的代碼部分:
#構(gòu)建adversarailLSTM模型class AdversarailLSTM(object):def __init__(self, config, wordEmbedding, indexFreqs):#定義輸入self.inputX = tf.placeholder(tf.int32, [None, config.sequenceLength], name="inputX")self.inputY = tf.placeholder(tf.float32, [None, 1], name="inputY")self.dropoutKeepProb = tf.placeholder(tf.float32, name="dropoutKeepProb")#根據(jù)詞頻計(jì)算權(quán)重indexFreqs[0], indexFreqs[1] = 20000, 10000weights = tf.cast(tf.reshape(indexFreqs / tf.reduce_sum(indexFreqs), [1, len(indexFreqs)]), dtype=tf.float32)#詞嵌入層with tf.name_scope("wordEmbedding"):#利用預(yù)訓(xùn)練的詞向量初始化詞嵌入矩陣normWordEmbedding = self._normalize(tf.cast(wordEmbedding, dtype=tf.float32, name="word2vec"), weights)#self.W = tf.Variable(tf.cast(wordEmbedding, dtype=tf.float32, name="word2vec"), name="W")self.embeddedWords = tf.nn.embedding_lookup(normWordEmbedding, self.inputX)#計(jì)算二元交叉熵?fù)p失with tf.name_scope("loss"):with tf.variable_scope("Bi-LSTM", reuse=None):self.predictions = self._Bi_LSTMAttention(self.embeddedWords)self.binaryPreds = tf.cast(tf.greater_equal(self.predictions, 0.5), tf.float32, name="binaryPreds")losses = tf.nn.sigmoid_cross_entropy_with_logits(logits=self.predictions, labels=self.inputY)loss = tf.reduce_mean(losses)with tf.name_scope("perturloss"):with tf.variable_scope("Bi-LSTM", reuse=True):perturWordEmbedding = self._addPerturbation(self.embeddedWords, loss)print("perturbSize:{}".format(perturWordEmbedding))perturPredictions = self._Bi_LSTMAttention(perturWordEmbedding)perturLosses = tf.nn.sigmoid_cross_entropy_with_logits(logits=perturPredictions, labels=self.inputY)perturLoss = tf.reduce_mean(perturLosses)self.loss = loss + perturLossdef _Bi_LSTMAttention(self, embeddedWords):#定義兩層雙向LSTM的模型結(jié)構(gòu)with tf.name_scope("Bi-LSTM"):fwHiddenLayers = []bwHiddenLayers = []for idx, hiddenSize in enumerate(config.model.hiddenSizes):with tf.name_scope("Bi-LSTM" + str(idx)):#定義前向網(wǎng)絡(luò)結(jié)構(gòu)lstmFwCell = tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.LSTMCell(num_units=hiddenSize, state_is_tuple=True),output_keep_prob=self.dropoutKeepProb)#定義反向網(wǎng)絡(luò)結(jié)構(gòu)lstmBwCell = tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.LSTMCell(num_units=hiddenSize, state_is_tuple=True),output_keep_prob=self.dropoutKeepProb)fwHiddenLayers.append(lstmFwCell)bwHiddenLayers.append(lstmBwCell)# 實(shí)現(xiàn)多層的LSTM結(jié)構(gòu), state_is_tuple=True,則狀態(tài)會(huì)以元祖的形式組合(h, c),否則列向拼接fwMultiLstm = tf.nn.rnn_cell.MultiRNNCell(cells=fwHiddenLayers, state_is_tuple=True)bwMultiLstm = tf.nn.rnn_cell.MultiRNNCell(cells=bwHiddenLayers, state_is_tuple=True)#采用動(dòng)態(tài)rnn,可以動(dòng)態(tài)地輸入序列的長(zhǎng)度,若沒有輸入,則取序列的全長(zhǎng)#outputs是一個(gè)元組(output_fw, output_bw), 其中兩個(gè)元素的維度都是[batch_size, max_time, hidden_size], fw和bw的hiddensize一樣#self.current_state是最終的狀態(tài),二元組(state_fw, state_bw), state_fw=[batch_size, s], s是一個(gè)元組(h, c)outputs, self.current_state = tf.nn.bidirectional_dynamic_rnn(fwMultiLstm, bwMultiLstm,self.embeddedWords, dtype=tf.float32,scope="bi-lstm" + str(idx))#在bi-lstm+attention論文中,將前向和后向的輸出相加with tf.name_scope("Attention"):H = outputs[0] + outputs[1]#得到attention的輸出output = self.attention(H)outputSize = config.model.hiddenSizes[-1]print("outputSize:{}".format(outputSize))#全連接層的輸出with tf.name_scope("output"):outputW = tf.get_variable("outputW",shape=[outputSize, 1],initializer=tf.contrib.layers.xavier_initializer())outputB = tf.Variable(tf.constant(0.1, shape=[1]), name="outputB")predictions = tf.nn.xw_plus_b(output, outputW, outputB, name="predictions")return predictionsdef attention(self, H):"""利用Attention機(jī)制得到句子的向量表示"""#獲得最后一層lstm神經(jīng)元的數(shù)量hiddenSize = config.model.hiddenSizes[-1]#初始化一個(gè)權(quán)重向量,是可訓(xùn)練的參數(shù)W = tf.Variable(tf.random_normal([hiddenSize], stddev=0.1))#對(duì)bi-lstm的輸出用激活函數(shù)做非線性轉(zhuǎn)換M = tf.tanh(H)#對(duì)W和M做矩陣運(yùn)算,W=[batch_size, time_step, hidden_size], 計(jì)算前做維度轉(zhuǎn)換成[batch_size * time_step, hidden_size]#newM = [batch_size, time_step, 1], 每一個(gè)時(shí)間步的輸出由向量轉(zhuǎn)換成一個(gè)數(shù)字newM = tf.matmul(tf.reshape(M, [-1, hiddenSize]), tf.reshape(W, [-1, 1]))#對(duì)newM做維度轉(zhuǎn)換成[batch_size, time_step]restoreM = tf.reshape(newM, [-1, config.sequenceLength])#用softmax做歸一化處理[batch_size, time_step]self.alpha = tf.nn.softmax(restoreM)#利用求得的alpha的值對(duì)H進(jìn)行加權(quán)求和,用矩陣運(yùn)算直接操作r = tf.matmul(tf.transpose(H, [0, 2, 1]), tf.reshape(self.alpha, [-1, config.sequenceLength, 1]))#將三維壓縮成二維sequeezeR = [batch_size, hissen_size]sequeezeR = tf.squeeze(r)sentenceRepren = tf.tanh(sequeezeR)#對(duì)attention的輸出可以做dropout處理output = tf.nn.dropout(sentenceRepren, self.dropoutKeepProb)return outputdef _normalize(self, wordEmbedding, weights):"""對(duì)word embedding 結(jié)合權(quán)重做標(biāo)準(zhǔn)化處理"""mean = tf.matmul(weights, wordEmbedding)powWordEmbedding = tf.pow(wordEmbedding -mean, 2.)var = tf.matmul(weights, powWordEmbedding)stddev = tf.sqrt(1e-6 + var)return (wordEmbedding - mean) / stddevdef _addPerturbation(self, embedded, loss):"""添加波動(dòng)到word embedding"""grad, =tf.gradients(loss,embedded,aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)grad = tf.stop_gradient(grad)perturb = self._scaleL2(grad, config.model.epsilon)#print("perturbSize:{}".format(embedded+perturb))return embedded + perturbdef _scaleL2(self, x, norm_length):#shape(x) = [batch, num_step, d]#divide x by max(abs(x)) for a numerically stable L2 norm#2norm(x) = a * 2norm(x/a)#scale over the full sequence, dim(1, 2)alpha = tf.reduce_max(tf.abs(x), (1, 2), keep_dims=True) + 1e-12l2_norm = alpha * tf.sqrt(tf.reduce_sum(tf.pow(x/alpha, 2), (1, 2), keep_dims=True) + 1e-6)x_unit = x / l2_normreturn norm_length * x_unit
?代碼是在雙向lstm+attention的基礎(chǔ)上增加adversarial training,訓(xùn)練數(shù)據(jù)為imdb電影評(píng)論數(shù)據(jù),最后的結(jié)果發(fā)現(xiàn)確實(shí)很快就能達(dá)到最優(yōu)值,但是訓(xùn)練所占的空間比較大(電腦跑了幾十步就停止了),每一步的時(shí)間也稍微長(zhǎng)一點(diǎn)。
?
轉(zhuǎn)載于:https://www.cnblogs.com/danny92/p/10636890.html
總結(jié)
以上是生活随笔為你收集整理的文本分类的一种对抗训练方法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: CF617E XOR and Favor
- 下一篇: Xcode9 Could not rec