當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow[基础篇]——LSTM的理解与实现

發布時間：2025/3/15 编程问答 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 Tensorflow[基础篇]——LSTM的理解与实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

本文參考了tensorflow github里面的實現的lstm的教程代碼6_lstm.ipynb。因為這代碼即實現了lstm，也實操了tf的內容，可以說是一箭雙雕。

源碼地址：https://github.com/Salon-sai/learning-tensorflow/tree/master/lesson4

小情緒

鄙人原本想試試NLP的，由于最近一直忙于做項目（急需換電腦，也準備做鴨，做男優來謀點財），而且最近心事重重，心虛不寧，滿腹心事，茶飯不思，蹙額顰眉，寢不安席，輾轉反側，雙眉緊皺，耿耿于懷。因此遲遲未能寫完整本文。

lstm理論知識

在簡書中有一篇很好的文章，大家可以參考一下當中圖和公式： [譯] 理解 LSTM 網絡。 LSTM的論文：https://arxiv.org/pdf/1402.1128v1.pdf

其實LSTM就是忘記以前的文字內容并記憶當前輸入的內容。而LSTM并不是完整的RNN，他僅僅對RNN的隱含層進行改進。而LSTM對隱含層進行精密的設計，設計出forget, input ,output, state這些閥門。

傳統的RNN

在這個間隔不斷增大時，RNN 會喪失學習到連接如此遠的信息的能力（個人認為跟vanishing gradient有關，因為在很深的神經網絡里面，梯度會逐級遞減，所以考前的cell就不能學到后面內容，就只能根據附近的信息學習）。而LSTM沒有這個問題。

LSTM隱含層

（鄙人沒有對公式進行證明，所以在此猜測一下。）LSTM是分為state和h_t兩個作為下一個元組的輸入內容。

實戰代碼

config.py

# config.py # -*-coding:utf-8-*-# import stringclass ModelConfig(object):def __init__(self):self.num_unrollings = 10 # 每條數據的字符串長度self.batch_size = 64 # 每一批數據的個數self.vocabulary_size = len(string.ascii_lowercase) + 1 # 定義出現字符串的個數(一共有26個英文字母和一個空格)self.summary_frequency = 100 # 生成樣本的頻率self.num_steps = 7001 # 訓練步數self.num_nodes = 64 # 隱含層個數config = ModelConfig()

如上，config.py用來保存一些變量。

handle_data.py

# -*-coding:utf-8-*-# import tensorflow as tf import string import zipfile import numpy as npfirst_letter = ord(string.ascii_lowercase[0])class LoadData(object):def __init__(self, valid_size=1000):self.text = self._read_data()self.valid_text = self.text[:valid_size]self.train_text = self.text[valid_size:]def _read_data(self, filename='text8.zip'):with zipfile.ZipFile(filename) as f:# 獲取當中的一個文件name = f.namelist()[0]print('file name : %s ' % name)data = tf.compat.as_str(f.read(name))return datadef char2id(char):# 將字母轉換成idif char in string.ascii_lowercase:return ord(char) - first_letter + 1elif char == ' ':return 0else:print("Unexpencted character: %s " % char)return 0def id2char(dictid):# 將id轉換成字母if dictid > 0:return chr(dictid + first_letter - 1)else:return ' 'def characters(probabilities):# 根據傳入的概率向量得到相應的詞return [id2char(c) for c in np.argmax(probabilities, 1)]def batches2string(batches):# 用于測試得到的batches是否符合原來的字符組合s = [''] * batches[0].shape[0]for b in batches:s = [''.join(x) for x in zip(s, characters(b))]return s

這里要提醒一下我拿的數據是text8.zip大家可以去下載來用。LoadData就是將壓縮包里面的文本拿出來.然后再劃分成train_text和valid_text兩個。這里還有一些char2id和id2char方法，這些都為了后面使用的。

BatchGenerator.py

# -*-coding:utf-8-*-# import numpy as np from handleData import char2id from config import configclass BatchGenerator(object):def __init__(self, text, batch_size, num_unrollings):self._text = textself._text_size = len(text)self._batch_size = batch_sizeself._num_unrollings = num_unrollings# 每個串之間的間距segment = self._text_size // self._batch_size# 記錄每個串當前的位置self._cursor =[ offset * segment for offset in range(self._batch_size)]self._last_batch = self._next_batch()def _next_batch(self):"""從當前數據的游標位置生成單一批數據，一個batch的大小為(batch, 27)"""batch = np.zeros(shape=(self._batch_size, config.vocabulary_size), dtype=np.float)for b in range(self._batch_size):# 生成one-hot向量batch[b, char2id(self._text[self._cursor[b]])] = 1.0self._cursor[b] = (self._cursor[b] + 1) % self._text_sizereturn batchdef next(self):# 因為這里加入了上一批數據的最后一個字符，所以當前這批# 數據每串長度為num_unrollings + 1batches = [self._last_batch]for step in range(self._num_unrollings):batches.append(self._next_batch())self._last_batch = batches[-1]return batches

這里是一個batch生成器,根據batch_size和num_unrollings生成batch_size個num_unrollings長度的字符串.可能這個類看起來比較繞,大家可以運行剛剛在handleData里面的batches2string函數來把理解好這個類.

train_batches = BatchGenerator(train_text, batch_size, num_unrollings) valid_batches = BatchGenerator(valid_text, 1, 1)print(batches2string(train_batches.next())) print(batches2string(train_batches.next())) print(batches2string(valid_batches.next())) print(batches2string(valid_batches.next()))

你會發現它打印的內容是這樣的:

['ons anarchi', 'when milita', 'lleria arch', ' abbeys and', 'married urr', 'hel and ric', 'y and litur', 'ay opened f', 'tion from t', 'migration t', 'new york ot', 'he boeing s', 'e listed wi', 'eber has pr', 'o be made t', 'yer who rec', 'ore signifi', 'a fierce cr', ' two six ei', 'aristotle s', 'ity can be ', ' and intrac', 'tion of the', 'dy to pass ', 'f certain d', 'at it will ', 'e convince ', 'ent told hi', 'ampaign and', 'rver side s', 'ious texts ', 'o capitaliz', 'a duplicate', 'gh ann es d', 'ine january', 'ross zero t', 'cal theorie', 'ast instanc', ' dimensiona', 'most holy m', 't s support', 'u is still ', 'e oscillati', 'o eight sub', 'of italy la', 's the tower', 'klahoma pre', 'erprise lin', 'ws becomes ', 'et in a naz', 'the fabian ', 'etchy to re', ' sharman ne', 'ised empero', 'ting in pol', 'd neo latin', 'th risky ri', 'encyclopedi', 'fense the a', 'duating fro', 'treet grid ', 'ations more', 'appeal of d', 'si have mad'] ['ists advoca', 'ary governm', 'hes nationa', 'd monasteri', 'raca prince', 'chard baer ', 'rgical lang', 'for passeng', 'the nationa', 'took place ', 'ther well k', 'seven six s', 'ith a gloss', 'robably bee', 'to recogniz', 'ceived the ', 'icant than ', 'ritic of th', 'ight in sig', 's uncaused ', ' lost as in', 'cellular ic', 'e size of t', ' him a stic', 'drugs confu', ' take to co', ' the priest', 'im to name ', 'd barred at', 'standard fo', ' such as es', 'ze on the g', 'e of the or', 'd hiver one', 'y eight mar', 'the lead ch', 'es classica', 'ce the non ', 'al analysis', 'mormons bel', 't or at lea', ' disagreed ', 'ing system ', 'btypes base', 'anguages th', 'r commissio', 'ess one nin', 'nux suse li', ' the first ', 'zi concentr', ' society ne', 'elatively s', 'etworks sha', 'or hirohito', 'litical ini', 'n most of t', 'iskerdoo ri', 'ic overview', 'air compone', 'om acnm acc', ' centerline', 'e than any ', 'devotional ', 'de such dev'] [' a'] ['an']

你發現這個是一個數組大小是batch_size，每個字符串都是num_unrollings。細心的你會更會注意到每個字符串在文中的間隔是segment也就是text_size // batch_size。而這個_next_batch函數其實就是生成一個只有一個字符長度為batch_size的數組，而且每個字符之間的間隔為segment。那next函數就是按照順序依次生成num_unrollings個只有一個字符長度為batch_size的數組。最后把他們join在一起就出現剛剛打印的內容啦。這樣以來我們就等于有個迭代生成數據集合的對象啦。這個類的代碼還是挺值得我們分析一下的。（大家可以debug看看吧）

sample.py

# -*-coding:utf-8-*-#import random import numpy as np from config import configdef sample_distribution(distribution):# 隨機概率分布采樣r = random.uniform(0, 1)s = 0for i in range(len(distribution)):s += distribution[i]if s >= r:return ireturn len(distribution) - 1def sample(prediction):# 隨機采樣生成one-hot向量p = np.zeros(shape=[1, config.vocabulary_size], dtype=np.float)p[0, sample_distribution(prediction[0])] = 1.0return pdef random_distribution():# 生成隨機概率向量,向量大小為1*27b = np.random.uniform(0.0, 1.0, size=[1, config.vocabulary_size])return b / np.sum(b, 1)[:, None]

lstm_model.py

# -*-coding:utf-8-*-# import tensorflow as tf from config import configclass LSTM_Cell(object):def __init__(self, train_data, train_label, num_nodes=64):with tf.variable_scope("input", initializer=tf.truncated_normal_initializer(-0.1, 0.1)) as input_layer:self.ix, self.im, self.ib = self._generate_w_b(x_weights_size=[config.vocabulary_size, num_nodes],m_weights_size=[num_nodes, num_nodes],biases_size=[1, num_nodes])with tf.variable_scope("memory", initializer=tf.truncated_normal_initializer(-0.1, 0.1)) as update_layer:self.cx, self.cm, self.cb = self._generate_w_b(x_weights_size=[config.vocabulary_size, num_nodes],m_weights_size=[num_nodes, num_nodes],biases_size=[1, num_nodes])with tf.variable_scope("forget", initializer=tf.truncated_normal_initializer(-0.1, 0.1)) as forget_layer:self.fx, self.fm, self.fb = self._generate_w_b(x_weights_size=[config.vocabulary_size, num_nodes],m_weights_size=[num_nodes, num_nodes],biases_size=[1, num_nodes])with tf.variable_scope("output", initializer=tf.truncated_normal_initializer(-0.1, 0.1)) as output_layer:self.ox, self.om, self.ob = self._generate_w_b(x_weights_size=[config.vocabulary_size, num_nodes],m_weights_size=[num_nodes, num_nodes],biases_size=[1, num_nodes])self.w = tf.Variable(tf.truncated_normal([num_nodes, config.vocabulary_size], -0.1, 0.1))self.b = tf.Variable(tf.zeros([config.vocabulary_size]))self.saved_output = tf.Variable(tf.zeros([config.batch_size, num_nodes]), trainable=False)self.saved_state = tf.Variable(tf.zeros([config.batch_size, num_nodes]), trainable=False)self.train_data = train_dataself.train_label = train_labeldef _generate_w_b(self, x_weights_size, m_weights_size, biases_size):x_w = tf.get_variable("x_weights", x_weights_size)m_w = tf.get_variable("m_weigths", m_weights_size)b = tf.get_variable("biases", config.batch_size, initializer=tf.constant_initializer(0.0))return x_w, m_w, bdef _run(self, input, output, state):forget_gate = tf.sigmoid(tf.matmul(input, self.fx) + tf.matmul(output, self.fm) + self.fb)input_gate = tf.sigmoid(tf.matmul(input, self.ix) + tf.matmul(output, self.im) + self.ib)update = tf.matmul(input, self.cx) + tf.matmul(output, self.cm) + self.cbstate = state * forget_gate + tf.tanh(update) * input_gateoutput_gate = tf.sigmoid(tf.matmul(input, self.ox) + tf.matmul(output, self.om) + self.ob)return output_gate * tf.tanh(state), statedef loss_func(self):outputs = list()output = self.saved_outputstate = self.saved_statefor i in self.train_data:output, state = self._run(i, output, state)outputs.append(output)# finnaly, the length of outputs is num_unrollingswith tf.control_dependencies([self.saved_output.assign(output),self.saved_state.assign(state)]):# concat(0, outputs) to concat the list of output on the dim 0# the length of outputs is batch_sizelogits = tf.nn.xw_plus_b(tf.concat(outputs, 0), self.w, self.b)# the label should fix the size of ouputsloss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf.concat(self.train_label, 0),logits=logits))train_prediction = tf.nn.softmax(logits)return logits, loss, train_prediction

這是一個本篇最核心的內容。我們在__init__里面定義了很多參數，這里我就不多加說明。直接上圖上公式更加清晰明了。

lstm cell內部的模型結構

LSTM的變量分析

這些變量也說明了__init__里面各個參數的含義。我在這里翻譯一下中文意思

x_t: 該LSTM cell的輸入向量
h_t: 該LSTM cell的輸出向量
c_t: 該LSTM cell的狀態向量
W, U 和 b：參數矩陣和向量
f_t, i_t和 o_t都是閥門向量
其中:
f_t為忘記閥門向量。它表示過去舊的信息的記憶權重（0就是應該要忘記，1就是要保留的）
i_t為輸入閥門。它表示接受新內容的權重是多少（0就是應該要忘記，1就是要保留的）
o_t為輸入閥門，它表示輸出的變量應該是多少

傳統的LSTM內部模型的公式

這里的公式就是_run里面的運行的內容。結合上面的變量一看就明白當中奧秘了。這個sigmod函數就是使得權重在0-1之間的重要函數。值得注意的是：計算當前LSTM cell的state時候，向量與向量之間是逐點向乘哦。可不要搞成矩陣乘法哦。（鄙人在這里沒看清楚公式就寫錯代碼了）另外當中的內容需要大家留意最后輸出h_t的計算不一定要對狀態加入激活函數的計算，直接與o_t做點乘就好了。

這里的loss_func就是通過計算softmax和cross_entropy計算預測與目標之間的損失值。我們就可以得到最后損失函數啦哈哈。

在main.py的輔助函數

def get_optimizer(loss):global_step = tf.Variable(0)learning_rate = tf.train.exponential_decay(10.0, global_step, 5000, 0.1, staircase=True)optimizer = tf.train.GradientDescentOptimizer(learning_rate)gradients, v = zip(*optimizer.compute_gradients(loss))# 為了避免梯度爆炸的問題，我們求出梯度的二范數。# 然后判斷該二范數是否大于1.25，若大于，則變成# gradients * (1.25 / global_norm)作為當前的gradientsgradients, _ = tf.clip_by_global_norm(gradients, 1.25)# 將剛剛求得的梯度組裝成相應的梯度下降法optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)return optimizer, learning_ratedef logprob(predictions, labels):# 計算交叉熵predictions[predictions < 1e-10] = 1e-10return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]

顯然這兩個分別是獲取學習算法另一個是計算交叉商也就是損失值。這里只得注意的是學習算法。可以看到它與之前的學習算法不同，因為他多個tf.clip_by_global_norm(gradients, 1.25)。LSTM對于RNN的隱含層的改進就是這個將梯度消失（vanishing gradient）變為梯度爆炸（exploding gradient）。梯度消失比較麻煩，因為消失了我們就很難讓靠前的LSTM單元學習到內容，但梯度爆炸可以通過正則化壓制梯度過大的問題。所以我們這里就用了clip的處理方式來處理這個問題。

梯度截取的公式

大家看這個圖就明白當中的含義啦。不止如此作者還是用指數遞減來降低學習率的問題。

訓練

定義好數據流和模型

loadData = LoadData() train_text = loadData.train_text valid_text = loadData.valid_texttrain_batcher = BatchGenerator(text=train_text, batch_size=config.batch_size, num_unrollings=config.num_unrollings) vaild_batcher = BatchGenerator(text=valid_text, batch_size=1, num_unrollings=1)# 定義訓練數據由num_unrollings個占位符組成 train_data = list() for _ in range(config.num_unrollings + 1):train_data.append(tf.placeholder(tf.float32, shape=[config.batch_size, config.vocabulary_size]))train_input = train_data[:config.num_unrollings] train_label= train_data[1:]# define the lstm train model model = LSTM_Cell(train_data=train_input,train_label=train_label) # get the loss and the prediction logits, loss, train_prediction = model.loss_func() optimizer, learning_rate = get_optimizer(loss)

我們的train_data是有num_unrollings個batch，每個batch之間的字符是相鄰的。因為我們用LSTM的時候是預測哪個字符出現在下一個位置的可能最大，所以我們的label和data之間是錯開相差一個字符。

定義樣本

# 定義樣本(通過訓練后的rnn網絡自動生成文字)的輸入,輸出,重置 sample_input = tf.placeholder(tf.float32, shape=[1, config.vocabulary_size]) save_sample_output = tf.Variable(tf.zeros([1, config.num_nodes])) save_sample_state = tf.Variable(tf.zeros([1, config.num_nodes])) reset_sample_state = tf.group(save_sample_output.assign(tf.zeros([1, config.num_nodes])),save_sample_state.assign(tf.zeros([1, config.num_nodes])))sample_output, sample_state = model._run(sample_input, save_sample_output, save_sample_state) with tf.control_dependencies([save_sample_output.assign(sample_output),save_sample_state.assign(sample_state)]):# 生成樣本sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, model.w, model.b))

這里的樣本是指每訓練一定次數就根據現在有的訓練結果隨機生成一段文字樣本。可以讓大家看看訓練的學習效果如何（個人覺得聽差勁的，哈哈哈）。
這里有些要注意的地方control_dependencies這個函數。因為不是順序執行語言，一般模型如果不是相關的語句，其執行是沒有先后順序的。這里我們必須先保存了output和state，因為在下次計算損失函數的時候需要重用上次的output和state。

開始訓練

# training with tf.Session() as session:tf.global_variables_initializer().run()print("Initialized....")mean_loss = 0for step in range(config.num_steps):batches = train_batcher.next()feed_dict = dict()for i in range(config.num_unrollings + 1):feed_dict[train_data[i]] = batches[i]_, l, predictions, lr = session.run([optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)# 計算每一批數據的平均損失mean_loss += lif step % config.summary_frequency == 0:if step > 0:mean_loss = mean_loss / config.summary_frequencyprint('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))mean_loss = 0labels = np.concatenate(list(batches)[1:])print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))if step % (config.summary_frequency * 10) == 0:# Generate some samples.print('=' * 80)for _ in range(5):feed = sample(random_distribution())sentence = characters(feed)[0]reset_sample_state.run()for _ in range(79):prediction = sample_prediction.eval({sample_input: feed})feed = sample(prediction)sentence += characters(feed)[0]print(sentence)print('=' * 80)reset_sample_state.run()

跟以往如出一轍，把之前的準備好的數據倒到損失函數上，然后迭代累積損失函數，最后加上梯度下降算法對模型進行優化。

總結

這僅僅是一個lstm深入理解當中的公式和原理（但沒有證明它的收斂性和長期依賴性），并且熟悉tf的一些操作。

這里用one-hot作為詞向量的方法是不行的，假如要提高準確率的話，就需要使用word2vec這些東西來表示每個字符（單詞）的向量。

Reference

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/6_lstm.ipynb
https://liusida.github.io/2016/11/16/study-lstm/
http://www.jianshu.com/p/9dc9f41f0b29
https://arxiv.org/pdf/1402.1128v1.pdf

作者：Salon_sai
鏈接：http://www.jianshu.com/p/b6130685d855
來源：簡書
著作權歸作者所有。商業轉載請聯系作者獲得授權，非商業轉載請注明出處。

總結

以上是生活随笔為你收集整理的Tensorflow[基础篇]——LSTM的理解与实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： tensorflow笔记：多层LSTM代
下一篇： All of Recurrent Neu

编程问答

Tensorflow[基础篇]——LSTM的理解与实现

前言

lstm理論知識

實戰代碼

config.py

handle_data.py

BatchGenerator.py

sample.py

lstm_model.py

在main.py的輔助函數

訓練

定義好數據流和模型

定義樣本

開始訓練

總結

Reference

總結