RNN-LSTM循环神经网络-03Tensorflow进阶实现
生活随笔
收集整理的這篇文章主要介紹了
RNN-LSTM循环神经网络-03Tensorflow进阶实现
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
- 全部代碼:點擊這里查看
- 關于Tensorflow實現一個簡單的二元序列的例子可以點擊這里查看
- 關于RNN和LSTM的基礎可以查看這里
- 這篇博客主要包含以下內容
- 訓練一個RNN模型逐字符生成文本數據(最后的部分)
- 使用Tensorflow的scan函數實現dynamic_rnn動態(tài)創(chuàng)建的效果
- 使用multiple RNN創(chuàng)建多層的RNN
- 實現Dropout和Layer Normalization的功能
一、模型說明和數據處理
1、模型說明
- 我們要使用RNN學習一個語言模型(language model)去生成字符序列
- githbub上有別人實現好的
- Torch中的實現:https://github.com/karpathy/char-rnn
- Tensorflow中的實現:https://github.com/sherjilozair/char-rnn-tensorflow
- 接下來我們來看如何實現
2、數據處理
- 數據集使用莎士比亞的一段文集,點擊這里查看, 實際也可以使用別的
- 大小寫字符視為不同的字符
- 下載并讀取數據
| 12345678 | '''下載數據并讀取數據'''file_url = 'https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/data/tiny-shakespeare.txt'file_name = 'tinyshakespeare.txt'if not os.path.exists(file_name):urllib.request.urlretrieve(file_url, filename=file_name)with open(file_name, 'r') as f:raw_data = f.read()print("數據長度", len(raw_data)) |
-
處理字符數據,轉換為數字
- 使用set去重,得到所有的唯一字符
- 然后一個字符對應一個數字(使用字典)
- 然后遍歷原始數據,得到所有字符對應的數字
12345678 '''處理字符數據,轉換為數字'''vocab = set(raw_data) # 使用set去重,這里就是去除重復的字母(大小寫是區(qū)分的)vocab_size = len(vocab) idx_to_vocab = dict(enumerate(vocab)) # 這里將set轉為了字典,每個字符對應了一個數字0,1,2,3..........(vocab_size-1)vocab_to_idx = dict(zip(idx_to_vocab.values(), idx_to_vocab.keys())) # 這里將字典的(key, value)轉換成(value, key)data = [vocab_to_idx[c] for c in raw_data] # 處理raw_data, 根據字符,得到對應的value,就是數字del raw_data
-
生成batch數據
- Tensorflow models給出的PTB模型:https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb
| 12345678910 | '''超參數'''num_steps=200 # 學習的步數batch_size=32state_size=100 # cell的sizenum_classes = vocab_sizelearning_rate = 1e-4def gen_epochs(num_epochs, num_steps, batch_size):for i in range(num_epochs):yield reader.ptb_iterator_oldversion(data, batch_size, num_steps) |
-
- ptb_iterator函數實現:
- 返回數據X,Y的shape=[batch_size, num_steps]
1234567891011121314151617181920212223242526272829303132 def ptb_iterator_oldversion(raw_data, batch_size, num_steps):"""Iterate on the raw PTB data.This generates batch_size pointers into the raw PTB data, and allowsminibatch iteration along these pointers.Args:raw_data: one of the raw data outputs from ptb_raw_data.batch_size: int, the batch size.num_steps: int, the number of unrolls.Yields:Pairs of the batched data, each a matrix of shape [batch_size, num_steps].The second element of the tuple is the same data time-shifted to theright by one.Raises:ValueError: if batch_size or num_steps are too high."""raw_data = np.array(raw_data, dtype=np.int32) data_len = len(raw_data)batch_len = data_len // batch_sizedata = np.zeros([batch_size, batch_len], dtype=np.int32)for i in range(batch_size):data[i] = raw_data[batch_len * i:batch_len * (i + 1)] epoch_size = (batch_len - 1) // num_steps if epoch_size == 0:raise ValueError("epoch_size == 0, decrease batch_size or num_steps") for i in range(epoch_size):x = data[:, i*num_steps:(i+1)*num_steps]y = data[:, i*num_steps+1:(i+1)*num_steps+1]yield (x, y)
- 返回數據X,Y的shape=[batch_size, num_steps]
- ptb_iterator函數實現:
二、使用tf.scan函數和dynamic_rnn
1、為什么使用tf.scan和dynamic_rnn
- 之前我們實現的第一個例子中沒有用dynamic_rnn的部分是將輸入的三維數據[batch_size,num_steps, state_size]按num_steps維度進行拆分,然后每計算一步都存到list列表中,如下圖
- 這種構建方式很耗時,在我們例子中沒有體現出來,但是如果我們要學習的步數很大(num_steps,也可以說要學習的依賴關系很長),如果再使用深層的RNN,這種就不合適了
-
為了方便比較和dynamic_rnn的運行耗時,下面還是給出使用list
2、使用list的方式(static_rnn)
-
構建計算圖
- 我這里tensorflow的版本是1.2.0,與1.0?些許不一樣
- 和之前的例子差不多,這里不再累述
12345678910111213141516171819202122232425262728293031323334353637383940414243444546 '''使用list的方式'''def build_basic_rnn_graph_with_list(state_size = state_size,num_classes = num_classes,batch_size = batch_size,num_steps = num_steps,num_layers = 3,learning_rate = learning_rate):reset_graph()x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')x_one_hot = tf.one_hot(x, num_classes) # (batch_size, num_steps, num_classes)'''這里按第二維拆開num_steps*(batch_size, num_classes)'''rnn_inputs = [tf.squeeze(i,squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]cell = tf.nn.rnn_cell.BasicRNNCell(state_size)init_state = cell.zero_state(batch_size, tf.float32)'''使用static_rnn方式'''rnn_outputs, final_state = tf.contrib.rnn.static_rnn(cell=cell, inputs=rnn_inputs, initial_state=init_state)#rnn_outputs, final_state = tf.nn.rnn(cell, rnn_inputs, initial_state=init_state) # tensorflow 1.0的方式with tf.variable_scope('softmax'):W = tf.get_variable('W', [state_size, num_classes])b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]y_as_list = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(y, num_steps, 1)]#loss_weights = [tf.ones([batch_size]) for i in range(num_steps)]losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_as_list, logits=logits)#losses = tf.nn.seq2seq.sequence_loss_by_example(logits, y_as_list, loss_weights) # tensorflow 1.0的方式total_loss = tf.reduce_mean(losses)train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)return dict(x = x,y = y,init_state = init_state,final_state = final_state,total_loss = total_loss,train_step = train_step)
-
訓練神經網絡函數
- 和之前例子類似
123456789101112131415161718192021222324252627 '''訓練rnn網絡的函數'''def train_rnn(g, num_epochs, num_steps=num_steps, batch_size=batch_size, verbose=True, save=False):tf.set_random_seed(2345)with tf.Session() as sess:sess.run(tf.initialize_all_variables())training_losses = []for idx, epoch in enumerate(gen_epochs(num_epochs, num_steps, batch_size)):training_loss = 0steps = 0training_state = Nonefor X, Y in epoch:steps += 1feed_dict = {g['x']: X, g['y']: Y}if training_state is not None:feed_dict[g['init_state']] = training_statetraining_loss_, training_state, _ = sess.run([g['total_loss'],g['final_state'],g['train_step']],feed_dict=feed_dict)training_loss += training_loss_ if verbose:print('epoch: {0}的平均損失值:{1}'.format(idx, training_loss/steps))training_losses.append(training_loss/steps) if isinstance(save, str):g['saver'].save(sess, save)return training_losses
- 和之前例子類似
-
調用執(zhí)行:
123456 start_time = time.time()g = build_basic_rnn_graph_with_list()print("構建圖耗時", time.time()-start_time)start_time = time.time()train_rnn(g, 3)print("訓練耗時:", time.time()-start_time) -
運行結果
- 構建計算圖耗時:?113.43532419204712
- 3個epoch運行耗時:
1234 epoch: 0的平均損失值:3.6314958388777985epoch: 1的平均損失值:3.287133811534136epoch: 2的平均損失值:3.250853428895446訓練耗時: 84.2816972732544
-
可以看出在構建圖的時候非常耗時,這里僅僅一層的cell
3、dynamic_rnn的使用
- 之前在我們第一個例子中實際已經使用過了,這里使用MultiRNNCell實現多層cell,具體下面再講
- 構建模型:
- tf.nn.embedding_lookup(params, ids)函數是在params中查找ids的表示, 和在matrix中用array索引類似, 這里是在二維embeddings中找二維的ids,?ids每一行中的一個數對應embeddings中的一行,所以最后是[batch_size, num_steps, state_size],關于具體的輸出可以查看這里
- 這里我認為就是某個字母的表示,之前上面我們的statci_rnn就是one-hot來表示的
| 12345678910111213141516171819202122232425262728293031323334353637383940414243 | '''使用dynamic_rnn方式- 之前我們自己實現的cell和static_rnn的例子都是將得到的tensor使用list存起來,這種方式構建計算圖時很慢- dynamic可以在運行時構建計算圖'''def build_multilayer_lstm_graph_with_dynamic_rnn(state_size = state_size,num_classes = num_classes,batch_size = batch_size,num_steps = num_steps,num_layers = 3,learning_rate = learning_rate):reset_graph()x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')embeddings = tf.get_variable(name='embedding_matrix', shape=[num_classes, state_size])'''這里的輸入是三維的[batch_size, num_steps, state_size]- embedding_lookup(params, ids)函數是在params中查找ids的表示, 和在matrix中用array索引類似,這里是在二維embeddings中找二維的ids, ids每一行中的一個數對應embeddings中的一行,所以最后是[batch_size, num_steps, state_size]'''rnn_inputs = tf.nn.embedding_lookup(params=embeddings, ids=x)cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)init_state = cell.zero_state(batch_size, dtype=tf.float32)'''使用dynamic_rnn方式'''rnn_outputs, final_state = tf.nn.dynamic_rnn(cell=cell, inputs=rnn_inputs, initial_state=init_state) with tf.variable_scope('softmax'):W = tf.get_variable('W', [state_size, num_classes])b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0)) rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size]) # 轉成二維的矩陣y_reshape = tf.reshape(y, [-1])logits = tf.matmul(rnn_outputs, W) + b # 進行矩陣運算total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshape))train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss) return dict(x = x,y = y,init_state = init_state,final_state = final_state,total_loss = total_loss,train_step = train_step) |
- 調用執(zhí)行即可
| 123456 | start_time = time.time()g = build_multilayer_lstm_graph_with_dynamic_rnn()print("構建圖耗時", time.time()-start_time)start_time = time.time()train_rnn(g, 3)print("訓練耗時:", time.time()-start_time) |
- 運行結果(注意這是3層的LSTM):
- 構建計算圖耗時?7.616888523101807,相比第一種static_rnn很快
- 訓練耗時(這是3層的LSTM,所以還是挺慢的):
1234 epoch: 0的平均損失值:3.604653576324726epoch: 1的平均損失值:3.3202743626188957epoch: 2的平均損失值:3.3155322650383257訓練耗時: 303.5468375682831
4、tf.scan實現的方式
- 如果你不了解tf.scan,建議看下官方API, 還是有點復雜的。
- 或者Youtube上有個介紹,點擊這里查看
- scan是個高階函數,一般的計算方式是:給定一個序列[x0,x1,…..,xn]和初試狀態(tài)y?1,根據yt=f(xt,yt?1)?計算得到最終序列[y0,y1,……,yn]
- 構建計算圖
- tf.transpose(rnn_inputs, [1,0,2])?是將rnn_inputs的第一個和第二個維度調換,即變成[num_steps,batch_size, state_size], 在dynamic_rnn函數有個time_major參數,就是指定num_steps是否在第一個維度上,默認是false的,即不在第一維
- tf.scan會將elems按照第一維拆開,所以一次就是一個step的數據(和我們static_rnn的例子類似)
- 參數a的結構和initializer的結構一致,所以a[1]就是對應的state,cell需要傳入x和state計算
- 每次迭代cell返回的是一個rnn_output, shape=(batch_size,state_size)和對應的state,num_steps之后的rnn_outputs的shape就是(num_steps, batch_size, state_size)?,state同理
- 每次輸入的x都會得到的state-->(final_states),我們只要的最后的final_state
| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455 | '''使用scan實現dynamic_rnn的效果'''def build_multilayer_lstm_graph_with_scan(state_size = state_size,num_classes = num_classes,batch_size = batch_size,num_steps = num_steps,num_layers = 3,learning_rate = learning_rate):reset_graph()x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')embeddings = tf.get_variable(name='embedding_matrix', shape=[num_classes, state_size])'''這里的輸入是三維的[batch_size, num_steps, state_size]'''rnn_inputs = tf.nn.embedding_lookup(params=embeddings, ids=x)'''構建多層的cell, 先構建一個cell, 然后使用MultiRNNCell函數構建即可'''cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True) init_state = cell.zero_state(batch_size, dtype=tf.float32)'''使用tf.scan方式- tf.transpose(rnn_inputs, [1,0,2]) 是將rnn_inputs的第一個和第二個維度調換,即[num_steps,batch_size, state_size],在dynamic_rnn函數有個time_major參數,就是指定num_steps是否在第一個維度上,默認是false的,即不在第一維- tf.scan會將elems按照第一維拆開,所以一次就是一個step的數據(和我們static_rnn的例子類似)- a的結構和initializer的結構一致,所以a[1]就是對應的state,cell需要傳入x和state計算- 每次迭代cell返回的是一個rnn_output(batch_size,state_size)和對應的state,num_steps之后的rnn_outputs的shape就是(num_steps, batch_size, state_size)- 每次輸入的x都會得到的state(final_states),我們只要的最后的final_state'''def testfn(a, x):return cell(x, a[1])rnn_outputs, final_states = tf.scan(fn=testfn, elems=tf.transpose(rnn_inputs, [1,0,2]),initializer=(tf.zeros([batch_size,state_size]),init_state))'''或者使用lambda的方式'''#rnn_outputs, final_states = tf.scan(lambda a,x: cell(x, a[1]), tf.transpose(rnn_inputs, [1,0,2]),#initializer=(tf.zeros([batch_size, state_size]),init_state))final_state = tuple([tf.nn.rnn_cell.LSTMStateTuple(tf.squeeze(tf.slice(c, [num_steps-1,0,0], [1,batch_size,state_size])),tf.squeeze(tf.slice(h, [num_steps-1,0,0], [1,batch_size,state_size]))) for c, h in final_states])with tf.variable_scope('softmax'):W = tf.get_variable('W', [state_size, num_classes])b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0)) rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])y_reshape = tf.reshape(y, [-1])logits = tf.matmul(rnn_outputs, W) + btotal_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshape))train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss) return dict(x = x,y = y,init_state = init_state,final_state = final_state,total_loss = total_loss,train_step = train_step) |
- 運行結果
- 構建計算圖耗時:?8.685610055923462?(比dynamic_rnn稍微慢一點)
- 訓練耗時(和dynamic_rnn耗時差不多)
- 使用scan的方式只比dynamic_rnn慢一點點,但是對我們來說更加靈活和清楚執(zhí)行的過程。也方便我們修改代碼(比如從state的t-2時刻跳過一個時刻直接到t)
| 1234 | epoch: 0的平均損失值:3.6226147892831384epoch: 1的平均損失值:3.3211338095281318epoch: 2的平均損失值:3.3158331972429123訓練耗時: 303.2535448074341 |
三、關于多層RNN
1、結構
- LSTM中包含兩個state,一個是c記憶單元(memory cell),另外一個是h隱藏狀態(tài)(hidden state), 在Tensorflow中是以tuple元組的形式,所以才有上面構建dynamic_rnn時的參數state_is_tuple的參數,這種方式執(zhí)行更快
- 多層的結構如下圖
- 我們可以將其包裝起來, 看起來像一個cell一樣
2、代碼
- Tensorflow中的實現就是使用tf.nn.rnn_cell.MultiRNNCell
- 聲明一個cell
- MultiRNNCell中傳入[cell]*num_layers就可以了
- 注意如果是LSTM,定義參數state_is_tuple=True
123 cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)init_state = cell.zero_state(batch_size, dtype=tf.float32)
四、Dropout操作
- 應用在一層cell的輸入和輸出,不應用在循環(huán)的部分
1、一層的cell
- static_rnn中實現
- 聲明placeholder:keep_prob = tf.placeholder(tf.float32, name='keep_prob')
- 輸入:rnn_inputs = [tf.nn.dropout(rnn_input, keep_prob) for rnn_input in rnn_inputs]
- 輸出:rnn_outputs = [tf.nn.dropout(rnn_output, keep_prob) for rnn_output in rnn_outputs]
- feed_dict中加入即可:feed_dict = {g['x']: X, g['y']: Y, g['keep_prob']: keep_prob}
- dynamic_rnn或者scan中實現
- 直接添加即可,其余類似:rnn_inputs = tf.nn.dropout(rnn_inputed, keep_prob)
2、多層cell
- 我們之前說使用MultiRNNCell將多層cell看作一個cell, 那么怎么實現對每層cell使用dropout呢
- 可以使用tf.nn.rnn_cell.DropoutWrapper來實現
- 方式一:cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=input_keep_prob, output_keep_prob=output_drop_prob)
- 如果同時使用了input_keep_prob和output_keep_prob都是0.9, 那么層之間的drop_out=0.9*0.9=0.81
- 方式二: 對于basic cell只使用一個input_keep_prob或者output_keep_prob,對MultiRNNCell也使用一個input_keep_prob或者output_keep_prob
| 1234 | cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=keep_prob)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)cell = tf.nn.rnn_cell.DropoutWrapper(cell,output_keep_prob=keep_prob) |
五、層標準化 (Layer Normalization)
1、說明
- Layer Normalization是受Batch Normalization的啟發(fā)而來,針對于RNN,可以查看相關論文
- Batch Normalization主要針對于傳統(tǒng)的深度神經網絡和CNN,關于Batch Normalization的操作和推導可以看我之前的博客
- 可以加快訓練的速度,得到更好的結果等
2、代碼
- 找到LSTMCell的源碼拷貝一份修改即可
-
layer normalization函數
- 傳入的tensor是二維的,對其進行batch normalization操作
- tf.nn.moment是計算tensor的mean value和variance value
- 然后對其進行縮放(scale)和平移(shift)
123456789101112131415 '''layer normalization'''def ln(tensor, scope=None, epsilon=1e-5):assert(len(tensor.get_shape()) == 2)m, v = tf.nn.moments(tensor, [1], keep_dims=True)if not isinstance(scope, str):scope = ''with tf.variable_scope(scope+'layer_norm'):scale = tf.get_variable(name='scale', shape=[tensor.get_shape()[1]], initializer=tf.constant_initializer(1))shift = tf.get_variable('shift',[tensor.get_shape()[1]],initializer=tf.constant_initializer(0))LN_initial = (tensor - m) / tf.sqrt(v + epsilon)return LN_initial*scale + shift
-
LSTMCell中的call方法i,j,f,o調用layer normalization操作
- _linear函數中的bias設為False, 因為BN會加上shift
123456789 '''這里bias設置為false, 因為bn會加上shift'''lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=False)i, j, f, o = array_ops.split(value=lstm_matrix, num_or_size_splits=4, axis=1)'''執(zhí)行l(wèi)n'''i = ln(i, scope = 'i/')j = ln(j, scope = 'j/')f = ln(f, scope = 'f/')o = ln(o, scope = 'o/')
- _linear函數中的bias設為False, 因為BN會加上shift
-
構建計算圖
- 可以選擇RNN GRU LSTM
- Dropout
- Layer Normalization
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960 '''最終的整合模型,- 普通RNN,GRU,LSTM- dropout- BN'''from LayerNormalizedLSTMCell import LayerNormalizedLSTMCell # 導入layer normalization的LSTMCell 文件def build_final_graph(cell_type = None,state_size = state_size,num_classes = num_classes,batch_size = batch_size,num_steps = num_steps,num_layers = 3,build_with_dropout = False,learning_rate = learning_rate): reset_graph()x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')keep_prob = tf.placeholder(tf.float32, name='keep_prob')embeddings = tf.get_variable('embedding_matrix', [num_classes, state_size])rnn_inputs = tf.nn.embedding_lookup(embeddings, x)if cell_type == 'GRU':cell = tf.nn.rnn_cell.GRUCell(state_size)elif cell_type == 'LSTM':cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)elif cell_type == 'LN_LSTM':cell = LayerNormalizedLSTMCell(state_size) # 自己修改的代碼,導入對應的文件else:cell = tf.nn.rnn_cell.BasicRNNCell(state_size)if build_with_dropout:cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=keep_prob) init_state = cell.zero_state(batch_size, tf.float32)'''dynamic_rnn'''rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)with tf.variable_scope('softmax'):W = tf.get_variable('W', [state_size, num_classes])b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])y_reshaped = tf.reshape(y, [-1])logits = tf.matmul(rnn_outputs, W) + bpredictions = tf.nn.softmax(logits)total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped))train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)return dict(x = x,y = y,keep_prob = keep_prob,init_state = init_state,final_state = final_state,total_loss = total_loss,train_step = train_step,preds = predictions,saver = tf.train.Saver())
六、生成文本
1、說明
- 訓練完成之后將計算圖保存到本地磁盤,下次直接讀取就可以了
- 我們給出第一個字符,RNN接著一個個生成字符,每次都是根據前一個字符
- 所以num_steps=1,?batch_size=1(可以想象生成prediction的shape是(1, num_classes)中選擇一個概率,–>?num_steps=1?)
2、代碼
- 構建圖(直接傳入參數即可):g = build_final_graph(cell_type='LN_LSTM', num_steps=1, batch_size=1)
-
生成文本
- 讀取訓練好的文件
- 得到給出的第一個字符對應的數字
- 循環(huán)遍歷要生成多少個字符, 每次循環(huán)生成一個字符
123456789101112131415161718192021222324252627 '''生成文本'''def generate_characters(g, checkpoint, num_chars, prompt='A', pick_top_chars=None):with tf.Session() as sess:sess.run(tf.global_variables_initializer())g['saver'].restore(sess, checkpoint) # 讀取文件state = Nonecurrent_char = vocab_to_idx[prompt] # 得到給出的字母對應的數字chars = [current_char] for i in range(num_chars): # 總共生成多少數字if state is not None: # 第一次state為None,因為計算圖中定義了剛開始為0feed_dict={g['x']: [[current_char]], g['init_state']: state} # 傳入當前字符else:feed_dict={g['x']: [[current_char]]}preds, state = sess.run([g['preds'],g['final_state']], feed_dict) # 得到預測結果(概率)preds的shape就是(1,num_classes)if pick_top_chars is not None: # 如果設置了概率較大的前多少個p = np.squeeze(preds)p[np.argsort(p)[:-pick_top_chars]] = 0 # 其余的置為0p = p / np.sum(p) # 因為下面np.random.choice函數p的概率和要求是1,處理一下current_char = np.random.choice(vocab_size, 1, p=p)[0] # 根據概率選擇一個else:current_char = np.random.choice(vocab_size, 1, p=np.squeeze(preds))[0]chars.append(current_char)chars = map(lambda x: idx_to_vocab[x], chars)result = "".join(chars)print(result)return result
-
結果
- 由于訓練耗時很長,這里使用LSTM訓練了30個epoch,結果如下
- 可以自己調整參數,可能會得到更好的結果
123456789101112131415161718192021222324252627282930313233 ANKO: HFOFMFRone s the statlighte thithe thit.BODEN --I I's a tomir.I'tshis and on ar tald the theand this he sile be cares hat s ond tho fo hour he singe sime shind and somante tat ond treang tatsing of the an the to to fook.. Ir ard the with ane she stale..ANTE --KINEShow the ard and a beat the weringe be thing or.Bo hith tho he melan to the mute steres.The singer stis ard stis.BACE CANKONS CORESard the sids ing tho the the sackes tom theINWe stoe shit a dome thorate seomser hith.Thatthow oundTANTONT. SEAT THONTITE SERTI 1 23SHe the mathe a tomonerind is ingit ofres treacentit. Sher stard on this the tor an the candin he whor he sath heres andstha dortour tit thas stand. I'd and or a
-
#2017/06/25?運行結果更新
- 更換了一個大點的數據集,點擊查看,使用了layer normalized的LSTM模型
- 參數設置:
- num_steps=80
- batch_size=50
- state_size=512
- num_classes = vocab_size
- learning_rate = 5e-4
- 30個epochs
- 在實驗室電腦跑了一晚上,結果是不是好一點了
1234567891011121314151617181920212223242526272829303132333435 AKTIN: Yousa hand it have to turn you, sir.I have. I've got to here hard on myplay as a space state, and why hehappened. What we alwaws whothis?JOCASTAND :PADM You, sir!A battle. An arm of the ship is still.THE WINDEN'S CORUSHan's laser guns at the forest fire. The crowd spots his blackfolkwark and sees the bedroom and twists and sees Leiawho is shaking. A huge creature has a long time,hold her hand and his timmed, that we see the saulyand. Thecrowd ruised by the staircase.EXT. MAZ' CASTLE RUINS - DAYRey and Wicket and CAMERA is heard. Here as so they helfthis tonight, he spins and sit in a startled bright.LUKE(into propecy)The defenstity! Thank you.LUKEI'm afraid to have a lossing live,or help. We're
Reference
- https://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
- https://karpathy.github.io/2015/05/21/rnn-effectiveness/
- http://jmlr.org/proceedings/papers/v37/ioffe15.pdf
- tensorflow scan:
- https://www.tensorflow.org/api_docs/python/tf/scan
- https://www.youtube.com/watch?v=A6qJMB3stE4&t=621s
- 原文地址:?http://lawlite.me/2017/06/21/RNN-LSTM%E5%BE%AA%E7%8E%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C-03Tensorflow%E8%BF%9B%E9%98%B6%E5%AE%9E%E7%8E%B0/
總結
以上是生活随笔為你收集整理的RNN-LSTM循环神经网络-03Tensorflow进阶实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: RNN-循环神经网络-02Tensorf
- 下一篇: (Unfinished)RNN-循环神经