Assignment | 05-week1 -Character level language model - Dinosaurus land
該系列僅在原課程基礎上課后作業部分添加個人學習筆記,如有錯誤,還請批評指教。- ZJ
Coursera 課程 |deeplearning.ai |網易云課堂
CSDN:http://blog.csdn.net/JUNJUN_ZHAO/article/details/79409325
Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely!
歡迎來到恐龍島! 6500萬年前,恐龍就存在了,在這項任務中他們又回來了。 你負責一項特殊任務。 領先的生物學研究人員正在創造新的恐龍種類并將它們帶到地球上,您的工作就是為這些恐龍命名。 如果一只恐龍不喜歡它的名字,它可能會被人誤以為是,所以明智地選擇!
Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs’ wrath!
幸運的是,你已經學會了一些深入的學習,你會用它來保存一天。 你的助手收集了他們可以找到的所有恐龍名稱的列表,并將它們編譯到這個dataset. 中。 (可以通過點擊上一個鏈接來觀看。)要創建新的恐龍名稱,您將構建一個角色級語言模型以生成新名稱。 您的算法將學習不同的名稱模式,并隨機生成新名稱。 希望這個算法能夠讓你和你的團隊免于恐龍的憤怒!
By completing this assignment you will learn:
- How to store text data for processing using an RNN
- How to synthesize 合成 data, by sampling 采樣predictions at each time step and passing it to the next RNN-cell unit
- How to build a character-level 字符級 text generation recurrent neural network
- Why clipping the gradients 梯度裁剪 is important 防止梯度爆炸
We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you’ve implemented in the previous assignment.
我們將首先加載我們在rnn_utils中為您提供的一些函數。 具體而言,您可以訪問諸如rnn_forward和rnn_backward等功能,這些功能與您在前面的任務中實現的功能相同。
import numpy as np from utils import * import random '''utils 中的代碼'''import numpy as npdef softmax(x):e_x = np.exp(x - np.max(x))return e_x / e_x.sum(axis=0)def smooth(loss, cur_loss):return loss * 0.999 + cur_loss * 0.001def print_sample(sample_ix, ix_to_char):txt = ''.join(ix_to_char[ix] for ix in sample_ix)txt = txt[0].upper() + txt[1:] # capitalize first character print ('%s' % (txt, ), end='')def get_initial_loss(vocab_size, seq_length):return -np.log(1.0/vocab_size)*seq_lengthdef initialize_parameters(n_a, n_x, n_y):"""Initialize parameters with small random valuesReturns:parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)b -- Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)"""np.random.seed(1)Wax = np.random.randn(n_a, n_x)*0.01 # input to hiddenWaa = np.random.randn(n_a, n_a)*0.01 # hidden to hiddenWya = np.random.randn(n_y, n_a)*0.01 # hidden to outputb = np.zeros((n_a, 1)) # hidden biasby = np.zeros((n_y, 1)) # output biasparameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b,"by": by}return parametersdef rnn_step_forward(parameters, a_prev, x):Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']a_next = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b) # hidden statep_t = softmax(np.dot(Wya, a_next) + by) # unnormalized log probabilities for next chars # probabilities for next chars return a_next, p_tdef rnn_step_backward(dy, gradients, parameters, x, a, a_prev):gradients['dWya'] += np.dot(dy, a.T)gradients['dby'] += dyda = np.dot(parameters['Wya'].T, dy) + gradients['da_next'] # backprop into hdaraw = (1 - a * a) * da # backprop through tanh nonlinearitygradients['db'] += darawgradients['dWax'] += np.dot(daraw, x.T)gradients['dWaa'] += np.dot(daraw, a_prev.T)gradients['da_next'] = np.dot(parameters['Waa'].T, daraw)return gradientsdef update_parameters(parameters, gradients, lr):parameters['Wax'] += -lr * gradients['dWax']parameters['Waa'] += -lr * gradients['dWaa']parameters['Wya'] += -lr * gradients['dWya']parameters['b'] += -lr * gradients['db']parameters['by'] += -lr * gradients['dby']return parametersdef rnn_forward(X, Y, a0, parameters, vocab_size = 27):# Initialize x, a and y_hat as empty dictionariesx, a, y_hat = {}, {}, {}a[-1] = np.copy(a0)# initialize your loss to 0loss = 0for t in range(len(X)):# Set x[t] to be the one-hot vector representation of the t'th character in X.# if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector. x[t] = np.zeros((vocab_size,1)) if (X[t] != None):x[t][X[t]] = 1# Run one step forward of the RNNa[t], y_hat[t] = rnn_step_forward(parameters, a[t-1], x[t])# Update the loss by substracting the cross-entropy term of this time-step from it.loss -= np.log(y_hat[t][Y[t],0])cache = (y_hat, a, x)return loss, cachedef rnn_backward(X, Y, parameters, cache):# Initialize gradients as an empty dictionarygradients = {}# Retrieve from cache and parameters(y_hat, a, x) = cacheWaa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']# each one should be initialized to zeros of the same dimension as its corresponding parametergradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)gradients['db'], gradients['dby'] = np.zeros_like(b), np.zeros_like(by)gradients['da_next'] = np.zeros_like(a[0])### START CODE HERE #### Backpropagate through timefor t in reversed(range(len(X))):dy = np.copy(y_hat[t])dy[Y[t]] -= 1gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t-1])### END CODE HERE ###return gradients, a1 - Problem Statement
1.1 - Dataset and Preprocessing
Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.
運行以下單元格以讀取恐龍名稱的數據集,創建唯一字符列表(例如a-z),并計算數據集和詞匯大小。
data = open('dinos.txt', 'r').read() data= data.lower() # 小寫 chars = list(set(data)) #先轉化為集合 去除重復的,再轉化為list data_size, vocab_size = len(data), len(chars) print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size)) There are 19909 total characters and 27 unique characters in your data.The characters are a-z (26 characters) plus the “\n” (or newline character), which in this assignment plays a role similar to the <EOS> (or “End of sentence”) token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.
這些字符是az(26個字符)加上 “\n” (或換行符),它在本次任務中扮演類似于我們在講座中討論過的<EOS>(或“句尾結束” 只有在這里它表明恐龍名稱的結尾,而不是句子的結尾。 在下面的單元格中,我們創建了一個Python字典(即哈希表),將每個字符映射到0-26的索引。 我們還創建了第二個python字典,將每個索引映射回相應的字符。 這將幫助您找出哪些索引與softmax圖層的概率分布輸出中的哪個字符相對應。 下面,char_to_ix和ix_to_char是python字典。
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) } ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) } print(ix_to_char) print(char_to_ix) {0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'} {'\n': 0, 'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}1.2 - Overview of the model
Your model will have the following structure:
- Initialize parameters
- Run the optimization loop
- Forward propagation to compute the loss function
- Backward propagation to compute the gradients with respect to the loss function
- Clip the gradients to avoid exploding gradients
- Using the gradients, update your parameter with the gradient descent update rule.
- Return the learned parameters
您的模型將具有以下結構:
- 初始化參數
- 運行優化循環
- 正向傳播以計算損失函數
- 反向傳播以計算相對于損失函數的梯度
- 剪切漸變以避免梯度爆炸
- 使用梯度,使用梯度下降更新規則更新您的參數。
- 返回學習的參數
At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset X=(x?1?,x?2?,...,x?Tx?)X=(x?1?,x?2?,...,x?Tx?) is a list of characters in the training set, while Y=(y?1?,y?2?,...,y?Tx?)Y=(y?1?,y?2?,...,y?Tx?) is such that at every time-step tt, we have y?t?=x?t+1?y?t?=x?t+1?.
2 - Building blocks of the model
In this part, you will build two important blocks of the overall model:
- Gradient clipping: to avoid exploding gradients 梯度裁剪-避免梯度爆炸
- Sampling: a technique used to generate characters 采樣- 生成字符的一個技巧
You will then apply these two functions to build the model.
2.1 - Clipping the gradients in the optimization loop
In this section you will implement the clip function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not “exploding,” meaning taking on overly large values.
在本節中,您將實現您將在優化循環中調用的clip函數。 回想一下,您的整體循環結構通常由正向傳播,成本計算,反向傳遞和參數更新組成。 在更新參數之前,您需要在需要時執行梯度裁剪,以確保您的梯度不會“爆炸”,這意味著會出現過大的值。
包含 前向廣播,損失計算,反向傳播,參數更新,4 部分。 在參數更新之前,執行梯度修剪
In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone.
在下面的練習中,您將實現一個函數clip,它接收梯度字典并在需要時返回裁剪后版本的梯度。 有不同的方法來裁剪梯度。 我們將使用一個簡單的基于元素的裁剪程序,其中梯度矢量的每個元素都被裁剪以位于某個范圍[-N,N]之間。 更一般地說,你會提供一個maxValue(比如說10)。 在這個例子中,如果梯度矢量的任何分量大于10,它將被設置為10; 并且如果梯度矢量的任何分量小于-10,則將其設置為-10。 如果它在-10到10之間,它將被單獨放置。
我們使用最簡單的逐元乘積執行裁剪過程 保證其在[-N, N] 區間內 ,設置 一個 最大值 10,如果任何一個 值大于10 了,則設置為 10 ,最小值同理。
Exercise: Implement the function below to return the clipped gradients of your dictionary gradients. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this hint for examples of how to clip in numpy. You will need to use the argument out = ....
### GRADED FUNCTION: clipdef clip(gradients, maxValue):'''Clips the gradients' values between minimum and maximum.Arguments:gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValueReturns: gradients -- a dictionary with the clipped gradients.'''### START CODE HERE #### clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)for name,val in gradients.items():gradients[name] = np.clip(val, -maxValue, maxValue, out=gradients[name])### END CODE HERE ###return gradients所以,這里使用的梯度裁剪,僅僅是設置了,設置了最大值和最小值區間。去除了較大或較小的部分。
np.random.seed(3) dWax = np.random.randn(5,3)*10 dWaa = np.random.randn(5,5)*10 dWya = np.random.randn(2,5)*10 db = np.random.randn(5,1)*10 dby = np.random.randn(2,1)*10 gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby} gradients = clip(gradients, 10) print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2]) print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1]) print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2]) print("gradients[\"db\"][4] =", gradients["db"][4]) print("gradients[\"dby\"][1] =", gradients["dby"][1]) gradients["dWaa"][1][2] = 10.0 gradients["dWax"][3][1] = -10.0 gradients["dWya"][1][2] = 0.2971381536101662 gradients["db"][4] = [10.] gradients["dby"][1] = [8.45833407]* Expected output:*
| **gradients[“dWaa”][1][2] ** | 10.0 |
| **gradients[“dWax”][3][1]** | -10.0 |
| **gradients[“dWya”][1][2]** | 0.29713815361 |
| **gradients[“db”][4]** | [ 10.] |
| **gradients[“dby”][1]** | [ 8.45833407] |
2.2 - Sampling
Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:
Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps:
Step 1: Pass the network the first “dummy” input x?1?=0??x?1?=0→ (the vector of zeros). This is the default input before we’ve generated any characters. We also set a?0?=0??a?0?=0→
Step 2: Run one step of forward propagation to get a?1?a?1? and y^?1?y^?1?. Here are the equations:
a?t+1?=tanh(Waxx?t?+Waaa?t?+b)(1)(1)a?t+1?=tanh?(Waxx?t?+Waaa?t?+b)
z?t+1?=Wyaa?t+1?+by(2)(2)z?t+1?=Wyaa?t+1?+by
y^?t+1?=softmax(z?t+1?)(3)(3)y^?t+1?=softmax(z?t+1?)
Note that y^?t+1?y^?t+1? is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). y^?t+1?iy^i?t+1? represents the probability that the character indexed by “i” is the next character. We have provided a softmax() function that you can use.
請注意,y^?t+1?y^?t+1?是一個(softmax)概率向量(其條目介于0和1之間且總和為1)。y^?t+1?iy^i?t+1? 表示由“i”索引的字符是下一個字符的概率。 我們提供了一個可以使用的softmax()函數。
- Step 3: Carry out sampling: Pick the next character’s index according to the probability distribution specified by y^?t+1?y^?t+1?. This means that if y^?t+1?i=0.16y^i?t+1?=0.16, you will pick the index “i” with 16% probability. To implement it, you can use np.random.choice.
Here is an example of how to use np.random.choice():
np.random.seed(0) p = np.array([0.1, 0.0, 0.7, 0.2]) index = np.random.choice([0, 1, 2, 3], p = p.ravel())This means that you will pick the index according to the distribution:
P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2.
- Step 4: The last step to implement in sample() is to overwrite the variable x, which currently stores x?t?x?t?, with the value of x?t+1?x?t+1?. You will represent x?t+1?x?t+1? by creating a one-hot vector corresponding to the character you’ve chosen as your prediction. You will then forward propagate x?t+1?x?t+1? in Step 1 and keep repeating the process until you get a “\n” character, indicating you’ve reached the end of the dinosaur name.
在sample()中實現的最后一步是覆蓋當前存儲 x?t?x?t?的變量x,其值為 x?t+1?x?t+1?。 您將通過創建與您選擇的角色相對應的一個one-hot vector 來表示x?t+1?x?t+1?作為您的預測。 然后,您將在步驟1中向前傳播 x?t+1?x?t+1?并繼續重復此過程,直到您收到 “\n” 字符,表示您已達到恐龍名稱的末尾。
# GRADED FUNCTION: sampledef sample(parameters, char_to_ix, seed):"""Sample a sequence of characters according to a sequence of probability distributions output of the RNN 采樣簡單理解為 隨機選取Arguments:parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. char_to_ix -- python dictionary mapping each character to an index.seed -- used for grading purposes. Do not worry about it.Returns:indices -- a list of length n containing the indices of the sampled characters."""# Retrieve parameters and relevant shapes from "parameters" dictionaryWaa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']vocab_size = by.shape[0] # by: (27, 1) 根據上述公式 tag 2 中的 計算 z 時,因為是在字符級 的基礎上做預測,所以 by 的行坐標 與 詞匯表大小相同n_a = Waa.shape[1] # print("Waa.shape:", Waa.shape) # print('by:', by.shape) # print('b:', b.shape)# Waa.shape: (100, 100)# by: (27, 1)### START CODE HERE #### Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)# x 是一個 one-hot 向量 維度是(27,1) x 字符級 所以是 27 個字符中任意一個x = np.zeros((vocab_size, 1))# Step 1': Initialize a_prev as zeros (≈1 line) 記住,這是 字符級別的,都相當于是向量a_prev = np.zeros((n_a,1 ))# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)indices = []# Idx is a flag to detect a newline character, we initialize it to -1idx = -1 # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well # trained model), which helps debugging and prevents entering an infinite loop. counter = 0newline_character = char_to_ix['\n'] # 返回的是字符“\n”所在索引位置while (idx != newline_character and counter != 50):# Step 2: Forward propagate x using the equations (1), (2) and (3)a = np.tanh(np.matmul(Wax, x) + np.matmul(Waa, a_prev) + b)z = np.matmul(Wya, a) + byy = softmax(z)# for grading purposesnp.random.seed(counter+seed) # Step 3: Sample the index of a character within the vocabulary from the probability distribution yidx = np.random.choice(range(vocab_size), p = y.ravel())# Append the index to "indices"indices.append(idx)# Step 4: Overwrite the input character as the one corresponding to the sampled index.x = np.zeros((vocab_size, 1))x[idx] = 1# Update "a_prev" to be "a"a_prev = a# for grading purposesseed +=1counter +=1### END CODE HERE ###if (counter == 50):indices.append(char_to_ix['\n'])return indices錯誤記錄:
np.random.seed(2) _, n_a = 20, 100 Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a) b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1) parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}indices = sample(parameters, char_to_ix, 0) print("Sampling:") print("list of sampled indices:", indices) print("list of sampled characters:", [ix_to_char[i] for i in indices]) Sampling: list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 3, 6, 23, 13, 1, 0] list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'c', 'f', 'w', 'm', 'a', '\n']* Expected output:*
| **list of sampled indices:** | [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0] |
| **list of sampled characters:** | [‘l’, ‘q’, ‘x’, ‘n’, ‘m’, ‘i’, ‘j’, ‘v’, ‘x’, ‘f’, ‘m’, ‘k’, ‘l’, ‘f’, ‘u’, ‘o’, ‘u’, ‘n’, ‘c’, ‘b’, ‘a’, ‘u’, ‘r’, ‘x’, ‘g’, ‘y’, ‘f’, ‘y’, ‘r’, ‘j’, ‘p’, ‘b’, ‘c’, ‘h’, ‘o’, ‘l’, ‘k’, ‘g’, ‘a’, ‘l’, ‘j’, ‘b’, ‘g’, ‘g’, ‘k’, ‘e’, ‘f’, ‘l’, ‘y’, ‘\n’, ‘\n’] |
3 - Building the language model
It is time to build the character-level language model for text generation.
3.1 - Gradient descent
In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:
- Forward propagate through the RNN to compute the loss
- Backward propagate through time to compute the gradients of the loss with respect to the parameters
- Clip the gradients if necessary
- Update your parameters using gradient descent
Exercise: Implement this optimization process (one step of stochastic gradient descent).
We provide you with the following functions:
def rnn_forward(X, Y, a_prev, parameters):""" Performs the forward propagation through the RNN and computes the cross-entropy loss.It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""....return loss, cachedef rnn_backward(X, Y, parameters, cache):""" Performs the backward propagation through time to compute the gradients of the loss with respectto the parameters. It returns also all the hidden states."""...return gradients, adef update_parameters(parameters, gradients, learning_rate):""" Updates parameters using the Gradient Descent Update Rule."""...return parameters # GRADED FUNCTION: optimizedef optimize(X, Y, a_prev, parameters, learning_rate = 0.01):"""Execute one step of the optimization to train the model.Arguments:X -- list of integers, where each integer is a number that maps to a character in the vocabulary.Y -- list of integers, exactly the same as X but shifted one index to the left.a_prev -- previous hidden state.parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)b -- Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)learning_rate -- learning rate for the model.Returns:loss -- value of the loss function (cross-entropy)gradients -- python dictionary containing:dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)db -- Gradients of bias vector, of shape (n_a, 1)dby -- Gradients of output bias vector, of shape (n_y, 1)a[len(X)-1] -- the last hidden state, of shape (n_a, 1)"""### START CODE HERE #### Forward propagate through time (≈1 line)loss, cache = rnn_forward(X, Y, a_prev, parameters)# Backpropagate through time (≈1 line)gradients, a = rnn_backward(X, Y, parameters, cache)# Clip your gradients between -5 (min) and 5 (max) (≈1 line)gradients = clip(gradients, 5)# Update parameters (≈1 line)parameters = update_parameters(parameters, gradients, learning_rate)### END CODE HERE ###return loss, gradients, a[len(X)-1] np.random.seed(1) vocab_size, n_a = 27, 100 a_prev = np.random.randn(n_a, 1) Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a) b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1) parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by} X = [12,3,5,11,22,3] Y = [4,14,11,22,25, 26]loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01) print("Loss =", loss) print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2]) print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"])) print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2]) print("gradients[\"db\"][4] =", gradients["db"][4]) print("gradients[\"dby\"][1] =", gradients["dby"][1]) print("a_last[4] =", a_last[4]) Loss = 126.50397572165383 gradients["dWaa"][1][2] = 0.1947093153471825 np.argmax(gradients["dWax"]) = 93 gradients["dWya"][1][2] = -0.007773876032003897 gradients["db"][4] = [-0.06809825] gradients["dby"][1] = [0.01538192] a_last[4] = [-1.]* Expected output:*
| **Loss ** | 126.503975722 |
| **gradients[“dWaa”][1][2]** | 0.194709315347 |
| **np.argmax(gradients[“dWax”])** | 93 |
| **gradients[“dWya”][1][2]** | -0.007773876032 |
| **gradients[“db”][4]** | [-0.06809825] |
| **gradients[“dby”][1]** | [ 0.01538192] |
| **a_last[4]** | [-1.] |
3.2 - Training the model
Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order. 先將數據隨機打亂,這樣可以隨機選取任意樣本
Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:
index = j % len(examples)X = [None] + [char_to_ix[ch] for ch in examples[index]] Y = X[1:] + [char_to_ix["\n"]]Note that we use: index= j % len(examples), where j = 1....num_iterations, to make sure that examples[index] is always a valid statement (index is smaller than len(examples)).
The first entry of X being None will be interpreted by rnn_forward() as setting x?0?=0??x?0?=0→. Further, this ensures that Y is equal to X but shifted one step to the left, and with an additional “\n” appended to signify the end of the dinosaur name.
Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.
parameters = model(data, ix_to_char, char_to_ix) Iteration: 0, Loss: 23.087336Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Nkzxwtdmfqoeyhsqwasjkjvu Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kneb Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kzxwtdmfqoeyhsqwasjkjvu Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Neb Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Zxwtdmfqoeyhsqwasjkjvu Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eb Waa.shape: (50, 50) by: (27, 1) b: (50, 1) XwtdmfqoeyhsqwasjkjvuIteration: 2000, Loss: 27.884160Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Liusskeomnolxeros Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Hmdaairus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Hytroligoraurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lecalosapaus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xusicikoraurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Abalpsamantisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TpraneronxerosIteration: 4000, Loss: 25.901815Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mivrosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Inee Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ivtroplisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mbaaisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Wusichisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Cabaselachus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) ToraperlethosdarenitochusthiamamumamaonIteration: 6000, Loss: 24.608779Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Onwusceomosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lieeaerosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lxussaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Oma Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xusteonosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eeahosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) ToreonosaurusIteration: 8000, Loss: 24.070350Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Onxusichepriuon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kilabersaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lutrodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Omaaerosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xutrcheps Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Edaksoje Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrodiktonusIteration: 10000, Loss: 23.844446Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Onyusaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Klecalosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lustodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ola Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xusodonia Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eeaeosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TroceosaurusIteration: 12000, Loss: 23.291971Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Onyxosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kica Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lustrepiosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Olaagrraiansaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yuspangosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eealosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrognesaurusIteration: 14000, Loss: 23.382339Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meutromodromurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Inda Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Iutroinatorsaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Maca Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yusteratoptititan Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ca Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TroclosaurusIteration: 16000, Loss: 23.259291Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meustomia Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Indaadps Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Justolongchudosatrus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macabosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yuspanhosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Caaerosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrodonIteration: 18000, Loss: 22.940799Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Phusaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meicamitheastosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mussteratops Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Peg Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ytrong Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Egaltor Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrolomeIteration: 20000, Loss: 22.894192Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meutrodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lledansteh Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lwuspconyxauosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macalosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yusocichugus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiagosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrrangosaurusIteration: 22000, Loss: 22.851820Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Onustolia Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Midcagosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mwrrodonnonus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ola Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yurodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiaeptia Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrodoniohusIteration: 24000, Loss: 22.700408Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meutosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Jmacagosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kurrodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macaistel Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Yuroeleton Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiaeror Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrodonosaurusIteration: 26000, Loss: 22.736918Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Niutosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Liga Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lustoingosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Necakroia Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xrprinhtilus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiaestehastes Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrocilosaurusIteration: 28000, Loss: 22.595568Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meutosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kolaaeus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kystodonisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macahtopadrus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Xtrrararkaumurpasaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiaeosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrodmanolusIteration: 30000, Loss: 22.609381Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Meutosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Kracakosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Lustodon Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macaisthachwisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Wusqandosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Eiacosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrsatisaurusIteration: 32000, Loss: 22.251308Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mausinasaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Incaadropeglsaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Itrosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macamisaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Wuroenatoraerax Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Ehanosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) TrnanclodratosaurusIteration: 34000, Loss: 22.477910Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Mawspichaniaekorocimamroberax Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Inda Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Itrus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Macaesis Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Wrosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) Elaeosaurus Waa.shape: (50, 50) by: (27, 1) b: (50, 1) StegngosaurusConclusion
You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like maconucon, marloralus and macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurus, don, aura, tor, etc.
If your model generates some non-cool names, don’t blame the model entirely–not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!
This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!
你可以看到你的算法已經開始在訓練結束時產生合理的恐龍名稱。起初,它會產生隨機的角色,但最終你可以看到恐龍的名字與涼爽的結局。隨意運行該算法的時間更長,并使用超參數來查看是否可以獲得更好的結果。我們的實現產生了一些非??岬拿?#xff0c;比如maconucon,marloralus和macingsersaurus。你的模型也希望能夠知道,恐龍的名字往往以saurus,don,aura,tor等結尾。
如果你的模型產生了一些不酷的名字,不要完全責怪模型 - 并不是所有的真實恐龍名字都很酷。 (例如,dromaeosauroides是一個實際的恐龍名字,并且正在訓練集中。)但是這個模型應該給你一組候選人,你可以從中選出最酷的!
這項任務使用了一個相對較小的數據集,因此您可以在CPU上快速訓練RNN。培養英語語言模型需要一個更大的數據集,通常需要更多的計算,并且可以在GPU上運行數小時。我們恐龍的名字已經有相當長的一段時間了,到目前為止,我們最喜歡的名字是偉大的,不可戰勝的和激烈的:Mangosaurus!
4 - Writing like Shakespeare
The rest of this notebook is optional and is not graded, but we hope you’ll do it anyway since it’s quite fun and informative.
A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text–e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short.
這個筆記的其余部分是可選的,并沒有評分,但我們希望你會做到這一點,因為它非常有趣和信息豐富。
一個類似的(但更復雜的)任務是生成莎士比亞詩歌。 而不是從恐龍名稱的數據集中學習,你可以使用莎士比亞詩歌集。 使用 LSTM 單元,您可以學習跨越文本中多個字符的長期依賴關系 - 例如,出現在某個序列某個位置的某個字符可能影響該序列中很晚很久以后應該是不同字符的位置。 這些長期的依賴對于恐龍的名字來說不那么重要,因為名字很短。
We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.
from __future__ import print_function from keras.callbacks import LambdaCallback from keras.models import Model, load_model, Sequential from keras.layers import Dense, Activation, Dropout, Input, Masking from keras.layers import LSTM from keras.utils.data_utils import get_file from keras.preprocessing.sequence import pad_sequences from shakespeare_utils import * import sys import io d:\program files\python36\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.from ._conv import register_converters as _register_converters Using TensorFlow backend.Loading text data... Creating training set... number of training examples: 31412 Vectorizing training set... Loading model...To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called “The Sonnets”.
Let’s train the model for one more epoch. When it finishes training for an epoch—this will also take a few minutes—you can run generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try “Forsooth this maketh no sense ” (don’t enter the quotation marks). Depending on whether you include the space at the end, your results might also differ–try it both ways, and try other inputs as well.
讓我們訓練模型再一個epoch。 當它完成一個時代的訓練—這也需要幾分鐘—你可以運行generate_output,它會提示你詢問輸入(<40個字符)。 詩將以你的句子開頭,我們的 RNN 莎士比亞將為你完成詩的其余部分! 例如,嘗試“Forsooth this maketh no sense”(不要輸入引號)。 根據最后是否包含空格,結果可能也會不同 - 嘗試兩種方法,并嘗試其他輸入。
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback]) Epoch 1/1 31412/31412 [==============================] - 45s 1ms/step - loss: 2.5432<keras.callbacks.History at 0x1f40ab380f0> # Run this cell to try with different inputs without having to re-train the model generate_output() Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: where are you ? my love.Here is your poem: where are you ? my love. so to eve to by monter the time the bid, and beautyso hearting foot chalke deand: the lopperveh that bace my hister live mied, my peeter's berllose briat of wrateling true, a bud my ispeles thought i ashaying wited, a wend the state's bucince i be peter tingside is care on mening beronss, bage my theors, on time thou thy srabus cide midh storms now butr, he his witth fassude it tand: i me and the說一句額外的話,看到這生成的詩,我個人感覺,AI ML DL 進步的空間 可創新性 還是那么的大,情感,是詩歌的靈魂,怎么賦予機器以情感,我是很好奇的。
The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:
- LSTMs instead of the basic RNN to capture longer-range dependencies
- The model is a deeper, stacked LSTM model (2 layer)
- Using Keras instead of python to simplify the code
RNN-Shakespeare 模型與您為恐龍名稱建立的模型非常相似。 唯一的主要區別是:
- LSTM 而不是基本的 RNN 來捕獲更長距離的依賴關系
- 該模型是更深的堆疊 LSTM 模型(2層)
- 使用 Keras 而不是 python 來簡化代碼
If you want to learn more, you can also check out the Keras Team’s text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.
Congratulations on finishing this notebook!
References:
- This exercise took inspiration from Andrej Karpathy’s implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy’s blog post.
- For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
PS: 歡迎掃碼關注公眾號:「SelfImprovementLab」!專注「深度學習」,「機器學習」,「人工智能」。以及 「早起」,「閱讀」,「運動」,「英語 」「其他」不定期建群 打卡互助活動。
總結
以上是生活随笔為你收集整理的Assignment | 05-week1 -Character level language model - Dinosaurus land的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: js原生购物车功能
- 下一篇: Js 箭头函数 详细介绍(多种使用场景差