當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用RNN和TensorFlow创建自己的Harry Potter短故事

發布時間：2023/12/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了使用RNN和TensorFlow创建自己的Harry Potter短故事小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

數據科學 (Data Science)

“當然，這發生在你的腦海里，哈利，但是為什么在地球上那應該意味著那不是真實的呢？”1 (“Of course it is happening inside your head, Harry, but why on earth should that mean that it is not real?”1)

Still, waiting for your Hogwarts letter? Want to enjoy the feast in the Great Hall? Explore the secret passages in Hogwarts?Buy your first wand from Ollivander’s?*sigh* You are not alone.

還在等您的霍格沃茨信嗎？想在人民大會堂享受盛宴嗎？探索霍格沃茨的秘密通道？從奧利凡德的手中買第一把魔杖？*嘆氣*并不孤單。

I have (after all this time?) always been obsessed with Harry Potter, and I recently started learning neural networks. It’s fascinating to see how creative you can get with Deep Learning, so I thought why not brew them up?

( 一直以來？ ) 我一直沉迷于Harry Potter ，最近我開始學習神經網絡。令人驚訝的是，您可以通過深度學習獲得怎樣的創造力，所以我想為什么不把它們釀造呢？

So I executed a simple text generation model using TensorFlow to create my own version of a Harry Potter short-story (can't get as good as J.K. Rowling, duh!)

因此，我使用TensorFlow執行了一個簡單的文本生成模型，以創建自己的哈利波特短篇小說版本(不能像JK羅琳那樣好！)

This article runs you through the entire code I wrote to implement it. But for all the Hermione’s out there, you can directly find the github code here and run it yourself!

本文為您介紹了我編寫的用于實現它的全部代碼。但是對于所有Hermione來說，您都可以在這里直接找到github代碼并自己運行它！

So here’s something which will cast a Banishing Charm on your boredom while you’re quarantined.

因此，這里的一些東西，會投上你的無聊一個放逐魅力，而你隔離。

背景 (Background)

什么是RNN？ (What is an RNN?)

A Recurrent Neural Network is different from the other neural networks as it has a memory which stores information of all the layers it has processed so far and computes the next layer on the basis of this memory. For a simple introduction to RNNs, you can refer to this.

遞歸神經網絡與其他神經網絡不同，它具有一個內存，該內存可存儲到目前為止已處理的所有層的信息，并根據此內存計算下一層。有關RNN的簡單介紹，可以參考this 。

GRU與LSTM (GRU vs LSTM)

Both of these are great for text generation but GRUs are a newer concept…and there isn’t actually a way to determine which one is better in general. Tuning your hyper-parameters well is what will improve your model performance more than choosing a good architecture.2

兩者都非常適合生成文本，但是GRU是一個較新的概念……實際上，沒有一種方法可以確定總體上哪個更好。與選擇良好的體系結構相比， 調整好超參數的方法可以更好地改善模型性能。2

However, if the amount of data is not a problem, LSTMs perform better. If you have less data, GRUs have fewer parameters so they train faster and work well to generalize the lesser data.

但是，如果數據量不是問題，則LSTM的性能會更好。如果數據較少，則GRU的參數較少，因此訓練速度更快，并且可以很好地推廣較少的數據。

Feel free to check out this article for a more detailed explanation.

請隨意查看本文以獲取更詳細的說明。

為什么基于角色？ (Why character-based?)

When working with large datasets like this, the complete number of unique words in a corpus is much higher than the number of unique characters. A large dataset will have many many unique words, and when we assign one-hot encodings to such large matrices we’re likely to run into memory issues. Our labels alone can take up storage of terabytes of RAM.

當使用這樣的大型數據集時， 語料庫中唯一詞的完整數量要比唯一字符的數量高得多 。大型數據集將包含許多獨特的單詞，當我們為此類大型矩陣分配單編碼時，我們很可能會遇到內存問題。僅我們的標簽就可以占用數TB的RAM。

So, the same principles which you use to predict words can be applied here, but now you’ll be working with much smaller vocabulary size.

因此，可以在這里應用與預測單詞相同的原理，但是現在您將使用較小的詞匯量。

代碼 (The code)

So let’s get started!

因此，讓我們開始吧！

首先，導入您需要的庫 (First, import the libraries you need)

import tensorflow as tf
import numpy as np
import os
import time

現在，讀取數據 (Now, read the data)

You can find and download transcripts of all the Harry Potter books from this Kaggle dataset. Here, I am combining all the seven books into one text file named ‘harrypotter.txt’. You can also train your model on any one book if you like. Just experiment with it!

你可以找到并從所有哈利·波特書籍下載謄本 Kaggle數據集。在這里，我將全部七本書合并為一個名為“ harrypotter.txt”的文本文件。如果愿意，您還可以在任何一本書上訓練模型。只是嘗試一下！

files= [‘1SorcerersStone.txt’, ‘2ChamberofSecrets.txt’, ‘3ThePrisonerOfAzkaban.txt’, ‘4TheGobletOfFire.txt’, ‘5OrderofthePhoenix.txt’, ‘6TheHalfBloodPrince.txt’, ‘7DeathlyHollows.txt’]
with open(‘harrypotter.txt’, ‘w’) as outfile:
for file in files:
with open(file) as infile:
outfile.write(infile.read())
text = open(‘harrypotter.txt’).read()

看數據 (Looking at the data)

print(text[:300])

“Harry Potter and the Sorcerer’s Stone

“哈利·波特與魔法石

CHAPTER ONE

第一章

THE BOY WHO LIVED

一個住在。。。的男孩

Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they”3

排名第四的Privet Drive的Dursley夫婦很自豪地說他們很正常，非常感謝。他們是您最后希望參與任何奇怪或神秘事物的人，因為他們“3

處理數據 (Processing the data)

We map all the unique character strings in vocab to numbers by making two look-up tables:

通過創建兩個查詢表，我們將vocab所有唯一字符串映射為數字：

mapping the characters to numbers (char2index)

將字符映射到數字( char2index )

mapping the numbers back to the characters (index2char)

將數字映射回字符( index2char )

Then convert our text to numbers..

然后將我們的文本轉換為數字。

vocab = sorted(set(text))
char2index = {u:i for i, u in enumerate(vocab)}
index2char = np.array(vocab)
text_as_int = np.array([char2index[c] for c in text])#how it looks:
print ('{} -- characters mapped to int -- > {}'.format(repr(text[:13]), text_as_int[:13]))

‘Harry Potter ‘ — characters mapped to int → [39 64 81 81 88 3 47 78 83 83 68 81 3]

'Harry Potter'-映射到int的字符→[39 64 81 81 88 3 47 78 83 83 68 81 3]

Each input sequence for our model will contain seq_length number of characters from the text, and its corresponding target sequence will be of the same length with all characters shifted one place to the right. So we break the text into chunks of seq_length+1.?

我們模型的每個輸入序列將包含文本中seq_length個字符，并且其對應的目標序列將具有相同的長度，所有字符都向右移一位。因此，我們將文本分成seq_length+1塊。

tf.data.Dataset.from_tensor_slices converts the text vector into a stream of character indices and the batch method lets us group these characters into batches of the required length.

tf.data.Dataset.from_tensor_slices將文本向量轉換為字符索引流，并且batch方法使我們可以將這些字符分組為所需長度的批處理。

By using the map method to apply a simple function to each batch, we create our inputs and targets.

通過使用map方法對每個批次應用簡單的函數，我們創建了輸入和目標。

seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)def split_input_target(data):
input_text = data[:-1]
target_text = data[1:]
return input_text, target_textdataset = sequences.map(split_input_target)

Before feeding this data into the model, we shuffle the data and divide it into batches. tf.data maintains a buffer in which it shuffles elements.

在將這些數據輸入模型之前，我們將數據混洗并將其分為幾批。 tf.data維護一個緩沖區，在緩沖區中它會tf.data元素。

BATCH_SIZE = 64
BUFFER_SIZE = 10000dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

建立模型 (Building the Model)

Given all the characters computed until this moment, what will the next character be? This is what we will be training our RNN model to predict.

給定到目前為止計算出的所有字符，下一個字符將是什么？這就是我們將訓練RNN模型進行預測的內容。

I have used tf.keras.Sequential to define the model since all the layers in it only have a single input and produce a single output. The different layers used are:

我已經使用tf.keras.Sequential定義了模型，因為其中的所有層都只有一個輸入并產生一個輸出。使用的不同層是：

tf.keras.layers.Embedding : This is the input layer. An embedding is used to map all the unique characters to vectors in multi-dimensional space, having embedding_dim dimensions.
tf.keras.layers.Embedding ：這是輸入層。嵌入用于將所有唯一字符映射到具有embedding_dim維度的多維空間中的向量。
tf.keras.layers.GRU: A type of RNN with rnn_units number of units.(You can also use an LSTM layer here to see what works best for your data)
tf.keras.layers.GRU ：一種具有rnn_units個單位數的RNN。(您也可以在此處使用LSTM層，以查看最適合您的數據的層)
tf.keras.layers.Dense: This is the output layer, with vocab_size outputs.
tf.keras.layers.Dense ：這是輸出層，帶有vocab_size輸出。

It is also useful to define all the hyper-parameters separately so that it’s easier for you to change them later without editing the model definition.

分別定義所有超參數也很有用，這樣以后您無需編輯模型定義就可以更輕松地對其進行更改。

Source資源 vocab_size = len(vocab)
embedding_dim = 300
# Number of RNN units
rnn_units1 = 512
rnn_units2 = 256
rnn_units= [rnn_units1, rnn_units2]def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]), tf.keras.layers.GRU(rnn_units1, return_sequences=True,
stateful=True,recurrent_initializer='glorot_uniform'), tf.keras.layers.GRU(rnn_units2, return_sequences=True,
stateful=True,recurrent_initializer='glorot_uniform'), tf.keras.layers.Dense(vocab_size) ])
return modelmodel = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)

訓練模型 (Training the model)

The standard tf.keras.losses.sparse_categorical_crossentropy loss function works best with our model as it is applied across the last layer of the predictions. We set from_logits to True because the model returns logits. Then we choose the adam optimizer and compile our model.

標準的tf.keras.losses.sparse_categorical_crossentropy損失函數與我們的模型一起使用時效果最佳，因為它應用于預測的最后一層。我們將from_logits設置為True，因為該模型返回logits。然后，選擇adam優化器并編譯我們的模型。

def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels,
logits, from_logits=True)model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])

You can configure checkpoints like this to ensure that checkpoints are saved during training.

您可以像這樣配置檢查點，以確保在訓練期間保存檢查點。

# Directory where the checkpoints will be saved
checkpoint_dir = ‘./training_checkpoints’
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, “ckpt_{epoch}”)
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix, save_weights_only=True)

The training time of each epoch depends on your model layers and hyper-parameters used. I have set epochs to 50 to see how accuracy and loss change over time, but it may not be required to train for all 50 epochs. Make sure to stop training when you see your loss starts to increase or remains constant for a few epochs. The last epoch you train will be stored in latest_check . If using Google Colab, set the runtime to GPU to reduce training time.

每個時期的訓練時間取決于您的模型層和使用的超參數。我將時期設置為50，以查看準確性和損失如何隨時間變化，但是可能不需要為所有50個時期訓練。當您看到損失開始增加或在幾個時期保持恒定時，請務必停止訓練。您訓練的最后一個紀元將存儲在latest_check 。如果使用Google Colab，請將運行時設置為GPU以減少訓練時間。

EPOCHS= 50
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
latest_check = tf.train.latest_checkpoint(checkpoint_dir)

文字產生 (Text generation)

If you wish to use a different batch size, you need to rebuild the model and reload the checkpoints before running. I have used batch_size of 1 to keep it simple.

如果希望使用其他批處理大小，則需要在運行之前重建模型并重新加載檢查點。我使用1的batch_size來保持簡單。

(You can run a model.summary() to get insights on the layers of your model and the output shape after each layer)

(您可以運行model.summary()來深入了解模型的各層以及每一層之后的輸出形狀)

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(latest_check)
model.build(tf.TensorShape([1, None]))
model.summary()

The following function now generates the text:

現在，以下函數將生成文本：

It accepts a start_string, initializes the RNN state and sets the number of output characters to num_generate
它接受一個start_string ，初始化RNN狀態并將輸出字符數設置為num_generate
Gets the prediction distribution of the next character using start_string and the RNN state. Then it calculates the index of the predicted character, which is our next input to the model.
使用start_string和RNN狀態獲取下一個字符的預測分布。然后，它計算預測字符的索引，這是我們對該模型的下一個輸入。
The output state returned by the model is fed back into the model so that it now has more context, (as shown below). After predicting the next character, the cycle continues. This way the RNN learns as it builds up it’s memory from the previous outputs.?
由模型返回的輸出狀態將反饋到模型中，以便它現在具有更多上下文(如下所示)。預測下一個字符后，循環繼續。這樣RNN就可以從以前的輸出中建立內存來學習。?

Source資源

A lower scaling results in a more predictable text whereas higher scaling gives a more surprising text.
較低的scaling會產生更可預測的文本，而較高的scaling會帶來更令人驚訝的文本。

def generate_text(model, start_string): num_generate = 1000 #can be anything you like input_eval = [char2index[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0) text_generated = [] scaling = 0.5 #kept at a lower value here # Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
predictions = predictions / scaling
predicted_id = tf.random.categorical(predictions,
num_samples=1)[1,0].numpy()
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])return (start_string + ‘’.join(text_generated))

And you’re done!

大功告成！

產出 (Outputs)

You can try giving it different start strings to get different outputs.

您可以嘗試給它不同的開始字符串以獲得不同的輸出。

Here is a part of the output using my favorite character:

這是使用我最喜歡的角色的輸出的一部分：

print(generate_text(model, start_string=u”Severus Snape“))

Severus Snape moved to the scarlet Hogwarts students. Hermione said, “Well, I think it’s all right, all right, a bit dead before. . . .”“I think I’ll have to go to the other than you be to help him a question of the staff table and the doors opened and he stared at the clock to Harry. “I think it make the sword of Gryffindor, who was there too, he was on his pillows, and he and Ron stared at him. “I am sure we can bother the boy — ““You should have been there,” said Ron, and he took a strange and color.“I mean, he was a really good …

西弗勒斯·斯內普(Severus Snape)搬到了猩紅色的霍格沃茨大學(Hogwarts)的學生們。赫敏說：“好吧，我認為這沒事，沒事，以前有點死了。。。 ”“我想我將不得不去其他人，幫助他解決職員桌的問題，門開了，他盯著哈利的鐘。 “我認為這使格蘭芬多的劍也在那里，他躺在枕頭上，他和羅恩凝視著他。” “我確定我們會打擾這個男孩的-”“你應該去過那兒，”羅恩說，他帶著一種奇怪的色彩。“我的意思是，他真的很好……

You can also try different sentences:

您還可以嘗試其他句子：

Voldemort died of coronavirus.”“You didn’t know what to do,” said Harry, “it was a surrounding cloak, he was the one who sustain you to go to the way.”“Yeah, well, I think you might have done that!” she said, striding up the steps, and the strength were so far as he was a pretty great tent that was the first time they might have realized I saw him to be devastated and screaming of the crowd through the darkness at the time shouts and silence.“You see, Harry!”“I don’t know, see you haven’t got anything to do with a prater of the Ministry of Magic …

伏地魔死于冠狀病毒。 ““你不知道該怎么辦，”哈利說，“那是一個周圍的斗篷，他是那個扶著你走的人。”“是的，我想你可能已經做到了！” 她說，邁上臺階，力量如此之大，以至于他是一個很棒的帳篷，這是他們第一次意識到我看到他被毀滅了，在呼喊和沉默中，黑暗中人群在尖叫“你明白了，哈利！”“我不知道，看看你與魔法部的偽裝沒有任何關系……

Here is one example if you train the model using just the first book, Sorcerer’s Stone3:

如果僅使用第一本書《巫師之石》來訓練模型，這是一個示例：

Dumbledore in the Leaky Cauldron, now empty. Harry had never been to London before. Although Hagrid seemed very cold and green eyes. He was still shaking.

漏水的大鍋中的鄧布利多 ，現在空了。哈利以前從未去過倫敦。盡管海格看上去非常冷淡和綠色。他還在發抖。

Harry sat down next to the bowl of peas. “What did you talk to Professor Dumbledore.”

哈利在豌豆碗旁坐下。 “你對鄧布利多教授說了什么。”

She eyed him with a mixture of shock and suspicion.

她震驚和懷疑地看著他。

“Who’s there?” he said suddenly as they climbed the street. He could just see the bundle of blankets on the step of number four.

“誰在那兒？” 他們爬上街道時，他突然說。他只能在四號臺階上看到那捆毯子。

Dudley’s favorite punching bag was Harry, but he couldn’t often catch him. Harry didn’t say anything …

達德利最喜歡的出氣筒是哈利，但他卻很少能抓住他。哈利什么都沒說……

You’ll see the model knows when to capitalize words, make a new paragraph and it imitates a magical writing vocabulary!

您會看到模型知道何時將單詞大寫，編寫新段落，并且它模仿了神奇的寫作詞匯！

Mischief Managed.

惡作劇管理。

To make the sentences more coherent, you can improve the model by

為了使句子更連貫，您可以通過以下方式改進模型：

changing the different parameter values like seq_length?, rnn_units?, embedding_dims?, scaling to find the best settings
更改不同的參數值，例如seq_length ， rnn_units ， embedding_dims ， scaling以找到最佳設置
training it for more epochs
訓練更多的時代
adding more layers of GRU / LSTM
添加更多層的GRU / LSTM

This model can be trained on any other series you like. Do share your own stories in the comments and have fun!

可以在您喜歡的任何其他系列上訓練該模型。在評論中分享您自己的故事并玩得開心！

[1] J.K. Rowling, Harry Potter and the Deathly Hallows, 2007

[1] JK羅琳，《哈利·波特與死亡圣器》，2007年

[2] Recurrent Neural Network Tutorial, Part 4 — Implementing a GRU/LSTM RNN with Python and Theano, OCTOBER 27, 2015 BY DENNY BRITZ

[2] 遞歸神經網絡教程，第4部分—使用Python和Theano實現GRU / LSTM RNN ，2015年10月27日，作者： DENNY BRITZ

[3] J.K. Rowling, Harry Potter and the Sorcerer’s Stone, 1998

[3] JK羅琳，《哈利·波特與魔法石》，1998年

[4] Text generation with an RNN, TensorFlow

[4] 使用RNN ，TensorFlow 生成文本

翻譯自: https://medium.com/towards-artificial-intelligence/create-your-own-harry-potter-short-story-using-rnn-and-tensorflow-853b3ed1b8f3

總結

以上是生活随笔為你收集整理的使用RNN和TensorFlow创建自己的Harry Potter短故事的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。