imdb导mysql_keras如何导入本地下载的imdb数据集?
第一步 下載數(shù)據(jù)集到本地
提取碼:9h3u
存儲(chǔ)位置:C:/用戶/用戶名/.keras/datasets
(用戶名不同人不一樣,可能電腦不一樣存儲(chǔ)位置也略有差異)
第二步 導(dǎo)入數(shù)據(jù)集
import keras
import numpy as np
# load data
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
----------
查看數(shù)據(jù)集是否導(dǎo)入正確
print(train_labels[0]) #1
print(max([max(sequence) for sequence in train_data])) #9999
----------
遇到的一些小問(wèn)題以及解決辦法:
若出現(xiàn)了幾個(gè)問(wèn)題,最后差不多是這樣:raise ValueError("Object arrays cannot be loaded when " ValueError: Object arrays cannot be loaded ……
這說(shuō)明numpy版本太高了,我一開(kāi)始的版本是1.16.4,之后轉(zhuǎn)換成了1.16.2
版本轉(zhuǎn)換:
cmd輸入xxxxxxxxxxxxxxxx numpy==1.16.2
xxxxxxxxxx為https://mirrors.tuna.tsinghua.edu.cn/help/pypi/中代碼,可加快下載速度,直接復(fù)制,只需要將some-package改成numpy==1.16.2即可
第三步 電影評(píng)論二分類完整代碼示例
import keras
import numpy as np
# load data
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
print(train_labels[0]) #1
print(max([max(sequence) for sequence in train_data])) #9999
# 將索引解碼為單詞,需要下載imdb_word_index.json至C:/用戶/用戶名/.keras/datasets
# 鏈接:https://pan.baidu.com/s/1kkmpXrr1tkFtg7D3LX_lcw 提取碼:wzjw
word_index = imdb.get_word_index() #將單詞映射為整數(shù)索引的字典
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()]) #鍵值顛倒,將整數(shù)索引映射為單詞
decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])
#print(decoded_review)
# 對(duì)列表進(jìn)行one-shot編碼,eg.將[3,5]轉(zhuǎn)換成[0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,...]
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i,sequence] = 1. #將 results[i] 的指定索引設(shè)為 1
return results
# handle input data
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
#print(x_train[0]) #[0. 1. 1. ... 0. 0. 0.]
# handle output data
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# 驗(yàn)證集預(yù)留
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
# build model
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
# train model
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val))
history_dict = history.history
#print(history_dict.keys()) #dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
# 繪制訓(xùn)練損失、驗(yàn)證損失、訓(xùn)練精度、驗(yàn)證精度
import matplotlib.pyplot as plt
# plot loss
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc)+1)
plt.plot(epochs, loss, 'bo', label='Training loss') #blue o
plt.plot(epochs, val_loss, 'b', label='Validation loss') #blue solid line
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# plot accuracy
plt.clf() #清空?qǐng)D像
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
----------
此處驗(yàn)證集用來(lái)確定訓(xùn)練NN可采用的最佳epoch,訓(xùn)練集--->NN參數(shù),得出epochs=4
用新參數(shù)搭建的NN去訓(xùn)練train_data,注釋掉history以及history之后的代碼:
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
print(model.predict(x_test))
------------
進(jìn)一步實(shí)驗(yàn)的實(shí)驗(yàn)結(jié)果,(控制變量:
[0.29455984374523164, 0.88312] #原結(jié)構(gòu)共三層
[0.2833905682277679, 0.88576] #共兩層
[0.30949291754722597, 0.87984] #神經(jīng)元個(gè)數(shù)為32
[0.08610797638118267, 0.88308] #mse
[0.32080167996406556, 0.87764] #用tanh代替relu
選定的結(jié)構(gòu)較為合適。
總結(jié)
以上是生活随笔為你收集整理的imdb导mysql_keras如何导入本地下载的imdb数据集?的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: python定义一个dog类 类属性有名
- 下一篇: mysql 查看最近的语句_查看MySQ