當(dāng)前位置：首頁(yè) > 人工智能 > pytorch >内容正文

pytorch

深度学习(莫烦神经网络 lecture 3） Keras

發(fā)布時(shí)間：2023/12/13 pytorch 25 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习(莫烦神经网络 lecture 3） Keras 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

神經(jīng)網(wǎng)絡(luò) & Keras

神經(jīng)網(wǎng)絡(luò) & Keras
目錄
- 1、Keras簡(jiǎn)介
  - 1.1 科普: 人工神經(jīng)網(wǎng)絡(luò) VS 生物神經(jīng)網(wǎng)絡(luò)
  - 1.2 什么是神經(jīng)網(wǎng)絡(luò) (Neural Network)
  - 1.3 神經(jīng)網(wǎng)絡(luò) 梯度下降
  - 1.4 科普: 神經(jīng)網(wǎng)絡(luò)的黑盒不黑
  - 1.5 Why Keras?
  - 1.6 兼容 backend
- 2、如何搭建各種神經(jīng)網(wǎng)絡(luò)
  - 2.1 Regressor回歸
  - 2.2 Classifier 分類
  - 2.3 什么是卷積神經(jīng)網(wǎng)絡(luò) CNN
  - 2.4 CNN 卷積神經(jīng)網(wǎng)絡(luò)
  - 2.5 什么是循環(huán)神經(jīng)網(wǎng)絡(luò) RNN
  - 2.6 什么是 LSTM 循環(huán)神經(jīng)網(wǎng)絡(luò)
  - 2.7 RNN Classifier
  - 2.8 RNN Regressor
  - 2.9什么是自編碼(Autoencoder)
  - 2.10 Autoencoder 自編碼

1、Keras簡(jiǎn)介

1.1 科普: 人工神經(jīng)網(wǎng)絡(luò) VS 生物神經(jīng)網(wǎng)絡(luò)

9百億神經(jīng)細(xì)胞組成了我們復(fù)雜的神經(jīng)網(wǎng)絡(luò)系統(tǒng), 這個(gè)數(shù)量甚至可以和宇宙中的星球數(shù)相比較

1.2 什么是神經(jīng)網(wǎng)絡(luò) (Neural Network)

是存在于計(jì)算機(jī)里的神經(jīng)系統(tǒng)

1.3 神經(jīng)網(wǎng)絡(luò) 梯度下降

optimization
牛頓法 (Newton’s method), 最小二乘法(Least Squares method), 梯度下降法 (Gradient Descent) 等等

梯度下降

全局 and 局部最優(yōu)

神經(jīng)網(wǎng)絡(luò)能讓你的局部最優(yōu)足夠優(yōu)秀

1.4 科普: 神經(jīng)網(wǎng)絡(luò)的黑盒不黑

將神經(jīng)網(wǎng)絡(luò)第一層加工后的寶寶叫做代表特征(feature representation)

與其說(shuō)黑盒是在加工處理, 還不如說(shuō)是在將一種代表特征轉(zhuǎn)換成另一種代表特征, 一次次特征之間的轉(zhuǎn)換

1.5 Why Keras?

如果說(shuō) Tensorflow 或者 Theano 是神經(jīng)網(wǎng)絡(luò)方面的巨人. 那 Keras 就是站在巨人肩膀上的人.

Keras 是一個(gè)兼容 Theano 和 Tensorflow 的神經(jīng)網(wǎng)絡(luò)高級(jí)包, 用他來(lái)組件一個(gè)神經(jīng)網(wǎng)絡(luò)更加快速, 幾條語(yǔ)句就搞定了.

而且廣泛的兼容性能使 Keras 在 Windows 和 MacOS 或者 Linux 上運(yùn)行無(wú)阻礙.

1.6 兼容 backend

我們來(lái)介紹 Keras 的兩個(gè) Backend，也就是Keras基于什么東西來(lái)做運(yùn)算。Keras 可以基于兩個(gè)Backend，一個(gè)是 Theano，一個(gè)是 Tensorflow。

如果我們選擇Theano作為Keras的Backend，那么Keras就用 Theano 在底層搭建你需要的神經(jīng)網(wǎng)絡(luò)；同樣，如果選擇 Tensorflow 的話呢，Keras 就使用 Tensorflow 在底層搭建神經(jīng)網(wǎng)絡(luò)。

import keras Using Theano Backend

可以修改 Backend

2、如何搭建各種神經(jīng)網(wǎng)絡(luò)

2.1 Regressor回歸

神經(jīng)網(wǎng)絡(luò)可以用來(lái)模擬回歸問(wèn)題 (regression)，例如給下面一組數(shù)據(jù)，用一條線來(lái)對(duì)數(shù)據(jù)進(jìn)行擬合，并可以預(yù)測(cè)新輸入 x 的輸出值。

"""1、導(dǎo)入模塊、創(chuàng)建數(shù)據(jù)""" import numpy as np np.random.seed(1337) # for reproducibility from keras.models import Sequential from keras.layers import Dense import matplotlib.pyplot as plt# create some data X = np.linspace(-1, 1, 200) np.random.shuffle(X) # randomize the data Y = 0.5 * X + 2 + np.random.normal(0, 0.05, (200, )) # plot data plt.scatter(X, Y) plt.show()X_train, Y_train = X[:160], Y[:160] # first 160 data points X_test, Y_test = X[160:], Y[160:] # last 40 data points"""2、建立模型""" # build a neural network from the 1st layer to the last layer model = Sequential()model.add(Dense(units=1, input_dim=1)) # choose loss function and optimizing method model.compile(loss='mse', optimizer='sgd')"""3、訓(xùn)練、評(píng)估""" # training print('Training -----------') for step in range(301):cost = model.train_on_batch(X_train, Y_train)if step % 100 == 0:print('train cost: ', cost)# test print('\nTesting ------------') cost = model.evaluate(X_test, Y_test, batch_size=40) print('test cost:', cost)"""4、預(yù)測(cè)新樣本""" W, b = model.layers[0].get_weights() print('Weights=', W, '\nbiases=', b)# plotting the prediction Y_pred = model.predict(X_test) plt.scatter(X_test, Y_test) plt.plot(X_test, Y_pred) plt.show()

2.2 分類問(wèn)題

import numpy as np np.random.seed(1337) # for reproducibility from keras.datasets import mnist from keras.utils import np_utils from keras.models import Sequential from keras.layers import Dense, Activation from keras.optimizers import RMSprop"""1、數(shù)據(jù)預(yù)處理x變成0-1之間，y進(jìn)行one-hot編碼 """ # download the mnist to the path '~/.keras/datasets/' if it is the first time to be called # X shape (60,000 28x28), y shape (10,000, ) (X_train, y_train), (X_test, y_test) = mnist.load_data()# data pre-processing X_train = X_train.reshape(X_train.shape[0], -1) / 255. # normalize X_test = X_test.reshape(X_test.shape[0], -1) / 255. # normalize y_train = np_utils.to_categorical(y_train, num_classes=10) y_test = np_utils.to_categorical(y_test, num_classes=10)"""2、建立模型直接在模型里面加入多個(gè)層 """ # Another way to build your neural net model = Sequential([Dense(32, input_dim=784),Activation('relu'),Dense(10),Activation('softmax'), ])"""3、定義優(yōu)化器、編譯模型、訓(xùn)練""" # Another way to define your optimizer rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)# We add metrics to get more results you want to see model.compile(optimizer=rmsprop,loss='categorical_crossentropy',metrics=['accuracy'])print('Training ------------') # Another way to train the model model.fit(X_train, y_train, epochs=2, batch_size=32)"""4、評(píng)估模型""" print('\nTesting ------------') # Evaluate the model with the metrics we defined earlier loss, accuracy = model.evaluate(X_test, y_test)print('test loss: ', loss) print('test accuracy: ', accuracy) Using TensorFlow backend. Training ------------ Epoch 1/2 60000/60000 [==============================] - 5s 84us/step - loss: 0.3434 - acc: 0.9046 Epoch 2/2 60000/60000 [==============================] - 4s 67us/step - loss: 0.1948 - acc: 0.9437Testing ------------ 10000/10000 [==============================] - 0s 35us/step test loss: 0.174235421626 test accuracy: 0.9505

在回歸網(wǎng)絡(luò)中用到的是 model.add 一層一層添加神經(jīng)層，今天的方法是直接在模型的里面加多個(gè)神經(jīng)層。好比一個(gè)水管，一段一段的，數(shù)據(jù)是從上面一段掉到下面一段，再掉到下面一段。

優(yōu)化器，可以是默認(rèn)的，也可以是我們?cè)谏弦徊蕉x的。損失函數(shù)，分類和回歸問(wèn)題的不一樣，用的是交叉熵。 metrics，里面可以放入需要計(jì)算的 cost，accuracy，score 等。

2.3 什么是卷積神經(jīng)網(wǎng)絡(luò) CNN

卷積
也就是說(shuō)神經(jīng)網(wǎng)絡(luò)不再是對(duì)每個(gè)像素的輸入信息做處理了,而是圖片上每一小塊像素區(qū)域進(jìn)行處理, 這種做法加強(qiáng)了圖片信息的連續(xù)性. 使得神經(jīng)網(wǎng)絡(luò)能看到圖形, 而非一個(gè)點(diǎn). 這種做法同時(shí)也加深了神經(jīng)網(wǎng)絡(luò)對(duì)圖片的理解

池化
是一個(gè)篩選過(guò)濾的過(guò)程, 能將 layer 中有用的信息篩選出來(lái), 給下一個(gè)層分析. 同時(shí)也減輕了神經(jīng)網(wǎng)絡(luò)的計(jì)算負(fù)擔(dān)

2.4 CNN 卷積神經(jīng)網(wǎng)絡(luò)

import numpy as np np.random.seed(1337) # for reproducibility from keras.datasets import mnist from keras.utils import np_utils from keras.models import Sequential from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Flatten from keras.optimizers import Adam# download the mnist to the path '~/.keras/datasets/' if it is the first time to be called # X shape (60,000 28x28), y shape (10,000, ) (X_train, y_train), (X_test, y_test) = mnist.load_data()# data pre-processing X_train = X_train.reshape(-1, 1,28, 28)/255. X_test = X_test.reshape(-1, 1,28, 28)/255. y_train = np_utils.to_categorical(y_train, num_classes=10) y_test = np_utils.to_categorical(y_test, num_classes=10)"""1、建立模型 conv-pool-conv-pool-fc-fc""" # Another way to build your CNN model = Sequential()# Conv layer 1 output shape (32, 28, 28) model.add(Convolution2D(batch_input_shape=(None, 1, 28, 28),filters=32,kernel_size=5,strides=1,padding='same', # Padding methoddata_format='channels_first', )) model.add(Activation('relu'))# Pooling layer 1 (max pooling) output shape (32, 14, 14) model.add(MaxPooling2D(pool_size=2,strides=2,padding='same', # Padding methoddata_format='channels_first', ))# Conv layer 2 output shape (64, 14, 14) model.add(Convolution2D(64, 5, strides=1, padding='same', data_format='channels_first')) model.add(Activation('relu'))# Pooling layer 2 (max pooling) output shape (64, 7, 7) model.add(MaxPooling2D(2, 2, 'same', data_format='channels_first'))# Fully connected layer 1 input shape (64 * 7 * 7) = (3136), output shape (1024) model.add(Flatten()) model.add(Dense(1024)) model.add(Activation('relu'))# Fully connected layer 2 to shape (10) for 10 classes model.add(Dense(10)) model.add(Activation('softmax'))"""2、定義優(yōu)化器、編譯模型、訓(xùn)練""" # Another way to define your optimizer adam = Adam(lr=1e-4)# We add metrics to get more results you want to see model.compile(optimizer=adam,loss='categorical_crossentropy',metrics=['accuracy'])print('Training ------------') # Another way to train the model model.fit(X_train, y_train, epochs=1, batch_size=64,)"""3、評(píng)估""" print('\nTesting ------------') # Evaluate the model with the metrics we defined earlier loss, accuracy = model.evaluate(X_test, y_test)print('\ntest loss: ', loss) print('\ntest accuracy: ', accuracy) Using TensorFlow backend. Training ------------ Epoch 1/1 60000/60000 [==============================] - 557s 9ms/step - loss: 0.2698 - acc: 0.9265Testing ------------ 10000/10000 [==============================] - 44s 4ms/steptest loss: 0.0994714692663test accuracy: 0.9691

2.5 什么是循環(huán)神經(jīng)網(wǎng)絡(luò) RNN

今天我們會(huì)來(lái)聊聊在語(yǔ)言分析, 序列化數(shù)據(jù)中穿梭自如的循環(huán)神經(jīng)網(wǎng)絡(luò) RNN（Recurrent Neural Network）

只想著斯蒂芬喬布斯這個(gè)名字 , 請(qǐng)你再把他逆序念出來(lái). 斯布喬(*#&, 有點(diǎn)難吧. 這就說(shuō)明, 對(duì)于預(yù)測(cè), 順序排列是多么重要. 我們可以預(yù)測(cè)下一個(gè)按照一定順序排列的字, 但是打亂順序, 我們就沒辦法分析自己到底在說(shuō)什么了.

（1）序列數(shù)據(jù)

我們想象現(xiàn)在有一組序列數(shù)據(jù) data 0,1,2,3. 在當(dāng)預(yù)測(cè) result0 的時(shí)候,我們基于的是 data0, 同樣在預(yù)測(cè)其他數(shù)據(jù)的時(shí)候, 我們也都只單單基于單個(gè)的數(shù)據(jù).* 每次使用的神經(jīng)網(wǎng)絡(luò)都是同一個(gè) NN. *不過(guò)這些數(shù)據(jù)是有關(guān)聯(lián) 順序的 , 就像在廚房做菜, 醬料 A要比醬料 B 早放, 不然就串味了. 所以普通的神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)并不能讓 NN 了解這些數(shù)據(jù)之間的關(guān)聯(lián).

（2）處理序列數(shù)據(jù)的神經(jīng)網(wǎng)絡(luò)

那我們?nèi)绾巫寯?shù)據(jù)間的關(guān)聯(lián)也被 NN 加以分析呢? 想想我們?nèi)祟愂窃趺捶治龈鞣N事物的關(guān)聯(lián)吧, 最基本的方式,就是記住之前發(fā)生的事情. 那我們讓神經(jīng)網(wǎng)絡(luò)也具備這種記住之前發(fā)生的事的能力.

再分析 Data0 的時(shí)候, 我們把分析結(jié)果存入記憶. 然后當(dāng)分析 data1的時(shí)候, NN會(huì)產(chǎn)生新的記憶, 但是新記憶和老記憶是沒有聯(lián)系的. 我們就簡(jiǎn)單的把老記憶調(diào)用過(guò)來(lái), 一起分析. 如果繼續(xù)分析更多的有序數(shù)據(jù) , RNN就會(huì)把之前的記憶都累積起來(lái), 一起分析.

我們?cè)僦貜?fù)一遍剛才的流程, 不過(guò)這次是以加入一些數(shù)學(xué)方面的東西. 每次 RNN 運(yùn)算完之后都會(huì)產(chǎn)生一個(gè)對(duì)于當(dāng)前狀態(tài)的描述 , state. 我們用簡(jiǎn)寫 S( t) 代替, 然后這個(gè) RNN開始分析 x(t+1) , 他會(huì)根據(jù) x(t+1)產(chǎn)生s(t+1), 不過(guò)此時(shí) y(t+1) 是由 s(t) 和 s(t+1) 共同創(chuàng)造的. 所以我們通?？吹降?RNN 也可以表達(dá)成這種樣子.

（3）RNN的用途
RNN 的形式不單單這有這樣一種, 他的結(jié)構(gòu)形式很自由. 如果用于分類問(wèn)題, 比如說(shuō)一個(gè)人說(shuō)了一句話, 這句話帶的感情色彩是積極的還是消極的. 那我們就可以用只有最后一個(gè)時(shí)間點(diǎn)輸出判斷結(jié)果的RNN.

又或者這是圖片描述 RNN, 我們只需要一個(gè) X 來(lái)代替輸入的圖片, 然后生成對(duì)圖片描述的一段話.

或者是語(yǔ)言翻譯的 RNN, 給出一段英文, 然后再翻譯成中文.

有了這些不同形式的 RNN, RNN 就變得強(qiáng)大了. 有很多有趣的 RNN 應(yīng)用. 比如之前提到的, 讓 RNN 描述照片. 讓 RNN 寫學(xué)術(shù)論文, 讓 RNN 寫程序腳本, 讓 RNN 作曲. 我們一般人甚至都不能分辨這到底是不是機(jī)器寫出來(lái)的.

2.6 什么是 LSTM 循環(huán)神經(jīng)網(wǎng)絡(luò)

今天我們會(huì)來(lái)聊聊在普通RNN的弊端和為了解決這個(gè)弊端而提出的 LSTM 技術(shù). LSTM 是 long-short term memory 的簡(jiǎn)稱, 中文叫做長(zhǎng)短期記憶. 是當(dāng)下最流行的 RNN 形式之一

（1）RNN的弊端

之前我們說(shuō)過(guò), RNN 是在有順序的數(shù)據(jù)上進(jìn)行學(xué)習(xí)的. 為了記住這些數(shù)據(jù), RNN 會(huì)像人一樣產(chǎn)生對(duì)先前發(fā)生事件的記憶. 不過(guò)一般形式的 RNN 就像一個(gè)老爺爺, 有時(shí)候比較健忘. 為什么會(huì)這樣呢?（在時(shí)間上梯度消失）

想像現(xiàn)在有這樣一個(gè) RNN, 他的輸入值是一句話: ‘我今天要做紅燒排骨, 首先要準(zhǔn)備排骨, 然后…., 最后美味的一道菜就出鍋了’, shua ~ 說(shuō)著說(shuō)著就流口水了. 現(xiàn)在請(qǐng) RNN 來(lái)分析, 我今天做的到底是什么菜呢. RNN可能會(huì)給出“辣子雞”這個(gè)答案. 由于判斷失誤, RNN就要開始學(xué)習(xí) 這個(gè)長(zhǎng)序列 X 和 ‘紅燒排骨’ 的關(guān)系 , 而RNN需要的關(guān)鍵信息 ”紅燒排骨”卻出現(xiàn)在句子開頭,

再來(lái)看看 RNN是怎樣學(xué)習(xí)的吧. 紅燒排骨這個(gè)信息原的記憶要經(jīng)過(guò)長(zhǎng)途跋涉才能抵達(dá)最后一個(gè)時(shí)間點(diǎn). 然后我們得到誤差, 而且在反向傳遞得到的誤差的時(shí)候, 他在每一步都會(huì) 乘以一個(gè)自己的參數(shù) W.

如果這個(gè) W 是一個(gè)小于1 的數(shù), 比如0.9. 這個(gè)0.9 不斷乘以誤差, 誤差傳到初始時(shí)間點(diǎn)也會(huì)是一個(gè)接近于零的數(shù), 所以對(duì)于初始時(shí)刻, 誤差相當(dāng)于就消失了. 我們把這個(gè)問(wèn)題叫做梯度消失或者梯度彌散 Gradient vanishing.

反之如果 W 是一個(gè)大于1 的數(shù), 比如1.1 不斷累乘, 則到最后變成了無(wú)窮大的數(shù), RNN被這無(wú)窮大的數(shù)撐死了, 這種情況我們叫做剃度爆炸, Gradient exploding.

這就是普通 RNN 沒有辦法回憶起久遠(yuǎn)記憶的原因

（2）LSTM

LSTM 就是為了解決這個(gè)問(wèn)題而誕生的. LSTM 和普通 RNN 相比, 多出了三個(gè)控制器. (輸入控制, 輸出控制, 忘記控制). 現(xiàn)在, LSTM RNN 內(nèi)部的情況是這樣.

他多了一個(gè) 控制全局的記憶, 我們用粗線代替. 為了方便理解, 我們把粗線想象成電影或游戲當(dāng)中的主線劇情. 而原本的 RNN 體系就是分線劇情. 三個(gè)控制器都是在原始的 RNN 體系上, 我們先看輸入方面 , 如果此時(shí)的分線劇情對(duì)于劇終結(jié)果十分重要, 輸入控制就會(huì)將這個(gè)分線劇情按重要程度寫入主線劇情進(jìn)行分析. 再看忘記方面, 如果此時(shí)的分線劇情更改了我們對(duì)之前劇情的想法, 那么忘記控制就會(huì)將之前的某些主線劇情忘記, 按比例替換成現(xiàn)在的新劇情. 所以主線劇情的更新就取決于輸入和忘記控制. 最后的輸出方面, 輸出控制會(huì)基于目前的主線劇情和分線劇情判斷要輸出的到底是什么.

基于這些控制機(jī)制, LSTM 就像延緩記憶衰退的良藥, 可以帶來(lái)更好的結(jié)果.

2.7 RNN Classifier

這次我們用循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN, Recurrent Neural Networks)進(jìn)行分類（classification），采用MNIST數(shù)據(jù)集，主要用到SimpleRNN層。

MNIST里面的圖像分辨率是28×28，為了使用RNN，我們將圖像理解為序列化數(shù)據(jù)。每一行作為一個(gè)輸入單元，所以輸入數(shù)據(jù)大小INPUT_SIZE = 28；先是第1行輸入，再是第2行，第3行，第4行，…，第28行輸入，這就是一張圖片也就是一個(gè)序列，所以步長(zhǎng)TIME_STEPS = 28。

import numpy as np np.random.seed(1337) # for reproducibilityfrom keras.datasets import mnist from keras.utils import np_utils from keras.models import Sequential from keras.layers import SimpleRNN, Activation, Dense from keras.optimizers import AdamTIME_STEPS = 28 # same as the height of the image INPUT_SIZE = 28 # same as the width of the image BATCH_SIZE = 50 BATCH_INDEX = 0 OUTPUT_SIZE = 10 CELL_SIZE = 50 LR = 0.001# download the mnist to the path '~/.keras/datasets/' if it is the first time to be called # X shape (60,000 28x28), y shape (10,000, ) (X_train, y_train), (X_test, y_test) = mnist.load_data()# data pre-processing X_train = X_train.reshape(-1, 28, 28) / 255. # normalize X_test = X_test.reshape(-1, 28, 28) / 255. # normalize y_train = np_utils.to_categorical(y_train, num_classes=10) y_test = np_utils.to_categorical(y_test, num_classes=10)"""1、建立模型""" # build RNN model model = Sequential()# RNN cell model.add(SimpleRNN(# for batch_input_shape, if using tensorflow as the backend, we have to put None for the batch_size.# Otherwise, model.evaluate() will get error.batch_input_shape=(None, TIME_STEPS, INPUT_SIZE), # Or: input_dim=INPUT_SIZE, input_length=TIME_STEPS,output_dim=CELL_SIZE,unroll=True, ))# output layer model.add(Dense(OUTPUT_SIZE)) model.add(Activation('softmax'))"""2、編譯模型、訓(xùn)練""" # optimizer adam = Adam(LR) model.compile(optimizer=adam,loss='categorical_crossentropy',metrics=['accuracy'])# training for step in range(4001):# data shape = (batch_num, steps, inputs/outputs)X_batch = X_train[BATCH_INDEX: BATCH_INDEX+BATCH_SIZE, :, :]Y_batch = y_train[BATCH_INDEX: BATCH_INDEX+BATCH_SIZE, :]cost = model.train_on_batch(X_batch, Y_batch)BATCH_INDEX += BATCH_SIZEBATCH_INDEX = 0 if BATCH_INDEX >= X_train.shape[0] else BATCH_INDEXif step % 500 == 0:cost, accuracy = model.evaluate(X_test, y_test, batch_size=y_test.shape[0], verbose=False)print('test cost: ', cost, 'test accuracy: ', accuracy) test cost: 2.40573239326 test accuracy: 0.0390999987721 test cost: 0.608026027679 test accuracy: 0.817900002003 test cost: 0.450786024332 test accuracy: 0.864799976349 test cost: 0.341593921185 test accuracy: 0.899800002575 test cost: 0.343054682016 test accuracy: 0.898400008678 test cost: 0.27272310853 test accuracy: 0.92040002346 test cost: 0.299111783504 test accuracy: 0.908800005913 test cost: 0.228507757187 test accuracy: 0.932900011539 test cost: 0.243453606963 test accuracy: 0.927900016308

有興趣的話可以修改BATCH_SIZE和CELL_SIZE的值，試試這兩個(gè)參數(shù)對(duì)訓(xùn)練時(shí)間和精度的影響。

2.8 RNN Regressor

（1）生成序列
這次我們使用RNN來(lái)求解回歸(Regression)問(wèn)題. 首先生成序列sin(x),對(duì)應(yīng)輸出數(shù)據(jù)為cos(x),設(shè)置序列步長(zhǎng)為20，每次訓(xùn)練的BATCH_SIZE為50.

（2）搭建模型
然后添加LSTM RNN層，輸入為訓(xùn)練數(shù)據(jù)，輸出數(shù)據(jù)大小由CELL_SIZE定義。因?yàn)槊恳粋€(gè)輸入都對(duì)應(yīng)一個(gè)輸出，所以return_sequences=True。每一個(gè)點(diǎn)的當(dāng)前輸出都受前面所有輸出的影響，BATCH之間的參數(shù)也需要記憶，故stateful=True

model.add(LSTM(batch_input_shape=(BATCH_SIZE, TIME_STEPS, INPUT_SIZE), # Or: input_dim=INPUT_SIZE, input_length=TIME_STEPS,output_dim=CELL_SIZE,return_sequences=True, # True: output at all steps. False: output as last step.stateful=True, # True: the final state of batch1 is feed into the initial state of batch2 ))

最后添加輸出層，LSTM層的每一步都有輸出，使用TimeDistributed函數(shù)。

model.add(TimeDistributed(Dense(OUTPUT_SIZE)))

import numpy as np np.random.seed(1337) # for reproducibility import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import LSTM, TimeDistributed, Dense from keras.optimizers import AdamBATCH_START = 0 TIME_STEPS = 20 BATCH_SIZE = 50 INPUT_SIZE = 1 OUTPUT_SIZE = 1 CELL_SIZE = 20 LR = 0.006"""1、生成序列""" def get_batch():global BATCH_START, TIME_STEPS# xs shape (50batch, 20steps)xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS)) / (10*np.pi)seq = np.sin(xs)res = np.cos(xs)BATCH_START += TIME_STEPS# plt.plot(xs[0, :], res[0, :], 'r', xs[0, :], seq[0, :], 'b--')# plt.show()return [seq[:, :, np.newaxis], res[:, :, np.newaxis], xs]"""建立 LSTM模型""" model = Sequential() # build a LSTM RNN model.add(LSTM(batch_input_shape=(BATCH_SIZE, TIME_STEPS, INPUT_SIZE), # Or: input_dim=INPUT_SIZE, input_length=TIME_STEPS,output_dim=CELL_SIZE,return_sequences=True, # True: output at all steps. False: output as last step.stateful=True, # True: the final state of batch1 is feed into the initial state of batch2 )) # add output layer model.add(TimeDistributed(Dense(OUTPUT_SIZE))) adam = Adam(LR) model.compile(optimizer=adam,loss='mse',)print('Training ------------') for step in range(501):# data shape = (batch_num, steps, inputs/outputs)X_batch, Y_batch, xs = get_batch()cost = model.train_on_batch(X_batch, Y_batch)pred = model.predict(X_batch, BATCH_SIZE)plt.plot(xs[0, :], Y_batch[0].flatten(), 'r', xs[0, :], pred.flatten()[:TIME_STEPS], 'b--')plt.ylim((-1.2, 1.2))plt.draw()plt.pause(0.1)if step % 10 == 0:print('train cost: ', cost)

train cost: 0.0412582

2.9什么是自編碼(Autoencoder)

今天我們會(huì)來(lái)聊聊用神經(jīng)網(wǎng)絡(luò)如何進(jìn)行非監(jiān)督形式的學(xué)習(xí). 也就是 autoencoder, 自編碼.

有一個(gè)神經(jīng)網(wǎng)絡(luò), 它在做的事情是接收一張圖片, 然后給它打碼, 最后再?gòu)拇虼a后的圖片中還原. 太抽象啦? 行, 我們?cè)倬唧w點(diǎn).

假設(shè)剛剛那個(gè)神經(jīng)網(wǎng)絡(luò)是這樣, 對(duì)應(yīng)上剛剛的圖片, 可以看出圖片其實(shí)是經(jīng)過(guò)了壓縮,再解壓的這一道工序. 當(dāng)壓縮的時(shí)候, 原有的圖片質(zhì)量被縮減, 解壓時(shí)用信息量小卻包含了所有關(guān)鍵信息的文件恢復(fù)出原本的圖片. 為什么要這樣做呢?

原來(lái)有時(shí)神經(jīng)網(wǎng)絡(luò)要接受大量的輸入信息, 比如輸入信息是高清圖片時(shí), 輸入信息量可能達(dá)到上千萬(wàn), 讓神經(jīng)網(wǎng)絡(luò)直接從上千萬(wàn)個(gè)信息源中學(xué)習(xí)是一件很吃力的工作. 所以, 何不壓縮一下, 提取出原圖片中的最具代表性的信息, 縮減輸入信息量, 再把縮減過(guò)后的信息放進(jìn)神經(jīng)網(wǎng)絡(luò)學(xué)習(xí). 這樣學(xué)習(xí)起來(lái)就簡(jiǎn)單輕松了.

所以, 自編碼就能在這時(shí)發(fā)揮作用. 通過(guò)將原數(shù)據(jù)白色的X 壓縮, 解壓成黑色的X, 然后通過(guò)對(duì)比黑白 X ,求出預(yù)測(cè)誤差, 進(jìn)行反向傳遞, 逐步提升自編碼的準(zhǔn)確性. 訓(xùn)練好的自編碼中間這一部分就是能總結(jié)原數(shù)據(jù)的精髓. 可以看出, 從頭到尾, 我們只用到了輸入數(shù)據(jù) X, 并沒有用到 X 對(duì)應(yīng)的數(shù)據(jù)標(biāo)簽, 所以也可以說(shuō)自編碼是一種非監(jiān)督學(xué)習(xí). 到了真正使用自編碼的時(shí)候. 通常只會(huì)用到自編碼前半部分.

（1）編碼器encoder

這部分也叫作 encoder 編碼器. 編碼器能得到原數(shù)據(jù)的精髓, 然后我們只需要再創(chuàng)建一個(gè)小的神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)這個(gè)精髓的數(shù)據(jù),不僅減少了神經(jīng)網(wǎng)絡(luò)的負(fù)擔(dān), 而且同樣能達(dá)到很好的效果.

如果你了解 PCA 主成分分析, 再提取主要特征時(shí), 自編碼和它一樣,甚至超越了 PCA. 換句話說(shuō), 自編碼可以像 PCA 一樣給特征屬性降維.

（2）解碼器 Decoder
至于解碼器 Decoder, 我們也能那它來(lái)做點(diǎn)事情. 我們知道, 解碼器在訓(xùn)練的時(shí)候是要將精髓信息解壓成原始信息, 那么這就提供了一個(gè)解壓器的作用, 甚至我們可以認(rèn)為是一個(gè)生成器 (類似于GAN). 那做這件事的一種特殊自編碼叫做 variational autoencoders, 你能在這里找到他的具體說(shuō)明.

2.10 Autoencoder 自編碼

自編碼，簡(jiǎn)單來(lái)說(shuō)就是把輸入數(shù)據(jù)進(jìn)行一個(gè)壓縮和解壓縮的過(guò)程。原來(lái)有很多 Feature，壓縮成幾個(gè)來(lái)代表原來(lái)的數(shù)據(jù)，解壓之后恢復(fù)成原來(lái)的維度，再和原數(shù)據(jù)進(jìn)行比較。

它是一種非監(jiān)督算法，只需要輸入數(shù)據(jù)，解壓縮之后的結(jié)果與原數(shù)據(jù)本身進(jìn)行比較。

今天要做的事情是把 datasets.mnist 數(shù)據(jù)的 28×28＝784 維的數(shù)據(jù)，壓縮成 2 維的數(shù)據(jù)，然后在一個(gè)二維空間中可視化出分類的效果。

（1）建立模型
encoding_dim，要壓縮成的維度

import numpy as np np.random.seed(1337) # for reproducibilityfrom keras.datasets import mnist from keras.models import Model from keras.layers import Dense, Input import matplotlib.pyplot as plt# download the mnist to the path '~/.keras/datasets/' if it is the first time to be called # X shape (60,000 28x28), y shape (10,000, ) (x_train, _), (x_test, y_test) = mnist.load_data()# data pre-processing x_train = x_train.astype('float32') / 255. - 0.5 # minmax_normalized x_test = x_test.astype('float32') / 255. - 0.5 # minmax_normalized x_train = x_train.reshape((x_train.shape[0], -1)) x_test = x_test.reshape((x_test.shape[0], -1)) print(x_train.shape) print(x_test.shape)# in order to plot in a 2D figure encoding_dim = 2# this is our input placeholder input_img = Input(shape=(784,))# encoder layers encoded = Dense(128, activation='relu')(input_img) encoded = Dense(64, activation='relu')(encoded) encoded = Dense(10, activation='relu')(encoded) encoder_output = Dense(encoding_dim)(encoded)# decoder layers decoded = Dense(10, activation='relu')(encoder_output) decoded = Dense(64, activation='relu')(decoded) decoded = Dense(128, activation='relu')(decoded) decoded = Dense(784, activation='tanh')(decoded)# construct the autoencoder model autoencoder = Model(input=input_img, output=decoded)# construct the encoder model for plotting encoder = Model(input=input_img, output=encoder_output)# compile autoencoder autoencoder.compile(optimizer='adam', loss='mse')# training autoencoder.fit(x_train, x_train,epochs=20,batch_size=256,shuffle=True)# plotting encoded_imgs = encoder.predict(x_test) plt.scatter(encoded_imgs[:, 0], encoded_imgs[:, 1], c=y_test) plt.colorbar() plt.show()

最后看到可視化的結(jié)果，自編碼模型可以把這幾個(gè)數(shù)字給區(qū)分開來(lái)，我們可以用自編碼這個(gè)過(guò)程來(lái)作為一個(gè)特征壓縮的方法，和PCA的功能一樣，效果要比它好一些，因?yàn)樗欠蔷€性的結(jié)構(gòu)。

Epoch 1/20 60000/60000 [==============================] - 5s 86us/step - loss: 0.0683 Epoch 2/20 60000/60000 [==============================] - 5s 78us/step - loss: 0.0565 Epoch 3/20 60000/60000 [==============================] - 5s 76us/step - loss: 0.0515 Epoch 4/20 60000/60000 [==============================] - 5s 88us/step - loss: 0.0478 Epoch 5/20 60000/60000 [==============================] - 4s 71us/step - loss: 0.0459 Epoch 6/20 60000/60000 [==============================] - 4s 66us/step - loss: 0.0445 Epoch 7/20 60000/60000 [==============================] - 4s 65us/step - loss: 0.0435 Epoch 8/20 60000/60000 [==============================] - 4s 66us/step - loss: 0.0427 Epoch 9/20 60000/60000 [==============================] - 4s 66us/step - loss: 0.0421 Epoch 10/20 60000/60000 [==============================] - 4s 71us/step - loss: 0.0416 Epoch 11/20 60000/60000 [==============================] - 4s 73us/step - loss: 0.0412 Epoch 12/20 60000/60000 [==============================] - 5s 78us/step - loss: 0.0410 Epoch 13/20 60000/60000 [==============================] - 5s 77us/step - loss: 0.0406 Epoch 14/20 60000/60000 [==============================] - 5s 81us/step - loss: 0.0403 Epoch 15/20 60000/60000 [==============================] - 4s 66us/step - loss: 0.0401 Epoch 16/20 60000/60000 [==============================] - 4s 66us/step - loss: 0.0398 Epoch 17/20 60000/60000 [==============================] - 5s 79us/step - loss: 0.0395 Epoch 18/20 60000/60000 [==============================] - 4s 70us/step - loss: 0.0393 Epoch 19/20 60000/60000 [==============================] - 4s 70us/step - loss: 0.0392 Epoch 20/20 60000/60000 [==============================] - 4s 74us/step - loss: 0.0391

總結(jié)

以上是生活随笔為你收集整理的深度学习(莫烦神经网络 lecture 3） Keras的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： C++基础学习（01）--（介绍，环境配
下一篇： C++(7)--for循环,break,