【LSTM】基于LSTM网络的人脸识别算法的MATLAB仿真
1.軟件版本
matlab2021a
2.本算法理論知識(shí)
??? 長(zhǎng)短時(shí)記憶模型LSTM是由Hochreiter等人在1997年首次提出的,其主要原理是通過(guò)一種特殊的神經(jīng)元結(jié)構(gòu)用來(lái)長(zhǎng)時(shí)間存儲(chǔ)信息。LSTM網(wǎng)絡(luò)模型的基本結(jié)構(gòu)如下圖所示:
圖1 LSTM網(wǎng)絡(luò)的基本結(jié)構(gòu)
??? 從圖1的結(jié)構(gòu)圖可知,LSMT網(wǎng)絡(luò)結(jié)構(gòu)包括輸入層,記憶模塊以及輸出層三個(gè)部分,其中記憶模塊由輸入門(Input Gate)、遺忘門(Forget Gate)以及輸出門(Output Gate)。LSTM模型通過(guò)這三個(gè)控制門來(lái)控制神經(jīng)網(wǎng)絡(luò)中所有的神經(jīng)元的讀寫操作。
??? LSTM模型的基本原理是通過(guò)多個(gè)控制門來(lái)抑制RNN神經(jīng)網(wǎng)絡(luò)梯度消失的缺陷。通過(guò)LSTM模型可以在較長(zhǎng)的時(shí)間內(nèi)保存梯度信息,延長(zhǎng)信號(hào)的處理時(shí)間,因此LSTM模型適合處理各種頻率大小的信號(hào)以及高低頻混合信號(hào)。LSTM模型中的記憶單元中輸入門(Input Gate)、遺忘門(Forget Gate)以及輸出門(Output Gate)通過(guò)控制單元組成非線性求和單元。其中輸入門、遺忘門以及輸出門三個(gè)控制門的激活函數(shù)為Sigmoid函數(shù),通過(guò)該函數(shù)實(shí)現(xiàn)控制門“開(kāi)”和“關(guān)”狀態(tài)的改變。
??? 下圖為L(zhǎng)STM模型中記憶模塊的內(nèi)部結(jié)構(gòu)圖:
?
圖2 LSTM網(wǎng)絡(luò)的記憶單元內(nèi)部結(jié)構(gòu)
??? 從圖2的結(jié)構(gòu)圖可知,LSTM的記憶單元的工作原理為,當(dāng)輸入門進(jìn)入”開(kāi)“狀態(tài),那么外部信息由記憶單元讀取信息,當(dāng)輸入門進(jìn)入“關(guān)”狀態(tài),那么外部信息無(wú)法進(jìn)入記憶單元。同理,遺忘門和輸出門也有著相似的控制功能。LSTM模型通過(guò)這三個(gè)控制門將各種梯度信息長(zhǎng)久的保存在記憶單元中。當(dāng)記憶單元進(jìn)行信息的長(zhǎng)時(shí)間保存的時(shí)候,其遺忘門處于“開(kāi)”狀態(tài),輸入門處于“關(guān)”狀態(tài)。
??? 當(dāng)輸入門進(jìn)入“開(kāi)”狀態(tài)之后,記憶單元開(kāi)始接受到外部信息并進(jìn)行存儲(chǔ)。當(dāng)輸入門進(jìn)入“關(guān)”狀態(tài)之后,記憶單元暫停接受外部信息,同時(shí),輸出門進(jìn)入“開(kāi)”狀態(tài),記憶單元中保存的信息傳輸?shù)胶笠粚印6z忘門的功能則是在必要的時(shí)候?qū)ι窠?jīng)元的狀態(tài)進(jìn)行重置。
??? 對(duì)于LSTM網(wǎng)絡(luò)模型的前向傳播過(guò)程,其涉及到的各個(gè)數(shù)學(xué)原理如下:
?
?2.遺忘門計(jì)算過(guò)程如下所示:
? ? ? ?
?3.記憶單元計(jì)算過(guò)程如下所示:
?4.輸出門計(jì)算過(guò)程如下所示:
?5.記憶單元輸出計(jì)算過(guò)程如下所示:
對(duì)于LSTM網(wǎng)絡(luò)模型的反向傳播過(guò)程,其涉及到的各個(gè)數(shù)學(xué)原理如下:
?6.輸入門計(jì)算過(guò)程如下所示:
??? 基于LSTM網(wǎng)絡(luò)的視覺(jué)識(shí)別算法,其整體算法流程圖如下圖所示:
????????????????????????? ??????
?
圖3基于LSTM網(wǎng)絡(luò)的視覺(jué)識(shí)別算法流程圖
根據(jù)圖3的算法流程圖,本文所要研究的基于LSTM網(wǎng)絡(luò)的視覺(jué)識(shí)別算法步驟為:
??? 步驟一:圖像的采集,本文以人臉圖像為研究對(duì)象。
??? 步驟二:圖像預(yù)處理,根據(jù)本章2節(jié)的內(nèi)容對(duì)所需要識(shí)別的視覺(jué)圖像進(jìn)行預(yù)處理,獲得較為清晰的圖像。
??? 步驟三:圖像分割,將圖像進(jìn)行分割,分割大小根據(jù)采集圖像的識(shí)別目標(biāo)和整體場(chǎng)景大小關(guān)系進(jìn)行確定,將原始的圖像分割為大小的子圖像。
??? 步驟四:子圖幾何元素提取,通過(guò)邊緣提取方法,獲得每個(gè)子圖中所包含的幾何元素,并將各個(gè)幾何元素構(gòu)成句子信息。
??? 步驟五:將句子信息輸入到LSTM網(wǎng)絡(luò),這個(gè)步驟也是核心環(huán)節(jié),下面對(duì)LSTM網(wǎng)絡(luò)的識(shí)別過(guò)程進(jìn)行介紹。首先,將句子信息通過(guò)LSTM的輸入層輸入到LSTM網(wǎng)絡(luò)中,基本結(jié)構(gòu)圖如下圖所示:
圖3基于LSTM網(wǎng)絡(luò)的識(shí)別結(jié)構(gòu)圖
??? 這里假設(shè)LSTM某一時(shí)刻的輸入特征信息和輸出結(jié)果為和,其記憶模塊中的輸入和輸出為和,和表示LSTM神經(jīng)元的激活函數(shù)的輸出和隱含層的輸出,整個(gè)LSTM的訓(xùn)練流程為:
3.核心代碼
function nn = func_LSTM(train_x,train_y,test_x,test_y);binary_dim = 8; largest_number = 2^binary_dim - 1; binary = cell(largest_number, 1);for i = 1:largest_number + 1binary{i} = dec2bin(i-1, binary_dim);int2binary{i} = binary{i}; end%input variables alpha = 0.000001; input_dim = 2; hidden_dim = 32; output_dim = 1;%initialize neural network weights %in_gate = sigmoid(X(t) * U_i + H(t-1) * W_i) U_i = 2 * rand(input_dim, hidden_dim) - 1; W_i = 2 * rand(hidden_dim, hidden_dim) - 1; U_i_update = zeros(size(U_i)); W_i_update = zeros(size(W_i));%forget_gate = sigmoid(X(t) * U_f + H(t-1) * W_f) U_f = 2 * rand(input_dim, hidden_dim) - 1; W_f = 2 * rand(hidden_dim, hidden_dim) - 1; U_f_update = zeros(size(U_f)); W_f_update = zeros(size(W_f));%out_gate = sigmoid(X(t) * U_o + H(t-1) * W_o) U_o = 2 * rand(input_dim, hidden_dim) - 1; W_o = 2 * rand(hidden_dim, hidden_dim) - 1; U_o_update = zeros(size(U_o)); W_o_update = zeros(size(W_o));%g_gate = tanh(X(t) * U_g + H(t-1) * W_g) U_g = 2 * rand(input_dim, hidden_dim) - 1; W_g = 2 * rand(hidden_dim, hidden_dim) - 1; U_g_update = zeros(size(U_g)); W_g_update = zeros(size(W_g));out_para = 2 * zeros(hidden_dim, output_dim) ; out_para_update = zeros(size(out_para)); % C(t) = C(t-1) .* forget_gate + g_gate .* in_gate % S(t) = tanh(C(t)) .* out_gate % Out = sigmoid(S(t) * out_para) %train iter = 9999; % training iterations for j = 1:iter% generate a simple addition problem (a + b = c)a_int = randi(round(largest_number/2)); % int versiona = int2binary{a_int+1}; % binary encodingb_int = randi(floor(largest_number/2)); % int versionb = int2binary{b_int+1}; % binary encoding% true answerc_int = a_int + b_int; % int versionc = int2binary{c_int+1}; % binary encoding% where we'll store our best guess (binary encoded)d = zeros(size(c));% total erroroverallError = 0;% difference in output layer, i.e., (target - out)output_deltas = [];% values of hidden layer, i.e., S(t)hidden_layer_values = [];cell_gate_values = [];% initialize S(0) as a zero-vectorhidden_layer_values = [hidden_layer_values; zeros(1, hidden_dim)];cell_gate_values = [cell_gate_values; zeros(1, hidden_dim)];% initialize memory gate% hidden layerH = [];H = [H; zeros(1, hidden_dim)];% cell gateC = [];C = [C; zeros(1, hidden_dim)];% in gateI = [];% forget gateF = [];% out gateO = [];% g gateG = [];% start to process a sequence, i.e., a forward pass% Note: the output of a LSTM cell is the hidden_layer, and you need to for position = 0:binary_dim-1% X ------> input, size: 1 x input_dimX = [a(binary_dim - position)-'0' b(binary_dim - position)-'0'];% y ------> label, size: 1 x output_dimy = [c(binary_dim - position)-'0']';% use equations (1)-(7) in a forward pass. here we do not use biasin_gate = sigmoid(X * U_i + H(end, :) * W_i); % equation (1)forget_gate = sigmoid(X * U_f + H(end, :) * W_f); % equation (2)out_gate = sigmoid(X * U_o + H(end, :) * W_o); % equation (3)g_gate = tanh(X * U_g + H(end, :) * W_g); % equation (4)C_t = C(end, :) .* forget_gate + g_gate .* in_gate; % equation (5)H_t = tanh(C_t) .* out_gate; % equation (6)% store these memory gatesI = [I; in_gate];F = [F; forget_gate];O = [O; out_gate];G = [G; g_gate];C = [C; C_t];H = [H; H_t];% compute predict outputpred_out = sigmoid(H_t * out_para);% compute error in output layeroutput_error = y - pred_out;% compute difference in output layer using derivative% output_diff = output_error * sigmoid_output_to_derivative(pred_out);output_deltas = [output_deltas; output_error];% compute total erroroverallError = overallError + abs(output_error(1));% decode estimate so we can print it outd(binary_dim - position) = round(pred_out);end% from the last LSTM cell, you need a initial hidden layer differencefuture_H_diff = zeros(1, hidden_dim);% stare back-propagation, i.e., a backward pass% the goal is to compute differences and use them to update weights% start from the last LSTM cellfor position = 0:binary_dim-1X = [a(position+1)-'0' b(position+1)-'0'];% hidden layerH_t = H(end-position, :); % H(t)% previous hidden layerH_t_1 = H(end-position-1, :); % H(t-1)C_t = C(end-position, :); % C(t)C_t_1 = C(end-position-1, :); % C(t-1)O_t = O(end-position, :);F_t = F(end-position, :);G_t = G(end-position, :);I_t = I(end-position, :);% output layer differenceoutput_diff = output_deltas(end-position, :); % H_t_diff = (future_H_diff * (W_i' + W_o' + W_f' + W_g') + output_diff * out_para') ... % .* sigmoid_output_to_derivative(H_t);% H_t_diff = output_diff * (out_para') .* sigmoid_output_to_derivative(H_t);H_t_diff = output_diff * (out_para') .* sigmoid_output_to_derivative(H_t);% out_para_diff = output_diff * (H_t) * sigmoid_output_to_derivative(out_para);out_para_diff = (H_t') * output_diff;% out_gate diferenceO_t_diff = H_t_diff .* tanh(C_t) .* sigmoid_output_to_derivative(O_t);% C_t differenceC_t_diff = H_t_diff .* O_t .* tan_h_output_to_derivative(C_t);% forget_gate_diffeenceF_t_diff = C_t_diff .* C_t_1 .* sigmoid_output_to_derivative(F_t);% in_gate differenceI_t_diff = C_t_diff .* G_t .* sigmoid_output_to_derivative(I_t);% g_gate differenceG_t_diff = C_t_diff .* I_t .* tan_h_output_to_derivative(G_t);% differences of U_i and W_iU_i_diff = X' * I_t_diff .* sigmoid_output_to_derivative(U_i);W_i_diff = (H_t_1)' * I_t_diff .* sigmoid_output_to_derivative(W_i);% differences of U_o and W_oU_o_diff = X' * O_t_diff .* sigmoid_output_to_derivative(U_o);W_o_diff = (H_t_1)' * O_t_diff .* sigmoid_output_to_derivative(W_o);% differences of U_o and W_oU_f_diff = X' * F_t_diff .* sigmoid_output_to_derivative(U_f);W_f_diff = (H_t_1)' * F_t_diff .* sigmoid_output_to_derivative(W_f);% differences of U_o and W_oU_g_diff = X' * G_t_diff .* tan_h_output_to_derivative(U_g);W_g_diff = (H_t_1)' * G_t_diff .* tan_h_output_to_derivative(W_g);% updateU_i_update = U_i_update + U_i_diff;W_i_update = W_i_update + W_i_diff;U_o_update = U_o_update + U_o_diff;W_o_update = W_o_update + W_o_diff;U_f_update = U_f_update + U_f_diff;W_f_update = W_f_update + W_f_diff;U_g_update = U_g_update + U_g_diff;W_g_update = W_g_update + W_g_diff;out_para_update = out_para_update + out_para_diff;endU_i = U_i + U_i_update * alpha; W_i = W_i + W_i_update * alpha;U_o = U_o + U_o_update * alpha; W_o = W_o + W_o_update * alpha;U_f = U_f + U_f_update * alpha; W_f = W_f + W_f_update * alpha;U_g = U_g + U_g_update * alpha; W_g = W_g + W_g_update * alpha;out_para = out_para + out_para_update * alpha;U_i_update = U_i_update * 0; W_i_update = W_i_update * 0;U_o_update = U_o_update * 0; W_o_update = W_o_update * 0;U_f_update = U_f_update * 0; W_f_update = W_f_update * 0;U_g_update = U_g_update * 0; W_g_update = W_g_update * 0;out_para_update = out_para_update * 0;endnn = newgrnn(train_x',train_y(:,1)',mean(mean(abs(out_para)))/2);
4.操作步驟與仿真結(jié)論
??? 通過(guò)本文的LSTM網(wǎng)絡(luò)識(shí)別算法,對(duì)不同干擾大小采集得到的人臉進(jìn)行識(shí)別,其識(shí)別正確率曲線如下圖所示:
?
??? 從圖2的仿真結(jié)果可知,隨著對(duì)采集圖像干擾的減少,本文所研究的LSTM識(shí)別算法具有最好的識(shí)別準(zhǔn)確率,RNN神經(jīng)網(wǎng)絡(luò)與基于卷積的深度神經(jīng)網(wǎng)絡(luò),其識(shí)別率相當(dāng),普通的神經(jīng)網(wǎng)絡(luò),其識(shí)別率性能明顯較差。具體的識(shí)別率大小如下表所示:
表1 四種對(duì)比算法的識(shí)別率
| 算法 | -15db | -10db | -5db | 0db | 5db | 10db | 15db |
| NN | 17.5250 | 30.9500 | 45.0000 | 52.6000 | 55.4750 | 57.5750 | 57.6000 |
| RBM | 19.4000 | 40.4500 | 58.4750 | 67.9500 | 70.4000 | 72.2750 | 71.8750 |
| RNN | 20.6750 | 41.1500 | 60.0750 | 68.6000 | 72.5500 | 73.3500 | 73.3500 |
| LSTM | 23.1000 | 46.3500 | 65.0250 | 72.9500 | 75.6000 | 76.1000 | 76.3250 |
5.參考文獻(xiàn)
[01]米良川,楊子夫,李德升等.自動(dòng)機(jī)器人視覺(jué)控制系統(tǒng)[J].工業(yè)控制計(jì)算機(jī).2003.3.
[02]Or1ando,Fla.Digital Image Processing Techniques.Academic Pr,Inc.1984
[03]K.Fukushima.A neural network model for selective attention in visual pattern recognition. Biological Cybernetics[J]October 1986?55(1):5-15.
[04]T.H.Hidebrandt Optimal Training of Thresholded Linear Correlation Classifiers[J]. IEEE Transaction Neural Networks.1991?2(6):577-588.
[05]Van Ooyen B.Nienhuis Pattern Recognition in the Neocognitron Is Improved by Neural Adaption[J].Biological Cybernetics.1993,70:47-53.
[06]Bao Qing Li BaoXinLi. Building pattern classifiers using convolutional neural networks[J]. Neural.Networks?vol.5(3): 3081-3085.
[07]E S ackinger?,B boser,Y lecun?,L jaclel. Application of the ANNA Neural Network Chip to High Speed Character Recognition[J]. IEEE Transactions on Neural Networks 1992.3:498-505.
A05-40
6.完整源碼獲得方式
方式1:微信或者QQ聯(lián)系博主
方式2:訂閱MATLAB/FPGA教程,免費(fèi)獲得教程案例以及任意2份完整源碼
總結(jié)
以上是生活随笔為你收集整理的【LSTM】基于LSTM网络的人脸识别算法的MATLAB仿真的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 【PSO运输优化】基于MATLAB的PS
- 下一篇: 【pointnet++点云识别】基于po