當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

bagging和时间序列预测_时间序列的LSTM模型预测——基于Keras

發布時間：2024/9/19 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 bagging和时间序列预测_时间序列的LSTM模型预测——基于Keras 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、問題背景

????現實生活中，在一系列時間點上觀測數據是司空見慣的活動，在農業、商業、氣象軍事和醫療等研究領域都包含大量的時間序列數據。時間序列的預測指的是基于序列的歷史數據，以及可能對結果產生影響的其他相關序列，對序列未來的可能取值做出預測。現實生活中的時間序列數據預測問題有很多，包括語音分析、噪聲消除以及股票期貨市場的分析等，其本質主要是根據前Ｔ個時刻的觀測數據推算出Ｔ＋１時刻的時間序列的值。

????那么面對時間序列的預測問題，我們可以用傳統的ARIMA模型，也可以用基于時間序列分解的STL模型或者Facebook開源的Prophet模型。在機器學習或者人工智能大熱的現在，深度學習等機器學習方法也可以用于時間序列的預測。今天介紹的就是如何基于Keras和Python，實現時間序列的LSTM模型預測。

二、LSTM模型介紹

????長短時記憶網絡(Long Short Term Memory，簡稱LSTM)模型，本質上是一種特定形式的循環神經網絡(Recurrent Neural Network，簡稱RNN)。LSTM模型在RNN模型的基礎上通過增加門限(Gates)來解決RNN短期記憶的問題，使得循環神經網絡能夠真正有效地利用長距離的時序信息。LSTM在RNN的基礎結構上增加了輸入門限(Input Gate)、輸出門限(Output Gate)、遺忘門限(Forget Gate)３個邏輯控制單元，且各自連接到了一個乘法元件上(見圖１)，通過設定神經網絡的記憶單元與其他部分連接的邊緣處的權值控制信息流的輸入、輸出以及細胞單元(Memory cell)的狀態。其具體結構如下圖所示。

圖1：LSTM概念圖

????上圖中相關部件的描述如下：

????Input Gate：控制信息是否流入Memory cell中，記為。

????Forget Gate：控制上一時刻Memory cell中的信息是否積累到當前時刻Memory cell中，記為。

????Output Gate：控制當前時刻Memory cell中的信息是否流入當前隱藏狀態中，記為。

????cell：記憶單元，表示神經元狀態的記憶，使得LSTM單元有保存、讀取、重置和更新長距離歷史信息的能力，記為。

????在t時刻，LSTM神經網絡定義的公式如下：

????除了前文提及的、、和，分別代表其相應門限的遞歸連接權重，sigmoid和tanh為兩種激活函數。

????隱藏層cell結構圖如圖2所示。在LSTM神經網絡的訓練過程中，首先將ｔ時刻的數據特征輸入至輸入層，經過激勵函數輸出結果。將輸出結果、ｔ-1時刻的隱藏層輸出和ｔ-1時刻cell單元存儲的信息輸入LSTM結構的節點中，通過Input Gate，Output Gate，Forget Gate和cell單元的處理，輸出數據到下一隱藏層或輸出層，輸出LSTM結構節點的結果到輸出層神經元，計算反向傳播誤差，更新各個權值。

圖2：LSTM細節圖

三、LSTM模型準備

3.1 加載需要的包

from?math?import?sqrt
from?numpy?import?concatenate
from?matplotlib?import?pyplot
from?pandas?import?read_csv
from?pandas?import?DataFrame
from?pandas?import?concat
from?sklearn.preprocessing?import?MinMaxScaler
from?sklearn.preprocessing?import?LabelEncoder
from?sklearn.metrics?import?mean_squared_error
from?keras.models?import?Sequential
from?keras.layers?import?Dense
from?keras.layers?import?LSTM
import?pandas?as?pd
import?numpy?as?np

3.2 定義將時間序列預測問題轉化為監督學習問題的函數

????前文已經提及，時間序列預測的本質主要是根據前Ｔ個時刻的觀測數據推算出Ｔ＋１時刻的時間序列的值。這就轉化為機器學習中的監督學習問題，輸入值為歷史值，輸出值為預測值。因此，利用LSTM模型進行時間序列預測的第一步便是將數據集整理成監督學習中常見的數據類型，一行為一個樣本，行數為樣本數，列數為變量總數。這里利用的是pandas庫中dataframe的shift函數。

def?series_to_supervised(data,?n_in=1,?n_out=1,?dropnan=True):
????n_vars?=?1?if?type(data)?is?list?else?data.shape[1]
????df?=?DataFrame(data)
????cols,?names?=?[],?[]
????#i:?n_in,?n_in-1,?...,?1，為滯后期數
????#分別代表t-n_in,?...?,t-1期
????for?i?in?range(n_in,?0,?-1):
????????cols.append(df.shift(i))
????????names?+=?[('var%d(t-%d)'?%?(j+1,?i))?for?j?in?range(n_vars)]
????#i:?0,?1,?...,?n_out-1，為超前預測的期數
????#分別代表t，t+1，?...?,t+n_out-1期
????for?i?in?range(0,?n_out):
????????cols.append(df.shift(-i))
????????if?i?==?0:
????????????names?+=?[('var%d(t)'?%?(j+1))?for?j?in?range(n_vars)]
????????else:
????????????names?+=?[('var%d(t+%d)'?%?(j+1,?i))?for?j?in?range(n_vars)]
????agg?=?concat(cols,?axis=1)
????agg.columns?=?names
????if?dropnan:
????????agg.dropna(inplace=True)
????return?agg

3.3 定義準備數據的函數

def?prepare_data(filepath,?n_in,?n_out=30,?n_vars=4,?train_proportion=0.8):
????#讀取數據集
????dataset?=?read_csv(filepath,?encoding='utf-8')
????#設置時間戳索引
????dataset['日期']?=?pd.to_datetime(dataset['日期'])
????dataset.set_index("日期",?inplace=True)
????values?=?dataset.values
????#保證所有數據都是float32類型
????values?=?values.astype('float32')
????#變量歸一化
????scaler?=?MinMaxScaler(feature_range=(0,?1))
????scaled?=?scaler.fit_transform(values)
????#將時間序列問題轉化為監督學習問題
????reframed?=?series_to_supervised(scaled,?n_in,?n_out)
????#取出保留的變量
????contain_vars?=?[]
????for?i?in?range(1,?n_in+1):
????????contain_vars?+=?[('var%d(t-%d)'?%?(j,?i))?for?j?in?range(1,n_vars+1)]??
????data?=?reframed?[?contain_vars?+?['var1(t)']?+?[('var1(t+%d)'?%?(j))?for?j?in?range(1,n_out)]]
????#修改列名
????col_names?=?['Y',?'X1',?'X2',?'X3']
????contain_vars?=?[]
????for?i?in?range(n_vars):
????????contain_vars?+=?[('%s(t-%d)'?%?(col_names[i],?j))?for?j?in?range(1,n_in+1)]??
????data.columns?=?contain_vars?+??['Y(t)']?+?[('Y(t+%d)'?%?(j))?for?j?in?range(1,n_out)]
????#分隔數據集，分為訓練集和測試集
????values?=?data.values
????n_train?=?round(data.shape[0]*train_proportion)
????train?=?values[:n_train,?:]
????test?=?values[n_train:,?:]
????#分隔輸入X和輸出y
????train_X,?train_y?=?train[:,?:n_in*n_vars],?train[:,?n_in*n_vars:]
????test_X,?test_y?=?test[:,?:n_in*n_vars],?test[:,?n_in*n_vars:]
????#將輸入X改造為LSTM的輸入格式，即[samples,timesteps,features]
????train_X?=?train_X.reshape((train_X.shape[0],?n_in,?n_vars))
????test_X?=?test_X.reshape((test_X.shape[0],?n_in,?n_vars))
????return?scaler,?data,?train_X,?train_y,?test_X,?test_y,?dataset

3.4 定義擬合LSTM模型的函數

def?fit_lstm(data_prepare,?n_neurons=50,?n_batch=72,?n_epoch=100,?loss='mae',?optimizer='adam',?repeats=1):
????train_X?=?data_prepare[2]
????train_y?=?data_prepare[3]
????test_X?=?data_prepare[4]
????test_y?=?data_prepare[5]
????model_list?=?[]
????for?i?in?range(repeats):
????????#設計神經網絡
????????model?=?Sequential()
????????model.add(LSTM(n_neurons,?input_shape=(train_X.shape[1],?train_X.shape[2])))
????????model.add(Dense(train_y.shape[1]))
????????model.compile(loss=loss,?optimizer=optimizer)
????????#擬合神經網絡
????????history?=?model.fit(train_X,?train_y,?epochs=n_epoch,?batch_size=n_batch,?validation_data=(test_X,?test_y),?verbose=0,?shuffle=False)
????????#畫出學習過程
????????p1?=?pyplot.plot(history.history['loss'],?color='blue',?label='train')
????????p2?=?pyplot.plot(history.history['val_loss'],?color='yellow',label='test')
????????#保存model
????????model_list.append(model)
????pyplot.legend(["train","test"])
????pyplot.show()
????return?model_list

3.5 定義預測的函數

def?lstm_predict(model,?data_prepare):
????scaler?=?data_prepare[0]
????test_X?=?data_prepare[4]
????test_y?=?data_prepare[5]
????#做出預測
????yhat?=?model.predict(test_X)
????#將測試集上的預測值還原為原來的數據維度
????scale_new?=?MinMaxScaler()
????scale_new.min_,?scale_new.scale_?=?scaler.min_[0],?scaler.scale_[0]
????inv_yhat?=?scale_new.inverse_transform(yhat)
????#將測試集上的實際值還原為原來的數據維度
????inv_y?=?scale_new.inverse_transform(test_y)
????return?inv_yhat,?inv_y

3.6 定義預測評價的函數(RMSE)

#?計算每一步預測的RMSE
def?evaluate_forecasts(test,?forecasts,?n_out):
????rmse_dic?=?{}
????for?i?in?range(n_out):
????????actual?=?[float(row[i])?for?row?in?test]
????????predicted?=?[float(forecast[i])?for?forecast?in?forecasts]
????????rmse?=?sqrt(mean_squared_error(actual,?predicted))
????????rmse_dic['t+'?+?str(i+1)?+?'?RMSE']?=?rmse
????return?rmse_dic

3.7 定義將預測可視化的函數

#以原始數據為背景畫出預測數據
def?plot_forecasts(series,?forecasts):
????#用藍色畫出原始數據集
????pyplot.plot(series.values)
????n_seq?=?len(forecasts[0])
????#用紅色畫出預測值
????for?i?in?range(1,len(forecasts)+1):
????????xaxis?=?[x?for?x?in?range(i,?i+n_seq+1)]
????????yaxis?=?[float(series.iloc[i-1,0])]?+?list(forecasts[i-1])
????????pyplot.plot(xaxis,?yaxis,?color='red')
????#展示圖像
????pyplot.show()

四、建立LSTM模型

4.1 建立模型(n_in = 15，n_neuron = 5，n_batch = 16，n_epoch = 200)

????為了減少隨機性，重復建立五次模型，取五次結果的平均作為最后的預測。

#定義需要的變量
filepath?=?r'C:\Users\87689\Desktop\國貿實習\Premium\導出文件.csv'
n_in?=?15
n_out?=?30
n_vars?=?4
n_neuron?=?5
n_batch?=?16
n_epoch?=?200
repeats?=?5
inv_yhat_list?=?[]
inv_y_list?=?[]

data_prepare?=?prepare_data(filepath,n_in,?n_out)
scaler,?data,?train_X,?train_y,?test_X,?test_y,?dataset?=?data_prepare
model_list?=?fit_lstm(data_prepare,?n_neuron,?n_batch,?n_epoch,repeats=repeats)
for?i?in?range(len(model_list)):
????model?=?model_list[i]
????inv_yhat?=?lstm_predict(model,?data_prepare)[0]
????inv_y?=?lstm_predict(model,?data_prepare)[1]
????inv_yhat_list.append(inv_yhat)
????inv_y_list.append(inv_y)
圖3：模型訓練結果

????求出平均預測結果。

inv_yhat_ave?=?np.zeros(inv_y.shape)
for?i?in?range(repeats):
????inv_yhat_ave?+=?inv_yhat_list[i]
????
inv_yhat_ave?=?inv_yhat_ave/repeats

4.2 模型評價

????求出五次預測結果inv_yhat及最終平均預測結果inv_yhat_ave的每步預測RMSE。

rmse_dic_list?=?[]
for?i?in?range(len(model_list)):
????inv_yhat?=?inv_yhat_list[i]
????inv_y?=?inv_y_list[i]
????rmse_dic?=?evaluate_forecasts(inv_y,?inv_yhat,?n_out)
????rmse_dic_list.append(rmse_dic)

rmse_dic_list.append(evaluate_forecasts(inv_y,?inv_yhat_ave,?n_out))

df_dic?=?{}
for?i?in?range(len(rmse_dic_list)?-?1):
????df_dic['第'?+?str(i+1)?+?'次']?=?pd.Series(rmse_dic_list[i])
????
df_dic['平均']?=?pd.Series(rmse_dic_list[i+1])
rmse_df?=?DataFrame(df_dic)
rmse_df
表1：預測RMSE結果表第1次第2次第3次第4次第5次平均t+1 RMSEt+2 RMSEt+3 RMSEt+4 RMSEt+5 RMSEt+6 RMSEt+7 RMSEt+8 RMSEt+9 RMSEt+10 RMSEt+11 RMSEt+12 RMSEt+13 RMSEt+14 RMSEt+15 RMSEt+16 RMSEt+17 RMSEt+18 RMSEt+19 RMSEt+20 RMSEt+21 RMSEt+22 RMSEt+23 RMSEt+24 RMSEt+25 RMSEt+26 RMSEt+27 RMSEt+28 RMSEt+29 RMSEt+30 RMSE

6.054318	5.910827	6.574757	5.514524	5.930769	5.608112
6.799496	5.919717	7.129066	5.930466	6.346187	6.065158
6.961642	6.410686	7.275462	6.546540	6.857233	6.618044
7.707383	6.725169	7.962916	7.179770	7.364481	7.175948
8.543821	7.359477	8.997917	7.591111	8.004835	7.831434
8.826944	8.790049	9.158018	8.390160	8.501117	8.547602
9.372653	8.842808	9.847240	8.818847	8.995596	9.000587
9.869172	9.206043	10.790298	9.389183	9.621195	9.657852
10.224256	10.113056	11.275415	9.925362	10.320555	10.177834
10.730779	10.613619	11.738241	10.547615	10.858770	10.750603
11.217751	11.153954	12.502412	11.135731	11.692412	11.458523
11.975135	12.117445	13.382915	11.581188	12.537667	12.201281
12.431174	12.648199	14.090454	12.092397	12.351822	12.579149
13.060560	13.141831	14.255130	12.452494	13.279099	13.074444
13.879692	13.775036	15.346088	13.138019	13.793227	13.845060
14.670389	14.491605	16.118887	13.422946	14.143394	14.421311
15.237153	15.503956	16.679258	13.896419	14.108534	14.909649
15.390742	15.054655	16.010253	14.280028	15.037541	15.060793
16.030557	15.657191	16.441218	14.518066	15.145838	15.417519
15.671409	15.547522	16.150036	14.704495	15.150355	15.359049
15.757660	15.787190	16.898795	14.751765	15.152527	15.546924
15.208240	17.009959	16.376563	14.852501	15.567227	15.670801
15.162898	16.254727	16.888183	15.513842	15.526986	15.738396
15.516408	16.282854	16.762683	15.109268	15.232268	15.642263
15.338432	16.417778	16.450235	15.218523	15.879392	15.764465
15.215581	16.714160	16.687302	15.704890	15.650851	15.860303
15.410981	16.711455	17.264010	15.838977	15.801374	16.052185
15.969366	16.453532	16.564865	15.876696	16.078542	16.110161
16.291474	16.447350	16.771656	15.870124	16.276672	16.236876
16.122111	16.578749	16.643779	16.260352	16.013707	16.189853

????下面求最終平均預測結果inv_yhat_ave的每步預測錯誤率的平均，平均來看，預測結果會比真實結果偏高。

s?=?inv_yhat_ave[0].shape
erro_rate?=?np.zeros(s)
for?i?in?range(len(inv_y)):
????erro_rate?+=?inv_yhat_ave[i]/inv_y[i]-1

erro_rate_ave?=?erro_rate/len(inv_y)
err_df?=?DataFrame(pd.Series(erro_rate_ave))
err_df.columns?=?['平均預測錯誤率']
err_df.index?=?['超前%d步預測'?%?(i+1)?for?i?in?range(n_out)]
err_df
表2：按步平均預測錯誤率結果表平均預測錯誤率超前1步預測超前2步預測超前3步預測超前4步預測超前5步預測超前6步預測超前7步預測超前8步預測超前9步預測超前10步預測超前11步預測超前12步預測超前13步預測超前14步預測超前15步預測超前16步預測超前17步預測超前18步預測超前19步預測超前20步預測超前21步預測超前22步預測超前23步預測超前24步預測超前25步預測超前26步預測超前27步預測超前28步預測超前29步預測超前30步預測

0.046550
0.047578
0.050722
0.052867
0.059091
0.063377
0.064786
0.064920
0.065614
0.066760
0.074791
0.077122
0.076288
0.076228
0.085402
0.089160
0.090592
0.096903
0.099449
0.096841
0.099562
0.100283
0.093786
0.091290
0.091107
0.086526
0.086098
0.085978
0.085980
0.077837

4.3 預測結果可視化

????測試集的前十個樣本

dataset?=?data_prepare[6]
test_X?=?data_prepare[4]
n_real?=?len(dataset)-len(test_X)-len(inv_yhat_ave[0])
#多畫一個
y_real?=?DataFrame(dataset['Y'][n_real:n_real+10+30])
plot_forecasts(y_real,?inv_yhat_ave[0:10])
圖4：預測結果可視化1

????整個測試集

n_real?=?len(dataset)-len(test_X)-len(inv_yhat[0])
#多畫一個
y_real?=?DataFrame(dataset['Y'][n_real:])
plot_forecasts(y_real,?inv_yhat_ave)
圖4：預測結果可視化2

4.4 結果導出

????Yhat

pre_df?=?DataFrame(inv_yhat_ave)
#時間戳處理，讓它只顯示到日
date_index?=?dataset.index[n_in-1+len(train_X)-1:n_in-1+len(train_X)+len(test_X)-1]
pydate_array?=?date_index.to_pydatetime()
date_only_array?=?np.vectorize(lambda?s:?s.strftime('%Y-%m-%d'))(pydate_array?)
date_only_series?=?pd.Series(date_only_array)
pre_df?=?pre_df.set_index(date_only_series)
names_columns?=?['未來%d期'?%?(i+1)?for?i?in?range(n_out)]
pre_df.columns?=?names_columns
pre_df?=?pre_df.round(decimals=2)#小數點

????Y

actual_df?=?DataFrame(inv_y)
names_columns?=?['未來%d期'?%?(i+1)?for?i?in?range(n_out)]
actual_df.columns?=?names_columns
actual_df?=?actual_df.set_index(date_only_series)
actual_df?=?actual_df.round(decimals=2)

????導出xlsx

writer?=?pd.ExcelWriter('Y-結果導出.xlsx')
pre_df.to_excel(writer,"Yhat")
actual_df.to_excel(writer,"Y")
writer.save()

參考資料：

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

https://blog.csdn.net/qq_28031525/article/details/79046718

https://cloud.tencent.com/developer/article/1645547

https://machinelearningmastery.com/update-lstm-networks-training-time-series-forecasting/

總結

以上是生活随笔為你收集整理的bagging和时间序列预测_时间序列的LSTM模型预测——基于Keras的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 365 天只换不修：松下 S50K8 电
下一篇： python下的所有文件_python批