多元时间序列回归模型_多元时间序列分析和预测:将向量自回归(VAR)模型应用于实际的多元数据集...
多元時間序列回歸模型
Multivariate Time Series Analysis
多元時間序列分析
A univariate time series data contains only one single time-dependent variable while a multivariate time series data consists of multiple time-dependent variables. We generally use multivariate time series analysis to model and explain the interesting interdependencies and co-movements among the variables. In the multivariate analysis — the assumption is that the time-dependent variables not only depend on their past values but also show dependency between them. Multivariate time series models leverage the dependencies to provide more reliable and accurate forecasts for a specific given data, though the univariate analysis outperforms multivariate in general[1]. In this article, we apply a multivariate time series method, called Vector Auto Regression (VAR) on a real-world dataset.
單變量時間序列數(shù)據(jù)僅包含一個時間相關(guān)的變量,而多元時間序列數(shù)據(jù)則包含多個時間相關(guān)的變量。 我們通常使用多元時間序列分析來建模和解釋變量之間有趣的相互依存關(guān)系和共同運動。 在多變量分析中,假定時間相關(guān)變量不僅取決于它們的過去值,而且還顯示它們之間的依賴關(guān)系。 多元時間序列模型利用依存關(guān)系為特定的給定數(shù)據(jù)提供更可靠,更準(zhǔn)確的預(yù)測,盡管單變量分析通常優(yōu)于多元變量[1]。 在本文中,我們在現(xiàn)實世界的數(shù)據(jù)集上應(yīng)用了一種稱為向量自動回歸(VAR)的多元時間序列方法。
Vector Auto Regression (VAR)
向量自回歸(VAR)
VAR model is a stochastic process that represents a group of time-dependent variables as a linear function of their own past values and the past values of all the other variables in the group.
VAR模型是一個隨機過程,將一組時間相關(guān)變量表示為它們自己的過去值以及該組中所有其他變量的過去值的線性函數(shù)。
For instance, we can consider a bivariate time series analysis that describes a relationship between hourly temperature and wind speed as a function of past values [2]:
例如,我們可以考慮一個雙變量時間序列分析,該分析描述了每小時溫度和風(fēng)速之間的關(guān)系,該關(guān)系是過去值的函數(shù)[2]:
temp(t) = a1 + w11* temp(t-1) + w12* wind(t-1) + e1(t-1)
temp(t)= a1 + w11 * temp(t-1)+ w12 * wind(t-1)+ e1(t-1)
wind(t) = a2 + w21* temp(t-1) + w22*wind(t-1) +e2(t-1)
wind(t)= a2 + w21 * temp(t-1)+ w22 * wind(t-1)+ e2(t-1)
where a1 and a2 are constants; w11, w12, w21, and w22 are the coefficients; e1 and e2 are the error terms.
其中a1和a2是常數(shù); w11,w12,w21和w22是系數(shù); e1和e2是誤差項。
Dataset
數(shù)據(jù)集
Statmodels is a python API that allows users to explore data, estimate statistical models, and perform statistical tests [3]. It contains time series data as well. We download a dataset from the API.
Statmodels是python API,允許用戶瀏覽數(shù)據(jù),估計統(tǒng)計模型并執(zhí)行統(tǒng)計測試[3]。 它還包含時間序列數(shù)據(jù)。 我們從API下載數(shù)據(jù)集。
To download the data, we have to install some libraries and then load the data:
要下載數(shù)據(jù),我們必須安裝一些庫,然后加載數(shù)據(jù):
import pandas as pdimport statsmodels.api as sm
from statsmodels.tsa.api import VAR
data = sm.datasets.macrodata.load_pandas().data
data.head(2)
The output shows the first two observations of the total dataset:
輸出顯示了總數(shù)據(jù)集的前兩個觀察值:
A snippet of the dataset數(shù)據(jù)集的摘要The data contains a number of time-series data, we take only two time-dependent variables “realgdp” and “realdpi” for experiment purposes and use “year” columns as the index of the data.
數(shù)據(jù)包含許多時間序列數(shù)據(jù),出于實驗?zāi)康?#xff0c;我們僅采用兩個與時間相關(guān)的變量“ realgdp”和“ realdpi”,并使用“ year”列作為數(shù)據(jù)索引。
data1 = data[["realgdp", 'realdpi']]data1.index = data["year"]
output:
輸出:
A snippet of the data數(shù)據(jù)片段Let's visualize the data:
讓我們可視化數(shù)據(jù):
data1.plot(figsize = (8,5))Both of the series show an increasing trend over time with slight ups and downs.
這兩個系列都顯示出隨著時間的推移呈上升趨勢,并有輕微的起伏。
Stationary
固定式
Before applying VAR, both the time series variable should be stationary. Both the series are not stationary since both the series do not show constant mean and variance over time. We can also perform a statistical test like the Augmented Dickey-Fuller test (ADF) to find stationarity of the series using the AIC criteria.
應(yīng)用VAR之前,兩個時間序列變量均應(yīng)為固定值。 兩個序列都不是平穩(wěn)的,因為兩個序列都沒有顯示出恒定的均值和隨時間變化。 我們還可以執(zhí)行統(tǒng)計測試(如增強迪基-富勒檢驗(ADF)),以使用AIC標(biāo)準(zhǔn)查找系列的平穩(wěn)性。
adfuller_test = adfuller(data1['realgdp'], autolag= "AIC")print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))
output:
輸出:
Result of the statistical test for stationarity平穩(wěn)性統(tǒng)計檢驗的結(jié)果In both cases, the p-value is not significant enough, meaning that we can not reject the null hypothesis and conclude that the series are non-stationary.
在這兩種情況下,p值都不足夠顯著,這意味著我們不能拒絕原假設(shè)并得出結(jié)論該序列是非平穩(wěn)的。
Differencing
差異化
As both the series are not stationary, we perform differencing and later check the stationarity.
由于兩個系列都不平穩(wěn),因此我們進行微分,然后檢查平穩(wěn)性。
data_d = data1.diff().dropna()The “realgdp” series becomes stationary after first differencing of the original series as the p-value of the test is statistically significant.
由于測試的p值具有統(tǒng)計意義,因此“ realgdp”系列在對原始系列進行第一次求差后將變得平穩(wěn)。
ADF test for one differenced realgdp dataADF測試一個不同的realgdp數(shù)據(jù)The “realdpi” series becomes stationary after first differencing of the original series as the p-value of the test is statistically significant.
由于原始的p值在統(tǒng)計上具有顯著性,因此“ realdpi”系列在與原始系列進行首次差異化處理后就變得穩(wěn)定了。
ADF test for one differenced realdpi dataADF測試一個不同的realdpi數(shù)據(jù)Model
模型
In this section, we apply the VAR model on the one differenced series. We carry-out the train-test split of the data and keep the last 10-days as test data.
在本節(jié)中,我們將VAR模型應(yīng)用于一個差分序列。 我們對數(shù)據(jù)進行火車測試拆分,并保留最后10天作為測試數(shù)據(jù)。
train = data_d.iloc[:-10,:]test = data_d.iloc[-10:,:]
Searching optimal order of VAR model
搜索VAR模型的最優(yōu)階
In the process of VAR modeling, we opt to employ Information Criterion Akaike (AIC) as a model selection criterion to conduct optimal model identification. In simple terms, we select the order (p) of VAR based on the best AIC score. The AIC, in general, penalizes models for being too complex, though the complex models may perform slightly better on some other model selection criterion. Hence, we expect an inflection point in searching the order (p), meaning that, the AIC score should decrease with order (p) gets larger until a certain order and then the score starts increasing. For this, we perform grid-search to investigate the optimal order (p).
在VAR建模過程中,我們選擇采用信息準(zhǔn)則赤池(AIC)作為模型選擇標(biāo)準(zhǔn)來進行最佳模型識別。 簡單來說,我們根據(jù)最佳AIC得分選擇VAR的階數(shù)(p)。 通常,AIC會因過于復(fù)雜而對模型進行懲罰,盡管復(fù)雜模型在某些其他模型選擇標(biāo)準(zhǔn)上可能會稍好一些。 因此,我們期望在搜索階數(shù)(p)時出現(xiàn)拐點,這意味著,隨著階數(shù)(p)的增大,AIC分?jǐn)?shù)應(yīng)減小,直到達(dá)到某個階數(shù),然后分?jǐn)?shù)才開始增加。 為此,我們執(zhí)行網(wǎng)格搜索以研究最佳階數(shù)(p)。
forecasting_model = VAR(train)results_aic = []for p in range(1,10):
results = forecasting_model.fit(p)
results_aic.append(results.aic)
In the first line of the code: we train VAR model with the training data. Rest of code: perform a for loop to find the AIC scores for fitting order ranging from 1 to 10. We can visualize the results (AIC scores against orders) to better understand the inflection point:
在代碼的第一行:我們使用訓(xùn)練數(shù)據(jù)訓(xùn)練VAR模型。 其余代碼:執(zhí)行for循環(huán)以找到適合訂單的AIC得分,范圍從1到10。我們可以可視化結(jié)果(針對訂單的AIC得分),以更好地了解拐點:
import seaborn as snssns.set()
plt.plot(list(np.arange(1,10,1)), results_aic)
plt.xlabel("Order")
plt.ylabel("AIC")
plt.show()Investigating optimal order of VAR models研究VAR模型的最佳順序
From the plot, the lowest AIC score is achieved at the order of 2 and then the AIC scores show an increasing trend with the order p gets larger. Hence, we select the 2 as the optimal order of the VAR model. Consequently, we fit order 2 to the forecasting model.
從圖中可以看到,最低的AIC得分約為2,然后,隨著p的增大,AIC得分呈上升趨勢。 因此,我們選擇2作為VAR模型的最優(yōu)順序。 因此,我們將訂單2擬合到預(yù)測模型。
let's check the summary of the model:
讓我們檢查一下模型的摘要:
results = forecasting_model.fit(2)results.summary()
The summary output contains much information:
摘要輸出包含許多信息:
Forecasting
預(yù)測
We use 2 as the optimal order in fitting the VAR model. Thus, we take the final 2 steps in the training data for forecasting the immediate next step (i.e., the first day of the test data).
我們使用2作為擬合VAR模型的最佳順序。 因此,我們在訓(xùn)練數(shù)據(jù)中采取最后的2個步驟來預(yù)測下一步(即測試數(shù)據(jù)的第一天)。
Forecasting test data預(yù)測測試數(shù)據(jù)Now, after fitting the model, we forecast for the test data where the last 2 days of training data set as lagged values and steps set as 10 days as we want to forecast for the next 10 days.
現(xiàn)在,在擬合模型之后,我們預(yù)測測試數(shù)據(jù),其中訓(xùn)練數(shù)據(jù)的最后2天設(shè)置為滯后值,步長設(shè)置為10天,因為我們希望在接下來的10天進行預(yù)測。
laaged_values = train.values[-2:]forecast = pd.DataFrame(results.forecast(y= laaged_values, steps=10), index = test.index, columns= ['realgdp_1d', 'realdpi_1d'])forecastThe output:
輸出:
First differenced forecasts一階差異預(yù)測We have to note that the aforementioned forecasts are for the one differenced model. Hence, we must reverse the first differenced forecasts into the original forecast values.
我們必須注意,上述預(yù)測是針對一種差異模型的。 因此,我們必須將最初的差異預(yù)測反轉(zhuǎn)為原始預(yù)測值。
forecast["realgdp_forecasted"] = data1["realgdp"].iloc[-10-1] + forecast_1D['realgdp_1d'].cumsum()forecast["realdpi_forecasted"] = data1["realdpi"].iloc[-10-1] + forecast_1D['realdpi_1d'].cumsum()output:
輸出:
Forecasted values for 1 differenced series and for the original series1個差異序列和原始序列的預(yù)測值The first two columns are the forecasted values for 1 differenced series and the last two columns show the forecasted values for the original series.
前兩列是1個差異序列的預(yù)測值,后兩列顯示原始序列的預(yù)測值。
Now, we visualize the original test values and the forecasted values by VAR.
現(xiàn)在,我們通過VAR可視化原始測試值和預(yù)測值。
Original and Forecasted values for realgdp and realdpirealgdp和realdpi的原始值和預(yù)測值The original realdpi and the forecasted realdpi show a similar pattern throwout the forecasted days. For realgdp: the first half of the forecasted values show a similar pattern as the original values, on the other hand, the last half of the forecasted values do not follow similar pattern.
原始的realdpi和預(yù)測的realdpi顯示出相似的模式,從而排除了預(yù)測的天數(shù)。 對于realgdp:預(yù)測值的前半部分顯示與原始值相似的模式,另一方面,預(yù)測值的后半部分沒有遵循相似的模式。
To sum up, in this article, we discuss multivariate time series analysis and applied the VAR model on a real-world multivariate time series dataset.
綜上所述,在本文??中,我們討論了多元時間序列分析,并將VAR模型應(yīng)用于實際的多元時間序列數(shù)據(jù)集。
You can also read the article — A real-world time series data analysis and forecasting, where I applied ARIMA (univariate time series analysis model) to forecast univariate time series data.
您還可以閱讀這篇文章- 真實的時間序列數(shù)據(jù)分析和預(yù)測 ,在這里我應(yīng)用了ARIMA(單變量時間序列分析模型)來預(yù)測單變量時間序列數(shù)據(jù)。
[1] https://homepage.univie.ac.at/robert.kunst/prognos4.pdf
[1] https://homepage.univie.ac.at/robert.kunst/prognos4.pdf
[2] https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/
[2] https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/
[3] https://www.statsmodels.org/stable/index.html
[3] https://www.statsmodels.org/stable/index.html
翻譯自: https://towardsdatascience.com/multivariate-time-series-forecasting-456ace675971
多元時間序列回歸模型
總結(jié)
以上是生活随笔為你收集整理的多元时间序列回归模型_多元时间序列分析和预测:将向量自回归(VAR)模型应用于实际的多元数据集...的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 做梦梦到屋子漏雨是怎么回事
- 下一篇: 梦到刑事案件有什么兆头