足球预测_预测足球热
足球預(yù)測(cè)
By Aditya Pethe
通過(guò)阿蒂亞·皮特(Aditya Pethe)
From September to January every year, football takes over America. Games dominate TV Sunday and Monday nights, and my brother tears his hair out each week over his consistently underperforming fantasy teams. The hype seems to reach an unbearable level by the time the playoffs roll around.
每年的9月至1月,足球席卷美國(guó)。 游戲在星期日和星期一晚上占據(jù)著電視臺(tái)的主導(dǎo)地位,而我的兄弟每周都在表現(xiàn)不佳的幻想隊(duì)中大放異彩。 季后賽到來(lái)之時(shí),炒作似乎已經(jīng)到了難以忍受的地步。
But is there a way to measure and forecast that hype? I decided to use one of my favorite NFL players, Peyton Manning, in order to explore seasonality in Deephaven’s Jupyter Notebooks. Using a dataset of Manning’s Wikipedia search frequencies taken over an 8 year period from 2008 to 2016, my goal was to break down how football hype evolved throughout the season.
但是,有沒(méi)有一種方法可以衡量和預(yù)測(cè)這種炒作? 我決定使用我最喜歡的NFL球員之一Peyton Manning來(lái)探索Deephaven的Jupyter筆記本的季節(jié)性。 使用從2008年到2016年的8年期間內(nèi)Manning的Wikipedia搜索頻率的數(shù)據(jù)集 ,我的目標(biāo)是弄清整個(gè)賽季足球宣傳的演變。
To do this, I decided to take two approaches to analyzing seasonality. The first was the traditional ARIMA model, and the second was the newer Fbprophet library. I would use both these methods to fit, predict, and validate models to see which was better at understanding NFL hype.
為此,我決定采用兩種方法來(lái)分析季節(jié)性。 第一個(gè)是傳統(tǒng)的ARIMA模型,第二個(gè)是較新的Fbprophet庫(kù)。 我將使用這兩種方法來(lái)擬合,預(yù)測(cè)和驗(yàn)證模型,以查看哪種方法更適合理解NFL宣傳。
我們的數(shù)據(jù) (OUR DATA)
We can plot our data in Deephaven with the following code:
我們可以使用以下代碼在Deephaven中繪制數(shù)據(jù):
At a top-level glance, our data is log-transformed Wikipedia page views for Peyton Manning taken each day for about 8 years. The data appears to exhibit some strong seasonal trends that we can look into.
從最高層次看,我們的數(shù)據(jù)是對(duì)Peyton Manning進(jìn)行日志轉(zhuǎn)換的Wikipedia頁(yè)面視圖,大約每天進(jìn)行8年。 數(shù)據(jù)似乎顯示出一些我們可以研究的強(qiáng)烈季節(jié)性趨勢(shì)。
Additionally, before we begin breaking down our data, we want a consistent way to visualize our forecasts. We can produce a function that takes our training, testing, and any forecast data and plots it with Deephaven. This allows us to combine analysis from multiple libraries and methods with Deephaven’s powerful and interactive plotting.
此外,在開(kāi)始分解數(shù)據(jù)之前,我們需要一種一致的方式來(lái)可視化我們的預(yù)測(cè)。 我們可以產(chǎn)生一個(gè)函數(shù),將我們的訓(xùn)練,測(cè)試和所有預(yù)測(cè)數(shù)據(jù)都用Deephaven進(jìn)行繪制。 這使我們能夠?qū)?lái)自多個(gè)庫(kù)和方法的分析與Deephaven強(qiáng)大而交互式的繪圖相結(jié)合。
有馬 (ARIMA)
The ARIMA model stands for autoregressive, integrated moving average model.
ARIMA模型代表自回歸,集成移動(dòng)平均模型。
The Autoregressive, or AR component of the model, is a linear combination of the previous N seasonal lags. For our Peyton Manning model, this means some linear combination of the previous N weeks, months, or years.
模型的自回歸或AR分量是前N個(gè)季節(jié)滯后的線性組合。 對(duì)于我們的Peyton Manning模型,這意味著前N周,幾個(gè)月或幾年的線性組合。
The moving average component of the model is a linear combination of the error terms for the previous N seasonal lags, like so:
模型的移動(dòng)平均成分是前N個(gè)季節(jié)滯后的誤差項(xiàng)的線性組合,如下所示:
The ARIMA model will estimate the coefficients for both these linear combinations, given three parameters as input:
給定三個(gè)參數(shù)作為輸入,ARIMA模型將估算這兩個(gè)線性組合的系數(shù):
p: The order of the autoregressive model (the number of lagged terms), described in the AR equation above.
p:自回歸模型的順序(滯后項(xiàng)的數(shù)量),在上面的AR方程中描述。
q: The order of the moving average model (the number of lagged terms), described in the MA equation above.
q:移動(dòng)平均模型的階數(shù)(滯后項(xiàng)的數(shù)量),如上面的MA方程所述。
d: The number of differences required to make the time series stationary. A stationary time series is essentially a time series without a time-dependent trend, excluding the seasonality.
d:使時(shí)間序列固定所需的差數(shù)。 固定時(shí)間序列本質(zhì)上是沒(méi)有季節(jié)性相關(guān)趨勢(shì)的時(shí)間序列,不包括季節(jié)性。
In the example below, the blue time series would be considered stationary, while the red would be nonstationary, even though both may exhibit seasonal patterns.
在下面的示例中,藍(lán)色時(shí)間序列將被認(rèn)為是平穩(wěn)的,而紅色時(shí)間序列將被視為非平穩(wěn)的,即使這兩個(gè)時(shí)間序列都可能呈現(xiàn)季節(jié)性變化。
Now that we know what parameters we need to find, we can analyze our Peyton Manning data. At first glance, our data seems stationary. There doesn’t appear to be a time-dependent trend outside seasonal fluctuations, but we can test for this using the Augmented Dickey-Fuller Test.
既然我們知道需要找到什么參數(shù),就可以分析Peyton Manning數(shù)據(jù)。 乍一看,我們的數(shù)據(jù)似乎穩(wěn)定。 除季節(jié)性波動(dòng)外,似乎沒(méi)有隨時(shí)間變化的趨勢(shì),但是我們可以使用增強(qiáng)Dickey-Fuller檢驗(yàn)進(jìn)行檢驗(yàn)。
Our test returns a p-value well below the significance level, so we can confirm that our model is indeed stationary. Our parameter value for d is zero.
我們的測(cè)試返回的p值遠(yuǎn)低于顯著性水平,因此我們可以確認(rèn)我們的模型確實(shí)是平穩(wěn)的。 d的參數(shù)值為零。
Now we need to find the parameter values of P and Q. In order to do this, I used autocorrelation plots. Autocorrelation and partial autocorrelation plots can tell how strongly lagged terms correlated with a given observation. While partial autocorrelation plots tell the correlation with the lag term independent of other lags, autocorrelation plots factor in the “inertia” from other lags. Because of this, we can use partial autocorrelation to estimate our parameter for P, and autocorrelation to estimate our parameter for Q.
現(xiàn)在我們需要找到P和Q的參數(shù)值。為此,我使用了自相關(guān)圖。 自相關(guān)圖和局部自相關(guān)圖可以說(shuō)明滯后項(xiàng)與給定觀察值的相關(guān)程度。 盡管部分自相關(guān)圖告訴了與滯后項(xiàng)的相關(guān)性,而與其他滯后無(wú)關(guān),但自相關(guān)圖將其他滯后的“慣性”作為因素。 因此,我們可以使用偏自相關(guān)來(lái)估計(jì)P的參數(shù),并使用自相關(guān)來(lái)估計(jì)Q的參數(shù)。
Both plots show a periodic behavior in the lags, each around 7 days in length. This makes sense — Peyton Manning search frequency probably increases on game nights, when football is being played. In fact, these autocorrelation plots even show a slight 6-day correlation, which is likely due to Sunday night football. But since the lags of 7 days have the highest correlation with the observed value, we can estimate both P and Q to be 7.
這兩個(gè)圖都顯示了滯后的周期性行為,每個(gè)周期的長(zhǎng)度約為7天。 這是有道理的-在踢足球的比賽之夜,佩頓·曼寧的搜索頻率可能會(huì)增加。 實(shí)際上,這些自相關(guān)圖甚至顯示了輕微的6天相關(guān)性,這很可能是由于周日晚上的足球比賽所致。 但是由于7天的滯后與觀測(cè)值具有最高的相關(guān)性,因此我們可以估計(jì)P和Q均為7。
I should note that these autocorrelation plots presented a problem. The ARIMA parameters did not allow for lag inputs of over ~10, which meant that looking at annual (365) or monthly (30) seasonality would be very difficult.
我應(yīng)該注意,這些自相關(guān)圖存在問(wèn)題。 ARIMA參數(shù)不允許滯后輸入超過(guò)?10,這意味著查看年度(365)或每月(30)的季節(jié)性非常困難。
Now that we have our parameters, we can produce our ARIMA model.
現(xiàn)在我們有了參數(shù),我們可以生成ARIMA模型。
Before we make our forecasts, we can check our model assumptions for variance and normality with a residual plot and density plot.
在進(jìn)行預(yù)測(cè)之前,我們可以使用殘差圖和密度圖檢查模型假設(shè)的方差和正態(tài)性。
Since the residuals appear to be randomly distributed, and the kernel probability density plot appears normal, our model assumptions check out.
由于殘差似乎是隨機(jī)分布的,并且核概率密度圖似乎是正態(tài)的,因此我們的模型假設(shè)得到了檢驗(yàn)。
Plotting our model yields the following:
繪制模型將得出以下結(jié)果:
As we can see, not having access to the other scales of seasonality hurts this model’s viability. Not being able to capture multiple seasonal trends means that ARIMA is limited by one seasonality at a time. Regardless, we can return some error estimators to validate our model.
如我們所見(jiàn),無(wú)法使用其他季節(jié)性尺度會(huì)損害該模型的生存能力。 無(wú)法捕獲多個(gè)季節(jié)趨勢(shì)意味著ARIMA一次只能受到一個(gè)季節(jié)的限制。 無(wú)論如何,我們可以返回一些誤差估計(jì)量來(lái)驗(yàn)證我們的模型。
MSE (mean squared error): 0.8916776825661407
MSE (均方誤差):0.8916776825661407
MAPE (mean absolute percentage error): 0.10230290573107942
MAPE (平均絕對(duì)百分比誤差):0.10230290573107942
薩里瑪 (SARIMA)
We can actually validate our ARIMA model using the auto-SARIMA model from pmdarima. The auto-SARIMA model estimates the parameter values for p, q, and d for us so there is no need for the prelude above. In addition, SARIMA takes m, the period of seasonality, as a parameter. Unfortunately, the model parameter limitations again constrain us to m < 10, so we may only look at weekly seasonality.
實(shí)際上,我們可以使用pmdarima的auto-SARIMA模型驗(yàn)證ARIMA模型。 auto-SARIMA模型為我們估計(jì)p , q和d的參數(shù)值,因此不需要上面的前奏。 另外,SARIMA將季節(jié)周期m用作參數(shù)。 不幸的是,模型參數(shù)限制再次將我們限制為m <10 ,因此我們可能只查看每周的季節(jié)性。
Fitting and plotting our model gives us the following:
擬合和繪制模型可以得到以下結(jié)果:
Lastly, we can validate our model with error metrics:
最后,我們可以使用錯(cuò)誤指標(biāo)來(lái)驗(yàn)證我們的模型:
MSE (mean squared error): 0.8916776825661407
MSE (均方誤差):0.8916776825661407
MAPE (mean absolute percentage error): 0.10789283997956421
MAPE (平均絕對(duì)百分比誤差):0.10789283997956421
We see that our SARIMA model performed nearly identically to our ARIMA model, and in fact our ARIMA model gave a slightly lower mean absolute percentage error than SARIMA. We can be happy that we picked optimal parameters to fit our ARIMA model with.
我們看到,SARIMA模型的性能幾乎與ARIMA模型相同,并且實(shí)際上,ARIMA模型的平均絕對(duì)百分比誤差略低于SARIMA。 我們很高興選擇了適合ARIMA模型的最佳參數(shù)。
預(yù)言家 (PROPHET)
For our final model, we will be using Fbprophet.
對(duì)于我們的最終模型,我們將使用Fbprophet。
Fbprophet is a library from Facebook intended to handle seasonal time-series datasets. Prophet implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. In general, using Prophet requires much less hands-on work than our ARIMA model, and for the most part, we can feed our data directly into prophet like so:
Fbprophet是Facebook的一個(gè)庫(kù),用于處理季節(jié)性時(shí)間序列數(shù)據(jù)集。 先知實(shí)現(xiàn)了一種基于加性模型的時(shí)間序列數(shù)據(jù)預(yù)測(cè)程序,其中非線性趨勢(shì)與年,周和日的季節(jié)性以及假期效應(yīng)相吻合。 通常,與我們的ARIMA模型相比,使用Prophet所需的動(dòng)手工作少得多,并且在大多數(shù)情況下,我們可以像這樣將數(shù)據(jù)直接輸入到先知中:
This allows us to forecast one year ahead, and compare actual data with expected values and their boundaries.
這使我們可以預(yù)測(cè)一年,并將實(shí)際數(shù)據(jù)與期望值及其界限進(jìn)行比較。
In addition, Prophet allows us to break down this data into seasonal components:
此外,先知使我們可以將這些數(shù)據(jù)分解為季節(jié)性成分:
Manning’s page views peaked in 2012–2013, his MVP year. Unsurprisingly, Monday night football is when most fans look Manning up, and the monthly seasonal breakdown shows the crazy highs of December and March in stark contrast to the great drought of the summer.
曼寧的網(wǎng)頁(yè)瀏覽量在他的MVP年度(2012-2013)達(dá)到頂峰。 毫不奇怪,周一晚上的足球比賽是大多數(shù)球迷抬頭看曼寧的時(shí)候,每月的季節(jié)性故障顯示出12月和3月的瘋狂高點(diǎn),與夏季的干旱形成鮮明對(duì)比。
Prophet can do even more, and add changepoints to the data, where the trend is most likely to shift.
先知可以做更多的事情,并且可以向數(shù)據(jù)添加變化點(diǎn),而趨勢(shì)最有可能在此變化。
With this feature, Prophet roughly estimates the start and end of the season, especially capturing the window of the playoffs.
通過(guò)此功能,先知大致估計(jì)了賽季的開(kāi)始和結(jié)束,尤其是捕獲了季后賽的窗口。
By the eye test alone, our prophet models look much better and coherent than ARIMA. But we can again validate the model predictions using MSE and MAPE.
僅憑眼睛測(cè)試,我們的先知模型看上去比ARIMA更好,更連貫。 但是我們可以再次使用MSE和MAPE驗(yàn)證模型預(yù)測(cè)。
MSE (mean squared error): 0.35800021765342394
MSE (均方誤差):0.35800021765342394
MAPE (mean absolute percentage error): 0.059460265364126956
MAPE (平均絕對(duì)百分比誤差):0.059460265364126956
結(jié)論 (CONCLUSION)
Both error estimators clearly point to Prophet as the more accurate model. For large time-series data with multiple seasonalities, ARIMA has many shortcomings. Simply using regression on previous lags to estimate future values won’t cut it in predicting more complex time-series datasets. ARIMA may be useful for more limited datasets with simpler seasonal effects, but particularly for things like sensor data, page views, or energy consumption, complex nonlinear models like Prophet are required to make predictions.
兩種誤差估計(jì)器都明確指出先知是更準(zhǔn)確的模型。 對(duì)于具有多個(gè)季節(jié)性的大型時(shí)間序列數(shù)據(jù),ARIMA有許多缺點(diǎn)。 只需對(duì)先前的滯后使用回歸來(lái)估計(jì)未來(lái)值,就無(wú)法預(yù)測(cè)更復(fù)雜的時(shí)間序列數(shù)據(jù)集。 ARIMA可能對(duì)于季節(jié)效應(yīng)較為簡(jiǎn)單的有限數(shù)據(jù)集很有用,但是對(duì)于傳感器數(shù)據(jù),頁(yè)面瀏覽量或能源消耗之類的東西尤其如此,需要使用諸如Prophet之類的復(fù)雜非線性模型進(jìn)行預(yù)測(cè)。
Deephaven’s integration with Jupyter Notebooks allows for users to have unique, library-specific plotting methods and operations side by side with Deephaven features. Deephaven’s plotting in particular provides user-friendly visualization options in interactive plots when used in conjunction with new, cutting edge libraries like fbprophet.
Deephaven與Jupyter Notebooks的集成使用戶可以與Deephaven功能并排使用獨(dú)特的,特定于庫(kù)的繪圖方法和操作。 當(dāng)與新的尖端庫(kù)(例如fbprophet)結(jié)合使用時(shí),Deephaven的繪圖在交互式繪圖中尤其提供了用戶友好的可視化選項(xiàng)。
翻譯自: https://medium.com/dev-genius/forecasting-football-fever-fe46fa779b69
足球預(yù)測(cè)
總結(jié)
以上是生活随笔為你收集整理的足球预测_预测足球热的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 递归函数基例和链条_链条和叉子
- 下一篇: 梦到父亲摔倒什么预兆