pymc3 贝叶斯线性回归_使用PyMC3估计的贝叶斯推理能力
pymc3 貝葉斯線(xiàn)性回歸
內(nèi)部AI (Inside AI)
If you’ve steered clear of Bayesian regression because of its complexity, this article shows how to apply simple MCMC Bayesian Inference to linear data with outliers in Python, using linear regression and Gaussian random walk priors, testing assumptions on observation errors from Normal vs Student-T prior distributions and comparing against ordinary least squares.
如果您由于復(fù)雜性而避開(kāi)了貝葉斯回歸,那么本文將介紹如何使用線(xiàn)性回歸和高斯隨機(jī)游走先驗(yàn),將簡(jiǎn)單的MCMC貝葉斯推斷應(yīng)用于帶有異常值的Python線(xiàn)性數(shù)據(jù),測(cè)試來(lái)自Normal或Student的觀(guān)察誤差假設(shè)-T先驗(yàn)分布,并與普通最小二乘法進(jìn)行比較。
Ref: Jupyter Notebook and GitHub Repository
參考: Jupyter Notebook 和 GitHub存儲(chǔ)庫(kù)
The beauty of Bayesian methods is their ability to generate probability estimates of the quality of derived models based on a priori expectations of the behaviour of the model parameters. Rather than a single point-solution, we obtain a distribution of solutions each with an assigned probability. A priori expectations of the data behaviour are formulated into a set of Priors, probability distributions over the model parameters.
的T貝葉斯方法,他的美是他們產(chǎn)生衍生車(chē)型的基礎(chǔ)上,模型參數(shù)的行為的先驗(yàn)預(yù)期質(zhì)量的概率估計(jì)能力。 而不是單點(diǎn)解,我們獲得了具有指定概率的解的分布。 數(shù)據(jù)行為的先驗(yàn)期望被公式化為一組Priors ,即模型參數(shù)上的概率分布。
The following scenario is based on Thomas Wiecki’s excellent article “GLM: Robust Linear Regression”1 and is an exploration of the Python library PyMC32 as a means to model data using Markov chain Monte Carlo (MCMC) methods in Python, with some discussion of the predictive power of the derived Bayesian models. Different priors are considered, including Normal, Student-T, Gamma and Gaussian Random Walk, with a view to model the sample data both using generalized linear models (GLM) and by taking a non-parametric approach, estimating observed data directly from the imputed priors.
以下場(chǎng)景基于Thomas Wiecki的出色文章“ GLM:穩(wěn)健的線(xiàn)性回歸”1,并且是對(duì)Python庫(kù)PyMC32的探索,該方法是使用Python中的馬爾可夫鏈蒙特卡洛(MCMC)方法對(duì)數(shù)據(jù)進(jìn)行建模的一種方法,并進(jìn)行了一些討論。貝葉斯模型的預(yù)測(cè)能力。 考慮了不同的先驗(yàn),包括法線(xiàn),Student-T,伽馬和高斯隨機(jī)游走,以期使用廣義線(xiàn)性模型(GLM)并通過(guò)采用非參數(shù)方法對(duì)樣本數(shù)據(jù)建模,從而直接從推算中估算觀(guān)察到的數(shù)據(jù)先驗(yàn)。
樣本數(shù)據(jù) (Sample Data)
Our sample data is generated as a simple linear model with given intercept and slope, with an additional random Gaussian noise term added to each observation. To make the task interesting, a small number of outliers are added to the distribution which have no algebraic relationship to the true linear model. Our objective is to model the observed linear model as closely as possible with the greatest possible robustness to disruption by the outliers and the observation error terms.
我們的樣本數(shù)據(jù)是作為具有給定截距和斜率的簡(jiǎn)單線(xiàn)性模型生成的,每個(gè)觀(guān)察值都添加了一個(gè)額外的隨機(jī)高斯噪聲項(xiàng)。 為了使任務(wù)有趣,向分布中添加了少量離群值,這些離群值與真實(shí)線(xiàn)性模型沒(méi)有代數(shù)關(guān)系。 我們的目標(biāo)是盡可能密切地建模觀(guān)察到的線(xiàn)性模型,以最大程度地增強(qiáng)對(duì)異常值和觀(guān)察誤差項(xiàng)的干擾。
普通與學(xué)生T先驗(yàn) (Normal vs Student-T priors)
We compare models based on Normal versus Student-T observation error priors. Wiecki shows that a Student-T prior is better at handling outliers than the Normal as the distribution has longer tails, conveying the notion that an outlier is not to be considered wholly unexpected.
我們比較基于正常與學(xué)生-T觀(guān)察誤差先驗(yàn)的模型。 Wiecki表示,由于分布的尾部較長(zhǎng),因此Student-T先驗(yàn)值在處理異常值方面要比正態(tài)值更好,這表達(dá)了一個(gè)觀(guān)點(diǎn),即異常值不應(yīng)被視為完全出乎意料。
真線(xiàn)性模型和普通最小二乘(OLS)回歸 (True linear model and Ordinary Least Squares (OLS) regression)
def add_outliers(y, outliers_x, outliers_y):y_observed = yfor i in range(len(outliers_x)):y_observed[outliers_x[i]] = outliers_y[i]y_predicted = np.append(y_observed, [0] * observables)y_predicted = np.ma.MaskedArray(y_predicted,np.arange(y_predicted.size)>=observables)return y_observed, y_predictedx_observed = np.linspace(0, 1.5, observables) x_delta = np.linspace(1.5, 3.0, observables) x_predicted = np.append(x_observed, x_delta)predicted_line = true_intercept + true_slope * x_predictedy = true_intercept + true_slope * x_observed + np.random.default_rng(seed=123).normal(scale=0.5, size=observables) # Add noisey_observed, y_predicted = add_outliers(y, outliers_x, outliers_y)def plot_sample_data(ax, y_min, y_max):ax.set_xlim(-0.04, 3.04)ax.set_ylim(y_min, y_max)plt.plot(x_predicted, y_predicted, '+', label='Sampled data', c=pal_2[5])def plot_true_regression_line(title, loc):plt.plot(x_predicted, predicted_line, lw=2., label = 'True regression', c='orange')plt.axvline(x=x_predicted[observables],linestyle='--', c='b',alpha=0.25)plt.title(title)plt.legend(loc=loc)data = dict(x=x_observed, y=y_observed) fig = plt.figure(figsize=(fig_size_x, fig_size_y)) ax1 = fig.add_subplot(121) plot_sample_data(ax1, y_min_compact, y_max_compact) seaborn.regplot(x="x", y="y", data=data, marker="+", color="b", scatter=False, label="RMSE regression") plot_true_regression_line('Simulated sample data - OLS Regression', 'best')ax2 = fig.add_subplot(122) plot_sample_data(ax2, y_min_compact, y_max_compact) seaborn.regplot(x="x", y="y", data=data, marker="+", color="g", robust=True, scatter=False, label="Robust RMSE regression") plot_true_regression_line('Simulated sample data - Robust OLS Regression', 'best')We start with a simulated dataset of observations of the form y=ax+b+epsilon, with the noise term epsilon drawn from a normal distribution. Here we 1) create observation values in the range x from 0 to 1.5, based on the ‘true’ linear model, 2) add test outliers and 3) extend the x axis from 1.5 to 3.0 so we can test the predictive power of our models into this region.
我們從y = ax + b + epsilon形式的觀(guān)測(cè)數(shù)據(jù)的模擬數(shù)據(jù)集開(kāi)始,噪聲項(xiàng)epsilon從正態(tài)分布中得出。 在這里,我們1)基于“真實(shí)”線(xiàn)性模型創(chuàng)建介于0到1.5范圍內(nèi)的x的觀(guān)測(cè)值,2)添加測(cè)試異常值,3)將x軸從1.5擴(kuò)展到3.0以測(cè)試我們的預(yù)測(cè)能力建模到該區(qū)域。
In the predictive region we have used numpy.MaskedArray() function in order to indicate to PyMC3 that we have unknown observational data in this region1?.
在預(yù)測(cè)區(qū)域中,我們使用了numpy.MaskedArray()函數(shù)來(lái)向PyMC3指示在該區(qū)域中我們未知的觀(guān)測(cè)數(shù)據(jù)。
It is useful to visualize the simulated data, along with the true regression line and ordinary least squares regression.
可視化模擬數(shù)據(jù)以及真實(shí)的回歸線(xiàn)和普通的最小二乘回歸非常有用。
The Seaborn library allows us to draw this plot directly?, and we can also set a ‘robust’ flag which calculates the regression line while de-weighting outliers.
Seaborn庫(kù)允許我們直接繪制此圖?,我們還可以設(shè)置一個(gè)“ robust ”標(biāo)志,該標(biāo)志在對(duì)異常值進(jìn)行加權(quán)時(shí)計(jì)算回歸線(xiàn)。
Plot of Sample Points, true regression line and Ordinary Least Squares regression (OLS) and Robust OLS.樣本點(diǎn),真實(shí)回歸線(xiàn)和普通最小二乘回歸(OLS)和穩(wěn)健OLS的圖。PyMC3的貝葉斯回歸 (Bayesian Regression with PyMC3)
Following the example of Wiecki, we can create linear regression models (GLM) in PyMC3, generating the linear model from y(x)= ‘y ~ x’.
?Following Wiecki的例子中,我們可以創(chuàng)建線(xiàn)性回歸模型(GLM)在PyMC3,生成從y中(x)的線(xiàn)性模型= 'Y?X'。
To test the notion of robust regression, we create two models, one based on a Normal prior of observational errors and a second based on the Student-T distribution, which we expect to be less influenced by outliers.
為了測(cè)試魯棒回歸的概念,我們創(chuàng)建了兩個(gè)模型,一個(gè)模型基于觀(guān)測(cè)誤差的正態(tài)先驗(yàn),另一個(gè)模型基于Student-T分布,我們希望它們不受異常值的影響較小。
data = dict(x=x_observed, y=y_observed) with pm.Model() as n_model:pm.glm.GLM.from_formula('y ~ x', data)trace_normal_glm = pm.sample(samples_glm, tune=tune_glm, random_seed=123, cores=2, chains=2)with pm.Model() as t_model:pm.glm.GLM.from_formula('y ~ x', data, family = pm.glm.families.StudentT())trace_studentT_glm = pm.sample(samples_glm, tune=tune_glm, random_seed=123, cores=2, chains=2)fig = plt.figure(figsize=(fig_size_x, fig_size_y)) ax1 = fig.add_subplot(121) plot_sample_data(ax1, y_min_compact, y_max_compact) pm.plot_posterior_predictive_glm(trace_normal_glm, eval=x_predicted, samples=50, label='Posterior predictive regression', c=pal_1[4]) plot_true_regression_line('Regression Fit assuming Normal error prior', 'best')ax2 = fig.add_subplot(122) plot_sample_data(ax2, y_min_compact, y_max_compact) pm.plot_posterior_predictive_glm(trace_studentT_glm, eval=x_predicted, samples=50, label='Posterior predictive regression', c=pal_1[4]) plot_true_regression_line('Regression Fit assuming Student-T error prior', 'best')The pymc3.sample() method allows us to sample conditioned priors. In the case of the Normal model, the default priors will be for intercept, slope and standard deviation in epsilon. In the case of the Student-T model priors will be for intercept, slope and lam?.
pymc3.sample()方法允許我們采樣條件先驗(yàn)。 在普通模型的情況下,默認(rèn)優(yōu)先級(jí)將用于ε,ε的截距,斜率和標(biāo)準(zhǔn)偏差。 對(duì)于Student-T模型,先驗(yàn)將用于截距,斜率和lam。
Running these two models creates trace objects containing a family of posteriors sampled for each model over all chains, conditioned on the observations, from which we we can plot individual glm regression lines.
運(yùn)行這兩個(gè)模型將創(chuàng)建一個(gè)跟蹤對(duì)象,該跟蹤對(duì)象包含所有鏈上每個(gè)模型采樣的后驗(yàn)對(duì)象,并以觀(guān)測(cè)為條件,從中我們可以繪制單個(gè)glm回歸線(xiàn)。
Plot of Sample Points, true regression line and Bayesian Linear Model, with Normal and Student-T priors具有正態(tài)和Student-T先驗(yàn)的樣本點(diǎn)圖,真實(shí)回歸線(xiàn)和貝葉斯線(xiàn)性模型正態(tài)誤差與Student-T誤差分布的比較 (Comparison of Normal vs Student-T error distributions)
A summary of the parameter distributions can be obtained from pymc3.summary(trace) and we see here that the intercept from the Student-T prior is 1.155 with slope 2.879. Hence much closer to the expected values of 1, 3 than those from the Normal prior, which gives intercept 2.620 with slope 1.608.
可以從pymc3.summary(trace)獲得參數(shù)分布的摘要,我們?cè)谶@里看到來(lái)自Student-T先驗(yàn)的截距為1.155 ,斜率為2.879。 因此,與正常先驗(yàn)的期望值相比,更接近于1、3的期望值,即截距2.620的斜率為1.608 。
完善Student-T錯(cuò)誤分布 (Refinement of Student-T error distribution)
It is clear from the plots above that we have improved the regression by using a Student-T prior, but we we hope to do better by refining the Student-T degrees of freedom parameter.
這是從上面我們已經(jīng)改善了現(xiàn)有使用Student-T的回歸情節(jié)清楚,但我們希望大家通過(guò)細(xì)化學(xué)生-T自由度參數(shù)做的更好。
def create_normal_glm_model(samples, tune, x, y):with pm.Model() as model_normal:a_0 = pm.Normal('a_0', mu=1, sigma=10)b_0 = pm.Normal('b_0', mu=1, sigma=10)mu_0 = a_0 * x + b_0sigma_0 = pm.HalfCauchy('sigma_0', beta=10)y_0 = pm.Normal('y_0', mu=mu_0, sigma=sigma_0, observed=y)trace_normal = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace_normal, pm.sample_posterior_predictive(trace_normal, random_seed=123)['y_0']def create_studentT_glm_model(samples, tune, x, y):with pm.Model() as model_studentT:a_0 = pm.Normal('a_0', mu=1, sigma=10)b_0 = pm.Normal('b_0', mu=1, sigma=10)mu_0 = a_0 * x + b_0nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace_studentT = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace_studentT, pm.sample_posterior_predictive(trace_studentT, random_seed=123)['y_0']trace_normal_glm_explicit, y_posterior_normal_glm = create_normal_glm_model(samples_glm, tune_glm, x_observed, y_observed) trace_studentT_glm_explicit, y_posterior_studentT_glm = create_studentT_glm_model(samples_glm, tune_glm, x_observed, y_observed)Models: The distribution for the Normal model’s standard deviation prior is a HalfCauchy, a positive domain Student-T with one degree of freedom1?, which mirrors our previous model.
模型:常規(guī)模型的標(biāo)準(zhǔn)偏差之前的分布是HalfCauchy,這是一個(gè)具有一個(gè)自由度的正Student-T域,與我們的先前模型類(lèi)似。
We also need to construct a new Student-T model where as well as parameterizing the glm we also create a prior on nu, the degrees of freedom parameter.
我們還需要構(gòu)造一個(gè)新的Student-T模型,在參數(shù)化glm的同時(shí),還要在nu (自由度參數(shù))上創(chuàng)建先驗(yàn)。
nu is modelled as a Gamma function, as recommended by Andrew Gelman? nu ~ gamma(alpha=2, beta=0.1), and retaining the lam value of 8.368 from the run above.
nu被建模為Gamma函數(shù),正如AndrewGelman?nu?gamma(alpha = 2,beta = 0.1)所建議的那樣,并且從上述運(yùn)行中保留了lam值為8.368。
Results: From summary(trace) we can see that the intercept from the Student-T error distribution is improved to 1.144, with slope 2.906 against the previous result of 1.155, with slope 2.879.
結(jié)果:從摘要(痕量),我們可以看到,從學(xué)生-T誤差分布截距提高到1.144,用斜率2.906針對(duì)1.155先前的結(jié)果,具有斜率2.879。
def calculate_regression_line(trace):a0 = trace['a_0']b0 = trace['b_0']y_pred = np.array([])for x_i in x_predicted:y_quartiles = np.percentile(a0 * x_i + b0, quartiles)y_pred = np.append(y_pred, y_quartiles)return np.transpose(y_pred.reshape(len(x_predicted), no_quartiles))def plot_intervals(ax, y_pred, title, pal, loc, y_min, y_max):plt.fill_between(x_predicted,y_pred[0,:],y_pred[4,:],alpha=0.5, color=pal[1])plt.fill_between(x_predicted,y_pred[1,:],y_pred[3,:],alpha=0.5, color=pal[2])plot_intervals_line(ax, y_pred, title, pal, loc, 1, y_min, y_max)def plot_intervals_line(ax, y_pred, title, pal, loc, lw, y_min, y_max):plot_sample_data(ax, y_min, y_max)plt.plot(x_predicted, y_pred[0,:], label='CI 95%', lw=lw, c=pal[1])plt.plot(x_predicted, y_pred[4,:], lw=lw, c=pal[1])plt.plot(x_predicted, y_pred[1,:], label='CI 50%', lw=lw, c=pal[2])plt.plot(x_predicted, y_pred[3,:], lw=lw, c=pal[2])plot_true_regression_line(title, loc)def plot_regression(y_pred_normal, y_pred_student, loc, y_min, y_max):fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plot_intervals(ax1, y_pred_normal, 'GLM Regression Fit - Normal error prior', pal_1, loc, y_min, y_max)plt.plot(x_predicted, y_pred_normal[2,:], c=pal_1[5],label='Posterior regression')ax2 = fig.add_subplot(122)plot_intervals(ax2, y_pred_student, 'GLM Regression Fit - Student-T error prior', pal_1, loc, y_min, y_max)plt.plot(x_predicted, y_pred_student[2,:], c=pal_1[5], label='Posterior regression')y_pred_normal_glm = calculate_regression_line(trace_normal_glm_explicit) y_pred_studentT_glm = calculate_regression_line(trace_studentT_glm_explicit)plot_regression(y_pred_normal_glm, y_pred_studentT_glm, 4, y_min_compact, y_max_compact)Visualizing credibility intervals: Having fitted the linear models and obtained intercept and slope for every sample we can calculate expected y(x) at every point on the x axis and then use numpy.percentile() to aggregate these samples into credibility intervals (CI)? on x.
可視化可信區(qū)間:擬合線(xiàn)性模型并獲得每個(gè)樣本的截距和斜率,我們可以計(jì)算x軸上每個(gè)點(diǎn)的預(yù)期y(x) ,然后使用numpy.percentile()將這些樣本匯總為可信區(qū)間(CI) ?在x上 。
We can visualize the mean, 50% and 95% credibility intervals of the sampled conditioned priors as all values for all Markov chains are held in the trace object. It is clear below how close the credibility intervals lie to the true regression for the Student-T model.
由于所有馬爾可夫鏈的所有值都保存在跟蹤對(duì)象中,因此我們可以可視化采樣條件條件先驗(yàn)的均值,50%和95%可信區(qū)間。 從下面可以清楚地看出,可信區(qū)間與Student-T模型的真實(shí)回歸有多接近。
Later in this article we consider a non-parametric model which follows the linear trend line while modelling fluctuations in observations, which can convey local information which we might wish to preserve.
在本文的稍后部分,我們將考慮一個(gè)遵循線(xiàn)性趨勢(shì)線(xiàn)的非參數(shù)模型,同時(shí)對(duì)觀(guān)測(cè)值的波動(dòng)建模,可以傳達(dá)我們可能希望保留的局部信息。
Plot of Credibility Intervals for Bayesian Linear Model, with Normal and Student-T priors貝葉斯線(xiàn)性模型的可信區(qū)間圖,具有先驗(yàn)和Student-T先驗(yàn)后驗(yàn)預(yù)測(cè)力 (Posterior predictive power)
Bayesian inference works by seeking modifications to the parameterized prior probability distributions in order to maximise a likelihood function of the observed data over the prior parameters.
乙 ayesian推理作品通過(guò)尋求為了最大化比現(xiàn)有參數(shù)觀(guān)測(cè)數(shù)據(jù)的似然函數(shù)的修改的參數(shù)先驗(yàn)概率分布。
So what happens to the expected posterior in regions where we have missing sample data? We should find that the posterior predictions follow the regression line, taking into account fluctuations due to the error term.
那么,在缺少樣本數(shù)據(jù)的區(qū)域中,預(yù)期后驗(yàn)會(huì)如何處理? 考慮到誤差項(xiàng)引起的波動(dòng),我們應(yīng)該發(fā)現(xiàn)后驗(yàn)預(yù)測(cè)遵循回歸??線(xiàn)。
建模未觀(guān)察到的數(shù)據(jù) (Modelling unobserved data)
We have created a y_predicted array of observations which in the region 1.5 to 3 is masked, indicating that these are unobserved samples. PyMC3 adds these missing observations as new priors, y_missing, and computes a posterior distribution over the full sample space.
我們創(chuàng)建了一個(gè)y_predicted觀(guān)測(cè)值數(shù)組,該數(shù)組在1.5到3的范圍內(nèi)被遮罩,表明這些是未觀(guān)察到的樣本。 PyMC3將這些缺失的觀(guān)測(cè)值添加為新的先驗(yàn)y_missing ,并計(jì)算整個(gè)樣本空間的后驗(yàn)分布。
trace_normal_glm_pred, y_posterior_normal_glm_pred = create_normal_glm_model(samples_glm, tune_glm, x_predicted, y_predicted) trace_studentT_glm_pred, y_posterior_studentT_glm_pred = create_studentT_glm_model(samples_glm, tune_glm, x_predicted, y_predicted)pm.summary(trace_studentT_glm_pred)Re-running the above models with this extended sample space yields statistics which now include the imputed, observations.
在擴(kuò)展的樣本空間下重新運(yùn)行上述模型將產(chǎn)生統(tǒng)計(jì)信息,其中包括估算的觀(guān)測(cè)值。
def plot_actuals(y_posterior_normal, y_posterior_student, loc, y_min, y_max):y_pred_normal = np.percentile(y_posterior_normal, quartiles, axis=0)y_pred_studentT = np.percentile(y_posterior_student, quartiles, axis=0)fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plt.plot(x_predicted, y_pred_normal[2,:],alpha=0.75, lw=3, color=pal_2[5],label='Posterior mean')plot_intervals(ax1, y_pred_normal, 'Predictive Samples - Normal error prior', pal_2, loc, y_min, y_max)ax2 = fig.add_subplot(122)plt.plot(x_predicted, y_pred_studentT[2,:],alpha=0.75, lw=3, color=pal_2[5],label='Posterior mean')plot_intervals(ax2, y_pred_studentT, 'Predictive Samples - Student-T error prior', pal_2, loc, y_min, y_max)return figfig = plot_actuals(y_posterior_normal_glm_pred, y_posterior_studentT_glm_pred, 4, y_min_compact, y_max_compact) fig.suptitle('Predicting from Linear Model', fontsize=16)When we run this model we see in the summary not only the priors for a0, b0, nu0 but also for all the missing y, as y_0_missing.
當(dāng)我們運(yùn)行該模型時(shí),我們?cè)?strong>摘要中不僅看到a0,b0,nu0的先驗(yàn),而且還看到所有缺失的y,如y_0_missing。
Having inferred the full range of predicted sample observations we can plot these posterior means, along with credibility intervals.
推斷出預(yù)測(cè)樣本觀(guān)察的全部范圍后,我們可以繪制這些后驗(yàn)均值以及可信區(qū)間。
We see below that the inferred observations have been well fitted to the expected linear model.
我們?cè)谙旅婵吹?#xff0c;推斷的觀(guān)測(cè)值已經(jīng)很好地?cái)M合了預(yù)期的線(xiàn)性模型。
Plot of Predicitive samples for Bayesian Linear Model, with Normal and Student-T priors貝葉斯線(xiàn)性模型的預(yù)測(cè)樣本的圖,具有先驗(yàn)和Student-T先驗(yàn)非參數(shù)推理模型 (Non-parametric Inference Models)
Having verified the linear model, it would be nice to see if we can model these sample observations with a non-parametric model, where we dispense with finding intercept and slope parameters and instead try to model the observed data directly with priors on every datapoint across the sample space11.
?AVING驗(yàn)證線(xiàn)性模型,這將是很好看,如果我們可以用一個(gè)非參數(shù)模型,其中我們用查找截距和斜率參數(shù)分配這些樣本觀(guān)測(cè)模型,而是嘗試將觀(guān)察到的數(shù)據(jù)直接與每一個(gè)數(shù)據(jù)點(diǎn)先驗(yàn)?zāi)P驼麄€(gè)樣品空間11。
非平穩(wěn)數(shù)據(jù)和先驗(yàn)隨機(jī)走動(dòng)12 (Non-Stationary data and Random Walk prior12)
The simple assumption with a non-parametric model is that the data is stationary, so the observed samples at each datapoint do not trend in a systematic way.
使用非參數(shù)模型的簡(jiǎn)單假設(shè)是數(shù)據(jù)是固定的,因此在每個(gè)數(shù)據(jù)點(diǎn)觀(guān)察到的樣本不會(huì)以系統(tǒng)的方式趨向。
def create_normal_stationary_inference_model(samples, tune, y):with pm.Model() as model_normal:sigma_0 = pm.HalfCauchy('sigma_0', beta=10)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)mu_0 = pm.Normal('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.Normal('y_0', mu=mu_0, sd=sigma_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']def create_studentT_stationary_inference_model(samples, tune, y):with pm.Model() as model_studentT:nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)mu_0 = pm.Normal('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']But real-world data does very often trend, or is at least constrained by requirements of smoothness to adjacent data points and then the choice of priors is crucial13.
但是,現(xiàn)實(shí)世界中的數(shù)據(jù)確實(shí)經(jīng)常會(huì)趨勢(shì)化,或者至少受到對(duì)相鄰數(shù)據(jù)點(diǎn)的平滑度要求的限制,然后選擇先驗(yàn)至關(guān)重要。
In our scenario the data is non-stationary, since it has a clear trend over x. If we use a simple gaussian prior for our observations, then even after conditioning with the observed data, as soon as we enter the unobserved region, the predicted observations rapidly return to their prior mean.
在我們的場(chǎng)景中,數(shù)據(jù)是不穩(wěn)定的,因?yàn)樵趚上有明顯的趨勢(shì)。 如果我們使用簡(jiǎn)單的高斯先驗(yàn)進(jìn)行觀(guān)察,那么即使在對(duì)觀(guān)察到的數(shù)據(jù)進(jìn)行條件調(diào)整之后,只要我們進(jìn)入未觀(guān)察到的區(qū)域,預(yù)測(cè)的觀(guān)察就會(huì)Swift返回到其先前的均值。
Plot of Predicitive samples for non-parametric model with stationary prior on y, for Normal and Student-T priors非參數(shù)模型的預(yù)測(cè)樣本的圖,y為平穩(wěn)先驗(yàn),Normal和Student-T先驗(yàn)高斯隨機(jī)游走先驗(yàn) (Gaussian Random Walk prior)
Gaussian random walk prior gives us the ability to model non-stationary data because it estimates an observable based on the previous data point state plus a random walk step. The PyMC3 implementation also allows us to model a drift parameter which adds a fixed scalar to each random walk step.
摹 aussian隨機(jī)游走之前讓我們來(lái)模擬非平穩(wěn)數(shù)據(jù)的能力,因?yàn)樗墓烙?jì)基于先前的數(shù)據(jù)點(diǎn)的狀態(tài)加上一個(gè)隨機(jī)游走一步可觀(guān)察到的。 PyMC3的實(shí)現(xiàn)還允許我們對(duì)漂移參數(shù)建模,該參數(shù)為每個(gè)隨機(jī)步階添加了一個(gè)固定的標(biāo)量。
def create_normal_grw_inference_model(samples, tune, y, drift):with pm.Model() as model_normal:sigma_0 = pm.HalfCauchy('sigma_0', beta=10)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)if drift:mu_drift = pm.Normal('mu_drift', mu=1, sigma=2)mu_0 = pm.GaussianRandomWalk('mu_0', mu=mu_drift, sd=sigma_1, shape=y.size)else:mu_0 = pm.GaussianRandomWalk('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.Normal('y_0', mu=mu_0, sd=sigma_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']def create_studentT_grw_inference_model(samples, tune, y, drift):with pm.Model() as model_studentT:nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)if drift:mu_drift = pm.Normal('mu_drift', mu=1, sigma=2)mu_0 = pm.GaussianRandomWalk('mu_0', mu=mu_drift, sd=sigma_1, shape=y.size)else:mu_0 = pm.GaussianRandomWalk('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']In these two inference models we have used a single scalar drift parameter which in this scenario is equivalent to the slope on the data in the direction of the drift.
在這兩個(gè)推理模型中,我們使用了單個(gè)標(biāo)量漂移參數(shù),在這種情況下,該參數(shù)等于數(shù)據(jù)在漂移方向上的斜率。
It is important that data, including outliers, is correctly ordered with equal sample step size, as the prior will assume this behaviour in modelling the data. For this reason outliers are inserted into x array at index points, not appended as (x, y).
重要的是,必須以相等的樣本步長(zhǎng)大小正確排序數(shù)據(jù)(包括異常值),因?yàn)橄闰?yàn)者將在建模數(shù)據(jù)時(shí)采用這種行為。 因此,離群值在索引點(diǎn)處插入 x數(shù)組中, 而不附加為(x,y)。
Below we run the two models, first with the drift parameter set to zero, leading to a static line at the last observation point, and second with the drift inferred from the data.
下面我們運(yùn)行兩個(gè)模型,第一個(gè)模型的漂移參數(shù)設(shè)置為零,在最后一個(gè)觀(guān)察點(diǎn)指向一條靜態(tài)線(xiàn),第二個(gè)模型從數(shù)據(jù)推斷出的漂移。
As can be seen, this final model gives a very nice fit to the linear trend, particularly when using the Student-T prior which is less influenced by the outliers.
可以看出,該最終模型非常適合線(xiàn)性趨勢(shì),尤其是在使用Student-T優(yōu)先級(jí)時(shí),該值不受異常值的影響較小。
Plot of posterior means for non-parametric model with random walk without drift prior on y, for Normal and Student-T priors非參數(shù)模型的后均值圖,具有隨機(jī)游動(dòng)且y上無(wú)漂移,對(duì)于Normal和Student-T先驗(yàn) Plot of posterior means for non-parametric model with random walk with drift prior on y, for Normal and Student-T priors非參數(shù)模型的后驗(yàn)均值圖,具有先驗(yàn)y漂移的隨機(jī)游走,對(duì)于Normal和Student-T先驗(yàn)繪制高斯隨機(jī)游走的后驗(yàn)樣本 (Plot posterior samples from Gaussian random walk)
def plot_samples(y_posterior, samples):show_label = Truefor rand_loc in np.random.randint(0, len(y_posterior), samples):rand_sample = y_posterior[rand_loc]if show_label:plt.plot(x_predicted, rand_sample, alpha=0.05, c=pal_1[5], label='Posterior samples')show_label = Falseelse:plt.plot(x_predicted, rand_sample, alpha=0.05, c=pal_1[5])def plot_posterior_predictive(y_posterior_normal, y_posterior_student, loc, samples, y_min, y_max):y_pred_normal = np.percentile(y_posterior_normal, quartiles, axis=0)y_pred_studentT = np.percentile(y_posterior_student, quartiles, axis=0)fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plot_samples(y_posterior_normal, samples)plt.plot(x_predicted, y_pred_normal[2,:], alpha=0.75, lw=3, color=pal_2[5], label='Posterior mean')plot_intervals_line(ax1, y_pred_normal, 'Random Walk Prior: Predictive Samples - Normal error prior', pal_2, loc, 2, y_min, y_max)ax2 = fig.add_subplot(122)plot_samples(y_posterior_student, samples)plt.plot(x_predicted, y_pred_studentT[2,:], alpha=0.75, lw=3, color=pal_2[5], label='Posterior mean')plot_intervals_line(ax2, y_pred_studentT, 'Random Walk Prior: Predictive Samples - Student-T error prior', pal_2, loc, 2, y_min, y_max)plot_posterior_predictive(y_posterior_normal, y_posterior_studentT, 4, 250, y_min_spread, y_max_compact)pm.traceplot(trace_studentT)Also, we now have the benefit that we can see the effect of the outliers in local perturbations to linearity.
而且,我們現(xiàn)在有一個(gè)好處,就是可以看到離群值在局部擾動(dòng)對(duì)線(xiàn)性的影響。
Below we visualize some of the actual random walk samples.
下面我們可視化一些實(shí)際的隨機(jī)游動(dòng)樣本。
We should not expect any particular path to be close to the actual predicted path of our sample data. In this sense it is a truly stochastic method as our result is derived from the statistical mean of the sampled paths, not from any particular sample’s path.
我們不應(yīng)期望任何特定路徑接近我們樣本數(shù)據(jù)的實(shí)際預(yù)測(cè)路徑。 從這個(gè)意義上說(shuō),這是一種真正的隨機(jī)方法,因?yàn)槲覀兊慕Y(jié)果是從采樣路徑的統(tǒng)計(jì)平均值中得出的,而不是從任何特定樣本的路徑中得出的。
Plot of Predictive samples for non-parametric model with random walk with drift prior on y, for Normal and Student-T priors非參數(shù)模型的預(yù)測(cè)樣本圖,具有隨機(jī)游走且y方向具有先驗(yàn)漂移,對(duì)于Normal和Student-T先驗(yàn)離群值對(duì)隨機(jī)游走預(yù)測(cè)后驗(yàn)的影響 (Impact of Outliers on Random Walk predictive posteriors)
# Remove the outliers y = true_intercept + true_slope * x_observed + np.random.default_rng(seed=123).normal(scale=0.5, size=observables) # Add noisey_observed, y_predicted = add_outliers(y, [], [])trace_normal, y_posterior_normal_linear = create_normal_grw_inference_model(samples_predictive, tune_predictive, y_predicted, True) # With Drift trace_studentT, y_posterior_studentT_linear = create_studentT_grw_inference_model(samples_predictive, tune_predictive, y_predicted, True) # With DriftHaving seen the effect that the outliers have on the posterior mean, we surmise that the noisiness of the predictive samples might be due to perturbation of the random walk caused by the outliers.
看到離群值對(duì)后均值的影響后,我們推測(cè)預(yù)測(cè)樣本的噪聲可能是由于離群值引起的隨機(jī)游走擾動(dòng)。
To test this, we have here removed the outliers and rerun the same two predictive models.
為了測(cè)試這一點(diǎn),我們?cè)谶@里刪除了異常值,然后重新運(yùn)行相同的兩個(gè)預(yù)測(cè)模型。
Plot of Predictive posterior means for no-outlier data, with random walk with drift prior on y, for Normal and Student-T priors對(duì)于非異常數(shù)據(jù)的預(yù)測(cè)后驗(yàn)均值圖,對(duì)于正常和學(xué)生T先驗(yàn),在y之前具有隨機(jī)漂移且漂移為y Plot of Predictive samples for no-outlier data, with random walk with drift prior on y, for Normal and Student-T priors對(duì)于非離群數(shù)據(jù)的預(yù)測(cè)樣本的圖,對(duì)于正態(tài)和Student-T先驗(yàn),具有隨機(jī)漂移且在y之前具有漂移As can be seen, the predictive samples are much cleaner, with very tight credibility intervals and much smaller divergence in the random walk samples. This gives us confidence that with a correct model the Gaussian random walk is a well conditioned prior for non-stationary model fitting.
可以看出,預(yù)測(cè)樣本更干凈,可信區(qū)間非常緊密,隨機(jī)游走樣本的差異小得多。 這使我們相信,使用正確的模型,高斯隨機(jī)游走是非平穩(wěn)模型擬合之前的良好條件。
結(jié)論 (Conclusion)
The intention of this article was to show the simplicity and power of Bayesian learning with PyMC3 and hopefully we have:
本文的目的是展示使用PyMC3進(jìn)行貝葉斯學(xué)習(xí)的簡(jiǎn)單性和功能,希望我們有:
- shown that the Student-T error prior performs better that the Normal prior in robust linear regression; 表明在穩(wěn)健線(xiàn)性回歸中,Student-T誤差先驗(yàn)比正常先驗(yàn)表現(xiàn)更好;
- demonstrated that the refined degrees of freedom on the Student-T error prior improves the fit to the regression line; 證明了對(duì)Student-T錯(cuò)誤的改進(jìn)后的自由度提高了對(duì)回歸線(xiàn)的擬合度;
- shown we can quickly and easily draw credibility intervals on the posterior samples for both the linear regression and the non-parametric models; 表明我們可以在線(xiàn)性樣本和非參數(shù)模型上快速輕松地在后驗(yàn)樣本上繪制可信區(qū)間;
- demonstrated that the non-parametric model using Gaussian random walk with drift can predict the linear model in unobserved regions and at the same time allow model local fluctuations in the observed data which we might want to preserve; 證明了使用帶有漂移的高斯隨機(jī)游走的非參數(shù)模型可以預(yù)測(cè)未觀(guān)測(cè)區(qū)域的線(xiàn)性模型,同時(shí)允許我們可能希望保留的觀(guān)測(cè)數(shù)據(jù)中的模型局部波動(dòng);
- investigated the Gaussian random walk prior and shown that its chaotic behaviour allows it to explore a wide sample space while not preventing it generating smooth predictive posterior means. 對(duì)高斯隨機(jī)游走進(jìn)行了先驗(yàn)研究,結(jié)果表明其混沌行為使其能夠探索廣闊的樣本空間,同時(shí)又不會(huì)阻止其生成平滑的預(yù)測(cè)后驗(yàn)均值。
Thanks for reading, please feel free to add comments or corrections. Full source code can be found in the Jupyter Notebook and GitHub Repository.
感謝您的閱讀,請(qǐng)隨時(shí)添加評(píng)論或更正。 完整的源代碼可以在Jupyter Notebook和GitHub Repository中找到 。
翻譯自: https://towardsdatascience.com/the-power-of-bayesian-inference-estimated-using-pymc3-6357e3af7f1f
pymc3 貝葉斯線(xiàn)性回歸
總結(jié)
以上是生活随笔為你收集整理的pymc3 贝叶斯线性回归_使用PyMC3估计的贝叶斯推理能力的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 梦到小鬼孩是什么意思
- 下一篇: 梦到捡鸡枞菌是什么意思