一般线性模型和混合线性模型_线性混合模型如何工作
一般線性模型和混合線性模型
生命科學的數學統計和機器學習 (Mathematical Statistics and Machine Learning for Life Sciences)
This is the seventeenth article from my column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Linear Mixed Model (LMM) also known as Linear Mixed Effects Model is one of key techniques in traditional Frequentist statistics. Here I will attempt to derive LMM solution from scratch from the Maximum Likelihood principal by optimizing mean and variance parameters of Fixed and Random Effects. However, before diving into derivations, I will start slowly in this post with an introduction of when and how to technically run LMM. I will cover examples of linear modeling from both Frequentist and Bayesian frameworks.
這是我的《生命科學的數學統計和機器學習》專欄中的第十七篇文章,我試圖以一種簡單的方式來解釋生物信息學和計算生物學中使用的一些神秘的分析技術。 線性混合模型(LMM)也稱為線性混合效應模型,是傳統頻率統計中的關鍵技術之一。 在這里,我將嘗試通過優化固定效應和隨機效應的均值和方差參數,從最大似然原理從頭開始 獲得LMM解決方案。 但是,在深入探討衍生工具之前,我將在本文中慢慢介紹何時以及如何在技術上運行LMM 。 我將介紹來自頻繁框架和貝葉斯框架的線性建模示例。
數據非獨立性問題 (Problem of Non-Independence in Data)
Traditional Mathematical Statistics is based to a large extent on assumptions of the Maximum Likelihood principal and Normal distribution. In case of e.g. multiple linear regression these assumptions might be violated if there is non-independence in the data. Provided that data is expressed as a p by n matrix, where p is the number of variables and n is the number of observations, there can be two types of non-independence in the data:
傳統的數學統計在很大程度上是基于最大似然原理和正態分布的假設 。 在例如多重線性回歸的情況下,如果數據中存在非獨立性 ,則可能會違反這些假設。 假設數據用ap x n矩陣表示,其中p是變量數,n是觀察數,則數據中可以有兩種類型的非獨立性:
non-independent variables / features (multicollinearity)
非獨立變量/特征( 多重共線性 )
- non-independent statistical observations (grouping of samples) 非獨立統計觀察(樣本分組)
In both cases, the inverse data matrix needed for the solution of Linear Model is singular, because its determinant is close to zero due to correlated variables or observations. This problem is particularly manifested when working with a high-dimensional data (p >> n) where variables can become redundant and correlated, this is known as the Curse of Dimensionality.
在這兩種情況下,線性模型求解所需的逆數據矩陣都是奇異的 ,因為由于相關變量或觀測值,其行列式接近于零。 當使用高維數據(p >> n)時,此問題尤其明顯,其中變量可能變得多余且相關,這稱為維數詛咒 。
The Curse of Dimensionality: solution of linear model diverges in high-dimensional space, p >> n limit維度的詛咒:高維空間中線性模型的解發散,p >> n極限To overcome the problem of non-independent variables, one can for example select most informative variables with LASSO, Ridge or Elastic Net regression, while the non-independence among statistical observations can be taking into account via Random Effects modelling within the Linear Mixed Model.
為了克服非獨立變量的問題,例如可以使用LASSO ,Ridge或Elastic Net回歸選擇最具信息量的變量,而統計觀測值之間的非獨立性可以通過線性混合模型中的 隨機效應建模加以考慮。
image source圖像源I covered a few variable selection methods including LASSO in my post Select Features for OMICs Integration. In the next section, we will see an example of longitudinal data where grouping of data points should be addressed through the Random Effects modelling.
在我的文章《 用于OMIC集成的選擇功能》中,我介紹了一些變量選擇方法,包括LASSO。 在下一部分中,我們將看到一個縱向數據的示例,其中應通過隨機效應建模解決數據點分組的問題。
LMM and Random Effects modeling are widely used in various types of data analysis in Life Sciences. One example is the GCTA tool that contributed a lot to the research of long-standing problem of Missing Heritability. The idea of GCTA is to fit genetic variants with small effects all together as Random Effect withing LMM framework. Thanks to the GCTA model the problem of Missing Heritability seems to be solved at least for Human Height.
LMM和隨機效應建模廣泛用于生命科學中的各種類型的數據分析。 GCTA工具就是一個例子,它為長期遺留 遺傳力問題的研究做出了很大貢獻。 GCTA的想法是將具有較小影響的遺傳變異體與具有LMM框架的隨機效應結合在一起。 多虧了GCTA模型,遺傳力缺失的問題似乎至少對于人類身高可以解決 。
B.Maher, Nature, volume 456, 2008B.Maher,《自然》,第456卷,2008年Another popular example from computational biology is the Differential Gene Expression analysis with DESeq / DESeq2 R package that does not really run LMM but performs a variance stabilization/shrinkage that is one of essential points of LMM. The advantage of this approach is that lowly expressed genes can borrow some information from the highly expressed genes that allows for their more stable and robust testing.
來自計算生物學的另一個流行示例是使用DESeq / DESeq2 R軟件包進行的差異基因表達分析, 該軟件包不能真正運行LMM,但可以執行方差穩定化 /收縮,這是LMM的關鍵點之一。 這種方法的優勢在于,低表達的基因可以從高表達的基因中借用一些信息,從而可以進行更穩定,更可靠的測試。
Finally, LMM is one of the most popular analytical techniques in Evolutionary Science and Ecology where they use the state-of-the-art MCMCglmm package for estimating e.g. trait heritability.
最后,LMM是進化科學和生態學中最流行的分析技術之一,它們使用最先進的MCMCglmm軟件包來估計例如性狀遺傳力 。
數據非獨立性的示例 (Example of Non-Independence in Data)
As we concluded previously, LMM should be used when there is some sort of clustering among statistical observations / samples. This can be, for example, due to different geographic locations where the samples were collected, or different experimental protocols that produced the samples. Batch-effects in Biomedical Sciences is an example of such a grouping factor that leads to non-independence between statistical observations. If not properly corrected for, batch-effects in RNAseq data can lead to totally opposite co-expression pattern between two genes (Simpson’s paradox).
正如我們先前得出的結論,當統計觀測值/樣本之間存在某種聚類時,應使用LMM。 例如,這可能是由于收集樣品的地理位置不同或產生樣品的實驗方案不同所致。 生物醫學科學中的批量效應就是這種分組因子的一個例子,這種分組因子導致了統計觀察結果之間的獨立性。 如果不能正確校正,RNAseq數據中的批處理效應可能導致兩個基因之間完全相反的共表達模式( 辛普森悖論 )。
Another example can be a genetic relation between individuals. Finally, this can be repetitive measurements performed on the same individuals but at different time points, i.e. technical (not biological) replicates.
另一個例子可以是個體之間的遺傳關系 。 最后,這可以是對同一個人但在不同時間點進行的重復測量 ,即技術(非生物學)重復。
As an example of such clustering, we will consider a sleep deprivation study where sleeping time of 18 individuals was restricted, and a Reaction of their organism on a series of tests was measured during 10 days. The data includes three variables: 1) Reaction, 2) Days, 3) Subject, i.e. the same individual was followed during 10 days. To check how the overall reaction of the individuals changed as a response to the sleep deprivation, we will fit an Ordinary Least Squares (OLS) Linear Regression with Reaction as a response variable and Days as a predictor / explanatory variable with lm and display it with ggplot.
作為此類聚類的一個示例,我們將考慮一項睡眠剝奪研究 ,其中限制了18個人的睡眠時間,并在10天內測量了他們的生物在一系列測試中的React 。 數據包括三個變量:1)React,2)天,3)對象,即在10天內跟蹤了同一個人。 為了檢查個體的整體React如何隨著對睡眠剝奪的React而變化,我們將lm擬合為普通最小二乘(OLS)線性回歸,其中React為React變量,天為預測變量/解釋變量,并將其顯示為ggplot 。
We can observe that Reaction vs. Days has a increasing trend but with a lot of variation between days and individuals. Looking at the summary of the linear regression fit, we conclude that the slope is significantly different from zero, i.e. there is a statistically significant increasing relation between Reaction and Days. The grey area around the fitting line represents 95% confidence interval according to the formula:
我們可以觀察到,Respons vs. Days呈上升趨勢,但天與個人之間存在很大差異。 查看線性回歸擬合的摘要,我們得出結論,斜率與零顯著 不同 ,即,“React”和“天”之間存在統計上顯著的增加關系。 擬合線周圍的灰色區域表示根據公式的95%置信區間:
The magic number 1.96 originates from the Gaussian distribution and reflects a Z-score value covering 95% of the data in the distribution. To demonstrate how the confidence intervals are calculated under the hood by ggplot we will implement an identical Linear Regression fit in plain R using predict function.
幻數1.96來自高斯分布,反映的Z分數覆蓋分布中95%的數據 。 為了演示如何通過ggplot在引擎蓋下計算置信區間,我們將使用預測函數在平原R中實現相同的線性回歸擬合。
However, there is a problem with the fit above. The Ordinary Least Squares (OLS) assumes that all the observations are independent, which will result in uncorrelated and hence Normally distributed residuals. However, we know that the data points on the plot belong to 18 individuals (10 for each), i.e. the data points cluster within individuals and therefore are not independent. As an alternative way, we can fit linear model (lm) for each individual separately.
但是, 上面的擬合存在問題 。 普通最小二乘(OLS)假設所有觀測值都是獨立的 ,這將導致不相關且因此呈正態分布的殘差 。 但是,我們知道該圖上的數據點屬于18個個體(每個個體10個),即,數據點在個體內成簇 ,因此不是獨立的 。 作為一種替代方法,我們可以分別擬合每個人的線性模型( lm )。
We can see that most of the individuals have increasing Reaction profile while some have a neutral or even decreasing profile. Doesn’t it looks strange that the overall Reaction increases while individual slopes might be decreasing? Is the fit above really good enough?
我們可以看到,大多數人的React曲線都在增加,而有些人則是中性甚至 下降 。 整體React增加而單個斜率可能正在減少,這看起來并不奇怪嗎? 上面的合身真的足夠好嗎?
Did we capture all the variation in the data with the naive Ordinary Least Squares (OLS) Linear Regression model?
我們是否使用樸素的普通最小二乘(OLS)線性回歸模型捕獲了數據中的所有變化 ?
The answer is NO because we have not taken the non-independence between data points into account. As we will see later, we can do it much better with a Linear Mixed Model (LMM) that accounts for non-independence between the samples via Random Effects. Despite the term “Random Effects” might sound mysterious, we will show below that it is essentially equivalent to introducing one more fitting parameter in the Maximum Likelihood optimization.
答案是否定的,因為我們沒有考慮數據點之間的非獨立性。 正如我們將在后面看到的,我們可以使用線性混合模型(LMM)來做得更好,該模型通過隨機效應考慮了樣本之間的非獨立性。 盡管“隨機效應”一詞聽起來似乎很神秘,但我們將在下面顯示它基本上等效于在“最大似然性”優化中引入一個更合適的參數 。
頻率線性混合模型 (Frequentist Linear Mixed Model)
The naive linear fit that we used above is called Fixed Effects modeling as it fixes the coefficients of the Linear Regression: Slope and Intercept. In contrast Random Effects modeling allows for individual level Slope and Intercept, i.e. the parameters of Linear Regression are no longer fixed but have a variation around their mean values.
我們上面使用的幼稚線性擬合稱為固定效果 建模,因為它固定了線性回歸的系數:斜率和截距。 相反,“隨機效應”建模允許單個級別的“斜率”和“截距”,即線性回歸的參數不再固定,而是在其平均值附近有所變化 。
Variation of intercepts and slopes between individuals from the sleep study睡眠研究中個體之間的截距和斜率變化This concept reminds a lot about Bayesian statistics where the parameters of a model are random while the data is fixed, in contrast to Frequentist approach where parameters are fixed but the data is random. Indeed, later we will show that we obtain similar results with both Frequentist Linear Mixed Model and Bayesian Hierarchical Model. Another strength of LMM and Random Effects is that the fit is performed on all individuals simultaneously in the context of each other, that is all individual fits “know” about each other. Therefore, the slopes, intercepts and confidence intervals of the individual fits are affected by their common statistic, shared variance, this is called shrinkage toward the mean, we will cover it in more details when deriving LMM from scratch in the next post.
這個概念使人們想起了很多有關貝葉斯統計的問題 ,其中,在數據固定的情況下模型的參數是隨機的,與在參數固定但數據是隨機的情況下的頻繁性方法相反。 的確,稍后我們將證明,使用“ 頻繁線性混合模型”和“ 貝葉斯 層次模型”都可以獲得類似的結果。 LMM和隨機效應的另一個優勢是,在彼此的上下文中同時對所有個體執行擬合,也就是說,所有個體彼此“了解” 。 因此,各個擬合的斜率,截距和置信區間受其共同統計量( 共享方差)影響 ,這被稱為“均值收縮” ,在下一篇文章中從零開始導出LMM時,我們將更詳細地介紹它。
We will fit LMM with random slopes and intercepts for the effect of Days for each individual (Subject) using lmer function from lme4 R package. This will correspond to adding the (Days | Subject) term to the linear model Reaction ~ Days that was previously used inside the lm function.
我們將使用lme4 R包中的lmer函數,為LMM擬合隨機斜率并截取天數對每個個體(對象)的影響。 這將對應于將(Days | Subject)項添加到lm函數之前使用的線性模型Reaction?Days中。
We can immediately see two types of statistics reported: Fixed and Random Effects. The slope and intercept values for Fixed Effects look fairly similar to the ones obtained above with the OLS Linear Regression. On the other hand, the Random Effects statistics is where the adjustment for non-independence between samples occurs. We can see two types of variance reported: the one shared across slopes and intercepts, Name = (Intercept) and Name = Days, that reflects grouping the data points by Subject, and a Residual variance that remains un-modelled, i.e. we can not further reduce this variance within the given model. Further, comparing the Residual errors between Fixed (lm) and Random (lmer) effects models, we can see that the Residual error decreased for the Random Effects model meaning that we captured more variation in the response variable with the Random Effects model. The same conclusion can be drawn from comparing AIC and BIC values for the two models, again the LMM with Random Effects simply fits the data better. Now let us visualize the difference between Fixed Effects modeling vs. LMM modeling.
我們可以立即看到報告的兩種統計信息: 固定效應和隨機效應 。 固定效果的斜率和截距值看起來與上面使用OLS線性回歸獲得的斜率和截距值非常 相似 。 另一方面,“隨機效應”統計是在樣本之間進行非獨立性調整的地方。 我們可以看到報告了兩種類型的方差 :一種是跨坡度和截距共享的 ,即Name =(Intercept)和Name = Days,它反映了按Subject對數據點進行分組,而Residual方差仍未建模,即我們無法在給定模型內進一步減小這種差異。 此外,通過比較固定效應( lm )和隨機效應( lmer )模型之間的殘差,我們可以看到隨機效應模型的殘差降低了,這意味著我們用隨機效應模型捕獲 了響應變量中的更多變化 。 通過比較兩個模型的AIC 和 BIC 值可以得出相同的結論,具有隨機效應的LMM再次可以更好地擬合數據。 現在讓我們可視化固定效果建模與LMM建模之間的區別。
For this reason, we need to visualize confidence intervals of the LMM model. The standard way to build confidence intervals in the Frequentist / Maximum Likelihood framework is via bootstrapping. We will start with the population level (overall / average) fit, and re-run it a number of times using resampling with replacement and randomly removing 75% of samples for each iteration. At each iteration I am going to save LMM fit statistics. After the bootstrapped statistics have been accumulated, I am going to make two plots: first, showing bootstrapped LMM fits against the naive Fixed Effects fit used in the previous section; second, from the accumulated bootstrapped LMM fits I will compute the median, i.e. 50% percentile, as well as 5% and 95% percentiles that will determine the confidence intervals of the population level LMM fit, this will again be plotted versus the naive Fixed Effects fit.
因此,我們需要可視化LMM模型的置信區間 。 在頻繁/最大可能性框架中建立置信區間的標準方法是通過引導 。 我們將從總體水平(總體/平均)擬合開始,并使用替換進行重采樣并在每次迭代中隨機刪除75%的樣本來重新運行多次。 在每次迭代中,我將保存LMM擬合統計信息。 在累積了自舉統計信息之后,我將作兩個圖:首先,顯示自舉LMM與上一節中使用的天真的“固定效果”擬合的擬合;以及 第二,從累積的自舉LMM擬合中 ,我將計算中位數 ,即50%百分位數,以及將確定總體水平LMM擬合的置信區間的5%和95%百分位數,將再次與樸素的固定值作圖效果合適。
Fixed Effects (blue line, grey area) vs. bootstrapped LMM (black and red lines).固定效果(藍線,灰色區域)與自舉LMM(黑線和紅線)。Above, the Fixed Effects fit (blue line + grey 95% confidence intervals area) is displayed together with the computed bootstrapped LMM fits (left plot), and the summary statistics (percentiles) of the bootstrapped LMM fits (right plot). We can observe that the population level LMM fit (lmer, red line, right plot) is very similar to the Fixed Effects fit (lm, blue line on both plots), the difference is hardly noticeable, they overlap well. However, the computed bootstrapped fits (black thick lines, left plot) and confidence intervals for LMM (red dashed line, right plot) are a bit wider than for the Fixed Effects fit (grey area on both plots). This difference is partially due to the fact that the Fixed Effects fit does not account for individual level variation in contrast to LMM that accounts for both population and individual level variations.
上方顯示了固定效果擬合(藍線+ 95%的灰色置信區間灰色區域)以及計算出的自舉LMM擬合(左圖)和自舉LMM擬合的摘要統計量(百分數)(右圖)。 我們可以觀察到,總體水平的LMM擬合(lmer,紅線,右圖)與固定效應擬合(兩個圖上的lm,藍線)非常相似,差異幾乎不明顯,它們重疊得很好。 但是,LMM的計算的自舉擬合(黑色粗線,左側圖)和置信區間(紅色虛線,右側圖)比固定效果擬合(兩個圖上的灰色區域) 寬一些。 這種差異部分是由于固定效應擬合不考慮個體水平差異,而LMM卻同時考慮了人口和個體水平差異。
Another interesting thing is that we observe variations of Slope and Intercept around their mean values:
另一個有趣的事情是,我們觀察到“斜率”和“截距”均值附近的變化:
Therefore, one can hypothesize that the bootstrapping procedure for building confidence intervals within Frequentist framework can be viewed as allowing the slopes and intercepts to follow some initial (Prior) distributions, and then sampling their plausible values from the distributions. This sounds much like Bayesian statistics. Indeed, bootstrapping is very similar to the working horse of Bayesian statistics which is Markov Chain Monte Carlo (MCMC). In other words, Frequentist analysis with bootstrapping is to a large extent equivalent to Bayesian analysis, we will revisit this later in more details.
因此,可以假設在Frequentist框架內建立置信區間的引導過程可以看作是允許斜率和截距遵循某些初始( Prior )分布,然后從分布中采樣其合理值。 這聽起來很像貝葉斯統計。 確實,自舉非常類似于貝葉斯統計工作的馬可夫鏈蒙特卡洛(MCMC) 。 換句話說,帶有自舉的頻繁分析在很大程度上等效于貝葉斯分析,我們將在后面更詳細地討論。
What about individual slopes, intercepts and confidence intervals for each of the 18 individuals from the sleep deprivation study? Here we again plot their Fixed Effects statistics together with the LMM statistics.
睡眠剝奪研究的18個人中每個人的個人斜率,截距和置信區間如何? 在這里,我們再次繪制其固定效應統計數據和LMM統計數據。
Again, red solid and dashed lines correspond to the LMM fit while blue solid line and the grey area depict Fixed Effects Model. We can see that individual LMM fits (lmer) and their confidence intervals might be very different from the Fixed Effects (lm) model. In other words the individual fits are “shrunk” toward their common population level mean / median, all the fits help each other to have more stable and resembling population level slopes, intercepts and confidence intervals. In the next post, when deriving LMM from scratch, we will understand that this shrinkage toward the mean effect is achieved by adding one more fitting parameter (shared variance) in the Maximum Likelihood optimization procedure.
同樣,紅色實線和虛線對應于LMM擬合,而藍色實線和灰色區域則表示固定效果模型。 我們可以看到,單個LMM擬合(lmer)及其置信區間可能與固定效應(lm)模型有很大不同 。 換句話說,個體擬合將“收縮”至其共同的人口水平均值/中位數,所有擬合都相互幫助,從而使人口水平斜率,截距和置信區間更加穩定和相似。 在下一篇文章中,當從零開始推導LMM時,我們將理解,通過在最大似然優化過程中添加一個更多的擬合參數(共享方差)來實現向均值效果的縮小 。
頻繁/最大似然與貝葉斯擬合 (Frequentist / Maximum Likelihood vs. Bayesian Fit)
Before moving to the Bayesian Multilevel Models, let us briefly introduce the major differences between Frequentist and Bayesian approaches. Frequentist fit used by LMM through lme4 / lmer is based on the Maximum Likelihood principle, where we maximize the likelihood L(y) of observing the data y, which is equivalent to minimizing residuals of the model, the Ordinary Least Squares approach. In contrast, Bayesian linear model is based on Maximum Posterior Probability principle, where we assume the data to be distributed with some likelihood L(y), and add a Prior assumption on the parameters of the Linear Model.
在轉到貝葉斯多級模型之前,讓我們簡要介紹一下頻率和貝葉斯方法之間的主要區別。 LMM通過lme4 / lmer使用的頻繁擬合是基于最大似然原理,其中我們將觀察數據y的可能性L(y)最大化,這相當于最小化模型的殘差(普通最小二乘法)。 相反,貝葉斯線性模型基于最大后驗概率原理,在該模型中,我們假設數據以某種似然L(y)分布,并在線性模型的參數上添加了一個先驗假設。
Here we calculate a probability distribution of parameters (and not the data) of the model, which automatically gives us uncertainties (Credible Intervals) on the parameters.
在這里,我們計算模型參數(而不是數據)的概率分布,這將自動為我們提供參數的不確定性(可信區間)。
貝葉斯多層次模型 (Bayesian Multilevel Model)
Linear Mixed Models (LMM) with Bayesian Prior distributions applied to the parameters are called Bayesian Multilevel Models or Bayesian Hierarchical Models. Here, for implementing Bayesian fitting, we will use brms R package that has an identical to lme4 / lmer syntax. However, an important difference to remember is that fitting LMM via lme4 / lmer applies Maximum Likelihood (ML) principle, i.e. it does not use prior assumptions about the parameters (or one case say, it uses flat Priors), while Bayesian Multilevel Model in brms sets reasonable priors reflecting the data. Another thing worth mentioning is that brms uses probabilistic programming language Stan under the hood. We start with Bayesian population level fitting using brms and display the results:
將貝葉斯 先驗分布應用于參數的線性混合模型(LMM)稱為貝葉斯多級模型或貝葉斯層次模型 。 在這里,為了實現貝葉斯擬合,我們將使用與lme4 / lmer語法相同的brms R包。 但是,要記住的一個重要區別是,通過lme4 / lmer擬合LMM應用了最大似然(ML)原理,即,它不使用關于參數的先驗假設(或者說,它使用平坦的先驗),而貝葉斯多級模型用于brms設置反映數據的合理先驗。 值得一提的另一件事是, brms在后臺使用了概率編程語言Stan 。 我們從使用brms的貝葉斯總體水平擬合開始,并顯示結果:
Above, we again plot the Fixed Effects population level fit by the blue line and grey area for confidence intervals, we also add the population level Bayesian Multilevel Model using the solid red line for median and red dashed lines for credible intervals. As for the case of bootstrapped LMM fit, we can conclude that population level Bayesian Multilevel fit perfectly overlaps with the Fixed Effects fit, while the Bayesian credible intervals are somewhat wider than the 95% confidence intervals of the Fixed Effects fit. What about individual fits?
上圖,我們再次畫出了固定效果人口水平,用置信區間的藍線和灰色區域擬合,我們還使用實心紅色線表示中值,紅色虛線表示可信區間 ,添加了人口級別貝葉斯多級模型 。 對于自舉LMM擬合,我們可以得出結論:總體水平的貝葉斯多層擬合與固定效應擬合完全重疊 ,而貝葉斯可信區間比固定效應擬合的95%置信區間稍寬 。 那個人適合呢?
Similarly to the individual bootstrapped Frequentist LMM fits, we can see that the individual Bayesian fits with brms (red solid lines) do not always converge to the Fixed Effects Frequentist fits (blue solid lines), but rather “try” to align with the overall population level fit (previous plot) in order to be as similar to each other as possible. The Bayesian credible intervals look again sometimes very different compared to the Frequentist Fixed Effects confidence intervals. This is the result of using Bayesian Priors and accounting for non-normality and non-independence in the data via the multi-level modeling.
類似于各個自舉的Frequentist LMM擬合,我們可以看到帶有brms的單個Bayes擬合(紅色實線)并不總是收斂于Fixed Effects Frequentist擬合(藍色實線),而是“嘗試”以與總體對齊總體水平擬合(上一個圖),以便彼此盡可能相似 。 與貝葉斯固定效應置信區間相比,貝葉斯可信區間有時有時看起來也非常不同 。 這是使用貝葉斯先驗并通過多級建模解決數據中非正常性和非獨立性的結果。
摘要 (Summary)
In this post, we have learnt that the Frequentist Linear Mixed Model (LMM) and Bayesian Multilevel (Hierarchical) Model are used to account for non-independence and hence non-normality of data points. The models usually provide a better fit and explain more variation in the data compared to the Ordinary Least Squares (OLS) linear regression model (Fixed Effect). While the population level mean fit of the models typically converges to the Fixed Effect model, the individual fits as well as credible and confidence intervals can be very different reflecting better accounting for non-normality in data.
在這篇文章中,我們了解到,頻繁線性混合模型(LMM)和貝葉斯多層次(分層)模型用于說明數據點的非獨立性和非正態性。 與普通最小二乘(OLS)線性回歸模型(固定效應)相比,該模型通常可提供更好的擬合度并解釋數據的更多變化 。 盡管模型的總體水平均值擬合通常收斂于固定效應模型,但個體擬合以及可信度和置信區間可以非常不同,這反映了對數據非正態性的更好解釋。
In the comments below, let me know which analytical techniques from Life Sciences seem especially mysterious to you and I will try to cover them in the future posts. Check the codes from the post on my Github. Follow me at Medium Nikolay Oskolkov, in Twitter @NikolayOskolkov and do connect in Linkedin. In the next post, we are going to derive the Linear Mixed Model and program it from scratch from the Maximum Likelihood, stay tuned.
在下面的評論中,讓我知道生命科學的哪些分析技術對您來說似乎特別神秘 ,我將在以后的文章中嘗試介紹它們。 在我的Github上檢查帖子中的代碼。 跟隨我在中型Nikolay Oskolkov,在Twitter @NikolayOskolkov上進行連接,并在Linkedin中進行連接。 在下一篇文章中,我們將導出線性混合模型,并從“最大似然”中重新編程 ,敬請期待。
翻譯自: https://towardsdatascience.com/how-linear-mixed-model-works-350950a82911
一般線性模型和混合線性模型
總結
以上是生活随笔為你收集整理的一般线性模型和混合线性模型_线性混合模型如何工作的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 开关变压器绕制教程_教程:如何将变压器权
- 下一篇: react 官网动画库(react-tr