batch lr替代关系_建立关系的替代方法
batch lr替代關(guān)系
Linear regression is one of the most well-known and simple tools for statistics and machine learning.
線性回歸是統(tǒng)計(jì)和機(jī)器學(xué)習(xí)中最知名的和簡(jiǎn)單的工具之一。
In this article, you can explore a linear regression algorithm, how it operates, and how you can better use it?
在本文中,您可以探索線性回歸算法,其運(yùn)作方式以及如何更好地使用它?
Linear regression (LR) is a simple yet powerful supervised learning technique. It is applied in a large number of situations.
線性回歸(LR)是一種簡(jiǎn)單但功能強(qiáng)大的監(jiān)督學(xué)習(xí)技術(shù) 。 它適用于許多情況。
LR determines how the input variable termed as the explanatory variables affecting the output variable named the response variable. It uses the best fit straight line with the smallest number of square residuals nicknamed the line of regression or the square least line. The simple linear model contains only one independent variable called simple linear regression. While the multiple linear regression has more than one explanatory variable.
LR確定稱為解釋變量的輸入變量如何影響稱為響應(yīng)變量的輸出變量 。 它使用最佳擬合的直線,其殘差最小的平方數(shù)被稱為回歸線或最小平方線 。 簡(jiǎn)單線性模型僅包含一個(gè)稱為簡(jiǎn)單線性回歸的自變量。 而多元線性回歸具有多個(gè)解釋變量。
LR handles the study of continuous variables. It’s beneficial for companies to forecasts such as the future market trend and the salary relationship with the experience. LR used in forecasting, time series, and cause-effect relationships. The association between reckless driving and road injuries, for example.
LR處理連續(xù)變量的研究。 對(duì)于公司而言,預(yù)測(cè)諸如未來(lái)的市場(chǎng)趨勢(shì)以及與經(jīng)驗(yàn)的薪資關(guān)系是有益的。 LR用于預(yù)測(cè),時(shí)間序列和因果關(guān)系。 例如,魯re駕駛和道路傷害之間的關(guān)聯(lián)。
LR could be either positive or negative. A positive relationship between the two variables means that an increase in the value of one variable always increases in the value of the other variable. On the other hand, a negative relationship between two variables means that an increase in the value of one variable means a reduction in the value of the other variable.
LR可以是正數(shù)或負(fù)數(shù)。 兩個(gè)變量之間的正相關(guān)關(guān)系意味著一個(gè)變量值的增加始終會(huì)增加另一個(gè)變量的值。 另一方面,兩個(gè)變量之間的負(fù)關(guān)系意味著一個(gè)變量的值增加意味著另一個(gè)變量的值減小。
線性回歸的假設(shè) (Assumptions of linear regression)
· The relationship between the dependent variable y and the independent variable x always linear. The coefficients of x must so be linear and unrelated. You cannot allow the coefficients to be the function of each other.
·因變量y和自變量x之間的關(guān)系始終是線性的。 x的系數(shù)必須是線性的并且不相關(guān)。 您不能允許系數(shù)互為函數(shù)。
· The independent variables must also be non-random in non-financial applications. Besides, in financial scenarios, the approximation to a random independent variable can be accurate as long as the error variable and the independent variable are not associated.
·在非金融應(yīng)用程序中,自變量也必須是非隨機(jī)的。 此外,在財(cái)務(wù)場(chǎng)景中,只要誤差變量和自變量不相關(guān)聯(lián),對(duì)隨機(jī)自變量的近似就可以是準(zhǔn)確的。
· Multicollinearity occurs when independent variables associated. With the correlation matrix where the correlation coefficient of all the variables must be less than 1. Tolerance is another measure of multi-collinearity. Tolerance defined by T=1-R2, where T<0.1 may be multicollinear and T<0.01 is multicollinear. For the variable inflation factor (VIF), VIF>10 is multicollinearity among variables.
·當(dāng)自變量關(guān)聯(lián)時(shí)發(fā)生多重共線性 。 對(duì)于所有變量的相關(guān)系數(shù)必須小于1的相關(guān)矩陣, 公差是多共線性的另一種度量。 由T = 1-R2定義的公差,其中T <0.1可以是多共線,而T <0.01是多共線。 對(duì)于可變通脹因子 (VIF),VIF> 10是變量之間的多重共線性。
· The word error is usually spread. It tested to shape a histogram or a Q-Q residual plot. The histogram should be symmetrical and bell-shaped and the points of the Q-Q plot should be on a 45-degree axis.
·錯(cuò)誤一詞通常會(huì)傳播。 它經(jīng)過(guò)測(cè)試可以塑造直方圖或QQ殘差圖。 直方圖應(yīng)對(duì)稱且呈鐘形,并且QQ圖的點(diǎn)應(yīng)位于45度軸上。
· The variance of the definition of error is constant. This called Homoscedasticity Constraint or Constant Error Variance. It evaluated using a scatter plot. Breusch-Pagan test used to test for homoscedasticity. Performs an extra analysis with squared residues on independent variables.
·誤差定義的方差是恒定的。 這稱為同 方差 約束或恒定誤差方差 。 使用散點(diǎn)圖進(jìn)行了評(píng)估。 Breusch-Pagan檢驗(yàn)用于測(cè)試均方差。 對(duì)自變量平方殘差執(zhí)行額外的分析。
? Autocorrelation happens where the residues are not independent of each other. Durbin-Watson (DW) checks the null hypothesis that the residues are not self-correlated. A DW statistic below 2 signals that nearby residuals correlated with one another.
?自相關(guān)發(fā)生在殘基彼此不獨(dú)立的情況下。 Durbin-Watson (DW)檢查了殘基不是自相關(guān)的原假設(shè)。 低于2的DW統(tǒng)計(jì)信號(hào)表明附近的殘差彼此相關(guān)。
? If LR makes reliable predictions, your input and output variables will be Gaussian distribution. Multivariate normality under which all variables expected to be multivariate and regular. Identified using the histogram or Q-Q plot. Further, verify the normality of the fitness test using the Kolmogorov-Smirnov test. When the data is not usually distributed for translation, log transformation done.
?如果LR做出可靠的預(yù)測(cè),則您的輸入和輸出變量將為高斯分布。 多元正態(tài)性,所有變量均應(yīng)為多元正態(tài)。 使用直方圖或QQ圖識(shí)別。 此外,使用Kolmogorov-Smirnov檢驗(yàn)驗(yàn)證適應(yīng)性檢驗(yàn)的正常性。 如果通常不分發(fā)數(shù)據(jù)進(jìn)行轉(zhuǎn)換,則完成日志轉(zhuǎn)換。
預(yù)測(cè)的準(zhǔn)確性水平 (Level of the accuracy of the prediction)
· The scale of the residues gives a clear example of how effective a regression line is to estimate Y values from X values. This calculation referred to as the standard error of the estimation. This is the standard deviation of the estimate. The smaller the number, the more precise the forecasts appear to be.
·殘基的規(guī)模清楚地說(shuō)明了回歸線從X值估計(jì)Y值的有效性。 該計(jì)算稱為估計(jì)的標(biāo)準(zhǔn)誤差。 這是估算值的標(biāo)準(zhǔn)偏差。 數(shù)字越小,預(yù)測(cè)似乎越精確。
· The reliability of the model tested using the formula R2, which is the square of the association between x and y. The stronger the R2 the more it suits. It’s still between 0 and 1. The stronger the linear alignment, the closer the R2 is to 1.
·使用公式R2進(jìn)行測(cè)試的模型的可靠性,公式R2是x和y之間關(guān)聯(lián)的平方。 R2越強(qiáng),則越適合。 它仍然在0到1之間。線性對(duì)齊越強(qiáng),R2越接近1。
· Adjusted R2 is an extra method that applies R2 to the number of explanatory variables in the equation. This used to control whether extra explanatory variables are part of the equation. Based R2 is the strongest approximation of the connection. Adjusted R2 may be negative, although that is not the case.
· 調(diào)整后的R2是將R2應(yīng)用于等式中解釋變量的數(shù)量的另一種方法。 這用于控制額外的解釋變量是否為方程式的一部分。 基于R2的是連接的最強(qiáng)近似值。 調(diào)整后的R2可能為負(fù),但事實(shí)并非如此。
In an over-fitting setting, a high R2 value, which contributes to a decreased predictability achieved. That is not the case with the R2 adjusted. Each variable added to the model increases R2 and never decreases. While the adjusted R2 only rises if the new predictor strengthens the LR model.
在過(guò)擬合的設(shè)置中,較高的R2值會(huì)導(dǎo)致降低可預(yù)測(cè)性。 調(diào)整R2并非如此。 添加到模型中的每個(gè)變量都會(huì)增加R2,而不會(huì)減少。 而僅當(dāng)新的預(yù)測(cè)變量增強(qiáng)了LR模型時(shí),調(diào)整后的R2才會(huì)增加。
建立關(guān)系的替代方法 (Alternative approaches to modeling the relationship)
· Many alternative explanatory factors are categorical and can’t test on a quantitative scale. It’s a trick to use dummy variables. A dummy variable is a variable with a potential value between 0 and 1. Example of gender, quarter.
·許多其他解釋性因素是絕對(duì)的,不能在定量范圍內(nèi)進(jìn)行檢驗(yàn)。 使用偽變量是一個(gè)技巧。 虛擬變量是可能值為0到1之間的變量。性別示例,季度。
· You may have an interaction variable combination of two explanatory variables. Including an interaction variable in a regression equation, if, you assume that the influence of one explanatory variable on y depends on the value of another explanatory variable.
·您可能具有兩個(gè)解釋變量的交互變量組合。 如果假設(shè)回歸變量中包含一個(gè)交互變量,則假定一個(gè)解釋變量對(duì)y的影響取決于另一個(gè)解釋變量的值。
· Nonlinear transformations of variables used as a consequence of curvature found in scatterplots. You should transform the dependent variable y or either of the explanatory variables, x or you can do all. It involves the normal logarithm, the square root, the reciprocal, and the square.
·由于散點(diǎn)圖中的曲率而導(dǎo)致的變量的非線性變換。 您應(yīng)該轉(zhuǎn)換因變量y或任一解釋變量x,否則可以全部轉(zhuǎn)換。 它涉及正常對(duì)數(shù),平方根,倒數(shù)和平方。
為什么要在回歸中記錄變量? (Why log your variables in a regression?)
? The variable’s got the right skew and taking a log will make the distribution of the transformed variable symmetrical. But this is not enough excuse to log the variable. There are no regression rules that govern the independent or dependent variables to be normal. If you have outliers in your dependent or independent variables, a log transformation cut the effect.
?變量具有正確的偏斜,并且取對(duì)數(shù)將使變換后的變量的分布對(duì)稱。 但這還不足以記錄變量。 沒(méi)有將自變量或因變量控制為正常的回歸規(guī)則。 如果因變量或因變量中有離群值,則對(duì)數(shù)轉(zhuǎn)換會(huì)減少影響。
? The variance of your regression residuals is increasing with your regression predictions. Taking the log of your dependent or independent variables may drop the heteroscedasticity.
?回歸殘差的方差隨著回歸預(yù)測(cè)的增加而增加。 記錄因變量或自變量的對(duì)數(shù)可能會(huì)降低異方差性。
? Your regression residual variance is growing with your regression forecasts. Taking a log of the dependent or independent variables that cut heteroscedasticity. Your regression residual is not normal. It might or may not have been a problem for you. Even if the residues are not usual. you should log the dependent or independent variables and verify whether the residuals are regular after the log transformation.
?您的回歸殘差方差隨著您的回歸預(yù)測(cè)而增長(zhǎng)。 記錄減少異方差的因變量或自變量的對(duì)數(shù)。 您的回歸殘差不正常。 這可能對(duì)您來(lái)說(shuō)不是問(wèn)題。 即使殘留物不常見。 您應(yīng)該記錄因變量或自變量,并在對(duì)數(shù)轉(zhuǎn)換后驗(yàn)證殘差是否為正則。
? If dependent and independent variables do not have a linear and exponential relation. For example, the amount of income correlated with food consumption. The proportional rise in income would raise consumption to a certain amount and, after that, food consumption would either flatten or even decrease.
?如果因變量和自變量不具有線性和指數(shù)關(guān)系。 例如,收入數(shù)額與糧食消費(fèi)相關(guān)。 收入的成比例增長(zhǎng)將使消費(fèi)增加到一定程度,此后,糧食消費(fèi)將趨于平緩甚至下降。
自變量的相關(guān)性 (The relevance of the independent variable)
The underlying idea is that parsimony demonstrates most with the least. It supports a model with less explanatory variables. The below techniques can be used to identify explanatory variable significance in the linear regression equation.
其基本思想是, 簡(jiǎn)約性表現(xiàn)出最少的表現(xiàn)。 它支持具有較少解釋變量的模型。 以下技術(shù)可用于識(shí)別線性回歸方程式中的解釋變量重要性。
The coefficient of correlation describes the strength and direction of the linear relationship between x and y. The hypothesis test helps one to determine, if the population correlation coefficient value is close to zero, or if it is different from zero.
相關(guān)系數(shù)描述了x和y之間線性關(guān)系的強(qiáng)度和方向。 假設(shè)檢驗(yàn)有助于確定總體相關(guān)系數(shù)值是否接近零,或者是否不同于零。
When the test determines the correlation coefficient is different from zero, the correlation coefficient is important. If the test shows that the correlation coefficient is close to zero, we assume the correlation coefficient is not significant. There are two methods to test the significance of using p-value and t statistic.
當(dāng)測(cè)試確定相關(guān)系數(shù)不同于零時(shí),相關(guān)系數(shù)很重要。 如果測(cè)試表明相關(guān)系數(shù)接近零,則我們假設(shè)相關(guān)系數(shù)不顯著。 有兩種方法可以檢驗(yàn)使用p值和t統(tǒng)計(jì)量的重要性。
T-values of regression coefficients to include or exclude explanatory variables in the regression equation. A variable assumed to be important if p-value < 0.05 at 95% confidence level and t statistic > 2 use in the regression equation. If t statistic is less than 1, then it is a statistical fact that standard error would decrease and adjusted R2 will increase if this variable excluded from the regression equation.
回歸系數(shù)的T值 ,以在回歸方程中包含或排除解釋變量。 如果在回歸方程中使用p值 <0.05(在95%置信水平下且t統(tǒng)計(jì)量> 2),則認(rèn)為該變量很重要。 如果t統(tǒng)計(jì)量小于1,則是一個(gè)統(tǒng)計(jì)事實(shí),如果將此變量從回歸方程中排除,則標(biāo)準(zhǔn)誤差將減小,而調(diào)整后的R2將增大。
F-test method to determine if the explained variation is high relative to the unexplained variation. The F-test of significance is the hypothesis test for the linear relationship. It has a related p-value that allows the test to run. If the F-value of the ANOVA table is large and the corresponding p-value is small. Reject the null hypothesis and assume explanatory variables have some value.
F檢驗(yàn)方法,用于確定所解釋的變化相對(duì)于無(wú)法解釋的變化是否較高。 顯著性F檢驗(yàn)是線性關(guān)系的假設(shè)檢驗(yàn)。 它具有相關(guān)的p值,該值允許測(cè)試運(yùn)行。 如果方差分析表的F值較大而相應(yīng)的p值較小。 拒絕原假設(shè),并假設(shè)解釋變量具有一定價(jià)值。
結(jié)論 (Conclusion)
Regression Analysis used in the broader sense. Yet, it focuses on quantifying shifts in the dependent variable related to adjustments in the independent variable. It is since all linear or non-linear regression models, link the dependent variable to the independent variables.
廣義上使用回歸分析。 然而,它著重于量化與自變量調(diào)整相關(guān)的因變量的變化。 由于所有線性或非線性回歸模型都將因變量鏈接到自變量。
Now, take your thoughts on Twitter and Linkedin! Agree or disagree with Saurav Singla ideas and examples? Want to tell us your story? Tweet @SauravSingla_08 and Comment Saurav_Singla right now!
現(xiàn)在,在Twitter和Linkedin上發(fā)表您的想法! 同意還是不同意Saurav Singla的想法和例子? 想告訴我們您的故事嗎? 發(fā)推文@ SauravSingla_08和評(píng)論Saurav_Singla現(xiàn)在!
翻譯自: https://medium.com/swlh/isnt-linear-regression-for-machine-learning-d31543f49181
batch lr替代關(guān)系
總結(jié)
以上是生活随笔為你收集整理的batch lr替代关系_建立关系的替代方法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 算法对算法!斯坦福大学推出DetectG
- 下一篇: ai/ml_您本周应阅读的有趣的AI /