當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

java 回归遍历_回归基础：代码遍历

發布時間：2023/12/15 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 java 回归遍历_回归基础：代码遍历小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

java 回歸遍歷

This article guides you through the basics of regression by showing code and thorough explanations of a full data project using a Kaggle used car dataset. The project utilizes Linear Regression, Ridge CV, Lasso CV, and Elastic Net CV models to predict sale price. Full code is available on Github.

本文通過顯示代碼和使用Kaggle二手車數據集的完整數據項目的詳盡解釋，引導您了解回歸的基礎知識。該項目利用線性回歸，Ridge CV，Lasso CV和Elastic Net CV模型來預測銷售價格。完整代碼可在Github上獲得。

設定 (Getting Set Up)

The used car dataset is available for download from Kaggle as a CSV file. The information is provided by cardekho, an Indian used car website. Once you have downloaded the CSV file, you can utilize the Pandas library to view and analyze the data.

二手車數據集可作為CSV文件從Kaggle下載。該信息由印度二手車網站cardekho提供。下載CSV文件后，即可使用熊貓庫以查看和分析數據。

The file path inside the quotes will be different depending on where you save the file. On a Mac you can find the file path by right clicking on the file and holding the Option key. An option to ‘Copy ‘file.csv’ as Pathname’ should appear, and then you can paste this in the brackets like I have below.

引號內的文件路徑將根據保存文件的位置而有所不同。在Mac上，您可以通過右鍵單擊文件并按住Option鍵找到文件路徑。應該出現“將'file.csv'復制為路徑名”的選項，然后您可以將其粘貼到下面的方括號中。

import pandas as pd# upload data from csvcar = pd.read_csv('/Users/Jewel/Desktop/Car-Price-Prediction/car details.csv')

Now the CSV is saved as a Pandas DataFrame. To see what kind of data we are working with, we can use the code below to view the first 5 rows of the data frame that we just created. If we want to see a specific amount of rows, we can put a number in the brackets to view that many rows (cars.head(10) would show the first 10 rows).

現在，CSV被保存為Pandas DataFrame。要查看我們正在使用哪種數據，我們可以使用下面的代碼查看我們剛創建的數據框的前5行。如果要查看特定數量的行，可以在括號中放置一個數字以查看有多少行(cars.head(10)將顯示前10行)。

car.head()

This allows us to see what the data frame contains, the columns and some examples of the used car details included. Each row is a unique car, and the columns represent the different features of the car. In this project we will use the features to predict the ‘selling_price’ of each car. To do this, we will use regression, but first we should explore the data and clean it as needed.

這使我們能夠查看數據框包含的內容，列以及所用二手車詳細信息的一些示例。每行都是唯一的汽車，列代表汽車的不同特征。在這個項目中，我們將使用這些功能預測每輛汽車的“ selling_price”。為此，我們將使用回歸，但是首先我們應該探索數據并根據需要進行清理。

探索性數據分析和清理 (Exploratory Data Analysis & Cleaning)

1.檢查缺失值 (1. Check for Missing Values)

car.isnull().sum()

There is not any missing values! Yay. If there is a missing value, it is difficult to model the information correctly. Usually there are lots of missing values in datasets, and in order to model the data we will either delete the row with the missing values or replace that instance with another value (mean, previous value…).

沒有任何缺失的值！好極了。如果缺少值，則很難正確建模信息。通常，數據集中有很多缺失值，為了對數據建模，我們將刪除具有缺失值的行，或者將該實例替換為另一個值(均值，先前值……)。

2.檢查數據類型 (2. Check Data Types)

The next step in exploring the data is to see what types of data are stored in the columns. This is helpful for noticing if a column that appears to be filled with numbers is incorrectly coded as an ‘object’. If this is the case you can easily change the data type so that the computer correctly understands the information you are providing.

探索數據的下一步是查看列中存儲了哪些類型的數據。這有助于通知看似由數字填充的列是否被錯誤地編碼為“對象”。在這種情況下，您可以輕松更改數據類型，以使計算機正確理解您所提供的信息。

car.dtypes

float = numbers with decimals (1.678)int = integer or whole number without decimals (1, 2, 3)obj = object, string, or words (‘hello’)The 64 after these data types refers to how many bits of storage the value occupies. You will often seen 32 or 64.

float =帶小數的數字(1.678) int =整數或不帶小數的整數(1、2、3) obj =對象，字符串或單詞('hello')這些數據類型后的64表示存儲了多少位價值占有。您經常會看到32或64。

The data types all look correct based on the first few rows of the data frame we have seen so far.

根據到目前為止我們看到的數據幀的前幾行，所有數據類型看起來都是正確的。

4.數據概述 (4. Data Overview)

car.describe()

This table gives us an overview of statistical information regarding the numerical data columns. Since only three of the columns are integers or floats, we only see these three listed on the chart. If the numerical data had incorrectly been coded as ‘objects’ we would not be able to view the statistical information on the columns.

下表為我們提供了有關數字數據列的統計信息的概述。由于只有三列是整數或浮點數，因此我們僅在圖表上看到這三列。如果數字數據被錯誤地編碼為“對象”，我們將無法查看列上的統計信息。

Using this table, we can see different values such as the mean, min, max and standard deviation. This table is useful for giving a quick overview of the data in the columns, and may allow us to identify outliers. For instance, when I was first looking at this table I thought that an average price of $500,000 for a used car was incredibly high. But then I looked at the data source and realized that the data is for an Indian company, and 500,000 rupees is about $6,600. Much more reasonable!

使用此表，我們可以看到不同的值，例如平均值，最小值，最大值和標準偏差。該表可用于快速概述各列中的數據，并可以使我們識別異常值。例如，當我第一次看這張桌子時，我認為二手車的平ASP格為500,000美元，這是令人難以置信的高價。但是后來我查看了數據源，發現數據是針對一家印度公司的，500,000盧比約合6,600美元。更合理！

Another thing that stood out from this table is that the minimum value for km_driven was 1 kilometer. That seemed very low for a used car, so I wanted to investigate this car further. To do this, we can sort the data frame by lowest to highest number of kilometers driven.

從該表中脫穎而出的另一件事是km_driven的最小值為1公里。對于二手車來說，這似乎非常低，所以我想進一步調查這輛車。為此，我們可以按行駛的最低到最高公里數對數據幀進行排序。

car.sort_values(by = 'km_driven')

The car at the top is the one with 1km on it, and since the car has had at least two owners and is from 2014, it does not seem realistic that it would only have 1 kilometer on it. This may be an outlier, or the data could have been entered incorrectly, so to be safe I will delete this row.

頂部的車是上面有1公里的那輛車，并且由于該車至少有兩個所有者，并且是從2014年開始的，所以看起來只有1公里是不現實的。這可能是一個異常值，或者輸入的數據可能不正確，為安全起見，我將刪除此行。

car.drop([1312], inplace = True)
#1312 is the index of the row (which can be seen all the way on the left of the first row

5.可視化關聯 (5. Visualize Correlations)

Using the Seaborn library, we can also visualize the correlations between different features. The correlations will only register for the columns with numerical data, but it is still helpful to investigate correlations the relationships between these features.

使用Seaborn庫，我們還可以可視化不同特征之間的相關性。相關將只為帶有數值數據的列注冊，但是研究這些特征之間的關系仍然有幫助。

import seaborn as snssns.heatmap(car.corr(), annot= True);#annot = True shows the correlation values in the squaresA positive correlation approaching 1 tells us that the two features have a positive linear relationship (as one goes up, so does the other). A negative correlation approaching -1 indicates the two have a negative linear relationship (as one increases, the other decreases).接近1的正相關告訴我們這兩個特征具有正線性關系(隨著一個上升，另一個上升)。接近-1的負相關表示兩者具有負線性關系(隨著一個線性關系的增大，另一個線性關系的減小)。

The information we gain from this correlation chart makes sense intuitively, and may confirm some of our beliefs — which is always good to see. Year and selling price are relatively positively correlated, so as year increases so will the sale price. Year and kilometers driven are negatively correlated, so as the car gets older there will be more kilometers driven on it.

我們從此相關圖表中獲得的信息在直覺上是有道理的，并且可以證實我們的某些信念-總是很容易看到。年和銷售價格是相對正相關的，因此，隨著年的增長，銷售價格也會增加。行駛的年數和公里數呈負相關，因此，隨著汽車的老化，行駛的里程數也將增加。

特征工程 (Feature Engineering)

Now that we have taken a closer look at the data and the statistics that surround it, we can prepare our data for modeling.

現在，我們已經仔細研究了數據及其周圍的統計信息，我們可以為建模準備數據。

The ‘name’ column in the data frame is very specific, and when we are modeling on a relatively limited data set it is sometimes better to be more vague. This allows the model to make predictions based on more past samples. To investigate the variety of car names in the data, we can use the code below to count how many of each car name is included.

數據框中的“名稱”列非常具體，當我們在相對有限的數據集上建模時，有時最好變得更加模糊。這使模型可以根據更多過去的樣本進行預測。為了調查數據中各種汽車名稱，我們可以使用下面的代碼來計算每個汽車名稱中包含多少個汽車名稱。

car['name'].value_counts()

There are over 1400 different car names included, and many have less that 30 examples. Our model will have a hard time predicting sale price if there is only a couple examples of each car. In order to make this a more general feature, we can just include the brand of the car. Luckily the brand name is the first word in each of the rows, so we can create a new feature with just the car brand names.

其中包括1400多種不同的汽車名稱，其中許多示例不足30個。如果每輛車只有幾個示例，我們的模型將很難預測銷售價格。為了使它成為更通用的功能，我們可以只包含汽車的品牌。幸運的是，品牌名稱是每一行的第一個單詞，因此我們可以僅使用汽車品牌名稱來創建新功能。

#make an empty list to append new names tobrand_name = []
for x in car.name:
y = x.split(' ')
brand_name.append(y[0]) #append only the first word to the list#we can drop the previous column that had the full name
car = car.drop(['name'], axis = 1)#and add the new column that just has the brand name
car['brand_name'] = brand_name#now let's check how many of each brand is in the column
car.brand_name.value_counts()

Since this column now just has the brand name there is a lot less than 1400 different values, and many have over 30 different cars. At this point you could still choose to delete the rows that have car brands that only have 1 or 2 cars (Force, Isuzu, Kian, Daewoo…), but I will leave these in for now. We have already limited the variety dramatically so our model should be stronger now.

由于此列現在僅具有品牌名稱，因此存在少于1400個不同的值，并且許多具有30多種不同的汽車。此時，您仍然可以選擇刪除只有1或2輛汽車品牌的行(Force，Isuzu，Kian，Daewoo…)，但我現在將其保留。我們已經大大限制了品種，因此我們的模型現在應該更強大。

二值化特征 (Binarize the Features)

The last step before we can model on the features is to create binary columns for the features that are not numerical. It is hard for the computer to understand the difference in meaning between all the car names, so binarizing simply tells the computer “yes this is a Volvo,” or “no, this is not a Volvo.”

我們可以對特征建模的最后一步是為非數字特征創建二進制列。計算機很難理解所有汽車名稱之間的含義差異，因此二值化只是告訴計算機“是，這是沃爾沃”，還是“否，這不是沃爾沃”。

Binarizing creates a specific column for each car brand, and then each row will have a 0 (not that car brand) or a 1 (it is that car brand). This process of creating binary columns for each feature is also called ‘dummifying’ the variable, and can be done easily with code in Pandas.

二值化為每個汽車品牌創建一個特定的列，然后每一行將具有0(不是該汽車品牌)或1(是該汽車品牌)。為每個功能創建二進制列的過程也稱為“復制”變量，并且可以使用Pandas中的代碼輕松完成。

car_dummies = pd.get_dummies(car, drop_first = True)
#set this equal to a new variable since it will be a different data set
#dropping the first column just removes the redundancy of having all the columns therecar_dummies.head()

With the dummy/binary columns, the data frame now looks like this — filled with columns of 0s and 1s, except for the three numerical columns. Although there are many more columns, this process makes it possible for the model to understand the information you are providing it with.

使用虛擬/二進制列，數據框現在看起來像這樣-填充了0和1的列，除了三個數字列。盡管還有更多的列，但是此過程使模型可以理解所提供的信息。

造型 (Modeling)

Now (finally) we can model our data to predict the sale price of these used cars.

現在(最終)，我們可以對我們的數據進行建模，以預測這些二手車的銷售價格。

分割目標變量和預測變量 (Split Target & Predictor Variables)

First, we need to define our X and y variables. Y is what we are predicting (sale price) and X is everything we are using to help us make this prediction.

首先，我們需要定義X和y變量。 Y是我們預測的價格(銷售價格)，X是我們用來幫助??進行此預測的所有信息。

X = car_dummies.copy()y = X.pop('selling_price')
#.pop() removes the column/list from X and saves it to the new variable

訓練與測試拆分 (Train & Test Split)

Next we need to create a training and test group for our model. We can use the code below to randomly select 70% of the data as the train group for our model to train on, and 30% will remain as our test group to test the quality of our model.

接下來，我們需要為我們的模型創建一個培訓和測試小組。我們可以使用下面的代碼隨機選擇70％的數據作為模型進行訓練的訓練組，而剩下的30％將作為測試組來測試模型的質量。

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=1)

標準化X值 (Standardize X Values)

Finally, we will need to standardize all the X values so that they are on a consistent scale. This will not change the proportionate relationship between the numbers.

最后，我們將需要標準化所有X值，以使它們處于一致的范圍內。這不會改變數字之間的比例關系。

from sklearn.preprocessing import StandardScalerscaler = StandardScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)
#you fit_transform on your train data only because you don't want your model to be influenced in any way by the test data. The test data acts as unseen, brand new data to test the quality of the model.X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)
#you only transform the test data so you can conform it to the parameters set with the mean from the training data

線性回歸 (Linear Regression)

The first model we will try is a simple Linear Regression. It is important to assess the cross validation score, training score, and test score to reflect on how the model is performing.

我們將嘗試的第一個模型是簡單的線性回歸。評估交叉驗證分數，訓練分數和測試分數以反映模型的性能非常重要。

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score#create a model instance
lr_model = LinearRegression()#fit the model on the training data
lr_model.fit(X_train, y_train)# get cross validated scores
scores = cross_val_score(lr_model, X_train, y_train, cv=5)
print("Cross-validated training scores:", scores)
print("Mean cross-validated training score:", scores.mean())
#training score
print("Training Score:", lr_model.score(X_train, y_train))
# evaluate the data on the test set
print("Test Score:", lr_model.score(X_test, y_test))Linear Regression Scores線性回歸分數

Since the training score was much better than the test, it indicates that the model was overfitting to the training data. This means that the model has a very hard time predicting on unseen data — not good! Also, the mean cross validation score is a very large negative number. In theory we want all three of these scores to be as close to 1.0 as possible, but since the scores are so bad for this model, we can try some other regression models that also regularize the data.

由于訓練得分遠勝于測試，因此表明該模型過度適合訓練數據。這意味著該模型很難預測看不見的數據-不好！而且，平均交叉驗證得分是一個非常大的負數。從理論上講，我們希望所有這三個分數都盡可能接近1.0，但由于該分數對于該模型非常不利，因此我們可以嘗試使用其他一些回歸模型來對數據進行正則化。

里奇簡歷 (Ridge CV)

Ridge regression is one way to regularize the variables, and is often helpful when dealing with collinearity. Ridge is typically useful when there is a large number of features, and many/all of them impact the target variable in a similar strength. When doing any regression problem it is good practice to try out any and all of these models to see what performs the best. With the RidgeCV model we can also set a range of alpha values to try and the model will chose the best.

Ridge回歸是正則化變量的一種方法，在處理共線性時通常很有用。當存在大量特征，并且許多/所有特征以相似的強度影響目標變量時，Ridge通常非常有用。在做任何回歸問題時，最好嘗試所有這些模型，看看哪種模型效果最好。使用RidgeCV模型，我們還可以設置一定范圍的alpha值來嘗試，模型將選擇最佳的。

from sklearn.linear_model import RidgeCV
import numpy as np# create a RidgeCV model instance
ridge_model = RidgeCV(alphas=np.logspace(-10, 10, 30), cv=5)
# fit the model
ridge_model.fit(X_train, y_train)#mean cv score on training data
scores = cross_val_score(ridge_model, X_train, y_train, cv=5)print("Cross-validated training scores:", scores)
print("Mean cross-validated training score:", scores.mean())
#training score
print("Training Score:", ridge_model.score(X_train, y_train))
# evaluate the data on the test set
print("Test Score:", ridge_model.score(X_test, y_test))Ridge CV Scores里奇簡歷得分

Now all three scores are roughly the same, within a few decimals. Since these scores are much better than the Linear Regression scores, so we can assume that the regularization was helpful for modeling.

現在，所有三個分數都大致相同，只有幾位小數。由于這些分數比線性回歸分數好得多，因此我們可以假設正則化有助于建模。

套索簡歷 (Lasso CV)

Another way we can regularize our features is by using Lasso. This regularization is often helpful for reducing collinearity when there are many features that have almost no impact on the target variable. Lasso will level these out to zero, and only keep the features that strongly impact the predictions. Again, it is always best to try all models and see which ends up working best for your model.

我們可以規范化功能的另一種方法是使用套索。當許多特征幾乎對目標變量沒有影響時，這種正則化通常有助于減少共線性。套索會將這些值平整為零，并且僅保留對預測有重大影響的特征。同樣，總是最好嘗試所有模型，看看哪種模型最適合您的模型。

from sklearn.linear_model import LassoCV# create a LassoCV model instance
lasso_model = LassoCV(eps= [.0001, .001, .01, .1], alphas=np.logspace(-8, 8, 20), max_iter = 1000000, cv=5)
# fit the model
lasso_model.fit(X_train, y_train)# evaluate on the training set
training_score = lasso_model.score(X_train, y_train)
# evaluate on the test set
test_score = lasso_model.score(X_test, y_test)#mean cv score on training data
scores = cross_val_score(lasso_model, X_train, y_train, cv=5)print("Cross-validated training scores:", scores)
print("Mean cross-validated training score:", scores.mean())
#training score
print("Training Score:", lasso_model.score(X_train, y_train))
# evaluate the data on the test set
print("Test Score:", lasso_model.score(X_test, y_test))Lasso CV Score套索簡歷得分

The Lasso scores are very similar to Ridge, but the mean CV score is slightly higher. A good metric for comparing models is either the mean CV or the test score.

套索分數與里奇非常相似，但平均簡歷分數略高。比較模型的一個好的指標是平均CV或測試分數。

彈性網簡歷 (Elastic Net CV)

The last model we will test is Elastic Net CV, which creates a combination of Lasso and Ridge regularizations.

我們將測試的最后一個模型是Elastic Net CV，它創建了Lasso和Ridge正則化的組合。

#Elastic net model with scores
from sklearn.linear_model import ElasticNetCVenet_model = ElasticNetCV(alphas=np.logspace(-4, 4, 10),
l1_ratio=np.array([.1, .5, .7, .9, .95, .99, 1]),
max_iter = 100000,
cv=5)
# fit the model
enet_model.fit(X_train, y_train)# evaluate on the training set
training_score = enet_model.score(X_train, y_train)
# evaluate on the test set
test_score = enet_model.score(X_test, y_test)#mean cv score on training data
scores = cross_val_score(enet_model, X_train, y_train, cv=5)print("Cross-validated training scores:", scores)
print("Mean cross-validated training score:", scores.mean())
print()
#training score
print("Training Score:", enet_model.score(X_train, y_train))
# evaluate the data on the test set
print("Test Score:", enet_model.score(X_test, y_test))Elastic Net CV Scores彈性凈簡歷得分

All three of the regularized models scored relatively similar, but by a small margin (.001), Lasso CV performed the best when comparing mean cross validation scores. Let’s look a bit closer at the Lasso model and how it made its predictions.

這三個正則化模型的得分都相對相似，但是在比較平均交叉驗證得分時，Lasso CV的表現最好(0.001)。讓我們仔細看看Lasso模型及其預測方法。

看最好的模型 (Looking at the Best Model)

功能重要性 (Feature Importance)

Viewing the coefficients can show us which features had the most (or least) impact on how the model made predictions. The chart below shows the features that had the largest positive impact on the sale price of a car. For instance, being a BMW, Mercedes, or Audi would cause the sale price to increase — as would being a newer car. By changing the ascending to ‘False’, we could also view the features that negatively impact price.

查看系數可以向我們展示哪些特征對模型的預測方式影響最大(或最小)。下圖顯示了對汽車售價產生最大正面影響的功能。例如，寶馬，梅賽德斯或奧迪將導致銷售價格上漲，而新車也將引起銷售價格上漲。通過將升序更改為“假”，我們還可以查看對價格產生負面影響的功能。

fi = pd.DataFrame({
'feature': X_train.columns,
'importance': lasso_model.coef_
})fi.sort_values('importance', ascending=True, inplace=True)#sns.set_style('ticks')
sns.set(font_scale = 2)
fig, ax = plt.subplots()
# the size of A4 paper
fig.set_size_inches(16, 12)
sns.barplot(x='importance', y='feature', data=fi[-15:], orient='h', palette = 'rocket', saturation=0.7)
ax.set_title("Feature Importance", fontsize=40, y=1.01)
ax.set_xlabel('Importance', fontsize = 30)
ax.set_ylabel('Feature', fontsize = 30)Lasso Positive Feature Importance套索的正面特征重要性

預測與殘差 (Predictions & Residuals)

Another way we could evaluate our model is by looking at what the model predicted compared to what the actual value was. This shows us where our model is making mistakes, and how wrong its predictions were. We can view this in a data frame, or we can transfer this information to a graph that contrasts the actual and predicted values.

我們評估模型的另一種方法是查看模型預測的值與實際值的比較。這向我們展示了我們的模型在哪里出錯，以及它的預測有多錯誤。我們可以在數據框中查看此信息，也可以將這些信息傳輸到對比實際值和預測值的圖形中。

predictions = lasso_model.predict(X_test)
residuals_df = pd.DataFrame(predictions, y_test)
residuals_df.reset_index(inplace = True)
residuals_df.rename({'selling_price': 'actual', 0: 'predictions'}, axis = 1, inplace = True)
residuals_df['residuals'] = residuals_df.actual - residuals_df.predictions
residuals_dfDataFrame of actual and predicted values, and residuals (actual — predicted)實際和預測值以及殘差(實際-預測)的DataFrame #predicted y values
predictions = lasso_model.predict(X_test)#residuals (or error between predictions and actual)
residuals = y_test - predictionssns.axes_style(style='white')sns.set(font_scale = 2)
fig, ax = plt.subplots()
fig.set_size_inches(16, 12)
ax = sns.regplot(x="predictions", y="actual", data= residuals_df, scatter_kws = {'color': 'lightsalmon'},
line_kws = {'color': 'darksalmon'})
ax.set_xlabel('Predicted', fontsize = 30)
ax.set_ylabel('Actual', fontsize = 30)

From these values we can see that our model does a decent job of creating accurate predictions, but there are many outliers that don’t seem to conform to our model.

從這些值可以看出，我們的模型在創建準確的預測方面做得很好，但是有許多離群值似乎與我們的模型不符。

均方根誤差 (Root Mean Squared Error)

One last way we can evaluate our model’s performance is by calculating the root MSE. This is the sum of all the squared residuals, and then the square root of that value. This tells us how far off our predictions are on average.

評估模型性能的最后一種方法是計算根MSE。這是所有殘差平方的總和，然后是該值的平方根。這告訴我們平均預測值有多遠。

from sklearn.metrics import mean_squared_error
(mean_squared_error(y_test, predictions))**0.5

The root MSE for this model was 327,518.584, which would be the equivalent of $4,370. While we always want to minimize this value, being able to predict a used car’s price within about $4,000 is not a bad accomplishment. Using this model, the car company could reasonably price their cars based only on details such as the brand, kilometers driven, fuel type, and year.

該模型的根MSE為327,518.584，相當于$ 4,370。盡管我們一直希望將其價值降至最低，但能夠預測二手車的價格在4,000美元左右并不是一件壞事。使用該模型，汽車公司可以僅根據品牌，行駛公里數，燃油類型和年份等詳細信息對汽車進行合理定價。

結論 (Conclusion)

Hopefully this was a helpful data project walk-though that guided you through some of the fundamentals of EDA, regression, and modeling in Python. Check out the full code on Github for more details.

希望這是一個有用的數據項目，可以指導您完成一些EDA，回歸和Python建模的基礎知識。在Github上查看完整的代碼以獲取更多詳細信息。

If you’re ready for the next step, check out a guided walk-through of the iris data set to learn the basics of classification.

如果您準備好進行下一步，請查看虹膜數據集的導覽，以了解分類的基礎知識。

翻譯自: https://towardsdatascience.com/regression-basics-code-walk-through-c2eac24da2e9

java 回歸遍歷

總結

以上是生活随笔為你收集整理的java 回归遍历_回归基础：代码遍历的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：满分100！老外给《流浪地球2》打分30
下一篇：《生化奇兵》缔造者的创业往事