當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

kaggle机器学习作业(房价预测)

發布時間：2024/10/8 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 kaggle机器学习作业(房价预测) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

來源:kaggle

Machine Learning Micro-Course Home Page

Recap

Here’s the code you’ve written so far. Start by running it again.

# Code you have previously used to load data import pandas as pd from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_absolute_error from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor# Path of the file to read. We changed the directory structure to simplify submitting to a competition iowa_file_path = 'train.csv'home_data = pd.read_csv(iowa_file_path) # Create target object and call it y y = home_data.SalePrice # Create X features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd'] X = home_data[features]# Split into validation and training data train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1) # Specify Model iowa_model = DecisionTreeRegressor(random_state=1) # Fit Model iowa_model.fit(train_X, train_y)# Make validation predictions and calculate mean absolute error val_predictions = iowa_model.predict(val_X) val_mae = mean_absolute_error(val_predictions, val_y) print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))# Using best value for max_leaf_nodes iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1) iowa_model.fit(train_X, train_y) val_predictions = iowa_model.predict(val_X) val_mae = mean_absolute_error(val_predictions, val_y) print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))# Define the model. Set random_state to 1 rf_model = RandomForestRegressor(random_state=1) rf_model.fit(train_X, train_y) rf_val_predictions = rf_model.predict(val_X) rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)print("Validation MAE for Random Forest Model: {:,.0f}".format(rf_val_mae)) Validation MAE when not specifying max_leaf_nodes: 29,653 Validation MAE for best value of max_leaf_nodes: 27,283 Validation MAE for Random Forest Model: 22,762

Creating a Model For the Competition

Build a Random Forest model and train it on all of X and y.

# To improve accuracy, create a new Random Forest model which you will train on all training data rf_model_on_full_data = RandomForestRegressor(random_state=1)# fit rf_model_on_full_data on all data from the training data rf_model_on_full_data.fit(train_X,train_y) RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,max_features='auto', max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,oob_score=False, random_state=1, verbose=0, warm_start=False)

Make Predictions

Read the file of “test” data. And apply your model to make predictions

# path to file you will use for predictions test_data_path = 'test.csv'# read test data file using pandas test_data = pd.read_csv(test_data_path)# create test_X which comes from test_data but includes only the columns you used for prediction. # The list of columns is stored in a variable called features test_X = test_data[['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']]# make predictions which we will submit. test_preds = rf_model.predict(test_X) # The lines below shows how to save predictions in format used for competition scoring # Just uncomment them.output = pd.DataFrame({'Id': test_data.Id,'SalePrice': test_preds}) output.to_csv('submission.csv', index=False)

kaggle確實時一個不錯的學習平臺

總結

以上是生活随笔為你收集整理的kaggle机器学习作业(房价预测)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：使用易舒美的塑形美体仪的效果如何？
下一篇： scikit-image基本用法（上）