[Kaggle] Housing Prices 房价预测
生活随笔
收集整理的這篇文章主要介紹了
[Kaggle] Housing Prices 房价预测
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
文章目錄
- 1. Baseline
- 1. 特征選擇
- 2. 異常值剔除
- 3. 建模預測
- 2. 待優化特征工程
房價預測 kaggle 地址
參考文章:kaggle比賽:房價預測(排名前4%)
1. Baseline
import numpy as np import pandas as pd %matplotlib inline import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.model_selection import StratifiedShuffleSplit from sklearn.impute import SimpleImputer from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import LabelBinarizer from sklearn.base import BaseEstimator, TransformerMixin from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.pipeline import FeatureUnion from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score train = pd.read_csv("./train.csv") test = pd.read_csv("./test.csv") # RangeIndex: 1460 entries, 0 to 1459 # Data columns (total 81 columns):1. 特征選擇
- 數據有79個特征,我們選出相關系數最高的10個
最相關的特征 ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', otalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd']
2. 異常值剔除
- 部分數據異常,刪除
- 年份數據作為文字變量
3. 建模預測
prepare_select_and_predict_pipeline = Pipeline([('preparation', full_pipeline),('forst_reg', RandomForestRegressor(random_state=0)) ]) param_grid = [{'preparation__num_pipeline__imputer__strategy': ['mean', 'median', 'most_frequent'],'forst_reg__n_estimators' : [50,100, 150, 200,250,300,330,350],'forst_reg__max_features':[45,50, 55, 65] }]grid_search_prep = GridSearchCV(prepare_select_and_predict_pipeline, param_grid, cv=7,scoring='neg_mean_squared_error', verbose=2, n_jobs=-1) grid_search_prep.fit(X_train,y_train) grid_search_prep.best_params_ final_model = grid_search_prep.best_estimator_ y_pred_test = final_model.predict(X_test) result = pd.DataFrame() result['Id'] = test['Id'] result['SalePrice'] = y_pred_test result.to_csv('housing_price_10_features.csv',index=False)
得分:19154.16762
2. 待優化特征工程
待學習 My Top 1% Approach: EDA, New Models and Stacking
總結
以上是生活随笔為你收集整理的[Kaggle] Housing Prices 房价预测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LeetCode MySQL 1501.
- 下一篇: LeetCode 410. 分割数组的最