當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

「Tensorflow」TensorFlow基本使用步骤——以线性回归为练习

發(fā)布時(shí)間：2024/9/27 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了「Tensorflow」TensorFlow基本使用步骤——以线性回归为练习小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

前期準(zhǔn)備

加載必要的庫(kù)

from __future__ import print_functionimport mathfrom IPython import display from matplotlib import cm from matplotlib import gridspec from matplotlib import pyplot as plt import numpy as np import pandas as pd from sklearn import metrics import tensorflow as tf from tensorflow.python.data import Datasettf.logging.set_verbosity(tf.logging.ERROR) pd.options.display.max_rows = 10 pd.options.display.float_format = '{:.1f}'.format

加載數(shù)據(jù)集

california_housing_dataframe = pd.read_csv("https://download.mlcc.google.cn/mledu-datasets/california_housing_train.csv", sep=",")

對(duì)數(shù)據(jù)進(jìn)行隨機(jī)化處理，以確保不會(huì)出現(xiàn)任何病態(tài)排序結(jié)果（可能會(huì)損害隨機(jī)梯度下降法的效果）。此外，將 median_house_value 調(diào)整為以千為單位，這樣，模型就能夠以常用范圍內(nèi)的學(xué)習(xí)速率較為輕松地學(xué)習(xí)這些數(shù)據(jù)。

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index)) california_housing_dataframe["median_house_value"] /= 1000.0 california_housing_dataframe

output would be like this:

檢查數(shù)據(jù)

使用數(shù)據(jù)前利用california_housing_dataframe.describe()對(duì)數(shù)據(jù)進(jìn)行統(tǒng)計(jì)處理，得到關(guān)于各列的一些實(shí)用統(tǒng)計(jì)信息快速摘要：樣本數(shù)、均值、標(biāo)準(zhǔn)偏差、最大值、最小值和各種分位數(shù)。

california_housing_dataframe.describe()

output would be like this:

開(kāi)始構(gòu)建第一個(gè)模型

練習(xí)目標(biāo)是嘗試預(yù)測(cè)median_house_value的值，使用total_rooms作為輸入特征。
為了訓(xùn)練模型，這里使用TensorFlow Estimator API 提供的LinearRegressor接口。此 API 負(fù)責(zé)處理大量低級(jí)別模型搭建工作，并會(huì)提供執(zhí)行模型訓(xùn)練、評(píng)估和推理的便利方法。

定義特征并配置特征列

為了將訓(xùn)練數(shù)據(jù)導(dǎo)入 TensorFlow，需要指定每個(gè)特征包含的數(shù)據(jù)類型。主要使用以下兩類數(shù)據(jù)：

分類數(shù)據(jù)，文字型數(shù)據(jù)，不包含任何分類特征，包括一些無(wú)用的文字或修飾詞。
數(shù)值數(shù)據(jù)，數(shù)值型數(shù)據(jù)（整數(shù)或者浮點(diǎn)）。
此時(shí)的輸入數(shù)值特征為total_rooms，下面的代碼會(huì)從california_housing_dataframe中提取total_rooms數(shù)據(jù)，并使用numeric_column 來(lái)定義特征列，這樣會(huì)將其數(shù)據(jù)指定為數(shù)值：

# Define the input feature: total_rooms. my_feature = california_housing_dataframe[["total_rooms"]]# Configure a numeric feature column for total_rooms. feature_columns = [tf.feature_column.numeric_column("total_rooms")]

注意：total_rooms數(shù)據(jù)的形狀是一維數(shù)組（每個(gè)街區(qū)的房間總數(shù)列表）。這是 numeric_column 的默認(rèn)形狀，因此我們不必將其作為參數(shù)傳遞。

定義目標(biāo)

定義目標(biāo)，即定義median_housing_dataframe，可以從 california_housing_dataframe 中提取它：

# Define the label. targets = california_housing_dataframe["median_house_value"]

配置LinearRegressor

使用LinearRegressor配置線性回歸模型，使用GradientDescenOptimizer（能實(shí)現(xiàn)小批量隨機(jī)梯度下降法（SGD））訓(xùn)練該模型，learning_rate參數(shù)課控制梯度步長(zhǎng)的大小。

接下來(lái)，我們將使用 LinearRegressor 配置線性回歸模型，并使用 GradientDescentOptimizer（它會(huì)實(shí)現(xiàn)小批量隨機(jī)梯度下降法 (SGD)）訓(xùn)練該模型。learning_rate 參數(shù)可控制梯度步長(zhǎng)的大小。

注意：為了安全起見(jiàn)，還可以通過(guò) clip_gradients_by_norm 將梯度剪裁應(yīng)用到優(yōu)化器。梯度裁剪可確保梯度大小在訓(xùn)練期間不會(huì)變得過(guò)大，梯度過(guò)大會(huì)導(dǎo)致梯度下降法失敗。

# Use gradient descent as the optimizer for training the model. my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0000001) my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)# Configure the linear regression model with our feature columns and optimizer. # Set a learning rate of 0.0000001 for Gradient Descent. linear_regressor = tf.estimator.LinearRegressor(feature_columns=feature_columns,optimizer=my_optimizer )

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

定義輸入函數(shù)

要將據(jù)導(dǎo)入 LinearRegressor，需要定義一個(gè)輸入函數(shù)，讓它告訴 TensorFlow 如何對(duì)數(shù)據(jù)進(jìn)行預(yù)處理，以及在模型訓(xùn)練期間如何批處理、隨機(jī)處理和重復(fù)數(shù)據(jù)。
首先，將 Pandas 特征數(shù)據(jù)轉(zhuǎn)換成 NumPy 數(shù)組字典。然后，使用 TensorFlow Dataset API 根據(jù)數(shù)據(jù)來(lái)構(gòu)建 Dataset 對(duì)象，并將數(shù)據(jù)拆分成大小為 batch_size 的多批數(shù)據(jù)，以按照指定周期數(shù) (num_epochs) 進(jìn)行重復(fù)。

注意：如果將默認(rèn)值 num_epochs=None 傳遞到 repeat()，輸入數(shù)據(jù)會(huì)無(wú)限期重復(fù)。

然后，如果 shuffle 設(shè)置為 True，則會(huì)對(duì)數(shù)據(jù)進(jìn)行隨機(jī)處理，以便數(shù)據(jù)在訓(xùn)練期間以隨機(jī)方式傳遞到模型。buffer_size 參數(shù)會(huì)指定 shuffle 將從中隨機(jī)抽樣的數(shù)據(jù)集的大小。

最后，輸入函數(shù)會(huì)為該數(shù)據(jù)集構(gòu)建一個(gè)迭代器，并向 LinearRegressor 返回下一批數(shù)據(jù)。

訓(xùn)練模型

在 linear_regressor 上調(diào)用 train() 來(lái)訓(xùn)練模型。將 my_input_fn 封裝在 lambda 中，以便可以將 my_feature 和 target 作為參數(shù)傳入（有關(guān)詳情，請(qǐng)參閱 TensorFlow 輸入函數(shù)教程），首先訓(xùn)練 100 步。

_ = linear_regressor.train(input_fn = lambda:my_input_fn(my_feature, targets),steps=100 )

評(píng)估模型

基于訓(xùn)練數(shù)據(jù)做一次預(yù)測(cè)，看模型在訓(xùn)練期間與這些數(shù)據(jù)的擬合情況。
注意：訓(xùn)練誤差可以衡量模型與訓(xùn)練數(shù)據(jù)的擬合情況，但并不能衡量模型泛化到新數(shù)據(jù)的效果。

…不想搬了，待續(xù)

總結(jié)

以上是生活随笔為你收集整理的「Tensorflow」TensorFlow基本使用步骤——以线性回归为练习的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：玉米大碴粥发苦怎么回事
下一篇： 'gbk' codec can't de