當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习训练验证测试_测试前验证| 机器学习

發布時間：2025/3/11 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习训练验证测试_测试前验证| 机器学习小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

機器學習訓練驗證測試

In my previous article, we have discussed about the need to train and test our model and we wrote a code to split the given data into training and test sets.

在上一篇文章中，我們討論了訓練和測試模型的必要性，并編寫了代碼將給定的數據分為訓練和測試集。

Before moving to the validation portion, we need to see what is the need to use validation procedure before performing the testing procedure in the given data set. At times when we are dealing with a huge amount of data there is a certain chance that maybe the data used by our model during learning produced a biased result and in this case as we use the test set to check the accuracy of our model the following 2 cases can arise:

在轉到驗證部分之前，我們需要了解在給定數據集中執行測試過程之前，需要使用驗證過程進行哪些操作。有時，當我們處理大量數據時，很有可能我們的模型在學習過程中使用的數據會產生有偏差的結果，在這種情況下，由于我們使用測試集來檢查模型的準確性，因此以下可能出現2種情況：

Under fitting of the test data

測試數據擬合

Over fitting of the test data

測試數據過度擬合

Image source: https://docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

圖片來源： https : //docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

So then how do we deal with such a problem? Well, the answer is pretty simple if we can somehow use a 3^rd data set to validate the results obtained from the training set so that we can adjust the various hyperparameters like learning rate and batch values to get a balanced result on the validation set which will, in turn, increase the accuracy of our model in estimating the target values from the test set.

那么，我們該如何處理這個問題呢？那么，答案很簡單，如果我們能夠以某種方式使用^三檔數據集來驗證訓練組所取得的成果，使我們可以調整各種超參數就像學率和批量值來得到驗證集一個平衡的結果，其反過來，將提高我們的模型從測試集中估算目標值的準確性。

Image source: https://rpubs.com/charlydethibault/348566

圖片來源： https : //rpubs.com/charlydethibault/348566

Here, you can see that the validation set is nothing but a subset of the training data set that we create. Here do remember that when we create a partition from a dataset. The data present in the datasets are shuffled randomly to remove biased results.

在這里，您可以看到驗證集不過是我們創建的訓練數據集的子集。這里要記住，當我們根據數據集創建分區時。數據集中存在的數據會隨機洗牌以消除有偏見的結果。

So, let us write a simple code to create a validation data set in python:

因此，讓我們編寫一個簡單的代碼來在python中創建一個驗證數據集：

File: headbrain.CSV

文件： headbrain.CSV

Here is the code:

這是代碼：

# -*- coding: utf-8 -*- """ Created on Wed Aug 1 22:18:11 2018@author: Raunak Goswami """import numpy as np import pandas as pd import matplotlib.pyplot as plt#reading the data """here the directory of my code and the headbrain.csv file is same make sure both the files are stored in the same folder or directory""" data=pd.read_csv('headbrain.csv')#this will show the first five records of the whole data data.head()#this will create a variable x which has the feature values i.e brain weight x=data.iloc[:,2:3].values #this will create a variable y which has the target value i.e brain weight y=data.iloc[:,3:4].values #splitting the data into training and test """ the following statement written below will split x and y into 2 parts: 1.training variables named x_train and y_train 2.test variables named x_test and y_test The splitting will be done in the ratio of 1:4 as we have mentioned the test_size as 1/4 of the total size """ from sklearn.cross_validation import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)#Here we again split the training data further ##into training and validating sets. #observe that the size of the validating set is #1/4 of the training set and not of the whole dataset from sklearn.cross_validation import train_test_split x_training,x_validate,y_training,y_validate=train_test_split(x_train,y_train,test_size=1/4,random_state=0)

After running this python code on your Spyder tool provided by the Anaconda distribution just cross check your variable explorer:

在Anaconda發行版提供的Spyder工具上運行此python代碼后，只需交叉檢查變量瀏覽器即可：

On the image above you can see that we have split the train variables into training variables and validate variables.

在上圖中，您可以看到我們已將訓練變量分為訓練變量并驗證了變量。

So, guys that is it for today hope you liked this article. Have a great day ahead.

所以，今天的家伙們希望您喜歡這篇文章。祝您有美好的一天。

翻譯自: https://www.includehelp.com/ml-ai/validation-before-testing.aspx

機器學習訓練驗證測試

總結

以上是生活随笔為你收集整理的机器学习训练验证测试_测试前验证| 机器学习的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： treeset java_Java Tr
下一篇：清除元素中的子元素html_HTML中的

编程问答

机器学习 训练验证测试_测试前验证| 机器学习

總結

机器学习训练验证测试_测试前验证| 机器学习