基于Kaggle的图像分类(CIFAR-10)
基于Kaggle的圖像分類(CIFAR-10)
Image Classification (CIFAR-10) on Kaggle
一直在使用Gluon’s data package數(shù)據(jù)包直接獲得張量格式的圖像數(shù)據(jù)集。然而,在實際應(yīng)用中,圖像數(shù)據(jù)集往往以圖像文件的形式存在。將從原始圖像文件開始,逐步組織、讀取并將文件轉(zhuǎn)換為張量格式。對CIFAR-10數(shù)據(jù)集進行了一個實驗。這是計算機視覺領(lǐng)域的一個重要數(shù)據(jù)集。現(xiàn)在,將應(yīng)用前面幾節(jié)中所學(xué)的知識來參加Kaggle競賽,該競賽解決CIFAR-10圖像分類問題。
比賽的網(wǎng)址是https://www.kaggle.com/c/cifar-10
圖1顯示了比賽網(wǎng)頁上的信息。為了提交結(jié)果,請先在Kaggle網(wǎng)站注冊一個帳戶。
Fig. 1 CIFAR-10 image classification competition webpage
information. The dataset for the competition can be accessed by clicking the “Data” tab.
首先,導(dǎo)入比賽所需的軟件包或模塊。
import collections
from d2l
import mxnet as d2l
import math
from mxnet
import autograd, gluon, init, npx
from mxnet.gluon
import nn
import os
import pandas
as pd
import shutil
import time
npx.set_np()
- Obtaining and Organizing the Dataset
比賽數(shù)據(jù)分為訓(xùn)練集和測試集。訓(xùn)練集包含50000幀圖像。測試集包含30萬幀圖像,其中10000幀圖像用于評分,而其29萬幀包括非評分圖像,以防止手動標記測試集和提交標記結(jié)果。兩個數(shù)據(jù)集中的圖像格式都是PNG,高度和寬度都是32個像素和三個顏色通道(RGB)。圖像覆蓋1010類別:飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車。圖中左上角顯示了數(shù)據(jù)集中飛機、汽車和鳥類的一些圖像。
1.1. Downloading the Dataset
登錄Kaggle后,點擊圖1所示CIFAR-10圖像分類競賽網(wǎng)頁上的“數(shù)據(jù)”選項卡,點擊“全部下載”按鈕下載數(shù)據(jù)集。在…/data中解壓縮下載的文件,并在其中解壓縮train.7z和test.7z之后,將在以下路徑中找到整個數(shù)據(jù)集:
· …/data/cifar-10/train/[1-50000].png
· …/data/cifar-10/test/[1-300000].png
· …/data/cifar-10/trainLabels.csv
· …/data/cifar-10/sampleSubmission.csv
這里的“訓(xùn)練”和“測試”文件夾分別包含訓(xùn)練和測試圖像,trainLabels.csv有訓(xùn)練圖像的標簽和sample_submission.csv是提交的樣本。為了便于入門,提供了一個小規(guī)模的數(shù)據(jù)集示例:包含第一個1000幀訓(xùn)練圖像和55隨機測試圖像。要使用Kaggle競賽的完整數(shù)據(jù)集,需要將以下demo變量設(shè)置為False。
#@save
d2l.DATA_HUB[‘cifar10_tiny’] = (d2l.DATA_URL
‘kaggle_cifar10_tiny.zip’,
‘2068874e4b9a9f0fb07ebe0ad2b29754449ccacd’)
# If you use the full dataset downloaded for the Kaggle
competition, set the
# demo variable to False
demo = True
if demo:
data_dir = d2l.download_extract('cifar10_tiny')
else:
data_dir = ‘…/data/cifar-10/’
1.2. Organizing the Dataset
需要組織數(shù)據(jù)集來促進模型的訓(xùn)練和測試。讓首先從csv文件中讀取標簽。以下函數(shù)返回一個字典,該字典將不帶擴展名的文件名映射到其標簽。
#@save
def read_csv_labels(fname):
"""Read
fname to return a name to label dictionary."""
with open(fname, 'r')
as f:
# Skip the file header line (column name)
lines = f.readlines()[1:]tokens = [l.rstrip().split(',') for l in lines]return
dict(((name, label) for name, label in tokens))
labels = read_csv_labels(os.path.join(data_dir, ‘trainLabels.csv’))
print(’#training examples:’, len(labels))
print(’# classes:’, len(set(labels.values())))
# training examples: 1000
# classes: 10
接下來,定義reorg_train_valid函數(shù)來從原始訓(xùn)練集中分割驗證集。此函數(shù)中的參數(shù)valid_ratio是驗證集中的示例數(shù)與原始訓(xùn)練集中的示例數(shù)的比率。特別是讓n是具有最少示例的類的圖像數(shù),以及r是比率,那么將使用最大值(?nr?,1),每個類的圖像作為驗證集。讓以valid_ratio=0.1為例。從最初的訓(xùn)練開始50000幀圖像,會有45000幀。當調(diào)整超參數(shù)時,用于訓(xùn)練并存儲在路徑“train_valid_test/train”中的圖像,而另一個5000幀圖像將作為驗證集存儲在“train_valid_test/train”路徑中。組織好數(shù)據(jù)后,同一類的圖像將被放在同一個文件夾下,以便以后閱讀。
#@save
def copyfile(filename, target_dir):
"""Copy a file into a target directory."""d2l.mkdir_if_not_exist(target_dir)shutil.copy(filename, target_dir)
#@save
def reorg_train_valid(data_dir, labels, valid_ratio):
# The number of examples of the class with the least examples in the# training datasetn = collections.Counter(labels.values()).most_common()[-1][1]# The number of examples per class for the validation setn_valid_per_label = max(1, math.floor(n * valid_ratio))label_count = {}for train_file in os.listdir(os.path.join(data_dir, 'train')):label = labels[train_file.split('.')[0]]fname = os.path.join(data_dir, 'train', train_file)# Copy to train_valid_test/train_valid with a subfolder per classcopyfile(fname, os.path.join(data_dir, 'train_valid_test','train_valid', label))if label not in label_count or label_count[label] < n_valid_per_label:# Copy to train_valid_test/validcopyfile(fname, os.path.join(data_dir, 'train_valid_test','valid', label))label_count[label] = label_count.get(label, 0) + 1else:# Copy to train_valid_test/traincopyfile(fname, os.path.join(data_dir, 'train_valid_test','train', label))return n_valid_per_label
下面的reorg_test函數(shù)用于組織測試集,以便于預(yù)測期間的讀數(shù)。
#@save
def reorg_test(data_dir):
for test_file in os.listdir(os.path.join(data_dir, 'test')):copyfile(os.path.join(data_dir, 'test', test_file),os.path.join(data_dir, 'train_valid_test', 'test','unknown'))
使用一個函數(shù)來調(diào)用先前定義的read_csv_labels、reorg_train_valid和reorg_test函數(shù)。
def reorg_cifar10_data(data_dir, valid_ratio):
labels = read_csv_labels(os.path.join(data_dir, 'trainLabels.csv'))reorg_train_valid(data_dir, labels, valid_ratio)
reorg_test(data_dir)
只將批量大小設(shè)置1為演示數(shù)據(jù)集。在實際訓(xùn)練和測試過程中,應(yīng)使用Kaggle競賽的完整數(shù)據(jù)集,并將批次大小設(shè)置為更大的整數(shù),例如128。使用10%作為調(diào)整超參數(shù)的驗證集。
batch_size = 1 if demo else 128
valid_ratio = 0.1
reorg_cifar10_data(data_dir, valid_ratio)
- Image Augmentation
為了解決過度擬合的問題,使用圖像增強技術(shù)。例如,通過添加transforms.RandomFlipLeftRight(),圖像可以隨機翻轉(zhuǎn)。還可以使用transforms.Normalize()。下面,將列出其中一些操作,可以根據(jù)需要選擇使用或修改這些操作。
transform_train = gluon.data.vision.transforms.Compose([
# Magnify the image to a square of 40 pixels in both height and widthgluon.data.vision.transforms.Resize(40),# Randomly crop a square image of 40 pixels in both height and width to# produce a small square of 0.64 to 1 times the area of the original# image, and then shrink it to a square of 32 pixels in both height and# widthgluon.data.vision.transforms.RandomResizedCrop(32, scale=(0.64, 1.0),ratio=(1.0, 1.0)),gluon.data.vision.transforms.RandomFlipLeftRight(),gluon.data.vision.transforms.ToTensor(),# Normalize each channel of the imagegluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023, 0.1994, 0.2010])])
為了保證測試過程中輸出的確定性,只對圖像進行歸一化處理。
transform_test = gluon.data.vision.transforms.Compose([
gluon.data.vision.transforms.ToTensor(),gluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023, 0.1994, 0.2010])])
- Reading the Dataset
接下來,可以創(chuàng)建ImageFolderDataset實例來讀取包含原始圖像文件的有組織的數(shù)據(jù)集,其中每個示例都包含圖像和標簽。
train_ds, valid_ds, train_valid_ds, test_ds = [
gluon.data.vision.ImageFolderDataset(os.path.join(data_dir, 'train_valid_test', folder))
for folder in [‘train’, ‘valid’, ‘train_valid’, ‘test’]]
在DataLoader中指定定義的圖像增強操作。在訓(xùn)練過程中,只使用驗證集來評估模型,所以需要確保輸出的確定性。在預(yù)測過程中,將在組合訓(xùn)練集和驗證集上訓(xùn)練模型,以充分利用所有標記數(shù)據(jù)。
train_iter, train_valid_iter = [gluon.data.DataLoader(
dataset.transform_first(transform_train), batch_size, shuffle=True,last_batch='keep') for dataset in (train_ds, train_valid_ds)]
valid_iter, test_iter = [gluon.data.DataLoader(
dataset.transform_first(transform_test), batch_size, shuffle=False,last_batch='keep') for dataset in (valid_ds, test_ds)]
- Defining the Model
基于HybridBlock類構(gòu)建剩余塊,這樣做是為了提高執(zhí)行效率。
class Residual(nn.HybridBlock):
def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):super(Residual, self).__init__(**kwargs)self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1,strides=strides)self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)if use_1x1conv:self.conv3 = nn.Conv2D(num_channels, kernel_size=1,strides=strides)else:self.conv3 = Noneself.bn1 = nn.BatchNorm()self.bn2 = nn.BatchNorm()def hybrid_forward(self, F, X):Y = F.npx.relu(self.bn1(self.conv1(X)))Y = self.bn2(self.conv2(Y))if self.conv3:X = self.conv3(X)return F.npx.relu(Y + X)
定義ResNet-18模型。
def resnet18(num_classes):
net = nn.HybridSequential()net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),nn.BatchNorm(), nn.Activation('relu'))def resnet_block(num_channels, num_residuals, first_block=False):blk = nn.HybridSequential()for i in range(num_residuals):if i == 0 and not first_block:blk.add(Residual(num_channels, use_1x1conv=True, strides=2))else:blk.add(Residual(num_channels))return blknet.add(resnet_block(64, 2, first_block=True),resnet_block(128, 2),resnet_block(256, 2),resnet_block(512, 2))net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))return net
CIFAR-10圖像分類挑戰(zhàn)賽使用10個類別。在訓(xùn)練開始之前,將對模型執(zhí)行Xavier隨機初始化。
def get_net(ctx):
num_classes = 10net = resnet18(num_classes)net.initialize(ctx=ctx, init=init.Xavier())return net
loss = gluon.loss.SoftmaxCrossEntropyLoss()
- Defining the Training Functions
將根據(jù)模型在驗證集上的性能來選擇模型并調(diào)整超參數(shù)。其次,定義了模型訓(xùn)練函數(shù)訓(xùn)練。記錄了每個時代的訓(xùn)練時間,這有助于比較不同模型的時間成本。
def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
lr_decay):trainer = gluon.Trainer(net.collect_params(), 'sgd',{'learning_rate': lr, 'momentum': 0.9, 'wd': wd})for epoch in range(num_epochs):train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()if epoch > 0 and epoch % lr_period == 0:trainer.set_learning_rate(trainer.learning_rate * lr_decay)for X, y in train_iter:y = y.astype('float32').as_in_ctx(ctx)with autograd.record():y_hat = net(X.as_in_ctx(ctx))l = loss(y_hat, y).sum()l.backward()trainer.step(batch_size)train_l_sum += float(l)train_acc_sum += float((y_hat.argmax(axis=1) == y).sum())n += y.sizetime_s = "time %.2f sec" % (time.time() - start)if valid_iter is not None:valid_acc = d2l.evaluate_accuracy_gpu(net, valid_iter)epoch_s = ("epoch %d, loss %f, train acc %f, valid acc %f, "% (epoch + 1, train_l_sum / n, train_acc_sum / n,valid_acc))else:epoch_s = ("epoch %d, loss %f, train acc %f, " %(epoch + 1, train_l_sum / n, train_acc_sum / n))print(epoch_s + time_s + ', lr ' + str(trainer.learning_rate))
- Training and Validating the Model
現(xiàn)在可以對模型進行驗證。可以調(diào)整以下超參數(shù)。例如,可以增加紀元的數(shù)量。由于lr_period和lr_decay分別設(shè)置為80和0.1,因此每80個周期后,優(yōu)化算法的學(xué)習(xí)速率將乘以0.1。為了簡單起見,在這里只訓(xùn)練了一個時代。
ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.1, 5e-4
lr_period, lr_decay, net = 80, 0.1, get_net(ctx)
net.hybridize()
train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
lr_decay)
epoch 1, loss 2.859060, train acc 0.100000, valid acc 0.100000, time 9.51 sec, lr 0.1
- Classifying the Testing Set and Submitting Results on Kaggle
在獲得滿意的模型設(shè)計和超參數(shù)后,使用所有訓(xùn)練數(shù)據(jù)集(包括驗證集)對模型進行再訓(xùn)練并對測試集進行分類。
net, preds = get_net(ctx), []
net.hybridize()
train(net, train_valid_iter, None, num_epochs, lr, wd, ctx, lr_period,
lr_decay)
for X, _ in test_iter:
y_hat = net(X.as_in_ctx(ctx))preds.extend(y_hat.argmax(axis=1).astype(int).asnumpy())
sorted_ids = list(range(1, len(test_ds) + 1))
sorted_ids.sort(key=lambda x: str(x))
df = pd.DataFrame({‘id’: sorted_ids, ‘label’: preds})
df[‘label’] = df[‘label’].apply(lambda x: train_valid_ds.synsets[x])
df.to_csv(‘submission.csv’, index=False)
epoch 1, loss 2.873863, train acc 0.106000, time 9.55 sec, lr 0.1
執(zhí)行上述代碼后,將得到一個“submission.csv “文件。此文件的格式符合Kaggle競賽要求。
- Summary
We can create an ImageFolderDataset instance to read the dataset containing the original image files.
We can use convolutional neural networks, image augmentation, and hybrid programming to take part in an image classification competition.
總結(jié)
以上是生活随笔為你收集整理的基于Kaggle的图像分类(CIFAR-10)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Single Shot Multibox
- 下一篇: 基于区域的CNN(R-CNN)