當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

MMDetection-数据准备

發(fā)布時(shí)間：2024/4/11 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 MMDetection-数据准备小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

簡(jiǎn)介

在本專欄的上一篇文章中，介紹了統(tǒng)籌控制整個(gè)MMDetection工作pipeline的配置文件，它是非常重要的。但是，其實(shí)也可以發(fā)現(xiàn)，配置文件里其實(shí)是一系列的鍵值對(duì)，這些類或者方法的具體定義其實(shí)都在mmdet這個(gè)文件夾下，它是MMDetection的核心。而熟悉PyTorch的都知道，數(shù)據(jù)和模型是整個(gè)工作流的兩個(gè)核心配置，本文和后一篇文章就針對(duì)MMDetection的數(shù)據(jù)準(zhǔn)備和模型展開(kāi)介紹。

數(shù)據(jù)集

數(shù)據(jù)集格式

首先，我們需要知道，目標(biāo)檢測(cè)的兩個(gè)基準(zhǔn)數(shù)據(jù)集（COCO和VOC）使用了兩種不同的數(shù)據(jù)集存放格式（data format），關(guān)于這兩種格式的具體介紹，我這里不細(xì)說(shuō)了，不太了解的可以參考我的博文。而MMDetection其實(shí)已經(jīng)對(duì)幾種常用的數(shù)據(jù)集格式進(jìn)行了PyTorch的Dataset的封裝，COCO和VOC當(dāng)然在其中。

自定義Dataset

熟悉PyTorch的都知道，我們使用沒(méi)有出現(xiàn)過(guò)的自定義數(shù)據(jù)集（或者數(shù)據(jù)集格式），就要構(gòu)建對(duì)應(yīng)的Dataset類，這個(gè)類其實(shí)是對(duì)數(shù)據(jù)集存儲(chǔ)格式解析、標(biāo)注文件讀取、圖像文件讀取等流程定義的一套操作，這些類定義在mmdet文件夾下的datasets目錄下，如mmdet/datasets/voc.py，就是定義了對(duì)PASCAL VOC數(shù)據(jù)集的處理。

下面以VOC數(shù)據(jù)集為例，不妨直接來(lái)看其源碼，內(nèi)容如下（我這里略去了在數(shù)據(jù)集上進(jìn)行mAP評(píng)估的函數(shù)）?？梢钥吹剿闹饕瘮?shù)都繼承自XMLDataset，這是因?yàn)閂OC數(shù)據(jù)集是以XML文件的形式存儲(chǔ)標(biāo)注的，為了復(fù)用單獨(dú)將XMLDataset獨(dú)立出來(lái)。

@DATASETS.register_module() class VOCDataset(XMLDataset):CLASSES = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car','cat', 'chair', 'cow', 'diningtable', 'dog', 'horse','motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train','tvmonitor')def __init__(self, **kwargs):super(VOCDataset, self).__init__(**kwargs)if 'VOC2007' in self.img_prefix:self.year = 2007elif 'VOC2012' in self.img_prefix:self.year = 2012else:raise ValueError('Cannot infer dataset year from img_prefix')

來(lái)看下面這個(gè)XMLDataset的具體實(shí)現(xiàn)，我這里省略部分函數(shù)的具體實(shí)現(xiàn)，其中最為核心的是兩個(gè)函數(shù)load_annotations和get_ann_info，前者讀取所有圖像的id和size信息以列表信息存儲(chǔ)，后者則根據(jù)圖像的id具體來(lái)讀取圖像的所有目標(biāo)邊界框信息和邊界框類別標(biāo)簽。

@DATASETS.register_module() class XMLDataset(CustomDataset):def __init__(self, min_size=None, **kwargs):assert self.CLASSES or kwargs.get('classes', None), 'CLASSES in `XMLDataset` can not be None.'super(XMLDataset, self).__init__(**kwargs)self.cat2label = {cat: i for i, cat in enumerate(self.CLASSES)}self.min_size = min_sizedef load_annotations(self, ann_file):data_infos = []img_ids = mmcv.list_from_file(ann_file)for img_id in img_ids:filename = f'JPEGImages/{img_id}.jpg'xml_path = osp.join(self.img_prefix, 'Annotations',f'{img_id}.xml')tree = ET.parse(xml_path)root = tree.getroot()size = root.find('size')if size is not None:width = int(size.find('width').text)height = int(size.find('height').text)else:img_path = osp.join(self.img_prefix, 'JPEGImages','{}.jpg'.format(img_id))img = Image.open(img_path)width, height = img.sizedata_infos.append(dict(id=img_id, filename=filename, width=width, height=height))return data_infosdef _filter_imgs(self, min_size=32):passdef get_ann_info(self, idx):img_id = self.data_infos[idx]['id']xml_path = osp.join(self.img_prefix, 'Annotations', f'{img_id}.xml')tree = ET.parse(xml_path)root = tree.getroot()bboxes = []labels = []bboxes_ignore = []labels_ignore = []for obj in root.findall('object'):name = obj.find('name').textif name not in self.CLASSES:continuelabel = self.cat2label[name]difficult = int(obj.find('difficult').text)bnd_box = obj.find('bndbox')# TODO: check whether it is necessary to use int# Coordinates may be float typebbox = [int(float(bnd_box.find('xmin').text)),int(float(bnd_box.find('ymin').text)),int(float(bnd_box.find('xmax').text)),int(float(bnd_box.find('ymax').text))]ignore = Falseif self.min_size:assert not self.test_modew = bbox[2] - bbox[0]h = bbox[3] - bbox[1]if w < self.min_size or h < self.min_size:ignore = Trueif difficult or ignore:bboxes_ignore.append(bbox)labels_ignore.append(label)else:bboxes.append(bbox)labels.append(label)if not bboxes:bboxes = np.zeros((0, 4))labels = np.zeros((0, ))else:bboxes = np.array(bboxes, ndmin=2) - 1labels = np.array(labels)if not bboxes_ignore:bboxes_ignore = np.zeros((0, 4))labels_ignore = np.zeros((0, ))else:bboxes_ignore = np.array(bboxes_ignore, ndmin=2) - 1labels_ignore = np.array(labels_ignore)ann = dict(bboxes=bboxes.astype(np.float32),labels=labels.astype(np.int64),bboxes_ignore=bboxes_ignore.astype(np.float32),labels_ignore=labels_ignore.astype(np.int64))return anndef get_cat_ids(self, idx):pass

但是上面兩個(gè)依此繼承的Dataset類并沒(méi)有出現(xiàn)我們PyTorch中最常出現(xiàn)的__getitem__()函數(shù)，它其實(shí)在XMLDataset繼承的CustomDataset中具體實(shí)現(xiàn)了，這個(gè)類也是MMDetection中所有自定義數(shù)據(jù)集類的根源，它其實(shí)定義了通用的目標(biāo)檢測(cè)數(shù)據(jù)集處理的內(nèi)容，包括__getitem__(idx)這個(gè)函數(shù)，按照id去讀取對(duì)應(yīng)的標(biāo)注和圖像（這里的圖像處理會(huì)經(jīng)過(guò)一個(gè)稱為pipeline的處理，在MMDetection中，pipeline定義了一個(gè)序列的對(duì)圖像的處理過(guò)程）。

上面說(shuō)了這么多，又以MMDetection對(duì)VOC數(shù)據(jù)集封裝的Dataset為例簡(jiǎn)要介紹Dataset結(jié)構(gòu)，就是想說(shuō)明，若你要使用全新的數(shù)據(jù)集就需要為其定義符合CustomDataset接口的類，或者將現(xiàn)有數(shù)據(jù)集轉(zhuǎn)換為MMDetection已經(jīng)支持的這幾種格式（如COCO和PASCAL VOC）或中間格式（middle format）。 官方推薦的是這種數(shù)據(jù)集轉(zhuǎn)換的方法，可選離線轉(zhuǎn)換或者在線轉(zhuǎn)換兩種方式，前者的意思是通過(guò)一個(gè)腳本重新組織數(shù)據(jù)集本地文件的格式，后者則是手寫一個(gè)新的Dataset類自定義其中讀取數(shù)據(jù)集部分的代碼，當(dāng)訓(xùn)練時(shí)一批一批的進(jìn)行格式轉(zhuǎn)換。在MMDetection中，推薦離線轉(zhuǎn)為COCO格式，這樣一勞永逸并且用戶只需要修改配置文件中的數(shù)據(jù)集標(biāo)注路徑和類別列表即可，而且，在MMDetection中，只支持COCO格式數(shù)據(jù)集的mask AP評(píng)估。

COCO基本標(biāo)注格式如下，其中的必要項(xiàng)及其含義可以參考我關(guān)于COCO數(shù)據(jù)集格式解析的博客。不過(guò)，也不用擔(dān)心轉(zhuǎn)換數(shù)據(jù)集腳本的麻煩，大多數(shù)開(kāi)源數(shù)據(jù)集都會(huì)提供轉(zhuǎn)換的工具箱代碼的，使用標(biāo)注工具自己標(biāo)注的數(shù)據(jù)集處理方式網(wǎng)上也有很多開(kāi)源代碼。在MMDetection中，建議將離線數(shù)據(jù)集轉(zhuǎn)換的腳本放在根目錄下的tools/dataset_converters/目錄下，該目錄下官方已經(jīng)預(yù)定義了將VOC數(shù)據(jù)集和Cityscapes數(shù)據(jù)集轉(zhuǎn)化為COCO格式的腳本。

'images': [{'file_name': 'COCO_val2014_000000001268.jpg','height': 427,'width': 640,'id': 1268},... ],'annotations': [{'segmentation': [[192.81,247.09,...219.03,249.06]], # if you have mask labels'area': 1035.749,'iscrowd': 0,'image_id': 1268,'bbox': [192.81, 224.8, 74.73, 33.43],'category_id': 16,'id': 42986},... ],'categories': [{'id': 0, 'name': 'car'},]

自定義數(shù)據(jù)準(zhǔn)備

我這里通過(guò)官方的腳本通過(guò)命令python tools/dataset_converters/pascal_voc.py /VOC/VOCdevkit/ -o /VOC/VOCdevkit/ --out-format coco將VOC數(shù)據(jù)集轉(zhuǎn)換為了COCO格式（由于VOC2012測(cè)試集沒(méi)有標(biāo)注，因而需要修改官方腳本測(cè)試集轉(zhuǎn)換的部分），得到了一個(gè)合適的COCO格式的自定義數(shù)據(jù)集（只不過(guò)我這里自定義數(shù)據(jù)集以VOC為例），生成了如下的COCO格式的標(biāo)注文件。

voc0712_train.json voc0712_trainval.json voc0712_val.json voc07_test.json voc07_train.json voc07_trainval.json voc07_val.json voc12_train.json voc12_trainval.json voc12_val.json

到這里，我們就完成了數(shù)據(jù)集的預(yù)處理，現(xiàn)在要想使用該數(shù)據(jù)集訓(xùn)練，還需要兩個(gè)步驟：修改自定義數(shù)據(jù)集的配置文件和檢查自定義數(shù)據(jù)集的標(biāo)注。

先是修改配置文件，我們需要在配置文件中修改三處，第一處是dataset_type字段，將其修改為轉(zhuǎn)換為的預(yù)定于數(shù)據(jù)集類型，如CocoDataset；第二處是classes字段，這是一個(gè)所有目標(biāo)類別的字符串列表（需要注意的是這里也未必需要所有類別，也可以是所有類別的子集上進(jìn)行訓(xùn)練，也是通過(guò)這個(gè)classes字段控制），配合這個(gè)修改，data字段下具體的train、val和test字段也要修改classes和文件目錄；第三處是model字段部分的num_classes數(shù)值，如VOC應(yīng)該設(shè)置為20。例如，對(duì)于上面COCO格式的VOC，我們編寫的配置文件如下。

_base_ = ['../_base_/models/faster_rcnn_r50_fpn.py','../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' ]model = dict(roi_head=dict(bbox_head=dict(num_classes=20)))dataset_type = 'CocoDataset' classes = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat','chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person','pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ) data_root = '自己的數(shù)據(jù)集根目錄' img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [dict(type='LoadImageFromFile'),dict(type='LoadAnnotations', with_bbox=True),dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),dict(type='RandomFlip', flip_ratio=0.5),dict(type='Normalize', **img_norm_cfg),dict(type='Pad', size_divisor=32),dict(type='DefaultFormatBundle'),dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [dict(type='LoadImageFromFile'),dict(type='MultiScaleFlipAug',img_scale=(1000, 600),flip=False,transforms=[dict(type='Resize', keep_ratio=True),dict(type='RandomFlip'),dict(type='Normalize', **img_norm_cfg),dict(type='Pad', size_divisor=32),dict(type='ImageToTensor', keys=['img']),dict(type='Collect', keys=['img']),]) ] data = dict(samples_per_gpu=2,workers_per_gpu=2,train=dict(type=dataset_type,classes=classes,ann_file=data_root + 'voc0712_train.json',img_prefix=data_root,pipeline=train_pipeline),val=dict(type=dataset_type,classes=classes,ann_file=data_root + 'voc0712_val.json',img_prefix=data_root,pipeline=test_pipeline),test=dict(type=dataset_type,classes=classes,ann_file=data_root + 'voc0712_val.json',img_prefix=data_root,pipeline=test_pipeline)) evaluation = dict(interval=1, metric='bbox')

接著，我們應(yīng)該檢查自定義數(shù)據(jù)集的標(biāo)注是否合法，因?yàn)闃?biāo)注不準(zhǔn)確，一切的訓(xùn)練都是無(wú)效的。下面是一個(gè)合法的COCO標(biāo)注格式示例。我們也可以通過(guò)官方提供的簡(jiǎn)單的可視化腳本來(lái)檢查標(biāo)注讀下來(lái)是否正確，具體為執(zhí)行python tools/misc/browse_dataset.py config.py命令即可，通過(guò)可視化的如下圖結(jié)果，可以確定標(biāo)注應(yīng)該是沒(méi)什么問(wèn)題的。

'annotations': [{'segmentation': [[192.81,247.09,...219.03,249.06]], # if you have mask labels'area': 1035.749,'iscrowd': 0,'image_id': 1268,'bbox': [192.81, 224.8, 74.73, 33.43],'category_id': 16,'id': 42986},... ],# MMDetection automatically maps the uncontinuous `id` to the continuous label indices. 'categories': [{'id': 1, 'name': 'a'}, {'id': 3, 'name': 'b'}, {'id': 4, 'name': 'c'}, {'id': 16, 'name': 'd'}, {'id': 17, 'name': 'e'},]

官方教程這里還介紹了一種可用的中間格式數(shù)據(jù)集（middle format），我個(gè)人覺(jué)得這個(gè)用途不是特別廣泛，所以這里就跳過(guò)了。

數(shù)據(jù)集包裝

MMDetection還支持在現(xiàn)有數(shù)據(jù)集的基礎(chǔ)上進(jìn)行包裝從而對(duì)現(xiàn)有數(shù)據(jù)集進(jìn)行混合或者修改分布，目前MMDetection支持三種包裝方式，分別如下。

RepeatDataset: 簡(jiǎn)單重復(fù)整個(gè)數(shù)據(jù)集
ClassBalancedDataset: 以類別平衡的方式重復(fù)數(shù)據(jù)集
ConcatDataset: 級(jí)聯(lián)數(shù)據(jù)集

對(duì)第一個(gè)RepeatDataset的使用方式也很簡(jiǎn)單，假定原始數(shù)據(jù)集為A，那么為了重復(fù)只需要給定重復(fù)次數(shù)times即可，應(yīng)當(dāng)構(gòu)建如下的data.train字段。

dataset_A_train = dict(type='RepeatDataset',times=N,dataset=dict( # This is the original config of Dataset_Atype='Dataset_A',...pipeline=train_pipeline))

對(duì)第二個(gè)ClassBalancedDataset，這是一種基于類別頻率的重復(fù)數(shù)據(jù)集的方式，使用該方法要求定義的Dataset類實(shí)現(xiàn)了get_cat_ids(idx)方法，該數(shù)據(jù)集構(gòu)建需要指定閾值oversample_thr即可，它會(huì)修改原始數(shù)據(jù)集使得類別接近平衡，示例如下。

dataset_A_train = dict(type='ClassBalancedDataset',oversample_thr=1e-3,dataset=dict( # This is the original config of Dataset_Atype='Dataset_A',...pipeline=train_pipeline))

最后，重點(diǎn)提一下關(guān)鍵的ConcatDataset，我們之前的VOC0712其實(shí)就是兩個(gè)數(shù)據(jù)集級(jí)聯(lián)而成的，數(shù)據(jù)集的級(jí)聯(lián)有三種情況，分別敘述。

若需要級(jí)聯(lián)的數(shù)據(jù)集標(biāo)注是同種格式如VOC2007和VOC2012，則可以采用如下方式級(jí)聯(lián)，這樣構(gòu)建的數(shù)據(jù)集默認(rèn)在各個(gè)子數(shù)據(jù)集上評(píng)估，如果要在整個(gè)級(jí)聯(lián)后的數(shù)據(jù)集上評(píng)估，應(yīng)當(dāng)將下面的separate_eval設(shè)為False。
dataset_A_train = dict(type='Dataset_A',ann_file = ['anno_file_1', 'anno_file_2'],pipeline=train_pipeline,separate_eval=True, )
若需要級(jí)聯(lián)的數(shù)據(jù)集格式不同，需要類似下面這種方式進(jìn)行級(jí)聯(lián)，它同樣和上面一樣支持separate_eval的修改。
dataset_A_train = dict() dataset_B_train = dict()data = dict(imgs_per_gpu=2,workers_per_gpu=2,train = [dataset_A_train,dataset_B_train],val = dataset_A_val,test = dataset_A_test)
同樣也支持如下顯式的級(jí)聯(lián)數(shù)據(jù)集，
dataset_A_val = dict() dataset_B_val = dict()data = dict(imgs_per_gpu=2,workers_per_gpu=2,train=dataset_A_train,val=dict(type='ConcatDataset',datasets=[dataset_A_val, dataset_B_val],separate_eval=False))

數(shù)據(jù)管道

在MMDetection中，數(shù)據(jù)準(zhǔn)備pipeline和數(shù)據(jù)集的定義是解耦開(kāi)的，一個(gè)Dataset定義了如何處理標(biāo)注文件而一個(gè)Pipeline則定義了一套準(zhǔn)備數(shù)據(jù)字典（這個(gè)數(shù)據(jù)字典就是Dataset對(duì)象提供的item）的步驟，因此它通常是一個(gè)操作序列，每一步的操作都輸入一個(gè)字典輸出一個(gè)新字典用于下一步的轉(zhuǎn)換。

上圖是一個(gè)經(jīng)典的pipeline，藍(lán)框是操作名，隨著數(shù)據(jù)在pipeline內(nèi)的流動(dòng)，每個(gè)操作會(huì)向原有的字典中添加新的鍵（圖中的綠色部分）或者更新已有的鍵（圖中的黃色部分）。

這些所有的pipeline中的操作分為數(shù)據(jù)加載、預(yù)處理、格式化和測(cè)試時(shí)數(shù)據(jù)增強(qiáng)，

下面是一個(gè)經(jīng)典的Faster R-CNN的pipeline配置，具體操作的參數(shù)和函數(shù)用途及對(duì)字典的修改可以查看官網(wǎng)教程。

img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [dict(type='LoadImageFromFile'),dict(type='LoadAnnotations', with_bbox=True),dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),dict(type='RandomFlip', flip_ratio=0.5),dict(type='Normalize', **img_norm_cfg),dict(type='Pad', size_divisor=32),dict(type='DefaultFormatBundle'),dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [dict(type='LoadImageFromFile'),dict(type='MultiScaleFlipAug',img_scale=(1333, 800),flip=False,transforms=[dict(type='Resize', keep_ratio=True),dict(type='RandomFlip'),dict(type='Normalize', **img_norm_cfg),dict(type='Pad', size_divisor=32),dict(type='ImageToTensor', keys=['img']),dict(type='Collect', keys=['img']),]) ]

當(dāng)然，pipeline也是支持自定義的，它以字典作為輸入也以字典作為輸出，同樣要遵循注冊(cè)（registry）機(jī)制才可用，分以下三步。

定義my_pipeline.py。

from mmdet.datasets import PIPELINES@PIPELINES.register_module() class MyTransform:def __call__(self, results):results['dummy'] = Truereturn results ``**導(dǎo)入自定義的pipeline。**```python from .my_pipeline import MyTransform

在配置文件中加入該操作。

總結(jié)

本文主要介紹了MMDetetion的數(shù)據(jù)準(zhǔn)備相關(guān)的內(nèi)容，數(shù)據(jù)是模型訓(xùn)練的關(guān)鍵，這部分對(duì)應(yīng)的官方文檔這里給出鏈接。最后，如果我的文章對(duì)你有所幫助，歡迎點(diǎn)贊收藏評(píng)論一鍵三連，你的支持是我不懈創(chuàng)作的動(dòng)力。

總結(jié)

以上是生活随笔為你收集整理的MMDetection-数据准备的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。