生活随笔
收集整理的這篇文章主要介紹了
制作coco数据集,并在mmdetection上实验
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、dataset2coco
首先將標注好的json和img放在同一個文件夾中,取名為images。
format.py
統一修改json中的img_path。將路徑修改為統一格式?!痢痢?jpg
format.py 代碼如下:
import os
import redir_path
= '/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN_NO_CHECK/邏輯分類/01 data/images/'
pattern
= re
.compile('"imagePath": "(.+?jpg)",')
for file in os
.listdir
(dir_path
):if os
.path
.splitext
(file)[-1] != '.json':continuewith open(os
.path
.join
(dir_path
, file), encoding
='utf-8') as f
:content
= f
.read
()imagePath
= pattern
.findall
(content
)[0]print('imagePath ', imagePath
)new_content
= content
.replace
(imagePath
, os
.path
.splitext
(file)[0]+'.jpg')with open(os
.path
.join
(dir_path
, file), 'w', encoding
='utf-8') as nf
:nf
.write
(new_content
)
checkClass.py
檢查標注好的文件中的label。檢查是否有錯誤標簽。修改CLASS_REAL_NAMES為自己數據集的標簽,以及要檢查的存放json文件的地址dir_path。
checkClass.py代碼如下:
from encodings
import utf_8
from math
import dist
import os
import re
from unittest
.util
import sorted_list_difference
import numpy
as np
import json
from collections
import CounterCLASS_REAL_NAMES
= ['page box', 'centerfold strip', 'text', 'figure']dir_path
= '/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN/物理分類/01 data/new_result_json/'class_ids
= []json_list
= os
.listdir
(dir_path
)for file in os
.listdir
(dir_path
):if os
.path
.splitext
(file)[-1] != '.json':continuewith open(dir_path
+file, 'r', encoding
='utf8') as fp
:json_data
= json
.load
(fp
)json_shapes
= json_data
["shapes"]for json_label
in json_shapes
:if json_label
["label"] not in CLASS_REAL_NAMES
:print('error', json_label
["label"], file)class_ids
.append
(json_label
["label"])class_ids
= np
.unique
(class_ids
)print('一共有{}種class'.format(len(class_ids
)))
print('分別是')
index
= 1
for id in class_ids
:print('"{}",'.format(id), end
="")index
+= 1
print()
index
= 1
for id in class_ids
:print('"{}":{},'.format(id, index
))index
+= 1
最終會輸出
一共有4種class
分別是
"centerfold strip","figure","page box","text",
"centerfold strip":1,
"figure":2,
"page box":3,
"text":4,
“centerfold strip”,“figure”,“page box”,“text”,可以直接復制到mmdetection需要修改的類別中。
labels.txt
將labels.txt修改為自己數據集的label
為每個子集創建測試集、驗證集、測試集(使用labelme2coco.py)
將數據集變成coco形式。注意labelme2coco.py(和要轉換的images同一個路徑)、images里面包含(img和json文件)
注意:
labelme2coco.py進行了較多的修改。如果是本人修改的部分。會有cheng修改的字樣。
這里展示的是分別兩個數據集劃分為8:1:1,然后合并的情況。如果是一個數據集,則不需要(3)步驟
(1) 注釋合并數據集過程代碼,取消注釋處理數據集過程的代碼。并修改以下比例。
(2) 運行以下代碼,生成按比例分好的子集coco類型數據coco_811_1、coco_811_2
子集分別為 images_1_3000、images_2_1000
① python labelme2coco.py --input_dir images_1_3000 --output_dir coco_811_1 --labels labels.txt
② python labelme2coco.py --input_dir images_2_1000 --output_dir coco_811_2 --labels labels.txt
(3) 合并兩個小數據集。
注意:如果只有一個數據集,則不需要這步驟。
① 創建coco_811_merge
② 將coco_811_1、coco_811_2中的train2017,val2017,test2017合并在一起。
③ 注釋處理數據集過程代碼,取消注釋合并數據集過程的代碼。
④ 運行
python labelme2coco.py --input_dir images_all --output_dir coco_811 --labels labels.txt
最終生成的images_all就為2個子集的coco文件。
labelme2coco.py代碼如下:
注意:如果需要可視化??梢匀∠⑨?可視化的處理 過程。
"""https://www.cnblogs.com/gy77/p/15408027.html"""
import argparse
import collections
import datetime
import glob
import imp
import json
import os
import os
.path
as osp
import shutil
import sys
import uuid
import imgviz
import numpy
as np
import labelme
from sklearn
.model_selection
import train_test_split
import cv2
from matplotlib
import pyplot
as plt
try:import pycocotools
.mask
except ImportError
:print("Please install pycocotools:\n\n pip install pycocotools\n")sys
.exit
(1)def to_coco(args
,label_files
,num
):now
= datetime
.datetime
.now
()data
= dict(info
=dict(description
=None,url
=None,version
=None,year
=now
.year
,contributor
=None,date_created
=now
.strftime
("%Y-%m-%d %H:%M:%S.%f"),),licenses
=[dict(url
=None, id=0, name
=None,)],images
=[],type="instances",annotations
=[],categories
=[],)class_name_to_id
= {}for i
, line
in enumerate(open(args
.labels
).readlines
()):class_id
= i
- 1 class_name
= line
.strip
() if class_id
== -1:assert class_name
== "__ignore__" continueclass_name_to_id
[class_name
] = class_iddata
["categories"].append
(dict(supercategory
=None, id=class_id
, name
=class_name
,))if num
== 0:out_ann_file
= osp
.join
(args
.output_dir
, "annotations","instances_train2017.json")elif num
== 1:out_ann_file
= osp
.join
(args
.output_dir
, "annotations","instances_val2017.json")else:out_ann_file
= osp
.join
(args
.output_dir
, "annotations","instances_test2017.json")for image_id
, filename
in enumerate(label_files
):label_file
= labelme
.LabelFile
(filename
=filename
)base
= osp
.splitext
(osp
.basename
(filename
))[0] if num
== 0:out_img_file
= osp
.join
(args
.output_dir
, "train2017", base
+ ".jpg")elif num
== 1:out_img_file
= osp
.join
(args
.output_dir
, "val2017", base
+ ".jpg")else:out_img_file
= osp
.join
(args
.output_dir
, "test2017", base
+ ".jpg")print("| ",out_img_file
)img
= cv2
.imread
(filename
[:-5]+'.jpg')shutil
.copy
(filename
[:-5]+'.jpg', os
.path
.join
(out_img_file
)) data
["images"].append
(dict(license
=0,url
=None,file_name
=base
+".jpg", height
=img
.shape
[0],width
=img
.shape
[1],date_captured
=None,id=image_id
,))masks
= {} segmentations
= collections
.defaultdict
(list) for shape
in label_file
.shapes
:points
= shape
["points"]label
= shape
["label"]group_id
= shape
.get
("group_id")shape_type
= shape
.get
("shape_type", "polygon")mask
= labelme
.utils
.shape_to_mask
(img
.shape
[:2], points
, shape_type
)if group_id
is None:group_id
= uuid
.uuid1
()instance
= (label
, group_id
)if instance
in masks
:masks
[instance
] = masks
[instance
] | mask
else:masks
[instance
] = mask
if shape_type
== "rectangle":(x1
, y1
), (x2
, y2
) = pointsx1
, x2
= sorted([x1
, x2
])y1
, y2
= sorted([y1
, y2
])points
= [x1
, y1
, x2
, y1
, x2
, y2
, x1
, y2
]else:points
= np
.asarray
(points
).flatten
().tolist
()segmentations
[instance
].append
(points
)segmentations
= dict(segmentations
)for instance
, mask
in masks
.items
():cls_name
, group_id
= instance
if cls_name
not in class_name_to_id
:continuecls_id
= class_name_to_id
[cls_name
]mask
= np
.asfortranarray
(mask
.astype
(np
.uint8
))mask
= pycocotools
.mask
.encode
(mask
)area
= float(pycocotools
.mask
.area
(mask
))bbox
= pycocotools
.mask
.toBbox
(mask
).flatten
().tolist
()data
["annotations"].append
(dict(id=len(data
["annotations"]),image_id
=image_id
,category_id
=cls_id
,segmentation
=segmentations
[instance
],area
=area
,bbox
=bbox
,iscrowd
=0,))with open(out_ann_file
, "w") as f
: json
.dump
(data
, f
)
def main():parser
= argparse
.ArgumentParser
(formatter_class
=argparse
.ArgumentDefaultsHelpFormatter
)parser
.add_argument
("--input_dir", help="input annotated directory")parser
.add_argument
("--output_dir", help="output dataset directory")parser
.add_argument
("--labels", help="labels file", required
=True)parser
.add_argument
("--noviz", help="no visualization", action
="store_true")args
= parser
.parse_args
()if osp
.exists
(args
.output_dir
):print("Output directory already exists:", args
.output_dir
)sys
.exit
(1)os
.makedirs
(args
.output_dir
)print("| Creating dataset dir:", args
.output_dir
)if not args
.noviz
:os
.makedirs
(osp
.join
(args
.output_dir
, "visualization"))if not os
.path
.exists
(osp
.join
(args
.output_dir
, "annotations")):os
.makedirs
(osp
.join
(args
.output_dir
, "annotations"))if not os
.path
.exists
(osp
.join
(args
.output_dir
, "train2017")):os
.makedirs
(osp
.join
(args
.output_dir
, "train2017"))if not os
.path
.exists
(osp
.join
(args
.output_dir
, "val2017")):os
.makedirs
(osp
.join
(args
.output_dir
, "val2017"))if not os
.path
.exists
(osp
.join
(args
.output_dir
, "test2017")):os
.makedirs
(osp
.join
(args
.output_dir
, "test2017"))feature_files
= glob
.glob
(osp
.join
(args
.input_dir
, "*.**g"))print(feature_files
)print('| Image number: ', len(feature_files
))label_files
= glob
.glob
(osp
.join
(args
.input_dir
, "*.json"))print('| Json number: ', len(label_files
))X_train
, X_validate_test
, y_train
, y_validate_test
= train_test_split
(feature_files
, label_files
, test_size
= 0.2)X_validate
, X_test
, y_validate
, y_test
= train_test_split
(X_validate_test
, y_validate_test
, test_size
= 0.5)print("| Train number:", len(y_train
), '\t Value number:', len(y_validate
), '\t Test number:', len(y_test
))print("—"*50) print("| Train images:")to_coco
(args
, y_train
, num
=0)print("—"*50)print("| Val images:")to_coco
(args
, y_validate
,num
=1)print("—"*50)print("| Test images:")to_coco
(args
, y_test
, num
=2)if __name__
== "__main__":print("—"*50)main
()print("—"*50)
check_label_figure_bbox_number.py
統計訓練集、驗證集、測試集label數量
check_label_figure_bbox_number.py代碼如下:
from pycocotools
.coco
import COCOdataDir
='/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN/物理分類/01 data/coco'
dataType
='test2017'annFile
='{}/annotations/instances_{}.json'.format(dataDir
, dataType
)
coco
=COCO
(annFile
)
cats
= coco
.loadCats
(coco
.getCatIds
())
cat_nms
=[cat
['name'] for cat
in cats
]
print('number of categories: ', len(cat_nms
))
print('COCO categories: \n', cat_nms
)print("{} {} {}".format('label', 'figure number', 'bbox number'))
for catId
in coco
.getCatIds
():img_number
=coco
.getImgIds
(catIds
=catId
)ann_number
=coco
.getAnnIds
(catIds
=catId
)print("{} {} {}".format(cat_nms
[catId
],len(img_number
),len(ann_number
)))
可以把結果復制到word下,把‘ ’替換成‘,’,然后制成表格。
如下所示:
?
statistical_aspect_ratio.py
統計寬高比
statistical_aspect_ratio.py代碼如下:
import pandas
as pd
import seaborn
as sns
import numpy
as np
import json
import matplotlib
.pyplot
as plt
import pandas
as pd
plt
.rcParams
['font.sans-serif'] = ['SimHei']
plt
.rcParams
['font.family'] = 'sans-serif'
plt
.rcParams
['figure.figsize'] = (10.0, 10.0)
ann_json_path1
= "/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN/物理分類/01 data/coco/annotations/instances_train2017.json"
ann_json_path2
= "/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN/物理分類/01 data/coco/annotations/instances_val2017.json"
ann_json_path3
= "/home/chenghiuyi/03 DLA-CHD/DLA-CHD_TRAIN/物理分類/01 data/coco/annotations/instances_test2017.json"
ann_json_path_all
= [ann_json_path1
, ann_json_path2
, ann_json_path3
]aspect_ratio
= []
label_number
= 5for i
in range(label_number
):aspect_ratio
.append
([])
bbox_w
= []
bbox_h
= []
bbox_wh
= []
all_label_ratio_top3
= []for ann_json_path
in ann_json_path_all
:with open(ann_json_path
) as f
:ann
=json
.load
(f
)categorys_dic
= dict([(i
['id'],i
['name']) for i
in ann
['categories']]) categorys_num
= dict([i
['name'], 0] for i
in ann
['categories']) for i
in ann
['annotations']:categorys_num
[categorys_dic
[i
['category_id']]] +=1for i
in ann
['annotations']:if i
['category_id'] in range(label_number
):bbox_w
.append
(round(i
['bbox'][2], 2))bbox_h
.append
(round(i
['bbox'][3], 2))wh
= round(i
['bbox'][2]/i
['bbox'][3],0)if(wh
< 1):wh
= round(i
['bbox'][3]/i
['bbox'][2], 0)aspect_ratio
[i
['category_id']].append
(wh
)print('label', ' ', 'aspect_ratio')
for i
in range(label_number
):
bbox_wh_unique
= set(aspect_ratio
[i
] )bbox_count_unique
= [aspect_ratio
[i
].count
(j
) for j
in bbox_wh_unique
] sort
= np
.argsort
(-np
.array
(bbox_count_unique
)) top_3
= np
.array
(list(bbox_wh_unique
))[sort
[0:3]]print(categorys_dic
[i
], ' ', top_3
)for j
in range(len(top_3
)):all_label_ratio_top3
.append
(top_3
[j
])all_label_ratio_top3
= pd
.Series
(all_label_ratio_top3
)
all_label_ratio_top3
= all_label_ratio_top3
.value_counts
()
all_label_ratio_top3
.sort_index
(inplace
=True)
print('-'*50)
print('all_label_top3')
print('ratio number')
print(all_label_ratio_top3
)
可以把結果復制到word下,把‘ ’替換成‘,’,然后制成表格
結果如下
?
以上就是把labelme數據轉換為coco數據集。并統計有關信息的全過程啦。
本人代碼能力有限,如果有不對的地方,希望大家批評指教。
總結
以上是生活随笔為你收集整理的制作coco数据集,并在mmdetection上实验的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。