當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

win下使用TensorFlow object detection训练自己模型

發布時間：2024/1/18 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 win下使用TensorFlow object detection训练自己模型小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

win下使用TensorFlow object detection訓練自己模型

1. 環境
2.xml生成csv文件，再生成record文件
- 2.1 對訓練文件和測試文件都使用以下兩個文件分別生成自己的csv文件
- 2.1 對生成的兩個csv文件分別生成自己的record文件
3. 修改配置文件
4.訓練保存模型
5.進行模型驗證
6.使用zed相機實時檢測
7.Android端使用實時檢測
- 7.1 將pb文件轉換成tflite文件
- 7.2安裝android studio
8.對訓練的圖像進行數據增強
9.模型應用于網絡攝像頭
工程文件
參考

1. 環境

1.1 創建虛擬環境python3.7，安裝tensorflow-gpu==1.13.1，安裝PIL（pip install pillow）。
1.2 下載labelimg，使用labelimg對自己的圖片進行標注,保存，生成xml文件（使用這三個快捷鍵：ctrl+s保存，d下一張，w畫筆工具，標注最好是字符串形式的標簽）。
1.3 建立4個文件夾（train訓練圖片，train_xml訓練圖片經過labelimg標注的xml文件，test測試文件同上）。
1.4 克隆tensorflow的models文件，就是用這里的模型和配置文件來訓練自己的數據。
1.5.1 下載自己對應版本的protoc,解壓后將bin文件夾中的【protoc.exe】放到C:\Windows，
1.5.2 在models\research\目錄下打開命令行窗口，輸入

protoc object_detection/protos/*.proto --python_out=.

1.5.3 在 ‘此電腦’-‘屬性’- ‘高級系統設置’ -‘環境變量’-‘系統變量’ 中新建名為‘PYTHONPATH’的變量，將
models/research/ 及 models/research/slim 兩個文件夾的完整目錄添加，分號隔開。
1.5.4 將slim文件夾下的nets文件夾復制，粘貼到research/object_detection文件夾下.
1.5.5 在slim位置打開終端，輸入

python setup.py build python setup.py install

如果有問題，將slim文件夾下的bulid文件改名。
1.5.6 測試API，輸入

python object_detection/builders/model_builder_test.py

不報錯說明運行成功。

2.xml生成csv文件，再生成record文件

2.1 對訓練文件和測試文件都使用以下兩個文件分別生成自己的csv文件

''' xml文件生成csv文件,生成到了各自的xml文件夾里更改三個地方 ''' import os import glob import pandas as pd import xml.etree.ElementTree as ETos.chdir(r'F:\bomb\test_xml') ## 1、更改路徑到訓練（或者測試）的xml文件夾 path = r'F:\bomb\test_xml' ## 2、同上def xml_to_csv(path):xml_list = []for xml_file in glob.glob(path + '\\*.xml'):tree = ET.parse(xml_file)#print(tree)root = tree.getroot()#hh=root.findall('object')#print(hh[0][0].text)#print(root.find('size')[0])for member in root.findall('object'):#print(member,member[0].text)try:value = (#root.find('filename').text,xml_file.split('\\')[-1].split('.')[0]+'.jpg',int(root.find('size')[0].text),int(root.find('size')[1].text),member[0].text,int(member[4][0].text),int(member[4][1].text),int(member[4][2].text),int(member[4][3].text))except:pass#print(value)xml_list.append(value)column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']xml_df = pd.DataFrame(xml_list, columns=column_name)return xml_dfdef main():image_path = pathxml_df = xml_to_csv(image_path)xml_df.to_csv('test.csv', index=None) #3、輸出文件名稱，生成train.csv或者test.csvprint('Successfully converted xml to csv.')main()

2.1 對生成的兩個csv文件分別生成自己的record文件

新建一個generate_tfrecord.py文件，修改里面的自己的路徑

# -*- coding: utf-8 -*- """ csv文件生成record文件,生成到了各自的xml文件夾里 """ """ Usage:# From tensorflow/models/# Create train data:python C:\\Users\\YFZX\\Desktop\\image_augment\\image_augment\\generate_tfrecord.py --csv_input=train.csv --output_path=train.recordpython generate_tfrecord.py --csv_input=F:\\bomb\\test.csv --output_path=F:\\bomb\\test.record """ #改3處 import sys sys.path.append(r'C:\\models-r1.13.0\research\object_detection\utils')#1.改成自己下載的tensorflow的model文件夾里面的research\object_detection\utils文件夾路徑 import dataset_utilimport os import io import pandas as pd import tensorflow as tffrom PIL import Image #from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict# os.chdir(r'C:\Users\YFZX\Desktop\dz') # print(sys.argv[0]) flags = tf.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input')#這里是運行py文件的輸入文件名叫csv_input flags.DEFINE_string('output_path', '', 'Path to output TFRecord')## 第一個是參數名稱，第二個參數是默認值，第三個是參數描述 FLAGS = flags.FLAGS# TO-DO replace this with label map #2.注意將對應的label改成自己的類別！！！！！！！！！！ def class_text_to_int(row_label):if row_label == 'class3':return 1elif row_label == 'class4':return 2elif row_label == 'class5':return 3elif row_label == 'class6':return 4elif row_label == 'class7':return 5else:return 0def split(df, group):data = namedtuple('data', ['filename', 'object'])gb = df.groupby(group)return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]def create_tf_example(group, path):with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:encoded_jpg = fid.read()encoded_jpg_io = io.BytesIO(encoded_jpg)image = Image.open(encoded_jpg_io)width, height = image.sizefilename = group.filename.encode('utf8')image_format = b'jpg'xmins = []xmaxs = []ymins = []ymaxs = []classes_text = []classes = []for index, row in group.object.iterrows():xmins.append(row['xmin'] / width)xmaxs.append(row['xmax'] / width)ymins.append(row['ymin'] / height)ymaxs.append(row['ymax'] / height)classes_text.append(row['class'].encode('utf8'))classes.append(class_text_to_int(row['class']))tf_example = tf.train.Example(features=tf.train.Features(feature={'image/height': dataset_util.int64_feature(height),'image/width': dataset_util.int64_feature(width),'image/filename': dataset_util.bytes_feature(filename),'image/source_id': dataset_util.bytes_feature(filename),'image/encoded': dataset_util.bytes_feature(encoded_jpg),'image/format': dataset_util.bytes_feature(image_format),'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),'image/object/class/text': dataset_util.bytes_list_feature(classes_text),'image/object/class/label': dataset_util.int64_list_feature(classes),}))return tf_exampledef main(_):writer = tf.python_io.TFRecordWriter(FLAGS.output_path)path = os.path.join(os.getcwd(), r'F:\bomb\train') #3.改為自己的train（或者test）圖片的存放路徑examples = pd.read_csv(FLAGS.csv_input)grouped = split(examples, 'filename')for group in grouped:tf_example = create_tf_example(group, path)writer.write(tf_example.SerializeToString())writer.close()output_path = os.path.join(os.getcwd(), FLAGS.output_path)print('Successfully created the TFRecords: {}'.format(output_path))if __name__ == '__main__':tf.app.run()

保存此py文件，然后在終端的虛擬環境(activate tf1.13)中運行這個py文件（在這位置安裝shift再鼠標右鍵，選擇在此處打開命令窗口），如下
csv_input=自己的csv文件地址。output_path=生成的record文件的地址和名稱。（如果出錯，可以試著吧地址寫成絕對路徑）

python bomb819\torecord.py --csv_input=bomb831\train831.csv --output_path=bomb831\train.record

3. 修改配置文件

1.建立一個自己的pbtxt文件（bomb.pbtxt），我自己是手寫的，就按照research\object_detection\data文件夾里面已有的文件改成自己的分類類別。
2.在object_detection文件夾下新建一個自己的工程目錄文件夾（bomb），在object_detection/samples/config文件夾里面，找到你自己想要的模型的config文件，復制粘貼到自己剛才建立的工程目錄里，并按自己的模型分類進行修改
3.主要的修改是num_classes分類的類別數，batch_size按自己數據的大小選擇適合的,學習率與batchsize應該按比例增加或減少。num_steps看自己的訓練時間，fine_tune_checkpoint是遷移學習的，如果沒有model.ckpt文件就要注釋掉，input_path是自己的record文件的位置，label_map_path是自己的pbtxt文件的位置）。

4.訓練保存模型

首先需在models/research/目錄下執行：python setup.py install，在在models/research/slim目錄下執行：python setup.py install
1.訓練模型：在object detection 文件夾打開虛擬環境
train_dir是自己的工程文件夾，pipeline_config_path是工程文件夾下剛才修改的配置文件

python ./legacy/train.py --logtostderr --train_dir=bomb/ --pipeline_config_path=bomb/ssdlite_mobilenet_v1_coco.config

這里我遇到了顯存不足的問題，該config里的batchsize還是不行，于是在legacy下的train.py文件開頭添加了

#放在代碼頂部的導入包的位置 import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"

或者

from tensorflow.compat.v1 import InteractiveSession import tensorflow as tf config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))sess = tf.compat.v1.Session(config=config)

查看tensorboard，

tensorboard --logdir=bomb831\

2.保存模型
訓練完成后，在object detection 文件夾打開虛擬環境，bomb2/ssdlite_mobilenet_v1_coco.config是自己工程項目下生成的config文件，bomb2/model.ckpt-100是上一步訓練的結果文件，bomb2_model是模型的保存文件夾。

python export_inference_graph.py \ --input_type image_tensor \ --pipeline_config_path bomb2/ssdlite_mobilenet_v1_coco.config \ --trained_checkpoint_prefix bomb2/model.ckpt-100 \ --output_directory bomb2_model

5.進行模型驗證

將自己的驗證圖片放在文件夾下，我放在了object_detection/test_images下。

import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True)) sess = tf.compat.v1.Session(config=config) import zipfile from distutils.version import StrictVersion from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image import cv2 #主要改五處，剩下的自行查看修改。 # This is needed since the notebook is stored in the object_detection folder. sys.path.append("..") from object_detection.utils import ops as utils_opsfrom utils import label_map_utilfrom utils import visualization_utils as vis_util# # Model preparation # ## Variables # # Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file. # # By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.# In[ ]:# What model to download. MODEL_NAME = 'bomb2_model'#1.改為自己生成模型的文件夾# Path to frozen detection graph. This is the actual model that is used for the object detection. PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'# List of the strings that is used to add correct label for each box. PATH_TO_LABELS = os.path.join('data', 'bomb2.pbtxt')#2.bptxt文件，我把他放在了object—_detection\data\下的bomb2.pbtxt # ## Load a (frozen) Tensorflow model into memory.# In[ ]:detection_graph = tf.Graph() with detection_graph.as_default():od_graph_def = tf.GraphDef()with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:serialized_graph = fid.read()od_graph_def.ParseFromString(serialized_graph)tf.import_graph_def(od_graph_def, name='')# ## Loading label map # Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine# In[ ]:category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)# ## Helper code# In[ ]:def load_image_into_numpy_array(image):(im_width, im_height) = image.sizereturn np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)# # Detection# In[ ]:PATH_TO_TEST_IMAGES_DIR = 'test_images'#3.測試圖片的文件夾目錄，在object_detection\test_images里放測試圖片 TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, '{}.jpg'.format(i)) for i in range(1, 3)]#4.圖片的名稱，這里是1.jpg和2.jpg # TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, '1.png')]# Size, in inches, of the output images. IMAGE_SIZE = (20, 14)## In[ ]:def run_inference_for_single_image(image, graph):with graph.as_default():with tf.Session() as sess:# Get handles to input and output tensorsops = tf.get_default_graph().get_operations()all_tensor_names = {output.name for op in ops for output in op.outputs}tensor_dict = {}for key in ['num_detections', 'detection_boxes', 'detection_scores','detection_classes', 'detection_masks']:tensor_name = key + ':0'if tensor_name in all_tensor_names:tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)if 'detection_masks' in tensor_dict:# The following processing is only for single imagedetection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(detection_masks, detection_boxes, image.shape[1], image.shape[2])detection_masks_reframed = tf.cast(tf.greater(detection_masks_reframed, 0.5), tf.uint8)# Follow the convention by adding back the batch dimensiontensor_dict['detection_masks'] = tf.expand_dims(detection_masks_reframed, 0)image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')# Run inferenceoutput_dict = sess.run(tensor_dict,feed_dict={image_tensor: image})# print(output_dict)# all outputs are float32 numpy arrays, so convert types as appropriateoutput_dict['num_detections'] = int(output_dict['num_detections'][0])output_dict['detection_classes'] = output_dict['detection_classes'][0].astype(np.int64)output_dict['detection_boxes'] = output_dict['detection_boxes'][0]output_dict['detection_scores'] = output_dict['detection_scores'][0]if 'detection_masks' in output_dict:output_dict['detection_masks'] = output_dict['detection_masks'][0]return output_dict# In[ ]:i = 40 for image_path in TEST_IMAGE_PATHS:image = Image.open(image_path)# the array based representation of the image will be used later in order to prepare the# result image with boxes and labels on it.image_np = load_image_into_numpy_array(image)# Expand dimensions since the model expects images to have shape: [1, None, None, 3]image_np_expanded = np.expand_dims(image_np, axis=0)# Actual detection.output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)# print(output_dict)# Visualization of the results of a detection.image = vis_util.visualize_boxes_and_labels_on_image_array(image_np,output_dict['detection_boxes'],output_dict['detection_classes'],output_dict['detection_scores'],category_index,# instance_masks=output_dict.get('detection_masks'),use_normalized_coordinates=True,min_score_thresh=0.1,#5.可信度閾值line_thickness=4)# print(coordinate,score)cv2.imwrite(f'test_images\\{i}.jpg', image)# i += 1cv2.namedWindow('1',0)cv2.imshow('1',image_np)cv2.waitKey(0)# plt.figure(figsize=IMAGE_SIZE)# plt.imshow(image_np)# plt.show()

6.使用zed相機實時檢測

zed相機可以參考我之前的博客，傳送門，在下載的zed_sdk文件夾下打開自己的虛擬環境，運行python的api文件：

python get_python_api.py

這時會讓你接著下載whl文件，如圖所示
按照他的提示，進行輸入下載，然后就可以調用zed的python接口了。
下面用自己訓練的文件使用zed實現實時的目標檢測功能

import datetime import numpy as np import os import six.moves.urllib as urllib import sys import tensorflow as tf import collections import statistics import math import tarfile import os.path from threading import Lock, Thread from time import sleep import cv2 # ZED imports import pyzed.sl as sl #重點改3處 sys.path.append('utils')# ## Object detection imports from utils import ops as utils_ops from utils import label_map_util from utils import visualization_utils as vis_utildef load_image_into_numpy_array(image):ar = image.get_data()ar = ar[:, :, 0:3](im_height, im_width, channels) = image.get_data().shapereturn np.array(ar).reshape((im_height, im_width, 3)).astype(np.uint8)# return np.array(image)def load_depth_into_numpy_array(depth):ar = depth.get_data()ar = ar[:, :, 0:4](im_height, im_width, channels) = depth.get_data().shapereturn np.array(ar).reshape((im_height, im_width, channels)).astype(np.float32)lock = Lock() width = 1056 height = 624 confidence = 0.495#置信度閾值設置image_np_global = np.zeros([width, height, 3], dtype=np.uint8) depth_np_global = np.zeros([width, height, 4], dtype=np.float)exit_signal = False new_data = False# ZED image capture thread function def capture_thread_func(svo_filepath=None):global image_np_global, depth_np_global, exit_signal, new_datazed = sl.Camera()# Create a InitParameters object and set configuration parametersinput_type = sl.InputType()if svo_filepath is not None:input_type.set_from_svo_file(svo_filepath)init_params = sl.InitParameters(input_t=input_type)init_params.camera_resolution = sl.RESOLUTION.HD720init_params.camera_fps = 30init_params.depth_mode = sl.DEPTH_MODE.PERFORMANCEinit_params.coordinate_units = sl.UNIT.METERinit_params.svo_real_time_mode = False# Open the cameraerr = zed.open(init_params)print(err)while err != sl.ERROR_CODE.SUCCESS:err = zed.open(init_params)print(err)sleep(1)image_mat = sl.Mat()depth_mat = sl.Mat()runtime_parameters = sl.RuntimeParameters()image_size = sl.Resolution(width, height)while not exit_signal:if zed.grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:zed.retrieve_image(image_mat, sl.VIEW.LEFT, resolution=image_size)zed.retrieve_measure(depth_mat, sl.MEASURE.XYZRGBA, resolution=image_size)# print(image_mat.get_data().shape,depth_mat.get_data().shape)lock.acquire()image_np_global = load_image_into_numpy_array(image_mat)depth_np_global = load_depth_into_numpy_array(depth_mat)# print(image_np_global.shape,depth_np_global.shape)new_data = Truelock.release()sleep(0.01)zed.close()def display_objects_distances(image_np, depth_np, num_detections, boxes_, classes_, scores_, category_index):box_to_display_str_map = collections.defaultdict(list)box_to_color_map = collections.defaultdict(str)research_distance_box = 30for i in range(num_detections):if scores_[i] > confidence:box = tuple(boxes_[i].tolist())if classes_[i] in category_index.keys():class_name = category_index[classes_[i]]['name']display_str = str(class_name)if not display_str:display_str = '{}%'.format(int(100 * scores_[i]))else:display_str = '{}: {}%'.format(display_str, int(100 * scores_[i]))# Find object distanceymin, xmin, ymax, xmax = boxx_center = int(xmin * width + (xmax - xmin) * width * 0.5)y_center = int(ymin * height + (ymax - ymin) * height * 0.5)x_vect = []y_vect = []z_vect = []min_y_r = max(int(ymin * height), int(y_center - research_distance_box))min_x_r = max(int(xmin * width), int(x_center - research_distance_box))max_y_r = min(int(ymax * height), int(y_center + research_distance_box))max_x_r = min(int(xmax * width), int(x_center + research_distance_box))if min_y_r < 0: min_y_r = 0if min_x_r < 0: min_x_r = 0if max_y_r > height: max_y_r = heightif max_x_r > width: max_x_r = widthfor j_ in range(min_y_r, max_y_r):for i_ in range(min_x_r, max_x_r):z = depth_np[j_, i_, 2]if not np.isnan(z) and not np.isinf(z):x_vect.append(depth_np[j_, i_, 0])y_vect.append(depth_np[j_, i_, 1])z_vect.append(z)if len(x_vect) > 0:x = statistics.median(x_vect)y = statistics.median(y_vect)z = statistics.median(z_vect)distance = math.sqrt(x * x + y * y + z * z)print("{:.2f} {:.2f} {:.2f}".format(x, y, z))# display_str = display_str + " " + str('% 6.2f' % distance) + " m "display_str = display_str + str('% .2f' % x) + "x" + str('% .2f' % y) + "y" + str('% .2f' % z) + "z"box_to_display_str_map[box].append(display_str)box_to_color_map[box] = vis_util.STANDARD_COLORS[classes_[i] % len(vis_util.STANDARD_COLORS)]for box, color in box_to_color_map.items():ymin, xmin, ymax, xmax = boxvis_util.draw_bounding_box_on_image_array(image_np,ymin,xmin,ymax,xmax,color=color,thickness=4,display_str_list=box_to_display_str_map[box],use_normalized_coordinates=True)return image_npdef main(args):svo_filepath = Noneif len(args) > 1:svo_filepath = args[1]# This main thread will run the object detection, the capture thread is loaded later# What model to download and loadMODEL_NAME = 'bomb2_model'#1.改為自己的模型文件夾目錄# MODEL_NAME = 'faster_rcnn_nas_coco_2018_01_28' # Accurate but heavy# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'# Check if the model is already presentif not os.path.isfile(PATH_TO_FROZEN_GRAPH):print('Failing to initialize model')# print("Downloading model " + MODEL_NAME + "...")# List of the strings that is used to add correct label for each box.# PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')PATH_TO_LABELS = os.path.join('data', 'bomb2.pbtxt')#2.改為自己的bptxt文件夾目錄NUM_CLASSES = 5#3.自己模型的類別數# Start the capture thread with the ZED inputprint("Starting the ZED")capture_thread = Thread(target=capture_thread_func, kwargs={'svo_filepath': svo_filepath})capture_thread.start()# Shared resourcesglobal image_np_global, depth_np_global, new_data, exit_signal# Load a (frozen) Tensorflow model into memory.print("Loading model " + MODEL_NAME)detection_graph = tf.Graph()with detection_graph.as_default():od_graph_def = tf.GraphDef()with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:serialized_graph = fid.read()od_graph_def.ParseFromString(serialized_graph)tf.import_graph_def(od_graph_def, name='')# Limit to a maximum of 80% the GPU memory usage taken by TF https://www.tensorflow.org/guide/using_gpuconfig = tf.ConfigProto()config.gpu_options.per_process_gpu_memory_fraction = 0.8# Loading label maplabel_map = label_map_util.load_labelmap(PATH_TO_LABELS)categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,use_display_name=True)category_index = label_map_util.create_category_index(categories)# Detectionwith detection_graph.as_default():with tf.Session(config=config, graph=detection_graph) as sess:while not exit_signal:# Expand dimensions since the model expects images to have shape: [1, None, None, 3]if new_data:lock.acquire()image_np = np.copy(image_np_global)depth_np = np.copy(depth_np_global)# print(image_np,depth_np)new_data = Falselock.release()image_np_expanded = np.expand_dims(image_np, axis=0)image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')# Each box represents a part of the image where a particular object was detected.boxes = detection_graph.get_tensor_by_name('detection_boxes:0')# Each score represent how level of confidence for each of the objects.# Score is shown on the result image, together with the class label.scores = detection_graph.get_tensor_by_name('detection_scores:0')classes = detection_graph.get_tensor_by_name('detection_classes:0')num_detections = detection_graph.get_tensor_by_name('num_detections:0')# Actual detection.(boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],feed_dict={image_tensor: image_np_expanded})num_detections_ = num_detections.astype(int)[0]# Visualization of the results of a detection.image_np = display_objects_distances(image_np,depth_np,num_detections_,np.squeeze(boxes),np.squeeze(classes).astype(np.int32),np.squeeze(scores),category_index)print(datetime.datetime.now())cv2.imshow('object detection', cv2.resize(image_np, (width, height)))if cv2.waitKey(10) & 0xFF == ord('q'):cv2.destroyAllWindows()exit_signal = Trueelse:sleep(0.01)sess.close()exit_signal = Truecapture_thread.join()if __name__ == '__main__':main(sys.argv)

7.Android端使用實時檢測

7.1 將pb文件轉換成tflite文件

將生成的模型文件保存，這里用到了tensorflow自己的ssd，因此可以用object_detection目錄下的export_tflite_ssd_graph.py文件，如下圖所示，最終生成了這兩個文件。

python export_tflite_ssd_graph.py --input_type image_tensor --pipeline_config_path ./bomb819/ssd_mobilenet_v2_quantized_300x300_coco.config --trained_checkpoint_prefix ./bomb819/model.ckpt-10000 --output_directory bombtf819

2.下載Bazel的win版本

3.將下載的exe文件改名為bazel.exe,然后將它放在一個文件夾下，并將位置添加到系統的環境變量path中，就可以成功運行bazel了。
然后用bazel對文件build，獲取.pb模型輸入輸出節點array名稱和相關矩陣參數。這里我沒有跑成功，具體可以參考我師父的博客，不過如果用tensorflow自己的模型的話也可以不用build，因為名稱和相關矩陣可以找到。
4.運行以下py文件就可以生成tflite文件。

# -*- coding:utf-8 -*- import tensorflow as tf in_path = r"bombtf\tflite_graph.pb"#1、剛才生成的pb文件地址 #out_path = "tflite_graph.tflite" # out_path = "./model/quantize_frozen_graph.tflite"# 模型輸入節點 input_tensor_name = ["normalized_input_image_tensor"] input_tensor_shape = {"normalized_input_image_tensor":[1,300,300,3]}#2.這里是build的結果，也可以在項目的congfig文件中查看，如下圖 # 模型輸出節點 classes_tensor_name = ['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3']#3、這里也是build的結果，不過tensorflow自己的程序都是這幾個名稱，可以不用改。converter = tf.lite.TFLiteConverter.from_frozen_graph(in_path,input_tensor_name, classes_tensor_name,input_tensor_shape)converter.allow_custom_ops=True converter.post_training_quantize = True#進行量化縮小 tflite_model = converter.convert()open(r"bombtf\output_detect.tflite", "wb").write(tflite_model)#4.輸出tflite文件的地址。

7.2安裝android studio

安裝java，安裝android studio，并運行之后，發現報錯，沒有sdk，于是在Android Studio的安裝目錄下，找到\bin\idea.properties，在尾行添加disable.android.first.run=true，表示初次啟動不檢測SDK。
然后運行，并下載sdk，下載jdk，最后打開官方給的工程文件。
7.2.1將自己生成的tflite文件放在android/app/src/main/assets中，并在這個目錄下新建一個txt文件存放自己的識別標簽，如下圖

7.2.2去掉gradle文件里的一行注釋，如圖所示

7.2.3修改三處內容（復制關鍵字，右鍵app>>find infiles>>輸入關鍵字并打開就可以找到文件）

private static final boolean TF_OD_API_IS_QUANTIZED = true;private static final String TF_OD_API_MODEL_FILE = "output_detect.tflite";private static final String TF_OD_API_LABELS_FILE = "file:///android_asset/bombtf.txt";

private static final float MINIMUM_CONFIDENCE_TF_OD_API = 0.30f;#修改閾值

在運行程序時，出現了sdk不兼容問題，于是app那個小綠圖標有叉號，而且運行報錯No variants found for ‘app’. Check build files to ensure at least one variant exists.。具體的解決方法是：SDK Manager中選中Android 10進行下載，然后 File -> Sync Project with Gradle Files

然后就可以運行到手機，打開設置中的開發人員操作。

8.對訓練的圖像進行數據增強

首先pip install imgaug，然后運行程序，增強訓練數據集，我途中遇到了編碼問題，解決方法是在open方式里增加encoding=‘utf-8’，我改的好像是elementtree.py文件。這個py文件新開一個工程在main中粘貼運行，他在這一個工程下運行會出問題。

import xml.etree.ElementTree as ET import pickle import os from os import getcwd import numpy as np from PIL import Imageimport imgaug as ia from imgaug import augmenters as iaaia.seed(1)def read_xml_annotation(root, image_id):in_file = open(os.path.join(root, image_id),encoding='UTF-8')tree = ET.parse(in_file)root = tree.getroot()bndboxlist = []for object in root.findall('object'): # 找到root節點下的所有country節點bndbox = object.find('bndbox') # 子節點下節點rank的值xmin = int(bndbox.find('xmin').text)xmax = int(bndbox.find('xmax').text)ymin = int(bndbox.find('ymin').text)ymax = int(bndbox.find('ymax').text)# print(xmin,ymin,xmax,ymax)bndboxlist.append([xmin,ymin,xmax,ymax])# print(bndboxlist)bndbox = root.find('object').find('bndbox')return bndboxlistdef change_xml_list_annotation(root, image_id, new_target,saveroot,id,h,w):in_file = open(os.path.join(root, str(image_id) + '.xml'),encoding='UTF-8') # 這里root分別由兩個意思tree = ET.parse(in_file)xmlroot = tree.getroot()index = 0aaa = xmlroot.find('path')#print(aaa)#aaa.text = 'C:\\Users\\YFZX\\Desktop\\6#\\img_aug\\' + str(image_id) + "_aug_" + str(id) + '.jpg'for object in xmlroot.findall('object'): # 找到root節點下的所有object節點bndbox = object.find('bndbox') # 子節點下節點rank的值# xmin = int(bndbox.find('xmin').text)# xmax = int(bndbox.find('xmax').text)# ymin = int(bndbox.find('ymin').text)# ymax = int(bndbox.find('ymax').text)new_xmin = new_target[index][0]new_ymin = new_target[index][1]new_xmax = new_target[index][2]new_ymax = new_target[index][3]xmin = bndbox.find('xmin')xmin.text = str(new_xmin)ymin = bndbox.find('ymin')ymin.text = str(new_ymin)xmax = bndbox.find('xmax')xmax.text = str(new_xmax)ymax = bndbox.find('ymax')ymax.text = str(new_ymax)index = index + 1if new_xmin>0 and new_ymin >0 and new_xmax<w and new_ymax<h:tree.write(os.path.join(saveroot, str(image_id) + "_aug_" + str(id) + '.xml'))def mkdir(path):# 去除首位空格path = path.strip()# 去除尾部 \ 符號path = path.rstrip("\\")# 判斷路徑是否存在# 存在 True# 不存在 FalseisExists = os.path.exists(path)# 判斷結果if not isExists:# 如果不存在則創建目錄# 創建目錄操作函數os.makedirs(path)print(path + ' 創建成功')return Trueelse:# 如果目錄存在則不創建，并提示目錄已存在print(path + ' 目錄已存在')return Falseif __name__ == "__main__":IMG_DIR = r"D:\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\test" ## 原始圖片XML_DIR = r"D:\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\test_xml" ## 原始xmlAUG_XML_DIR = r"D:\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\test_xml_aug" # 存儲增強后的XML文件夾路徑mkdir(AUG_XML_DIR)AUG_IMG_DIR = r"D:\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\test_aug" # 存儲增強后的影像文件夾路徑mkdir(AUG_IMG_DIR)AUGLOOP = 60 # 每張影像增強的數量boxes_img_aug_list = []new_bndbox = []new_bndbox_list = []# 影像增強seq = iaa.Sequential([iaa.Flipud(0.5), # vertically flip 20% of all imagesiaa.Fliplr(0.5), # 鏡像#iaa.Multiply((1.2, 1.5),per_channel=0.2), # change brightness, doesn't affect BBs#iaa.GaussianBlur(sigma=(0, 3.0)),#iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.3*255), per_channel=0.5), #### loc 噪聲均值，scale噪聲方差，50%的概率，對圖片進行添加白噪聲并應用于每個通道iaa.Multiply((0.75, 1.5), per_channel=1), ####20%的圖片像素值乘以0.5-2中間的數值,用以增加圖片明亮度或改變顏色#iaa.Affine(# translate_px={"x": 15, "y": 15},# scale=(0.8, 0.95),# rotate=(-30, 30)#), # translate by 40/60px on x/y axis, and scale to 50-70%, affects BBsiaa.Crop(percent=(0, 0.1),keep_size=True),# 0-0.1的數值，分別乘以圖片的寬和高為剪裁的像素個數，保持原尺寸iaa.Affine(scale=(0.8, 1.5),translate_percent=None,translate_px=None,rotate=(-180, 180),shear=0.0,order=1,cval=0,mode='constant',)],random_order= True)for root, sub_folders, files in os.walk(XML_DIR):for name in files:bndbox = read_xml_annotation(XML_DIR, name)for epoch in range(AUGLOOP):seq_det = seq.to_deterministic() # 保持坐標和圖像同步改變，而不是隨機# 讀取圖片img = Image.open(os.path.join(IMG_DIR, name[:-4] + '.jpg'))img = np.array(img)#print(img.shape)(h,w,c)h=img.shape[0]w=img.shape[1]# bndbox 坐標增強for i in range(len(bndbox)):bbs = ia.BoundingBoxesOnImage([ia.BoundingBox(x1=bndbox[i][0], y1=bndbox[i][1], x2=bndbox[i][2], y2=bndbox[i][3]),], shape=img.shape)bbs_aug = seq_det.augment_bounding_boxes([bbs])[0]boxes_img_aug_list.append(bbs_aug)# new_bndbox_list:[[x1,y1,x2,y2],...[],[]]new_bndbox_list.append([int(bbs_aug.bounding_boxes[0].x1),int(bbs_aug.bounding_boxes[0].y1),int(bbs_aug.bounding_boxes[0].x2),int(bbs_aug.bounding_boxes[0].y2)])# 存儲變化后的圖片image_aug = seq_det.augment_images([img])[0]path = os.path.join(AUG_IMG_DIR, str(name[:-4]) + "_aug_" + str(epoch) + '.jpg')# image_auged = bbs.draw_on_image(image_aug, thickness=0)Image.fromarray(image_aug).save(path)# 存儲變化后的XMLchange_xml_list_annotation(XML_DIR, name[:-4], new_bndbox_list, AUG_XML_DIR, epoch,h,w)print(str(name[:-4]) + "_aug_" + str(epoch) + '.jpg')new_bndbox_list = []

畫出增強后的圖像的畫框圖像

import os import cv2 as cv import xml.etree.ElementTree as ETdef xml_to_jpg(imgs_path, xmls_path, out_path):imgs_list = os.listdir(imgs_path) #讀取圖片列表xmls_list = os.listdir(xmls_path) # 讀取xml列表if len(imgs_list) <= len(xmls_list): #若圖片個數小于或等于xml個數，從圖片里面找與xml匹配的for imgName in imgs_list:temp1 = imgName.split('.')[0] #圖片名例如123.jpg 分割之后 temp1 = 123temp1_ = imgName.split('.')[1] #圖片后綴if temp1_!='jpg':continuefor xmlName in xmls_list: #遍歷xml列表，temp2 = xmlName.split('.')[0] #xml名temp2_ = xmlName.split('.')[1]if temp2_ != 'xml':continueif temp2!=temp1: #判斷圖片名與xml名是否相同，不同的話跳過下面的步驟繼續找continueelse: #相同的話開始讀取xml坐標信息，并在對應的圖片上畫框img_path = os.path.join(imgs_path, imgName)xml_path = os.path.join(xmls_path, xmlName)img = cv.imread(img_path)labelled = imgroot = ET.parse(xml_path).getroot()for obj in root.iter('object'):bbox = obj.find('bndbox')xmin = int(bbox.find('xmin').text.strip())ymin = int(bbox.find('ymin').text.strip())xmax = int(bbox.find('xmax').text.strip())ymax = int(bbox.find('ymax').text.strip())labelled = cv.rectangle(labelled, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)cv.imwrite(out_path + '\\' +imgName, labelled)breakelse: # 若xml個數小于圖片個數，從xml里面找與圖片匹配的。下面操作與上面差不多for xmlName in xmls_list:temp1 = xmlName.split('.')[0]temp1_ = xmlName.split('.')[1]if temp1_ != 'xml':continuefor imgName in imgs_list:temp2 = imgName.split('.')[0]temp2_ = imgName.split('.')[1] # 圖片后綴if temp2_ != 'jpg':continueif temp2 != temp1:continueelse:img_path = os.path.join(imgs_path, imgName)xml_path = os.path.join(xmls_path, xmlName)img = cv.imread(img_path)labelled = imgroot = ET.parse(xml_path).getroot()for obj in root.iter('object'):bbox = obj.find('bndbox')xmin = int(bbox.find('xmin').text.strip())ymin = int(bbox.find('ymin').text.strip())xmax = int(bbox.find('xmax').text.strip())ymax = int(bbox.find('ymax').text.strip())labelled = cv.rectangle(labelled, (xmin, ymin), (xmax, ymax), (0, 0, 255), 1)cv.imwrite(out_path + '\\' +imgName, labelled)break if __name__ == '__main__': # 使用英文路徑，中文路徑讀不進來imgs_path =r'C:\Users\YFZX\Desktop\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\train' #圖片路徑xmls_path = r'C:\Users\YFZX\Desktop\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\train_xml' #xml路徑retangele_img_path =r'C:\Users\YFZX\Desktop\models-r1.13.0\models-r1.13.0\research\object_detection\bomb819\train_xml_tojpg' #保存畫框后圖片的路徑xml_to_jpg(imgs_path, xmls_path, retangele_img_path)

9.模型應用于網絡攝像頭

#!/usr/bin/env python # -*- coding: utf-8 -*- from time import sleepimport numpy as np import os import sys import tensorflow as tf config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True)) sess = tf.compat.v1.Session(config=config) import cv2 os.chdir(r'E:\models-r1.13.0\models-r1.13.0\research\object_detection') sys.path.append("..") # Object detection imports from utils import label_map_util from utils import visualization_utils as vis_util # Model preparation MODEL_NAME = 'bomb819_model' PATH_TO_CKPT = MODEL_NAME + '/1_frozen_inference_graph.pb' PATH_TO_LABELS = os.path.join('data', '1_bomb.pbtxt')NUM_CLASSES = 5 # Load a (frozen) Tensorflow model into memory. detection_graph = tf.Graph() with detection_graph.as_default():od_graph_def = tf.GraphDef()with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:serialized_graph = fid.read()od_graph_def.ParseFromString(serialized_graph)tf.import_graph_def(od_graph_def, name='')# Loading label map label_map = label_map_util.load_labelmap(PATH_TO_LABELS) categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,use_display_name=True) category_index = label_map_util.create_category_index(categories)# Helper code def load_image_into_numpy_array(image):(im_width, im_height) = image.sizereturn np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)with detection_graph.as_default():with tf.Session(graph=detection_graph) as sess:# Definite input and output Tensors for detection_graphimage_tensor = detection_graph.get_tensor_by_name('image_tensor:0')# Each box represents a part of the image where a particular object was detected.detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')# Each score represent how level of confidence for each of the objects.# Score is shown on the result image, together with the class label.detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')num_detections = detection_graph.get_tensor_by_name('num_detections:0')# the video to be detected, eg, "test.mp4" here# url = "rtsp://admin:yzwlgzw123@192.168.0.99/12"url = "rtsp://admin:Yfzx6666@192.168.3.144/11"vidcap = cv2.VideoCapture(url)# Default resolutions of the frame are obtained.The default resolutions are system dependent.# We convert the resolutions from float to integer.while (1):sleep(0.001)ret, image = vidcap.read()if ret == True:image_np = image#x, y = image_np.shape[0:2]#image_np= cv2.resize(image_np, (int(y * 2), int(x * 2)))# Expand dimensions since the model expects images to have shape: [1, None, None, 3]image_np_expanded = np.expand_dims(image_np, axis=0)# Actual detection.(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],feed_dict={image_tensor: image_np_expanded})# Visualization of the results of a detection.vis_util.visualize_boxes_and_labels_on_image_array(image_np,np.squeeze(boxes),np.squeeze(classes).astype(np.int32),np.squeeze(scores),category_index,use_normalized_coordinates=True,line_thickness=2)#print(scores)cv2.imshow("capture",image_np)if cv2.waitKey(1) == ord('q'):break vidcap.release() cv2.destroyAllWindows()

聲明下，本人初入職場，菜鳥小白，這個項目是工作接觸的第二個任務，全程在小楊師傅的指導下完成，哈哈哈哈，要向優秀的的小楊師傅努力學習呀！加油啦！

工程文件

文件太大，可以留言或私信找我要。

參考

【1】https://blog.csdn.net/weixin_42232538/article/details/111141445
【2】https://my.oschina.net/u/3732258/blog/4698658

總結

以上是生活随笔為你收集整理的win下使用TensorFlow object detection训练自己模型的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：应用ESP8266控制433M无线遥控电
下一篇：微信小程序--查看变量类型的方法（简易）