當前位置：首頁 > 人工智能 > ChatGpt >内容正文

ChatGpt

monk js_对象检测-使用Monk AI进行文档布局分析

發布時間：2023/12/15 ChatGpt 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 monk js_对象检测-使用Monk AI进行文档布局分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

monk js

計算機視覺 (Computer Vision)

介紹 (Introduction)

This is an article on how Object Detection can help us in predicting various regions of a document. It can be useful in cropping out headlines, paragraphs, tables, images, etc. from a document image that can be later processed to get desired information from them as per the need. We compare the performance of 3 different Object Detection Architectures for this task, i.e., YOLOv3, Faster-RCNN, and SSD512, and use Monk Library to load these models.

這是一篇有關“對象檢測”如何幫助我們預測文檔各個區域的文章。它可用于從文檔圖像中裁剪標題，段落，表格，圖像等，然后可以對其進行處理以根據需要從中獲取所需的信息。我們針對此任務比較了3種不同的對象檢測體系結構的性能，即YOLOv3，Faster-RCNN和SSD512，并使用Monk庫加載這些模型。

Detailed Tutorial on Github.

關于Github的詳細教程。

關于數據集 (About the Dataset)

The training dataset used for this task is PRImA Layout Analysis Dataset. It includes a wide variety of different document types, reflecting various challenges in layout analysis. Particular emphasis is placed on:

用于此任務的訓練數據集是PRImA布局分析數據集。它包括各種不同的文檔類型，反映了布局分析中的各種挑戰。特別強調：

Magazine scans from a variety of mainstream news, business, and technology publications which contain a mixture of simple and complex layouts (e.g. non-Manhattan, with varying font sizes, etc.)
雜志掃描來自各種主流新聞，商業和技術出版物，這些出版物包含簡單和復雜的布局(例如，非曼哈頓，具有不同的字體大小等)。
Technical articles on a variety of disciplines, including papers in journals and conference proceedings, with both simple and complex layouts present.
有關各種學科的技術文章，包括期刊和會議論文集的論文，并提供簡單和復雜的版式。

The dataset contains 18 labels, namely, ‘caption’, ‘chart’, ‘credit’, ‘drop-capital’, ‘floating’, ‘footer’, ‘frame’, ‘graphics’, ‘header’, ‘heading’, ‘image’, ‘linedrawing’, ‘maths’, ‘noise’, ‘page-number’, ‘paragraph’, ‘separator’ and ‘table’

數據集包含18個標簽，即“標題”，“圖表”，“信用”，“首字母大寫”，“浮動”，“頁腳”，“框架”，“圖形”，“頁眉”，“標題”， “圖像”，“線條圖”，“數學”，“噪聲”，“頁碼”，“段落”，“分隔符”和“表格”

It can be downloaded from here.

可以從這里下載。

和尚AI： (Monk AI :)

Monk object detection is a collection of all object detection pipelines. The benefit is two-fold for each pipeline- make the installation compatible for multiple OS, Cuda versions, and python versions, and make it low code with a standardized flow of things. Monk object detection enables a user to solve a computer vision problem in very few lines of code. For this task, we’ll be using 3 different pipelines of this library for 3 different architectures- yolov3, gluoncv_finetune, and mxrcnn.

和尚對象檢測是所有對象檢測管道的集合。每個管道的好處是雙重的-使安裝兼容多個OS，Cuda版本和python版本，并通過標準的流程使其成為低代碼。和尚對象檢測使用戶可以用很少的幾行代碼解決計算機視覺問題。對于這個任務，我們將使用這個庫的3個不同的管道3種不同的architectures- yolov3 ， gluoncv_finetune 和 mxrcnn 。

目錄 (Table of Contents)

Installing Monk Object Detection Toolkit

安裝和尚對象檢測工具包

Using the Pre-trained model for the Document Layout Analysis Task

將預訓練模型用于文檔布局分析任務

Training your own Model

訓練自己的模型

Downloading and Pre-Processing Data (Format Conversion, Selective Data Augmentation)
下載和預處理數據(格式轉換，選擇性數據增強)
Training the model from Scratch
從頭開始訓練模型

4. Inference and Comparison

4.推論與比較

1.安裝和尚對象檢測工具包 (1. Installing Monk Object Detection Toolkit)

First of all, clone the library to your system using the following command:

首先，使用以下命令將庫克隆到您的系統：

! git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

Then, choose the pipeline that you want to install and the correct requirements file of that pipeline depending on your system’s CUDA version or Colab version. These are the commands for the pipelines that I’ve used for this task:

然后，根據系統的CUDA版本或Colab版本，選擇要安裝的管道以及該管道的正確要求文件。這些是我用于此任務的管道的命令：

#For yolov3 (used for yolov3 architecture)
! cd Monk_Object_Detection/7_yolov3/installation && cat requirements.txt | xargs -n 1 -L 1 pip install#For gluoncv_finetune (used for SSD512 architecture)
! cd Monk_Object_Detection/1_gluoncv_finetune/installation && cat requirements_cuda10.1.txt | xargs -n 1 -L 1 pip install#For mxrcnn (used for FasterRCNN architecture)
! cd Monk_Object_Detection/3_mxrcnn/installation && cat requirements_cuda10.1.txt | xargs -n 1 -L 1 pip install

For more pipelines or ways to install visit Monk Object Detection Library.

有關更多管道或安裝方式的信息，請訪問Monk對象檢測庫。

2.將預訓練的模型用于文檔布局分析任務 (2. Using the Pre-trained model for the Document Layout Analysis Task)

If you don’t want to train the model on your own, and just want to use the model that we’ve trained for the task, you can use the following piece of code to directly use it:

如果您不想自己訓練模型，而只想使用我們為該任務訓練的模型，則可以使用以下代碼直接使用它：

對于YOLOv3： (For YOLOv3:)

import osimport sysfrom IPython.display import Image
sys.path.append("Monk_Object_Detection/7_yolov3/lib")from infer_detector import Infer
gtf = Infer()

Download and initialize the pre-trained model:

下載并初始化預訓練的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1Si1puABMiijtvLvH-XMnr2pVj4K2lUkO' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1Si1puABMiijtvLvH-XMnr2pVj4K2lUkO" -O obj_dla_yolov3_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_yolov3_trained.zip! mv dla_yolov3/yolov3.cfg .f = open("dla_yolov3/classes.txt")
class_list = f.readlines()
f.close()model_name = "yolov3"
weights = "dla_yolov3/dla_yolov3.pt"
gtf.Model(model_name, class_list, weights, use_gpu=True, input_size=416)

And you can test it:

您可以對其進行測試：

#change test1 to whatever image you want it to test for.
img_path = "test1.jpg"
gtf.Predict(img_path, conf_thres=0.3, iou_thres=0.5)
Image(filename='output/test1.jpg')

對于SSD512： (For SSD512:)

import osimport sys
sys.path.append("Monk_Object_Detection/1_gluoncv_finetune/lib/")from inference_prototype import Infer

Download and initialize the pre-trained model:

下載并初始化預訓練的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1E6T7RKGwy-v1MUxVJm-rxt5XcRyr2SQ7' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1E6T7RKGwy-v1MUxVJm-rxt5XcRyr2SQ7" -O obj_dla_ssd512_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_ssd512_trained.zipmodel_name = "ssd_512_vgg16_atrous_coco";
params_file = "dla_ssd512/dla_ssd512-vgg16.params";
class_list = ["paragraph", "heading", "credit", "footer", "drop-capital", "floating", "noise", "maths", "header", "caption", "image", "linedrawing", "graphics", "fname", "page-number", "chart", "separator", "table"]gtf = Infer(model_name, params_file, class_list, use_gpu=True)

And you can test it:

您可以對其進行測試：

#change test1 to whatever image you want it to test for.
img_name = "test1.jpg"
visualize = True
thresh = 0.3
output = gtf.run(img_name, visualize=visualize, thresh=thresh)

對于Faster-RCNN： (For Faster-RCNN:)

import osimport sys
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/")
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")from infer_base import *

Download and initialize the pre-trained model:

下載并初始化預訓練的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1TZQSBiMDBrGhcT75AknTbofirSFXprt8' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1TZQSBiMDBrGhcT75AknTbofirSFXprt8" -O obj_dla_faster_rcnn_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_faster_rcnn_trained.zipclass_file = set_class_list("dla_fasterRCNN/classes.txt")set_model_params(model_name="vgg16", model_path="dla_fasterRCNN/dla_fasterRCNN-vgg16.params")set_hyper_params(gpus="0", batch_size=1)set_img_preproc_params(img_short_side=300, img_long_side=500, mean=(196.45086004329943, 199.09071480252155, 197.07683846968297), std=(0.25779948968052024, 0.2550292865960972, 0.2553027154941914))initialize_rpn_params()
initialize_rcnn_params()
sym = set_network()
mod = load_model(sym)

And you can test it:

您可以對其進行測試：

#change test1 to whatever image you want it to test for.
set_output_params(vis_thresh=0.9, vis=True)
Infer("test1.jpg", mod);

3.訓練自己的模型 (3. Train Your Own Model)

資料準備 (Data Preparation)

The dataset can be downloaded using the following command:

可以使用以下命令下載數據集：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1iBfafT1WHAtKAW0a1ifLzvW5f0ytm2i_' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1iBfafT1WHAtKAW0a1ifLzvW5f0ytm2i_" -O PRImA_Layout_Analysis_Dataset.zip && rm -rf /tmp/cookies.txt! unzip -qq PRImA_Layout_Analysis_Dataset.zip

All the images in the dataset are in TIFF format. Training on TIFF images was over 5x slower than JPEG format images because of their huge size. Therefore, TIFF images were converted to JPEG format images.

數據集中的所有圖像均為TIFF格式。由于TIFF圖像的尺寸很大，因此其訓練速度比JPEG格式的圖像慢5倍以上。因此，TIFF圖像被轉換為??JPEG格式的圖像。

for name in glob.glob(root_dir+img_dir+'*.tif'):
im = Image.open(name)
name = str(name).rstrip(".tif")
name = str(name).lstrip(root_dir)
name = str(name).lstrip(img_dir)
im.save(final_root_dir+ img_dir+ name + '.jpg', 'JPEG')

The data is present in the VOC format. To use it with various pipelines, we first convert it to Monk format, which is directly compatible with a lot of Monk pipelines, and later on, we can easily convert it to some other format if required. If you want to skip converting to Monk format and want to directly convert it to some other required format, then you can check out that pipelines’ example notebooks here.

數據以VOC格式顯示。要在各種管道中使用它，我們首先將其轉換為與許多Monk管道直接兼容的Monk格式，然后，如果需要，我們可以輕松地將其轉換為其他格式。如果您想跳過轉換為Monk格式并想直接將其轉換為其他所需格式，則可以在此處查看管道的示例筆記本。

Monk Format

和尚格式

Annotation file format

批注文件格式

Labels: xmin ymin xmax ymax label
標簽：xmin ymin xmax ymax標簽
xmin, ymin — top left corner of the bounding box
xmin，ymin —邊界框的左上角
xmax, ymax — bottom right corner of the bounding box
xmax，ymax —邊界框的右下角

The code for data conversion is straight-forward but very long. You can check out the code in one of the notebooks here.

數據轉換的代碼很簡單但是很長。您可以在此處的其中一本筆記本中簽出代碼。

Following are the format requirements for various pipelines used for this task:

以下是用于此任務的各種管道的格式要求：

yolov3 pipeline used for YOLOv3 architecture required data in YOLOv3 format. You can check out this conversion in this notebook.

用于YOLOv3體系結構的yolov3管道需要YOLOv3格式的數據。您可以在此筆記本中查看此轉換。

gluoncv-finetune pipeline used for SSD512 architecture directly takes in Monk Format for training. So, there was no need for further conversion.

用于SSD512架構的gluoncv-finetune管道直接采用Monk格式進行培訓。因此，無需進一步轉換。

mxrcnn pipeline used for Faster-RCNN architecture required data in COCO format. You can check out this conversion in this notebook.

用于Faster-RCNN體系結構的mxrcnn管道需要COCO格式的數據。您可以在此筆記本中查看此轉換。

選擇性數據擴充 (Selective Data Augmentation)

There was an issue with the dataset. As most part of a document is text, there were far more paragraphs in the dataset than there were other labels such as tables or graphs. To handle this huge bias in the dataset, we augmented only those document images which had one of these minority labels in them. For example, if the document only had paragraphs and images, then we didn’t augment it. But if it had tables, charts, graphs or any other minority label, we augmented that image by many folds. This process helped in reducing the bias in the dataset by around 25%. This selection and augmentation has been done during the format conversion from VOC to Monk Format. You can check out the code in one of the notebooks here.

數據集存在問題。由于文檔的大部分是文本，因此數據集中的段落比其他標簽(例如表格或圖形)要多得多。為了處理數據集中的這種巨大偏差，我們只對其中具有少數標簽之一的那些文檔圖像進行擴充。例如，如果文檔僅包含段落和圖像，則我們不會對其進行擴充。但是，如果它具有表格，圖表，圖形或其他任何少數標簽，我們會將其圖像放大很多倍。此過程有助于將數據集中的偏差減少約25％。在從VOC轉換為Monk格式的過程中，已經完成了這種選擇和擴充。您可以在此處的其中一本筆記本中簽出代碼。

For data augmentation, we have used the Albumentations library. It offers a lot of different ways to augment data, such as random cropping, translation, hue, saturation, contrast, brightness, etc. You can check more about this library here. It can be directly installed using pip command:

對于數據擴充，我們使用了Albumentations庫。它提供了許多不同的方法來擴充數據，例如隨機裁剪，平移，色調，飽和度，對比度，亮度等。您可以在此處查看有關此庫的更多信息。可以使用pip命令直接安裝：

! pip install albumentations

Following is the function that we wrote for data augmentation. There were few cases where bounding boxes were going out of the image and Albumentations library wasn’t able to handle it, so we’ve written a custom function to make sure that labels are inside the image.

以下是我們為數據擴充編寫的功能。在少數情況下，邊界框從圖像中移出并且Albumentations庫無法處理它，因此我們編寫了一個自定義函數來確保標簽位于圖像中。

def augmentData(fname, boxes):
image = cv2.imread(final_root_dir+img_dir+fname)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

transform = A.Compose([
A.IAAPerspective(p=0.7),
A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=5, p=0.5),
A.IAAAdditiveGaussianNoise(),
A.ChannelShuffle(),
A.RandomBrightnessContrast(),
A.RGBShift(p=0.8),
A.HueSaturationValue(p=0.8)
], bbox_params=A.BboxParams(format='pascal_voc', min_visibility=0.2))

for i in range(1, 9):
label=""
transformed = transform(image=image, bboxes=boxes)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
#print(transformed_bboxes)
flag=False
for box in transformed_bboxes:
x_min, y_min, x_max, y_max, class_name = box
if(xmax<=xmin or ymax<=ymin):
flag=True
break
label+= str(int(x_min))+' '+str(int(y_min))+' '+str(int(x_max))+' '+str(int(y_max))+' '+class_name+' '

if(flag):
continue
cv2.imwrite(final_root_dir+img_dir+str(i)+fname, transformed_image)
label=label[:-1]
combined.append([str(i) + fname, label])

計算數據集的均值和標準差 (Calculating Mean and Standard deviation of the dataset)

The mxrcnn pipeline (used for Faster-RCNN) also requires mean and standard deviation as one of the parameters. It can be calculated using the following function:

mxrcnn管道(用于Faster-RCNN)也需要平均值和標準偏差作為參數之一。可以使用以下函數進行計算：

def normalize():
channel_sum = np.zeros(3)
channel_sum_squared = np.zeros(3)
num_pixels=0
count=0
for file in files:
file_path=final_root_dir+img_dir+file
img=cv2.imread(file_path)
img= img/255.
num_pixels += (img.size/3)
channel_sum += np.sum(img, axis=(0, 1))
channel_sum_squared += np.sum(np.square(img), axis=(0, 1)) mean = channel_sum / num_pixels
std = np.sqrt((channel_sum_squared/num_pixels) - mean**2)

#bgr to rgb conversion
rgb_mean = list(mean)[::-1]
rgb_std = list(std)[::-1]
return rgb_mean, rgb_stdmean, std = normalize()
mean=[x*255 for x in mean]

訓練自己的模型 (Train Your Own Model)

This is where the real power of Monk Library kicks in. Writing code for Object detection architectures can be a very tedious task, but it can be achieved in very few lines of code using Monk Object Detection Library.

這就是Monk庫真正的功能所在。為對象檢測體系結構編寫代碼可能是一項非常繁瑣的任務，但是使用Monk對象檢測庫只需幾行代碼即可實現。

For the comparison purposes, all 3 architectures have been trained for 30 epochs with a learning rate of 0.003.

為了進行比較，所有3種架構都經過了30個時期的培訓，學習率為0.003。

For YOLOv3:

對于YOLOv3：

import osimport sys
sys.path.append("Monk_Object_Detection/7_yolov3/lib")from train_detector import Detector
gtf = Detector()#dataset directories
img_dir = "Document_Layout_Analysis/Images/"
label_dir = "Document_Layout_Analysis/labels/"
class_list_file = "Document_Layout_Analysis/classes.txt"gtf.set_train_dataset(img_dir, label_dir, class_list_file, batch_size=16)
gtf.set_val_dataset(img_dir, label_dir)
gtf.set_model(model_name="yolov3")#sgd is found out to perform better than adam optimiser on this task
gtf.set_hyperparams(optimizer="sgd", lr=0.003, multi_scale=False, evolve=False)gtf.Train(num_epochs=30)

For Faster-RCNN:

對于Faster-RCNN：

import osimport sys
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/")
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")from train_base import *# Dataset params
root_dir = "./";
coco_dir = "Document_Layout_Analysis"
img_dir = "Images"set_dataset_params(root_dir=root_dir, coco_dir=coco_dir, imageset=img_dir);
set_model_params(model_name="vgg16")
set_hyper_params(gpus="0", lr=0.003, lr_decay_epoch='20', epochs=30, batch_size=8)
set_output_params(log_interval=500, save_prefix="model_vgg16")#Preprocessing image parameters(mean and std calculated during data pre-processing)
set_img_preproc_params(img_short_side=300, img_long_side=500, mean=(196.45086004329943, 199.09071480252155, 197.07683846968297), std=(0.25779948968052024, 0.2550292865960972, 0.2553027154941914))initialize_rpn_params();
initialize_rcnn_params();#Removing cache if anyif os.path.isdir("./cache/"):
os.system("rm -r ./cache/")roidb = set_dataset()
sym = set_network()
train(sym, roidb)

For SSD512:

對于SSD512：

import os
import sys
sys.path.append("Monk_Object_Detection/1_gluoncv_finetune/lib/");from detector_prototype import Detector
gtf = Detector()root = "Document_Layout_Analysis/"
img_dir = "Images/"
anno_file = "train_labels.csv"
batch_size=8gtf.Dataset(root, img_dir, anno_file, batch_size=batch_size)#vgg16 architecture, with atrous convolutions, pretrained on COCO dataset is used for this task
pretrained = True
gpu=True
model_name = "ssd_512_vgg16_atrous_coco"gtf.Model(model_name, use_pretrained=pretrained, use_gpu=gpu)
gtf.Set_Learning_Rate(0.003)
epochs=30
params_file = "saved_model.params"
gtf.Train(epochs, params_file)

These models were trained on 16GB of NVIDIA Tesla V100. YOLOv3 took the least amount of time in training- 6–7 hrs, SSD512 took around 11 hrs, and Faster-RCNN took the most amount of time- 24+ hrs.

這些模型在16GB的NVIDIA Tesla V100上進行了培訓。在訓練中，YOLOv3花費的時間最少(6-7小時)，SSD512花費的時間約為11小時，而Faster-RCNN花費的時間最多(24小時以上)。

4.推論與比較 (4. Inference and Comparison)

The inference code is almost the same as the one used when directly using the pre-trained model. You can check them out in the notebooks here.

推論代碼與直接使用預訓練模型時使用的推論代碼幾乎相同。您可以在這里的筆記本中查看它們。

Following results were obtained on test images after training the model from scratch:

從頭開始訓練模型后，在測試圖像上獲得以下結果：

Results Obtained from YOLOv3:

從YOLOv3獲得的結果：

The outputs produced by YOLOv3 were very accurate. It’s the only model that was able to identify drop-capital among the 3 architectures. Though the confidence in the predictions is low compared to other models, their classification is most accurate among all three.

YOLOv3產生的輸出非常準確。它是唯一能夠識別這三種架構中的首字母大寫的模型。盡管與其他模型相比，對預測的信心較低，但在這三個模型中，它們的分類最為準確。

Inference on Test Images from YOLOv3 Architecture從YOLOv3體系結構推斷測試圖像

Results Obtained from Faster-RCNN:

從Faster-RCNN獲得的結果：

Faster-RCNN detected bounding boxes with very high confidence, but it missed some of the important regions, such as footer in the 1st example, heading in the 2nd example, and drop capital in the 3rd. If we decrease the threshold confidence for getting the missing boxes, it produces a lot of random boxes with no clarity of what it represents.

Faster-RCNN以很高的置信度檢測到邊界框，但它錯過了一些重要區域，例如第一個示例中的頁腳，第二個示例中的標題和第三個示例中的首字母大寫。如果我們降低獲取缺失框的閾值置信度，則會產生很多隨機框，但不清楚其代表的含義。

Results Obtained from SSD512:

從SSD512獲得的結果：

SSD512 produces outputs with very high confidence, a lot of them being 0.9+. It was also the only model that was able to identify footer and noises like division lines in the document. But it was also producing repetitive or incorrect headings such as ‘floating’ in the 2nd example (extra box with incorrect label), and graphics and paragraph in the third (2 boxes with different labels for the same region).

SSD512產生的輸出具有非常高的置信度，其中很多都是0.9+。它也是唯一能夠識別文檔中的頁腳和噪聲(例如分隔線)的模型。但是它還會產生重復的標題或不正確的標題，例如第二個示例中的“ floating”(帶有錯誤標簽的額外框)，以及第三個示例中的圖形和段落(同一區域中兩個帶有不同標簽的框)。

Inference on Test Images from SSD512 Architecture從SSD512架構推斷測試映像

Following inferences can be made from this tutorial on the basis of their output:

可以根據本教程的輸出得出以下推論：

Monk library makes it very easy for students, researchers and competitors to create deep learning models and try different hyper-parameter tuning to increase the accuracy of the model in very few lines of code.

通過Monk庫，學生，研究人員和競爭對手可以輕松創建深度學習模型，并嘗試不同的超參數調整，從而以很少的幾行代碼提高模型的準確性。

Faster-RCNN gave the worst performance on this task, whereas SSD512 and YOLOv3 gave comparable results.

Faster-RCNN在此任務上的性能最差，而SSD512和YOLOv3的結果可比。

If you want to use a model which shouldn’t take much time to train and missing minute details like footers or separators won’t affect your work, go for YOLOv3.

如果您想使用不需要花費太多時間進行訓練的模型，并且缺少諸如頁腳或分隔符之類的詳細信息也不會影響您的工作，請使用YOLOv3。

If these small details are crucial for your work and the focus is more on bounding box prediction than classification, go for SSD512. It should also be considered that gluoncv-finetune pipeline of Monk AI (which has been used for SSD512) also provides architectures that are pre-trained on various other datasets, such as COCO dataset.

如果這些小細節對您的工作至關重要，并且重點放在邊界框預測而非分類上，請使用SSD512。還應該考慮的是，Monk AI的gluoncv-finetune管道(已用于SSD512)還提供了在各種其他數據集(例如COCO數據集)上經過預訓練的體系結構。

翻譯自: https://medium.com/@swapnil.ahlawat/object-detection-document-layout-analysis-using-monk-object-detection-toolkit-6c57200bde5

monk js

總結

以上是生活随笔為你收集整理的monk js_对象检测-使用Monk AI进行文档布局分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：破解超星泛雅网课实现自动播放下一节
下一篇： WinDbg常用命令系列---单步执行p