當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

opencv 检测几何图形_使用OpenCV + ConvNets检测几何形状

發布時間：2023/12/15 编程问答 46 豆豆

生活随笔收集整理的這篇文章主要介紹了 opencv 检测几何图形_使用OpenCV + ConvNets检测几何形状小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

opencv 檢測幾何圖形

A simple yet powerful pipeline for detecting shapes in scanned documents

一個簡單而強大的管道，用于檢測掃描文檔中的形狀

這是什么意思？ (What is this about ?)

One of the most rapidly growing sub fields in the domain of Artificial Intelligence is Natural language processing (NLP), it deals with the interactions between computers and human (natural) languages, in particular how to program computers to process and make sense of large amounts of natural language data.

在人工智能領域增長最Swift的子場?NE是自然語言處理(NLP)，它與計算機和人類(自然)語言之間的相互作用涉及，特別是如何計劃的計算機處理和大有意義自然語言數據量。

Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation among others. Out of these, information extraction problems such as NER (Named Entity Recognition) are fast becoming one of the cornerstone applications of NLP. In this post, I am going to share a solution for one of the trickiest problems that comes up while performing NER.

自然語言處理中的挑戰經常涉及語音識別，自然語言理解和自然語言生成等。其中，諸如NER(命名實體識別)之類的信息提取問題正Swift成為NLP的基礎應用之一。在本文中，我將分享執行NER時遇到的最棘手問題之一的解決方案。

為什么我們需要定制解決方案？ (Why do we need a custom solution ?)

Photo by Rock'n Roll Monkey on Unsplash Rock'n Roll Monkey在Unsplash上的照片

Recent developments in Deep Learning has led to an explosion of sophisticated techniques that are available for entity extraction and other NLP related tasks. More often than not, enterprise grade OCR softwares (ABBY, ADLIB etc.) are used to transform massive volumes of unstructured and image-based documents into fully searchable PDF and PDF/A assets. Subsequently, one can use state of the art algorithms (BERT, ELMo etc.) to create highly contextual language models to infer the extracted information and achieve NLP objective.

深度學習的最新發展導致了可用于實體提取和其他NLP相關任務的復雜技術的爆炸式增長。企業級OCR軟件(ABBY，ADLIB等)通常用于將大量非結構化和基于圖像的文檔轉換為可完全搜索的PDF和PDF / A資產。隨后，人們可以使用最先進的算法(BERT，ELMo等)來創建高度上下文相關的語言模型，以推斷提取的信息并實現NLP目標。

In reality though, not all documents are comprised solely of language based data. A document can have lot of other non-linguistic elements such as radio buttons or a signature block or some other geometrical shape that may contain useful information but cannot be easily interpreted by either OCR or any of the aforementioned algorithms. So, there exists a need to design a specialized solution to identify and interpret such elements and that’s our Why.

但是實際上，并非所有文檔都僅包含基于語言的數據。文檔可以具有許多其他非語言元素，例如單選按鈕或簽名塊或某些其他幾何形狀 ，這些元素可能包含有用的信息，但是無論OCR還是任何上述算法都不能輕易解釋。因此，需要設計一種專門的解決方案來識別和解釋這些元素，這就是我們的原因。

An example of check boxes and radio buttons in a document文檔中復選框和單選按鈕的示例

我們該怎么做呢？ (How do we do it ?)

Now, this where the things get interesting. How do we perform extraction and identification of such elements from a scanned document ? To answer this, the author proposes a 3 step architecture that can be potentially used to detect any shape (a universal shape detector ? maybe). It’s a pretty straightforward approach and the one that promises a good accuracy.

現在，事情變得有趣起來了。我們如何從掃描的文檔中提取和識別此類元素？為了回答這個問題，作者提出了一個三步體系結構，可以潛在地用于檢測任何形狀(通用形狀檢測器？可能) 。這是一種非常簡單的方法，可以保證較高的準確性。

Step 1: Convert the documents (pdfs etc.) to image files. Write a heuristics code based on OpenCV APIs to extract all potential image segments. This code should be optimized for coverage rather than accuracy.

步驟1：將文檔(pdf等)轉換為圖像文件。編寫基于OpenCV API的啟發式代碼以提取所有可能的圖像段。該代碼應針對覆蓋率而不是準確性進行優化。

Step 2: Label the images extracted in Step 1 accordingly. Create a CNN based Deep Learning network, and train it on the labelled images. This step will take care of the accuracy.

步驟2：相應地標記在步驟1中提取的圖像。創建一個基于CNN的深度學習網絡，并在標記的圖像上對其進行訓練。此步驟將確保準確性。

Step 3: Create a Sklearn pipeline, integrating both the above steps , so when a documents is ingested, extract all of the potential images and then subsequently use the trained CNN model to predict images of the desired shape.

步驟3：創建一個Sklearn流水線，將以上兩個步驟集成在一起，因此，在提取文檔時，提取所有可能的圖像，然后使用經過訓練的CNN模型來預測所需形狀的圖像。

A high level overview of the solution解決方案的高級概述

設計注意事項 (Design Considerations)

Its important that the OpenCV code is able to identify as many image segments of the desired shape as possible. Essentially, we need to have a wide detection range, and don’t worry about the false positives, they will be taken care by the subsequent ConvNet model. We also need to parameterize the classes/functions up to the brim, this will enable easy configuration for a variety of documents going forward. I have chosen CNN for image classification because its easy and quick to model but one can use any other algorithm of choice as long as performance and accuracy are within acceptable limits.

重要的是，OpenCV代碼能夠識別所需形狀的盡可能多的圖像段。本質上，我們需要具有廣泛的檢測范圍，并且不必擔心誤報，后續的ConvNet模型將對它們進行處理。我們還需要對類/函數進行參數化設置，直到最高級為止，這將使以后的各種文檔的配置變得容易。我之所以選擇CNN進行圖像分類，是因為其易于建模且可以快速建模，但是只要性能和準確性在可接受的范圍內，就可以使用其他任何選擇的算法。

Pipelining plays a pivotal role in structuring ML code. It helps in streamlining the workflow and enforcing the order of step execution. Moreover, a production level code should always be piped.

流水線在構建ML代碼中起著關鍵作用。它有助于簡化工作流程并加強步驟執行的順序。此外，應該始終通過管道傳送生產級別代碼。

讓我們采取3個步驟 (Lets take 3 steps)

Step #1: The OpenCV

步驟＃1：OpenCV

This code serves dual purpose, 1) creating training/test data (when executed standalone) and 2) extracting image segments when integrated in the pipeline.

該代碼具有雙重目的：1)創建訓練/測試數據(當獨立執行時)和2)集成到管道中時提取圖像段。

The extraction code can currently detect 2 types (Radio Button and Check-boxes) but additional objects can be easily supported by adding the new methods under the ShapeFinder class, below is the code snippet to identify squares/rectangles aka check-boxes. (go here to see the complete code base)

提取代碼目前可以檢測2種類型(單選按鈕和復選框)，但是可以通過在ShapeFinder類下添加新方法來輕松支持其他對象，以下代碼段用于標識正方形/矩形或復選框。 (去這里查看完整的代碼庫)

*Use pdf2image to convert the pdf to image. I have not included this in Git since my data was already in image format.

*使用pdf2image將pdf轉換為圖像。由于我的數據已經是圖像格式，因此我沒有將其包含在Git中。

def Img2Pdf(dirname):

images = []

#get the pdf file
for x in os.listdir(dirname):
if (dirname.split('.')[1]) == 'pdf':
pdf_filename = x
images_from_path = convert_from_path(os.path.join(dirname),dpi=300, poppler_path = r'C:\Program Files (x86)\poppler-0.68.0_x86\poppler-0.68.0\bin')for image in images_from_path:
images.append(np.array(image))

return images

Now lets talk about the step #2 i.e. Convolutional Neural Network

現在讓我們談談第二步，即卷積神經網絡

Since the extracted image segments will have relatively small dimensions, a simple 3 layer CNN will do for us but we still need to throw in some regularization and an Adam to optimize the output.

由于提取的圖像片段將具有相對較小的尺寸，因此簡單的3層CNN可以為我們完成工作，但我們仍然需要進行一些正則化和Adam以優化輸出。

The network should be trained separately on each type of image samples for better accuracy. You may create a new network in case a new image shape is added, but for now I have used the same for both checkbox and radio button. Its currently only a binary classification but further categorization can also be done like:

該網絡應分別針對每種類型的圖像樣本進行培訓，以提高準確性。萬一添加了新的圖像形狀，您可以創建一個新的網絡，但是到目前為止，我已經將其用于復選框和單選按鈕。它目前僅是一個二進制分類，但是還可以像下面這樣進行進一步分類：

Ticked checkbox
勾選復選框
Empty checkbox
空復選框
Others
其他

Finally in step #3 we will be stitching all the things in a single Sklearn pipeline and expose this through the predict function.

最后，在第3步中，我們將所有內容縫合在一個Sklearn管道中，并通過預測函數將其公開。

One important functionality that I have not covered is to associate the checkbox or radio button to their corresponding texts in the document. Just detecting elements without association is frankly useless in real world applications. I would leave this as an open challenge to you guys but think of it as a text proximity problem.

我沒有涉及的一項重要功能是將復選框或單選按鈕與其在文檔中對應的文本相關聯。坦白說，僅在沒有關聯的情況下檢測元素在現實世界的應用中是無用的。我將這留給你們一個開放的挑戰，但是將其視為文本接近問題。

最后的想法 (Final thoughts)

‘One size doesn’t always fits all’ and this is specially true here, tend to think of this code as a kind of template. As-is, this code is not intended to work for everyone and that’s perfectly fine, but this approach will always work for the given documents/shapes provided some effort is put in to fine tune the parameters and create the training data.

“一個大小并不總是適合所有大小”，這在這里尤其正確，傾向于將這段代碼視為一種模板。照原樣，此代碼并不適合每個人使用，這很好，但是只要付出一些努力來微調參數并創建訓練數據，此方法就始終適用于給定的文檔/形狀。

Link to Git

鏈接到Git

Drop in your feedback in the comments !

在評論中加入您的反饋！

翻譯自: https://medium.com/swlh/extraction-of-geometrical-elements-using-opencv-convnets-48fd92168dfe