SorterBot-第1部分
A web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools.
一種基于Web的解決方案,用于控制大量的Raspberry Pi,具有實(shí)時(shí)儀表板,深度學(xué)習(xí)推理引擎,一鍵式Cloud部署和數(shù)據(jù)集標(biāo)簽工具。
This is the first article of the three-part SorterBot series.
這是分三部分的SorterBot系列的第一篇文章。
- Part 1 — General project description and the Web Application 第1部分-常規(guī)項(xiàng)目描述和Web應(yīng)用程序
Part 2 — Controlling the Robotic Arm
第2部分-控制機(jī)械臂
- Part 3 — Transfer Learning and Cloud Deployment (coming soon) 第3部分-轉(zhuǎn)移學(xué)習(xí)和云部署(即將推出)
Source code on GitHub:
GitHub上的源代碼:
Control Panel: Django backend and React frontend, running on EC2
控制面板 :在EC2上運(yùn)行的Django后端和React前端
Inference Engine: Object Recognition with PyTorch, running on ECS
推理引擎 :使用PyTorch進(jìn)行對(duì)象識(shí)別,在ECS上運(yùn)行
Raspberry: Python script to control the Robotic Arm
Raspberry :控制機(jī)器人手臂的Python腳本
Installer: AWS CDK, GitHub Actions and a bash script to deploy the solution
安裝程序 :AWS CDK,GitHub Actions和bash腳本以部署解決方案
LabelTools: Dataset labeling tools with Python and OpenCV
LabelTools :使用Python和OpenCV數(shù)據(jù)集標(biāo)簽工具
I recently completed an AI mentorship program at SharpestMinds, of which the central element was to build a project, or even better, a complete product. I choose the latter, and in this article, I write about what I built, how I built it, and what I learned along the way. Before we get started, I would like to send a special thanks to my mentor, Tomas Babej (CTO@ProteinQure) for his invaluable help during this journey.
我最近在SharpestMinds完成了AI指導(dǎo)計(jì)劃,其核心要素是建立一個(gè)項(xiàng)目,或者甚至更好的一個(gè)完整的產(chǎn)品。 我選擇后者,在本文中,我將介紹我的構(gòu)建,構(gòu)建方式以及在此過(guò)程中學(xué)到的知識(shí)。 在開(kāi)始之前,我要特別感謝我的導(dǎo)師Tomas Babej(CTO @ ProteinQure)在此過(guò)程中提供的寶貴幫助。
When thinking about what to build, I came up with an idea of a web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools. The Raspberry Pis can have any sensors and actuators attached to them. They collect data, send it to the inference engine, which processes it and turns it into commands that the actuators can execute. A control panel is also included to manage and monitor the system, while the subsystems communicate with each other using either WebSockets or REST API calls.
在考慮要構(gòu)建什么時(shí),我想到了一個(gè)基于Web的解決方案,用于控制大量的Raspberry Pis,該功能具有實(shí)時(shí)儀表板,深度學(xué)習(xí)推理引擎,一鍵式Cloud部署和數(shù)據(jù)集標(biāo)簽工具。 Raspberry Pis可以連接任何傳感器和執(zhí)行器。 他們收集數(shù)據(jù),將其發(fā)送到推理機(jī),由推理機(jī)對(duì)其進(jìn)行處理并將其轉(zhuǎn)換為執(zhí)行器可以執(zhí)行的命令。 還包括一個(gè)控制面板,用于管理和監(jiān)視系統(tǒng),而子系統(tǒng)之間可以使用WebSocket或REST API調(diào)用相互通信。
As an implementation of the above general idea, I built SorterBot, where the sensor is a camera, and the actuators are a robotic arm and an electromagnet. This solution is able to automatically sort metal objects based on how they look. When the user starts a session, the arm scans the area in front of it, locates the objects and containers within its reach, then automatically divides the objects into as many groups as many containers were found. Finally, it moves the objects to their corresponding containers.
為了實(shí)現(xiàn)上述總體思想,我構(gòu)建了SorterBot,其中的傳感器是攝像頭,執(zhí)行器是機(jī)械臂和電磁體。 該解決方案能夠根據(jù)外觀自動(dòng)對(duì)金屬對(duì)象進(jìn)行排序。 當(dāng)用戶開(kāi)始會(huì)話時(shí),手臂會(huì)掃描其前面的區(qū)域,找到其范圍內(nèi)的對(duì)象和容器,然后自動(dòng)將對(duì)象劃分為與找到的容器一樣多的組。 最后,它將對(duì)象移動(dòng)到其相應(yīng)的容器。
SorterBot automatically picks up objectsSorterBot自動(dòng)拾取對(duì)象To process the images taken by the arm’s camera, I built an inference engine based on Facebook AI’s Detectron2 framework. When a picture arrives for processing, it localizes the items and containers on that image, then saves the bounding boxes to the database. After the last picture in a given session is processed, the items are clustered into as many groups as many containers were found. Finally, the inference engine generates commands, which are instructing the arm to move the similar-looking items into the same container.
為了處理手臂相機(jī)拍攝的圖像,我構(gòu)建了基于Facebook AI的Detectron2框架的推理引擎。 圖片到達(dá)進(jìn)行處理時(shí),它將在該圖像上定位項(xiàng)目和容器,然后將邊界框保存到數(shù)據(jù)庫(kù)中。 在處理給定會(huì)話中的最后一張圖片之后,將項(xiàng)目聚類(lèi)到與找到的容器一樣多的組中。 最后,推理引擎生成命令,這些命令指示手臂將外觀相似的項(xiàng)目移動(dòng)到同一容器中。
To make it easier to control and monitor the system, I built a control panel, using React for the front-end and Django for the back-end. The front end shows a list of registered arms, allows the user to start a session, and also shows existing sessions with their statuses. Under each session, the user can access the logically grouped logs, as well as before and after overview images of the working area. To avoid paying for AWS resources unnecessarily, the user also has the option to start and stop the ECS cluster where the inference engine runs, using a button in the header.
為了簡(jiǎn)化控制和監(jiān)視系統(tǒng),我構(gòu)建了一個(gè)控制面板,前端使用React,后端使用Django。 前端顯示已注冊(cè)武器的列表,允許用戶啟動(dòng)會(huì)話,還顯示帶有其狀態(tài)的現(xiàn)有會(huì)話。 在每個(gè)會(huì)話下,用戶都可以訪問(wèn)按邏輯分組的日志,以及工作區(qū)概覽圖像的前后。 為了避免不必要地支付AWS資源,用戶還可以使用標(biāo)題中的按鈕來(lái)選擇啟動(dòng)和停止運(yùn)行推理引擎的ECS集群。
User Interface of the Control Panel控制面板的用戶界面To make it easier for the user to see what the arm is doing, I used OpenCV to stitch together the pictures that the camera took during the session. Additionally, another set of pictures are taken after the arm moved the objects to the containers, so the user can see a before/after overview of the area and verify that the arm actually moved the objects to the containers.
為了使用戶更容易看到手臂在做什么,我使用OpenCV將攝像機(jī)在會(huì)話期間拍攝的照片拼接在一起。 另外,在手臂將對(duì)象移至容器后,還拍攝了另一組照片,因此用戶可以查看該區(qū)域的前后視圖,并驗(yàn)證手臂是否確實(shí)將對(duì)象移至容器。
Overview image made of the session images stitched together由縫合在一起的會(huì)話圖像組成的概覽圖像The backend communicates with the Raspberry Pis via WebSockets and REST calls, handles the database and controls the inference engine. To enable real-time updates from the backend as they happen, the front-end also communicates with the back-end via WebSockets.
后端通過(guò)WebSocket和REST調(diào)用與Raspberry Pi進(jìn)行通信,處理數(shù)據(jù)庫(kù)并控制推理引擎。 為了在后端進(jìn)行實(shí)時(shí)更新時(shí),前端還通過(guò)WebSockets與后端進(jìn)行通信。
Since the solution consists of many different AWS resources and it is very tedious to manually provision them, I automated the deployment process utilizing AWS CDK and a lengthy bash script. To deploy the solution, 6 environment variables have to be set, and a single bash script has to be run. After the process finishes (which takes around 30 minutes), the user can log in to the control panel from any web browser and start using the solution.
由于該解決方案由許多不同的AWS資源組成,并且手動(dòng)配置它們非常繁瑣,因此我利用AWS CDK和冗長(zhǎng)的bash腳本自動(dòng)化了部署過(guò)程。 要部署該解決方案,必須設(shè)置6個(gè)環(huán)境變量,并且必須運(yùn)行一個(gè)bash腳本。 該過(guò)程完成后(大約需要30分鐘),用戶可以從任何Web瀏覽器登錄到控制面板并開(kāi)始使用該解決方案。
Web應(yīng)用程序 (The Web Application)
Conceptually the communication protocol has two parts. The first part is a repeated heartbeat sequence that the arm runs at regular intervals to check if everything is ready for a session to be started. The second part is the session sequence, responsible for coordinating the execution of the whole session across subsystems.
從概念上講,通信協(xié)議分為兩個(gè)部分。 第一部分是重復(fù)的心跳序列,手臂以固定的間隔運(yùn)行,以檢查是否已準(zhǔn)備好開(kāi)始會(huì)話。 第二部分是會(huì)話序列,負(fù)責(zé)協(xié)調(diào)跨子系統(tǒng)的整個(gè)會(huì)話的執(zhí)行。
Diagram illustrating how the different parts of the solution communicate with each other該圖說(shuō)明了解決方案的不同部分如何相互通信心跳序列 (Heartbeat Sequence)
The point where the execution of the first part starts is marked with a green rectangle. As the first step, the Raspberry Pi pings the WebSocket connection to the inference engine. If the connection is healthy, it skips over to the next part. If the inference engine appears to be offline, it requests its IP address from the control panel. After the control panel returns the IP (or ‘false’ if the inference engine is actually offline), it tries to establish a connection with the new address. This behavior enables the inference engine to be turned off when it’s not in use, which lowers costs significantly. It also simplifies setting up the arms, which is especially important when multiple arms are used.
第一部分開(kāi)始執(zhí)行的點(diǎn)用綠色矩形標(biāo)記。 第一步,Raspberry Pi將WebSocket連接ping到推理引擎。 如果連接正常,則跳至下一部分。 如果推理引擎似乎處于脫機(jī)狀態(tài),則它會(huì)從控制面板中請(qǐng)求其IP地址。 在控制面板返回IP(如果推理引擎實(shí)際上處于脫機(jī)狀態(tài),則為“ false”)之后,它將嘗試與新地址建立連接。 此行為使推理引擎在不使用時(shí)可以關(guān)閉,從而大大降低了成本。 它還簡(jiǎn)化了臂的設(shè)置,這在使用多個(gè)臂時(shí)尤其重要。
Regardless if the connection with the new IP succeeds or not, the result gets reported to the control panel alongside the arm’s ID. When the control panel receives the connection status, it first checks if the arm ID is already registered in the database, and registers it if needed. After that, the connection status is pushed to the UI, where a status LED lights up in green or orange, representing whether the connection succeeded or not, respectively.
無(wú)論與新IP的連接成功與否,結(jié)果都會(huì)與機(jī)械臂ID一起報(bào)告給控制面板。 當(dāng)控制面板收到連接狀態(tài)時(shí),它首先檢查臂ID是否已在數(shù)據(jù)庫(kù)中注冊(cè),并在需要時(shí)進(jìn)行注冊(cè)。 之后,將連接狀態(tài)推送到UI,其中狀態(tài)LED呈綠色或橙色點(diǎn)亮,分別表示連接是否成功。
An arm as it appears on the UI, with the start button and status light出現(xiàn)在UI上的手臂,帶有開(kāi)始按鈕和狀態(tài)指示燈On the UI, next to the status LED, there is a ‘play’ button. When the user clicks this button, the arm’s ID is added to a list in the database that contains the IDs of the arms that should start a session. When an arm checks in with the connection status, and that status is green, it checks if its ID is in that list. If it is, the ID gets removed and a response is sent back to the arm to start a session. If it isn’t, a response is sent back to restart the heartbeat sequence without starting a session.
在UI上,狀態(tài)LED旁邊有一個(gè)“播放”按鈕。 當(dāng)用戶單擊此按鈕時(shí),機(jī)械臂的ID將添加到數(shù)據(jù)庫(kù)中的列表中,該列表包含應(yīng)啟動(dòng)會(huì)話的機(jī)械臂的ID。 當(dāng)機(jī)械臂以連接狀態(tài)簽入且該狀態(tài)為綠色時(shí),它將檢查其ID是否在該列表中。 如果是,則刪除該ID,并將響應(yīng)發(fā)送回該分支以開(kāi)始會(huì)話。 如果不是,則發(fā)送響應(yīng)以重新啟動(dòng)心跳序列,而無(wú)需啟動(dòng)會(huì)話。
會(huì)話順序 (Session Sequence)
The first task of the arm is to take pictures for inference. To do that, the arm moves to inference position then starts to rotate at its base. It stops at certain intervals, then the camera takes a picture, which is directly sent to the inference engine as bytes, using the WebSocket connection.
手臂的首要任務(wù)是拍照以作推斷。 為此,手臂移至推斷位置,然后在其底部開(kāi)始旋轉(zhuǎn)。 它以一定的間隔停止,然后相機(jī)拍攝照片,并使用WebSocket連接將其作為字節(jié)直接發(fā)送到推理引擎。
High-level diagram of the Inference Engine推理引擎的高級(jí)圖When the image data is received from the Raspberry Pi, the image processing begins. First, the image is decoded from bytes, then the resulting NumPy array is used as the input of the Detectron2 object recognizer. The model outputs bounding box coordinates of the recognized objects alongside their classes. The coordinates are relative distances from the top-left corner of the image measured in pixels. Only binary classification is done here, meaning an object can be either an item or a container. Further clustering of items is done in a later step. At the end of the processing, the results are saved to the PostgreSQL database, then the images are written to disk to be used later by the vectorizer, and archived to S3 for later reference. Saving and uploading the image is not in the critical path, so they are executed in a separate thread. This lowers execution time as the sequence can continue before the upload finishes.
從Raspberry Pi收到圖像數(shù)據(jù)后,圖像處理開(kāi)始。 首先,從字節(jié)解碼圖像,然后將所得的NumPy數(shù)組用作Detectron2對(duì)象識(shí)別器的輸入。 該模型將已識(shí)別對(duì)象的邊界框坐標(biāo)與它們的類(lèi)一起輸出。 坐標(biāo)是距圖像左上角的相對(duì)距離(以像素為單位)。 此處僅進(jìn)行二進(jìn)制分類(lèi),這意味著對(duì)象可以是項(xiàng)目或容器。 項(xiàng)的進(jìn)一步聚類(lèi)在后續(xù)步驟中完成。 在處理結(jié)束時(shí),將結(jié)果保存到PostgreSQL數(shù)據(jù)庫(kù)中,然后將圖像寫(xiě)入磁盤(pán)以供矢量化器稍后使用,并存檔到S3中以供以后參考。 保存和上傳圖像不是關(guān)鍵路徑,因此它們?cè)趩为?dú)的線程中執(zhí)行。 由于序列可以在上傳完成之前繼續(xù)進(jìn)行,因此減少了執(zhí)行時(shí)間。
When evaluating models in Detectron2’s model zoo, I choose Faster R-CNN R-50 FPN, as it provides the lowest inference time (43 ms), lowest training time (0.261 s/iteration), and lowest training memory consumption (3.4 GB), without giving up too much accuracy (41.0 box AP, which is 92.5% of the best network’s box AP), compared to other available architectures.
在Detectron2的模型動(dòng)物園中評(píng)估模型時(shí),我選擇Faster R-CNN R-50 FPN,因?yàn)樗峁┝俗畹偷耐评頃r(shí)間(43 ms),最低的訓(xùn)練時(shí)間(0.261 s /迭代)和最低的訓(xùn)練內(nèi)存消耗(3.4 GB)與其他可用架構(gòu)相比,不會(huì)犧牲太多的準(zhǔn)確性(41.0盒式AP,是最佳網(wǎng)絡(luò)盒式AP的92.5%)。
High-level diagram of the Vectorizer矢量化器的高級(jí)圖After all of the session images have been processed and the signal to generate session commands arrived, stitching together these pictures starts on a separate process, providing a ‘before’ overview for the user. Parallel to this, all the image processing results belonging to the current session are loaded from the database. First, the coordinates are converted to absolute polar coordinates using an arm-specific constant sent with the request. The constant, r represents the distance between the center of the image and the arm’s base axis. The relative coordinates (x and y on the drawing below) are pixel distances from the top-left corner of the image. The angle where the image was taken is denoted with γ. Δγ represents the difference between the angle of the given item and the image’s center and can be calculated using equation 1) on the drawing below. The first absolute polar coordinate of the item (angle, γ’), can be simply calculated using this equation: γ’ = γ + Δγ. The second coordinate (radius, r’), can be calculated using equation 2) on the drawing.
在處理完所有會(huì)話圖像并到達(dá)生成會(huì)話命令的信號(hào)之后,將這些圖片拼接在一起是在單獨(dú)的過(guò)程中開(kāi)始的,為用戶提供了“之前”的概覽。 與此并行,屬于當(dāng)前會(huì)話的所有圖像處理結(jié)果都從數(shù)據(jù)庫(kù)中加載。 首先,使用隨請(qǐng)求發(fā)送的特定于手臂的常數(shù)將坐標(biāo)轉(zhuǎn)換為絕對(duì)極坐標(biāo)。 常數(shù)r表示圖像中心與手臂的基本軸之間的距離。 相對(duì)坐標(biāo)(下圖上的x和y )是距圖像左上角的像素距離。 拍攝圖像的角度用γ表示。 Δγ表示給定項(xiàng)目的角度與圖像中心之間的差,可以使用下圖中的公式1)計(jì)算得出。 可以使用以下等式簡(jiǎn)單地計(jì)算出項(xiàng)的第一絕對(duì)極坐標(biāo)(角度, γ' ): γ'=γ+Δγ 。 可以使用圖形上的公式2)計(jì)算第二個(gè)坐標(biāo)(半徑, r') 。
Drawing and equations used to convert relative coordinates to absolute polar coordinates用于將相對(duì)坐標(biāo)轉(zhuǎn)換為絕對(duì)極坐標(biāo)的圖形和方程式After the conversion of the coordinates, the bounding boxes belonging to the same physical objects are replaced by their averaged absolute coordinates.
坐標(biāo)轉(zhuǎn)換后,屬于相同物理對(duì)象的邊界框?qū)⒈黄淦骄^對(duì)坐標(biāo)替換。
In the preprocessing step for the vectorizer, the images saved to disk during the previous step are loaded, then cropped around the bounding boxes of each object, resulting in a small picture of every item.
在矢量化程序的預(yù)處理步驟中,加載在上一步中保存到磁盤(pán)的圖像,然后在每個(gè)對(duì)象的邊界框周?chē)M(jìn)行裁剪,從而為每個(gè)項(xiàng)目生成一張小圖片。
Example of an object cropped around its bounding box圍繞其邊界框裁剪的對(duì)象示例These pictures are converted to tensors, then added to a PyTorch dataloader. Once all the images are cropped, the created batch is processed by the vectorizer network. The chosen architecture is a ResNet18 model, which is appropriate for these small-sized images. A PyTorch hook is inserted after the last fully connected layer, so in each inference step the output of that layer, a 512-dimensional feature vector is copied to a tensor outside of the network. After the vectorizer processed all of the images, the resulting tensor is directly used as input of the K-Means clustering algorithm. For the other required input, the number of clusters to be computed, a simple count of the recognized containers is inserted from the database. This step outputs a set of pairings, representing which item goes to which container. Lastly, these pairings are replaced with absolute coordinates that are sent to the robotic arm.
這些圖片被轉(zhuǎn)換為張量,然后添加到PyTorch數(shù)據(jù)加載器中。 裁剪完所有圖像后,矢量化儀網(wǎng)絡(luò)將處理創(chuàng)建的批次。 選擇的體系結(jié)構(gòu)是ResNet18模型,適用于這些小型圖像。 在最后一個(gè)完全連接的層之后插入一個(gè)PyTorch掛鉤,因此在每個(gè)推理步驟中,該層的輸出將512維特征向量復(fù)制到網(wǎng)絡(luò)外部的張量。 在向量化器處理完所有圖像之后,所得張量將直接用作K-Means聚類(lèi)算法的輸入。 對(duì)于其他所需輸入,即要計(jì)算的群集數(shù),將從數(shù)據(jù)庫(kù)中插入一個(gè)簡(jiǎn)單的已識(shí)別容器計(jì)數(shù)。 此步驟輸出一組配對(duì),代表哪個(gè)項(xiàng)目進(jìn)入哪個(gè)容器。 最后,將這些配對(duì)替換為發(fā)送到機(jī)械臂的絕對(duì)坐標(biāo)。
The commands are pairs of coordinates representing items and containers. The arm executes these one by one, moving the objects to the containers using the electromagnet.
命令是代表項(xiàng)目和容器的坐標(biāo)對(duì)。 手臂一步一步地執(zhí)行這些操作,然后使用電磁體將物體移到容器中。
After the objects were moved, the arm takes another set of pictures to be stitched, as an overview of the landscape after the operation. Finally, the arm resets to its initial position and the session is complete.
在移動(dòng)對(duì)象之后,手臂會(huì)拍攝另一組要縫合的照片,作為手術(shù)后的景觀概覽。 最后,手臂復(fù)位到其初始位置,會(huì)話完成。
To be continued in Part 2…
在第二部分中繼續(xù)……
翻譯自: https://medium.com/swlh/web-application-to-control-a-swarm-of-raspberry-pis-with-an-ai-enabled-inference-engine-b3cb4b4c9fd
總結(jié)
以上是生活随笔為你收集整理的SorterBot-第1部分的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: sql 12天内的数据_想要在12周内成
- 下一篇: 马斯克合伙人研制大脑“第七层” 可赋予瘫