检测和语义分割_分割和对象检测-第2部分
檢測和語義分割
有關(guān)深層學(xué)習(xí)的FAU講義 (FAU LECTURE NOTES ON DEEP LEARNING)
These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. Try it yourself! If you spot mistakes, please let us know!
這些是FAU YouTube講座“ 深度學(xué)習(xí) ”的 講義 。 這是演講視頻和匹配幻燈片的完整記錄。 我們希望您喜歡這些視頻。 當(dāng)然,此成績單是使用深度學(xué)習(xí)技術(shù)自動創(chuàng)建的,并且僅進(jìn)行了較小的手動修改。 自己嘗試! 如果發(fā)現(xiàn)錯(cuò)誤,請告訴我們!
導(dǎo)航 (Navigation)
Previous Lecture / Watch this Video / Top Level / Next Lecture
上一個(gè)講座 / 觀看此視頻 / 頂級 / 下一個(gè)講座
U-net for cell segmentation. Image created using gifify. Source: YouTubeU-net用于細(xì)胞分割。 使用gifify創(chuàng)建的圖像 。 資料來源: YouTubeWelcome back to deep learning! So today, we want to talk about the further advanced methods of image segmentation. Let’s look at our slides. You can see here this is part two of this lecture video series on image segmentation and object detection.
歡迎回到深度學(xué)習(xí)! 因此,今天,我們要討論圖像分割的其他高級方法。 讓我們看一下幻燈片。 您可以在這里看到這是本講座視頻系列中有關(guān)圖像分割和目標(biāo)檢測的第二部分。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Now, the key idea that we need to know about is how to integrate the context knowledge. Just using this encoder-decoder structure that we talked about in the last video will not be enough to get a good segmentation. The key concept is that you somehow have to tell your method where what happened in order to get a good segmentation mask. You need to balance local and global information. Of course, this is very important because the local information is crucial to give good pixel accuracy and the global context is important in order to figure out the classes correctly. CNNs typically struggle with this balance. So, we now need some good ideas on how to incorporate this context information.
現(xiàn)在,我們需要了解的關(guān)鍵思想是如何整合上下文知識。 僅使用我們在上一個(gè)視頻中討論過的這種編解碼器結(jié)構(gòu)還不足以實(shí)現(xiàn)良好的分割效果。 關(guān)鍵概念是您必須以某種方式告訴您的方法發(fā)生了什么,才能獲得良好的分割蒙版。 您需要平衡本地和全局信息。 當(dāng)然,這非常重要,因?yàn)楸镜匦畔τ谔峁┝己玫南袼鼐戎陵P(guān)重要,而全局上下文對于正確地確定類別至關(guān)重要。 CNN通常會為此尋求平衡。 因此,我們現(xiàn)在需要一些有關(guān)如何合并此上下文信息的好主意。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Now Long et al. showed one of the first approaches to do so. They essentially using an upsampling that is consisting of learnable transposed convolutions. The key idea was that you want to add links combining the final prediction with the previous lower layers in the finer strides. Additionally, he had 1x1 convolutions after the pooling layer, and then the predictions were added up to make local predictions with a global structure. So the network topology is a directed acyclic graph with skip connections from lower to higher layers. Therefore, you can then refine a coarse segmentation.
現(xiàn)在龍等。 展示了這樣做的第一種方法。 他們本質(zhì)上使用的是由可學(xué)習(xí)的轉(zhuǎn)置卷積組成的升采樣。 關(guān)鍵思想是您要添加將最終預(yù)測與以前的較低層結(jié)合在一起的鏈接,以實(shí)現(xiàn)更精細(xì)的步幅。 此外,他在池化層之后具有1x1卷積,然后將這些預(yù)測相加以形成具有全局結(jié)構(gòu)的局部預(yù)測。 因此,網(wǎng)絡(luò)拓?fù)涫怯邢驘o環(huán)圖,具有從較低層到較高層的跳過連接。 因此,您可以隨后細(xì)化粗略的細(xì)分。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。So, let’s look at this idea in some more detail. You can see now if you have the ground truth here on the bottom right, this has a very high resolution. If you would simply use your CNN and upsample, you would get a very coarse resolution as shown on the left-hand side. So, what is Long et al. proposing? Well, they propose then to use the information from the previous downsampling step which still had higher resolution and use it within the decoder branch using a sum to produce a more highly resolved image. Of course, you can then do this again in the decoder branch. You can see that this way we can upsample the segmentation and reuse the information from the encoder branch in order to produce better highly resolved results. Now, you can introduce those skip connections and they produce much better segmentations than if you were just using the decoder and upsample that information.
因此,讓我們更詳細(xì)地看一下這個(gè)想法。 現(xiàn)在,您可以看到是否在右下角具有基本事實(shí),這具有很高的分辨率。 如果僅使用CNN和上采樣,您將獲得非常粗糙的分辨率,如左側(cè)所示。 那么,Long等人是什么。 提出? 好吧,他們建議然后使用來自先前的下采樣步驟的信息,該信息仍然具有更高的分辨率,并在解碼器分支中使用總和來使用它來生成分辨率更高的圖像。 當(dāng)然,您可以在解碼器分支中再次執(zhí)行此操作。 您可以看到,通過這種方式,我們可以對細(xì)分進(jìn)行上采樣并重新使用來自編碼器分支的信息,以便產(chǎn)生更好的高度解析結(jié)果。 現(xiàn)在,您可以引入那些跳過連接,與僅使用解碼器并對該信息進(jìn)行上采樣相比,它們可以產(chǎn)生更好的分段效果。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。You see integrating context knowledge is key. In SegNet, a different approach was taken here. You also have this encoder-decoder structure that is convolutional. Here, the key idea was that in the upsampling step, you reuse the information from the max-pooling in the downsampling steps such that you get better-resolved decoding. This is already a pretty good idea to integrate the context knowledge.
您會看到整合上下文知識是關(guān)鍵。 在SegNet中,此處采用了另一種方法。 您還具有卷積的編碼器-解碼器結(jié)構(gòu)。 在這里,關(guān)鍵思想是,在上采樣步驟中,您將在下采樣步驟中重用最大池中的信息,以便獲得更好解析的解碼。 集成上下文知識已經(jīng)是一個(gè)很好的主意。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。An even better idea is then demonstrated in U-net. Here, the network consists of the encoder branch which is then a contracting path to capture the context. The decoder branch does symmetric expansion for the localization. So, the encoder follows the typical structure of a CNN. The decoder now consists of the upsampling step and a concatenation of the previous feature maps of the respective layers of the corresponding encoder step. So, then the training strategy relies also on data augmentation. There were non-rigid deformations, rotations, and translations that were used to give the U-net an additional kick of performance.
然后在U-net中展示了一個(gè)更好的主意。 在此,網(wǎng)絡(luò)由編碼器分支組成,該分支隨后是捕獲上下文的收縮路徑。 解碼器分支進(jìn)行對稱擴(kuò)展以進(jìn)行本地化。 因此,編碼器遵循CNN的典型結(jié)構(gòu)。 解碼器現(xiàn)在包括上采樣步驟和相應(yīng)編碼器步驟各個(gè)層的先前特征圖的串聯(lián)。 因此,訓(xùn)練策略也依賴于數(shù)據(jù)擴(kuò)充。 存在非剛性的變形,旋轉(zhuǎn)和平移,這些變形,旋轉(zhuǎn)和平移使U-net有了更多的性能提升。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。You can say that U-net is essentially the state-of-the-art method for image segmentation. This is also the reason why it has this name. It stems from its shape. You can see that you get this U structure because you have a high resolution on the fine levels. Then you downsample to a lower resolution. The decoder branch upsamples everything again. The key information is here the skip connections that are connecting the two respective levels of the decoder and the encoder. This way you can get very, very good image segmentation. It’s quite straightforward to train and this paper has been cited thousands of times (August 11th, 2020: 16471 citations). Every day you can check the citation count and it already increased. Olaf Ronneberger was able to publish a very important paper here and it’s dominating the entire scene of image segmentation.
可以說,U-net本質(zhì)上是用于圖像分割的最新方法。 這也是它具有此名稱的原因。 它源自其形狀。 您會看到您獲得了此U結(jié)構(gòu),因?yàn)槟诰?xì)級別上具有高分辨率。 然后,您將采樣降低到較低的分辨率。 解碼器分支再次對所有內(nèi)容進(jìn)行升采樣。 此處的關(guān)鍵信息是跳過連接,它們連接了解碼器和編碼器的兩個(gè)相應(yīng)級別。 這樣,您可以獲得非常非常好的圖像分割。 培訓(xùn)非常簡單,本文被引用了數(shù)千次 (2020年8月11日:16471次引用)。 每天您都可以檢查引文計(jì)數(shù),并且它已經(jīng)增加了。 Olaf Ronneberger能夠在這里發(fā)表非常重要的論文,它主導(dǎo)著整個(gè)圖像分割領(lǐng)域。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。You can see that there are many additional approaches. They can be implemented with the U-net. So they can use dilated convolutions and many more. There have been many of these very small changes that have been suggested and they may be useful for particular tasks but for general image segmentation, the U-net has been shown to still outperform such approaches. Still, there are things using like dilated convolutions, there are network stacks that can be very beneficial, and there’s also multi-scale networks that then even further go into this idea of using the image at different scales. You can also do things like deferring the context modeling to another network. Then, you can also incorporate recurrent neural networks. Also very nice is the idea to refine the resulting segmentation maps using a conditional random field.
您可以看到還有許多其他方法。 它們可以通過U-net實(shí)施。 因此,他們可以使用膨脹卷積等等。 已經(jīng)提出了許多這些非常小的更改,它們可能對特定任務(wù)很有用,但對于一般的圖像分割, 已顯示出U-net仍然優(yōu)于此類方法 。 仍然有一些東西使用諸如膨脹卷積之類的東西,有一些網(wǎng)絡(luò)堆棧可能會非常有益,而且還有多尺度網(wǎng)絡(luò),它們甚至進(jìn)一步陷入了以不同尺度使用圖像的想法。 您還可以執(zhí)行將上下文建模推遲到另一個(gè)網(wǎng)絡(luò)的操作。 然后,您還可以合并遞歸神經(jīng)網(wǎng)絡(luò)。 使用條件隨機(jī)字段細(xì)化生成的分割圖的想法也非常好。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。We have some of these additional approaches here, such that you can see what we are talking about. The dilated convolutions, here, is the idea that you want to use those atrous convolutions that we already talked about. So, the idea is that you use dilated convolutions to support exponentially expanding the receptive field without losing the resolution. Then, you introduce the dilation rate L that controls the upsampling factor. You then stack this on top such that you make the receptive field grow exponentially while the number of parameters for the filters grows linear. So, in specific applications where you have a broad range of magnifications happening, this can be very useful. So, it really depends on your application.
我們這里有一些其他方法,您可以了解我們在說什么。 這里的膨脹卷積就是您想要使用我們已經(jīng)討論過的那些無用卷積的想法。 因此,我們的想法是使用膨脹卷積來支持以指數(shù)方式擴(kuò)展接收場而不損失分辨率。 然后,介紹控制上采樣因子的膨脹率L。 然后,將其堆疊在頂部,以使接收場呈指數(shù)增長,而過濾器的參數(shù)數(shù)量呈線性增長。 因此,在發(fā)生各種放大倍數(shù)的特定應(yīng)用中,這可能非常有用。 因此,這實(shí)際上取決于您的應(yīng)用程序。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Examples for this are DeepLab, ENet, and the multi-scale context aggregation module in [28]. The main issue, of course, is there’s no efficient implementation available. So, the benefit is somewhat unclear.
例子包括DeepLab,ENet和[28]中的多尺度上下文聚合模塊。 當(dāng)然,主要問題是沒有有效的實(shí)現(xiàn)方法。 因此,收益尚不清楚。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Another approach that I would like to show you here is these so-called stacked hourglass network. So, here, the idea is that you use something very similar to a U-net, but you would put in an additional trainable part in the skip connection. So, that’s essentially the main idea.
我想在這里向您展示的另一種方法是這些所謂的堆疊沙漏網(wǎng)絡(luò)。 因此,這里的想法是您使用與U-net非常相似的東西,但是您將在跳過連接中加入一個(gè)額外的可訓(xùn)練部分。 因此,這基本上是主要思想。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Then, you can use this hourglass module and stack it behind each other. So, you have essentially multiple refinement steps after each other and you always return to the original resolution. You can plug in a second network essentially as a kind of artifact correction network. Now, what’s really nice about this kind of hourglass network approach is that you return to the original resolution.
然后,您可以使用此沙漏模塊并將其堆疊在一起。 因此,您彼此之間實(shí)際上有多個(gè)優(yōu)化步驟,并且您始終會返回原始分辨率。 您可以實(shí)質(zhì)上將第二種網(wǎng)絡(luò)插入為一種偽像校正網(wǎng)絡(luò)。 現(xiàn)在,這種沙漏網(wǎng)絡(luò)方法的真正好處是您恢復(fù)了原始分辨率。
Convolutional pose machines also stack several modules on top of each other to enable pose tracking. This can also be combined with segmentation. Image created using gifify. Source: YouTube卷積式姿勢機(jī)還可以在彼此頂部堆疊幾個(gè)模塊,以實(shí)現(xiàn)姿勢跟蹤。 這也可以與細(xì)分結(jié)合使用。 使用gifify創(chuàng)建的圖像 。 資料來源: YouTubeLet’s say you’re predicting several classes at the same time. Then, you end up with several segmentation masks for the different classes. This idea can then be picked up in something that is called a convolutional pose machine. In the convolutional pose machine, you then use the area where your hourglasses connect, where you have one U-net essentially stacked on top of another U-net. At this layer, you can then also use the resulting segmentation maps per class in order to inform them of each other. So, you can use the context information of other things that have been detected in the image in order to steer this refinement. In convolutional pose machines, you do that for pose detection of joints of a body model. Of course, if you have the left knee joint and the right knee joint and other joints of the body the information about the other joints helps in decoding the correct position.
假設(shè)您要同時(shí)預(yù)測幾個(gè)課程。 然后,您將獲得針對不同類的多個(gè)細(xì)分掩碼。 然后可以在稱為卷積姿勢機(jī)的東西中找到這個(gè)想法。 然后在卷積姿勢機(jī)中,使用沙漏連接的區(qū)域,在該區(qū)域中,一個(gè)U網(wǎng)絡(luò)實(shí)際上堆疊在另一個(gè)U網(wǎng)絡(luò)的頂部。 然后,在此層,您還可以使用每個(gè)類生成的分割圖,以相互告知它們。 因此,您可以使用已在圖像中檢測到的其他事物的上下文信息來引導(dǎo)此優(yōu)化。 在卷積姿勢機(jī)中,您可以執(zhí)行此操作以檢測人體模型的關(guān)節(jié)。 當(dāng)然,如果您的身體有左膝關(guān)節(jié)和右膝關(guān)節(jié)以及其他關(guān)節(jié),則有關(guān)其他關(guān)節(jié)的信息將有助于解碼正確的位置。
X-ray transform invariant landmark detection by Bastian Bier. Image under X射線變換不變地標(biāo)檢測 。 CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。This idea has also been used by my colleague Bastian Bier for the detection of anatomic landmarks in the analysis of x-ray projections. I’m showing a small video here. You’ve already seen that in the introduction and now you finally have all the context that you need to understand the method. So, here you have an approach behind it that is very similar to convolutional pose machines that then start informing the landmarks about each other’s orientation and position in order to get improved detection results.
我的同事巴斯蒂安·比爾( Bastian Bier)也已將此想法用于在X射線投影分析中檢測解剖標(biāo)志 。 我在這里顯示一個(gè)小視頻。 您已經(jīng)在引言中看到了這一點(diǎn),現(xiàn)在終于有了理解該方法所需的所有上下文。 因此,這里有一個(gè)與卷積姿勢機(jī)非常相似的方法,該方法然后會開始向界標(biāo)通知彼此的方向和位置,以便獲得更好的檢測結(jié)果。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。So what else? I already hinted at the conditional random fields. Here, the idea is that you refine the output using a conditional random field. So, the pixel is modeled as a node in a random field. Theses pair-based terms between the pixels are very interesting because they can capture long-range dependencies and fine, local information.
還有什么? 我已經(jīng)暗示條件隨機(jī)字段。 這里的想法是,您可以使用條件隨機(jī)字段來優(yōu)化輸出。 因此,將像素建模為隨機(jī)字段中的節(jié)點(diǎn)。 這些像素之間的基于對的術(shù)語非常有趣,因?yàn)樗鼈兛梢圆东@遠(yuǎn)程依賴關(guān)系和精細(xì)的本地信息。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。So if you see the output here, this is from DeepLab. Here, you see how the iterative refinement of the conditional random field then can help to improve the segmentation. So, you can then also combine this with artous convolutions as in [4]. You could even model the conditional random field with recurrent neural networks as shown in reference [29]. This then also allows end-to-end training of the entire conditional random field.
因此,如果您在此處看到輸出,則來自DeepLab。 在這里,您將看到條件隨機(jī)字段的迭代細(xì)化如何可以幫助改善分割。 因此,您還可以將其與虛假卷積相結(jié)合,如[4]中所示。 您甚至可以使用遞歸神經(jīng)網(wǎng)絡(luò)為條件隨機(jī)場建模,如參考文獻(xiàn)[29]所示。 然后,這還允許對整個(gè)條件隨機(jī)字段進(jìn)行端到端訓(xùn)練。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。There are also a couple of advanced topics that I still want to hint at. Of course, you can also work with the losses. So far, we’ve only seen the segmentation loss itself but of course, you can also mix and match from previous ideas that we already saw in this class. For example, you can use a GAN in order to augment your loss. The idea here is then that you can essentially create a segmentor. You can then use the output of the segmentor as an input to a GAN type of discriminator. The discriminator now gets the task to say whether this is an automatic segmentation or a manual one. Then, this can be used as a kind of additional adversarial loss inspired by the ideas of the generative adversarial networks. You find that very often in literature as the so-called adversarial loss.
我還想暗示幾個(gè)高級主題。 當(dāng)然,您也可以彌補(bǔ)損失。 到目前為止,我們僅看到細(xì)分損失本身,但是當(dāng)然,您也可以將在本課程中已經(jīng)看到的以前的想法中混搭起來。 例如,您可以使用GAN來增加損失。 然后,這里的想法是,您基本上可以創(chuàng)建一個(gè)細(xì)分器。 然后,您可以將分段器的輸出用作GAN類型的鑒別器的輸入。 現(xiàn)在,判別器將獲得任務(wù)來說這是自動分段還是手動分段。 然后,這可以用作一種受生成對抗網(wǎng)絡(luò)思想啟發(fā)的附加對抗損失。 您在文學(xué)中經(jīng)常發(fā)現(xiàn)這種所謂的對抗損失。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。So, how is this then implemented? Well, the idea is that if you have a data set given of N training images and the corresponding label maps, you can then build the following loss function: This is essentially the multi-class cross-entropy loss and then you put on top the adversarial loss that works on your segmentation masks. So here, you can then essentially train your segmentation both with the ground truth label and on fooling the discriminator. This is essentially nothing else than a multi-task learning approach with an adversarial task.
那么,如何實(shí)現(xiàn)呢? 好吧,這個(gè)想法是,如果您有一個(gè)包含N個(gè)訓(xùn)練圖像和相應(yīng)標(biāo)簽圖的數(shù)據(jù)集,則可以構(gòu)建以下?lián)p失函數(shù):這本質(zhì)上是多類交叉熵?fù)p失,然后將對您的細(xì)分蒙版有效的對抗性損失。 因此,在這里,您實(shí)際上可以使用基本事實(shí)標(biāo)簽和欺騙鑒別器來訓(xùn)練您的細(xì)分。 從本質(zhì)上講,這只是帶有對抗任務(wù)的多任務(wù)學(xué)習(xí)方法。
CC BY 4.0 from the 深度學(xué)習(xí)講座中 Deep Learning Lecture.CC BY 4.0下的圖像。Okay. So, this already brings us to the end of our short video today. You see that we’ve seen now the key ideas on how to build good segmentation networks. In particular, U-net is one of the key ideas that you should know about. Now that we have discussed the segmentation networks, we can talk in the next lecture about object detection and how to actually implement that very quickly. So, this is the other side of image interpretation. We will also be able to figure out where different instances in the image actually are. So I hope, you liked this small video and I’m looking forward to seeing you in the next one. Thank you very much and bye-bye.
好的。 因此,這已經(jīng)使我們結(jié)束了今天的簡短視頻。 您會看到,我們已經(jīng)看到了有關(guān)如何構(gòu)建良好的細(xì)分網(wǎng)絡(luò)的關(guān)鍵思想。 特別是,U-net是您應(yīng)該了解的關(guān)鍵思想之一。 既然我們已經(jīng)討論了分割網(wǎng)絡(luò),那么我們可以在下一講中討論對象檢測以及如何非常快地實(shí)現(xiàn)它。 因此,這是圖像解釋的另一面。 我們還將能夠找出圖像中不同實(shí)例的實(shí)際位置。 因此,我希望您喜歡這個(gè)小視頻,希望在下一個(gè)視頻中見到您。 非常感謝,再見。
If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced. If you are interested in generating transcripts from video lectures try AutoBlog.
如果你喜歡這篇文章,你可以找到這里更多的文章 ,更多的教育材料,機(jī)器學(xué)習(xí)在這里 ,或看看我們的深入 學(xué)習(xí) 講座 。 如果您希望將來了解更多文章,視頻和研究信息,也歡迎關(guān)注YouTube , Twitter , Facebook或LinkedIn 。 本文是根據(jù)知識共享4.0署名許可發(fā)布的 ,如果引用,可以重新打印和修改。 如果您對從視頻講座中生成成績單感興趣,請嘗試使用AutoBlog 。
翻譯自: https://towardsdatascience.com/segmentation-and-object-detection-part-2-a334b91255f1
檢測和語義分割
總結(jié)
以上是生活随笔為你收集整理的检测和语义分割_分割和对象检测-第2部分的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 上班的班字最早指的是?蚂蚁庄园1.31日
- 下一篇: 米粉注意!MIUI 14最新升级计划:共