DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(四)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(四)
?
?
?
目錄
CNN
DN
DCIGN
?
?
?
?
?
相關(guān)文章
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(一)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(二)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(三)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(四)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(五)
DL:深度學習算法(神經(jīng)網(wǎng)絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(六)
?
?
?
CNN
? ? ? ? ?Convolutional neural networks (CNN or deep convolutional neural networks, DCNN)?are quite different from most other networks. They are primarily used for image processing?but can?also be used for other types of input such as as audio. A typical?use case for CNNs is where you feed the network images and the network classifies the data, e.g. it outputs?“cat” if you give it a cat picture and “dog” when you give it a dog picture. CNNs tend to start with an input “scanner” which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn’t want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it. This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1. Besides these convolutional layers, they also often feature pooling layers. Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red. To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of CNNs often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably.
? ? ? ? ?卷積神經(jīng)網(wǎng)絡(CNN或深度卷積神經(jīng)網(wǎng)絡,DCNN)與大多數(shù)其他網(wǎng)絡有很大的不同。它們主要用于圖像處理,但也可以用于其他類型的輸入,如音頻。CNNs的一個典型用例是提供網(wǎng)絡圖像,然后網(wǎng)絡對數(shù)據(jù)進行分類,例如,如果給它一張貓的圖片,它就輸出“cat”;如果給它一張狗的圖片,它就輸出“dog”。
?? ? ??CNNs傾向于從一個輸入“掃描器”開始,它不打算一次解析所有的訓練數(shù)據(jù)。例如,要輸入一個200 x 200像素的圖像,您不會想要一個有40000個節(jié)點的層。相反,您將創(chuàng)建一個掃描輸入層,例如20 x 20,它將提供圖像的前20 x 20像素(通常從左上角開始)。一旦您傳遞了該輸入(并可能將其用于訓練),您將為它提供下一個20x20像素:您將掃描儀向右移動一個像素。注意,不會將輸入的20個像素(或任何掃描器寬度)移動過來,您不是將圖像分割成20 x 20的塊,而是在它上面爬行。然后,這些輸入數(shù)據(jù)通過卷積層而不是普通層提供,在普通層中,并非所有節(jié)點都連接到所有節(jié)點。每個節(jié)點只關(guān)心自己與相鄰的單元之間的關(guān)系(緊密程度取決于實現(xiàn),但通常不會超過幾個)。這些卷積層也傾向于收縮變得更深,主要由易于分割因素的輸入(20可能去一層10其次是一層5)。兩個很常用的權(quán)力,因為他們可以劃分清晰,完全由定義:32,16,8、4、2、1。
?? ? ??除了這些卷積層,它們通常還具有池化層。池是一種過濾掉細節(jié)的方法:一種常見的池技術(shù)是max池,我們?nèi)? x 2個像素,然后傳遞紅色最多的像素。要將CNNs應用于音頻,您基本上是將輸入的音頻波形和一英寸的長度逐段地輸入到剪輯中。在現(xiàn)實世界中,CNNs的實現(xiàn)常常將FFNN綁定到數(shù)據(jù)的末尾,以進一步處理數(shù)據(jù),這允許高度非線性的抽象。這些網(wǎng)絡被稱為DCNNs,但是這兩個網(wǎng)絡之間的名稱和縮寫通??梢曰Q使用。
LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
Original Paper PDF
?
DN
? ? ? ? ?Deconvolutional networks (DN), also called inverse graphics networks (IGNs), are reversed convolutional neural networks. Imagine feeding a network the word “cat” and training it to produce cat-like pictures, by comparing what it generates to real pictures of cats. DNNs can be combined with FFNNs just like regular CNNs, but this is about the point where the line is drawn with coming up with new abbreviations. They may be referenced as deep deconvolutional neural networks, but you could argue that when you stick FFNNs to the back and the front of DNNs that you have yet another architecture which deserves a new name. Note that in most applications one wouldn’t actually feed text-like input to the network, more likely a binary classification input vector. Think <0, 1> being cat, <1, 0> being dog and <1, 1> being cat and dog. The pooling layers commonly found in CNNs are often replaced with similar inverse operations, mainly interpolation and extrapolation with biased assumptions (if a pooling layer uses max pooling, you can invent exclusively lower new data when reversing it).
? ? ? ? ?反卷積神經(jīng)網(wǎng)絡(DN)又稱逆圖形網(wǎng)絡(IGNS),是一種反向卷積神經(jīng)網(wǎng)絡。想象一下,給一個網(wǎng)絡輸入“貓”這個詞,并通過將生成的圖像與貓的真實圖像進行比較,訓練它生成類似貓的圖像。就像普通的CNNs一樣,DNNs也可以和FFNNs組合在一起,但這是關(guān)于如何使用新的縮寫的問題。它們可能被稱為深度反容量神經(jīng)網(wǎng)絡,但你可以爭辯說,當你把FFNNs放在DNNs的后面和前面時,你得到了另一個值得重新命名的架構(gòu)。
?? ? ??注意,在大多數(shù)應用程序中,實際上不會向網(wǎng)絡提供類似文本的輸入,更可能是二進制分類輸入向量。想想< 0,1 >是貓,< 1,0 >是狗,< 1,1 >是貓和狗。CNNs中常見的池化層經(jīng)常被類似的反操作替換,主要是使用有偏差的假設進行插值和外推(如果池化層使用最大池化,則可以在反轉(zhuǎn)時只創(chuàng)建更低的新數(shù)據(jù))。
Zeiler, Matthew D., et al. “Deconvolutional networks.” Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
Original Paper PDF
?
?
?
DCIGN
? ? ? ? ?Deep convolutional inverse graphics networks (DCIGN)?have a somewhat misleading name, as they are actually VAEs but with CNNs and DNNs for the respective encoders and decoders. These networks attempt to model “features” in the encoding as probabilities, so that it can learn to produce a picture with a cat and a dog together, having only ever seen one of the two in separate pictures. Similarly, you could feed it a picture of a cat with your neighbours’ annoying dog on it, and ask it to remove the dog, without ever having done such an operation. Demo’s have shown that these networks can also learn to model complex transformations on images, such as changing the source of light or the rotation of a 3D object. These networks tend to be trained with back-propagation.
? ? ? ? ?深度卷積逆圖形網(wǎng)絡(DCIGN)有一個有點誤導人的名字,因為它們實際上是VAEs,但是分別用于編碼器和解碼器的是CNNs和DNNs。這些網(wǎng)絡試圖將編碼中的“特征”建模為概率,這樣它就能學會在只在單獨的圖片中看到一只貓和一只狗的情況下,同時生成一張貓和狗的圖片。
? ? ? ?同樣,你也可以給它喂一張貓的照片,上面有你鄰居那只討厭的狗,然后讓它把狗移走,而不用做這樣的手術(shù)。演示表明,這些網(wǎng)絡還可以學習對圖像進行復雜的轉(zhuǎn)換建模,比如改變光源或3D對象的旋轉(zhuǎn)。這些網(wǎng)絡往往經(jīng)過反向傳播訓練。
Kulkarni, Tejas D., et al. “Deep convolutional inverse graphics network.” Advances in Neural Information Processing Systems. 2015.
Original Paper PDF
?
?
?
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(四)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DL:深度学习算法(神经网络模型集合)概
- 下一篇: DL:深度学习算法(神经网络模型集合)概