卷积神经网络结构_卷积神经网络
卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)
CNN’s are a special type of ANN which accepts images as inputs. Below is the representation of a basic neuron of an ANN which takes as input X vector. The values in the X vector is then multiplied by corresponding weights to form a linear combination. To thus, a non-linearity function or an activation function is imposed so as to get the final output.
CNN是一種特殊類型的ANN,它接受圖像作為輸入。 以下是ANN的基本神經(jīng)元的表示形式,它作為輸入X向量。 然后,將X向量中的值乘以相應(yīng)的權(quán)重以形成線性組合。 因此,施加非線性函數(shù)或激活函數(shù)以獲得最終輸出。
為什么選擇CNN? (Why CNN?)
Talking about grayscale images, they have pixel ranges from 0 to 255 i.e. 8-bit pixel values. If the size of the image is NxM, then the size of the input vector will be N*M. For RGB images, it would be N*M*3. Consider an RGB image with size 30x30. This would require 2700 neurons. An RGB image of size 256x256 would require over 100000 neurons. ANN takes a vector of inputs and gives a product as a vector from another hidden layer that is fully connected to the input. The number of weights, parameters for 224x224x3 is very high. A single neuron in the output layer will have 224x224x3 weights coming into it. This would require more computation, memory, and data. CNN exploits the structure of images leading to a sparse connection between input and output neurons. Each layer performs convolution on CNN. CNN takes input as an image volume for the RGB image. Basically, an image is taken as an input and we apply kernel/filter on the image to get the output. CNN also enables parameter sharing between the output neurons which means that a feature detector (for example horizontal edge detector) that’s useful in one part of the image is probably useful in another part of the image.
談到灰度圖像,它們的像素范圍是0到255,即8位像素值。 如果圖像的大小為NxM,則輸入向量的大小將為N * M。 對于RGB圖像,它將為N * M * 3。 考慮大小為30x30的RGB圖像。 這將需要2700個神經(jīng)元。 大小為256x256的RGB圖像將需要超過100000個神經(jīng)元。 ANN接受輸入向量,并從另一個完全連接到輸入的隱藏層中將向量作為乘積。 權(quán)數(shù)為224x224x3的參數(shù)非常高。 輸出層中的單個神經(jīng)元將具有224x224x3的權(quán)重。 這將需要更多的計算,內(nèi)存和數(shù)據(jù)。 CNN利用圖像的結(jié)構(gòu)導(dǎo)致輸入和輸出神經(jīng)元之間的稀疏連接。 每層在CNN上執(zhí)行卷積。 CNN將輸入作為RGB圖像的圖像量。 基本上,將圖像作為輸入,我們在圖像上應(yīng)用內(nèi)核/過濾器以獲取輸出。 CNN還可以在輸出神經(jīng)元之間共享參數(shù),這意味著在圖像的一部分中有用的特征檢測器(例如水平邊緣檢測器)可能在圖像的另一部分中有用。
卷積 (Convolutions)
Every output neuron is connected to a small neighborhood in the input through a weight matrix also referred to as a kernel or a weight matrix. We can define multiple kernels for every convolution layer each giving rise to an output. Each filter is moved around the input image giving rise to a 2nd output. The outputs corresponding to each filter are stacked giving rise to an output volume.
每個輸出神經(jīng)元通過權(quán)重矩陣(也稱為內(nèi)核或權(quán)重矩陣)連接到輸入中的小鄰域。 我們可以為每個卷積層定義多個內(nèi)核,每個內(nèi)核都會產(chǎn)生輸出。 每個濾鏡在輸入圖像周圍移動,產(chǎn)生第二個輸出。 堆疊與每個濾波器對應(yīng)的輸出,以增加輸出量。
Convolution operation, Image by indoml卷積運算,圖片由indomlHere the matrix values are multiplied with corresponding values of kernel filter and then summation operation is performed to get the final output. The kernel filter slides over the input matrix in order to get the output vector. If the input matrix has dimensions of Nx and Ny, and the kernel matrix has dimensions of Fx and Fy, then the final output will have a dimension of Nx-Fx+1 and Ny-Fy+1. In CNN’s, weights represent a kernel filter. K kernel maps will provide k kernel features.
這里,矩陣值與內(nèi)核濾波器的相應(yīng)值相乘,然后執(zhí)行求和運算以獲得最終輸出。 內(nèi)核濾波器在輸入矩陣上滑動以獲取輸出向量。 如果輸入矩陣的尺寸為Nx和Ny,而內(nèi)核矩陣的尺寸為Fx和Fy,則最終輸出的尺寸將為Nx-Fx + 1和Ny-Fy + 1。 在CNN中,權(quán)重代表內(nèi)核過濾器。 K個內(nèi)核映射將提供k個內(nèi)核功能。
填充 (Padding)
Padded convolution is used when preserving the dimension of an input matrix that is important to us and it helps us keep more of the information at the border of an image. We have seen that convolution reduces the size of the feature map. To retain the dimension of feature map as that of an input map, we pad or append the rows and column with zeros.
當保留對我們很重要的輸入矩陣的維時,將使用填充卷積,這有助于我們將更多信息保留在圖像的邊界。 我們已經(jīng)看到卷積減小了特征圖的大小。 為了將要素圖的尺寸保留為輸入圖的尺寸,我們將行和列填充或附加零。
Padding, Image by author填充,作者提供的圖片In the above figure, with padding of 1, we were able to preserve the dimension of a 3x3 input. The size pf the output feature map is of dimension N-F+2P+1. Where N is the size of the input map, F is the size of the kernel matrix and P is the value of padding. For preserving the dimensions, N-F+2P+1 should be equal to N. Therefore,
在上圖中,填充為1時,我們能夠保留3x3輸入的尺寸。 輸出特征圖的大小pf為維度N-F + 2P + 1。 其中N是輸入映射的大小,F是內(nèi)核矩陣的大小,P是填充的值。 為了保留尺寸,N-F + 2P + 1應(yīng)該等于N。因此,
Condition for retaining dimensions, Image by author保留尺寸的條件,作者提供的圖片大步走 (Stride)
Stride refers to the number of pixels the kernel filter will skip i.e pixels/time. A Stride of 2 means the kernel will skip 2 pixels before performing the convolution operation.
步幅是指內(nèi)核過濾器將跳過的像素數(shù),即像素/時間。 跨度為2表示內(nèi)核在執(zhí)行卷積運算之前將跳過2個像素。
Stride demonstration, Image by indoml大步示范,圖片來自indomlIn the figure above, the kernel filter is sliding over the input matrix by skipping one pixel at a time. A Stride of 2 would perform this skipping action twice before performing the convolution like in the image below.
在上圖中,內(nèi)核過濾器通過一次跳過一個像素在輸入矩陣上滑動。 步幅為2會在執(zhí)行卷積之前執(zhí)行兩次此跳過動作,如下圖所示。
Stride demonstration, Image by indoml大步示范,圖片來自indomlAn observation to make here is that the output feature map is reduced(4 times) when the stride is increased from 1 to 2. The dimension of the output feature map is (N-F+2P)/S + 1.
這里要觀察到的是,將步幅從1增加到2時,輸出特征圖將減少(4倍)。輸出特征圖的尺寸為(N-F + 2P)/ S + 1。
匯集 (Pooling)
Pooling provides translational invariance by subsampling: reduces the size of the feature maps. The two commonly used Pooling techniques are max pooling and average pooling.
合并通過子采樣提供平移不變性:減小要素圖的大小。 兩種常用的池化技術(shù)是最大池化和平均池化。
Max pooling operation, Image by indoml最大池操作,圖片由indoml提供In the above operation, the pooling operation divides 4x4 matrix into 4 2x2 matrices and picks the value which is the greatest amongst the four(for max-pooling) and the average of the four( for average pooling). This reduces the size of the feature maps which therefore reduces the number of parameters without missing important information. One thing to note here is that the pooling operation reduces the Nx and Ny values of the input feature map but does not reduce the value of Nc (number of channels). Also, the hyperparameters involved in pooling operation are the filter dimension, stride, and type of pooling(max or avg). There is no parameter for gradient descent to learn.
在上述操作中,合并操作將4x4矩陣劃分為4個2x2矩陣,并選擇四個值(對于最大池化)和四個平均值(對于平均池化)中最大的值。 這減小了特征圖的大小,因此減少了參數(shù)的數(shù)量而不會丟失重要信息。 這里要注意的一件事是,池化操作會減少輸入要素圖的Nx和Ny值,但不會減少Nc(通道數(shù))的值。 同樣,合并操作中涉及的超參數(shù)是過濾器的尺寸,步幅和合并類型(最大或平均)。 沒有參數(shù)可供梯度下降學(xué)習(xí)。
輸出特征圖 (Output Feature Map)
The size of the output feature map or volume depends on:
輸出要素圖的大小或體積取決于:
天真卷積 (Naive Convolution)
These are the building blocks of convolutional neural network and depend on the above parameters. The dimension of the output feature map can be formulated as:
這些是卷積神經(jīng)網(wǎng)絡(luò)的基礎(chǔ),并取決于上述參數(shù)。 輸出特征圖的維可以表示為:
The dimension of o/p feature map, Image by authoro / p特征圖的尺寸,作者提供膨脹卷積 (Dilated Convolution)
This has an additional parameter known as the dilation rate. This technique is used to increase the receptive field in convolution. This convolution is also known as an atrous convolution. A 3x3 convolution with dilation rate of 2 visualizes the same area as a naive 5x5 convolution, whilst having only 9 parameters. It can deliver a broader field of view at the same computational cost. They should be used only if a wide field of view is needed and when one cannot afford multiple convolutions or larger kernels. The image below depicts the receptive coverage of a dilated convolution.
這具有稱為膨脹率的附加參數(shù)。 該技術(shù)用于增加卷積中的接收場。 這種卷積也稱為原子卷積。 膨脹率為2的3x3卷積與樸素的5x5卷積可視化相同的區(qū)域,但只有9個參數(shù)。 它可以以相同的計算成本提供更廣闊的視野。 僅在需要廣闊視野且無法承受多次卷積或更大內(nèi)核時,才應(yīng)使用它們。 下圖描繪了膨脹卷積的接受范圍。
Paul-Louis Pr?vePaul-LouisPr?ve轉(zhuǎn)置卷積 (Transposed Convolution)
Used with an aim to increase the size of the output feature map. It is used in encoder-decoder networks to increase the spatial dimensions. The input image is appropriately padded before the convolution operation.
旨在增加輸出要素圖的大小。 它用于編碼器-解碼器網(wǎng)絡(luò)以增加空間尺寸。 在卷積操作之前適當?shù)靥畛漭斎雸D像。
Divyanshu MishraDivyanshu Mishra攝結(jié)束 (The End)
Thank you and stay tuned for more blogs on AI.
謝謝,請繼續(xù)關(guān)注更多有關(guān)AI的博客。
翻譯自: https://towardsdatascience.com/convolutional-neural-networks-f62dd896a856
卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)
總結(jié)
以上是生活随笔為你收集整理的卷积神经网络结构_卷积神经网络的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 快手极速版怎么设置私信权限
- 下一篇: 快手怎么设置陌生人私信