當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

alexnet vgg_从零开始：建立著名的分类网2（AlexNet / VGG）

發布時間：2023/12/15 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 alexnet vgg_从零开始：建立著名的分类网2（AlexNet / VGG）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

alexnet vgg

介紹 (Introduction)

In the last article, we reviewed how some of the most famous classification networks are evaluated on ImageNet. We also finished building a PyTorch evaluation dataset class, as well as, efficient evaluation functions. We will soon see how they come handy in validating our model structures and training.

在上一篇文章中，我們回顧了如何在ImageNet上評估一些最著名的分類網絡。我們還完成了PyTorch評估數據集類的構建以及高效的評估功能。我們很快將看到它們如何在驗證我們的模型結構和培訓時派上用場。

We will build AlexNet and VGG models in this article. Despite their influential contribution to computer vision and deep learning, their structures are straightforward in retrospect. Therefore, in addition to building them, we will also play with “weight porting” and sliding window implementation with convolutional layers.

我們將在本文中構建AlexNet和VGG模型。盡管它們對計算機視覺和深度學習具有重要影響，但回顧其結構卻很簡單。因此，除了構建它們之外，我們還將使用帶卷積層的“權重移植”和滑動窗口實現。

I personally find borrowing weights from other models a simple but useful technique to practice. Other than using it to double check our model structure, it can also be used for porting model weights between different frameworks, as well as, initializing the backbone of a detector or a modified network by reshaping the weights (we will dabble in it for our sanity check).

我個人發現從其他模型中借用權重是一種簡單但有用的練習方法。除了使用它來仔細檢查我們的模型結構之外，它還可以用于在不同框架之間移植模型權重，以及通過重塑權重來初始化檢測器或修改后的網絡的主干(我們將涉獵其中)完整性檢查)。

Cool, so let’s begin.

太酷了，讓我們開始吧。

監督 (Overveiw)

Explaining the structures of AlexNet and VGG family
解釋AlexNet和VGG系列的結構
Building our own library modules for them
為他們構建我們自己的庫模塊
Implementing sliding window with convolution
卷積實現滑動窗口
Replacing the fully connected classifier head with a convolutional classifier head for “dense evaluation”
用卷積分類器替換完全連接的分類器頭以進行“密集評估”
Discussion on weight initialization
權重初始化的討論
Sanity check by porting weights from the pretrained models
通過移植來自預訓練模型的權重進行健全性檢查

亞歷克斯網 (AlexNet)

AlexNet is often regarded as the model that marked the dawn of current deep learning era. It was the winner of ImageNet 2012 with a top 5 accuracy of 15.3%, outperforming the runner-up of the year by a whopping 10.9%.

AlexNet往往被視為標志著當前深度學習時代的到來模型。它是ImageNet 2012的贏家，其前5位的準確度為15.3％，比年度亞軍高出10.9％。

It has 60 million parameters and given GPUs’ limited memory nearly 10 years ago, AlexNet had to be split and trained across two GTX 580 3GB GPUs. For this reason, it can be confusing to decipher the exact structure of AlexNet for one GPU from the original paper. In fact, the official PyTorch implementation of AlexNet takes reference from this paper (check footnote 1 on page 5), although PyTorch’s implementation still differs from the paper by using 256 kernels in the 4th convolutional layer instead of the 384 described in the paper (aargh!).

它具有6000萬個參數，并且在近10年前，由于GPU的內存有限，AlexNet必須在兩個GTX 580 3GB GPU上進行拆分和培訓。因此，從原始論文中破譯一個GPU的AlexNet的確切結構可能會造成混淆。實際上，AlexNet的官方PyTorch實現參考了本文 (請參見第5頁的腳注1)，盡管PyTorch的實現與本文仍然有所不同，在第4卷積層使用了256個內核，而不是論文中描述的384個內核(aargh)。！)。

We will also ignore the Local Response Normalization (LRN) feature of the network. While the paper states that LRN reduces top1 and top5 by a non-negligible amount (1.4% and 1.2%), it is not often used nowadays as its effects are insignificant for most networks. Batch normalization is the default scheme to apply if we want the model to learn better. We will add that for our VGG networks.

我們還將忽略網絡的本地響應規范化(LRN)功能。盡管該論文指出LRN將top1和top5減少了不可忽略的數量(1.4％和1.2％)，但由于它對大多數網絡的影響微不足道，因此如今已不常用。如果我們希望模型學習得更好，則批量標準化是默認的方案。我們將其添加到我們的VGG網絡中。

In this article, we will follow PyTorch’s AlexNet structure so that we can use its pretrained weights. The structure we will use is illustrated in the following figure. The spatial dimension of the feature maps after convolution/max-pooling can be computed as floor(W - F + 2P) / S) + 1where Wis the current feature map’s width/height, Fis the filter’s size, Pis padding size and Sis stride. floor() means we round down to the nearest integer. The “thickness” or “depth” of the output feature maps depend on the number of kernels (filters) we use at current stage.

在本文中，我們將遵循PyTorch的AlexNet結構，以便我們可以使用其預先訓練的權重。下圖說明了我們將使用的結構。卷積/最大合并后的特征圖的空間尺寸可以計算為floor(W - F + 2P) / S) + 1 ，其中W是當前特征圖的寬度/高度， F是濾鏡的大小， P是填充大小和S大步前進。 floor()表示我們舍入到最接近的整數。輸出特征圖的“厚度”或“深度”取決于我們在當前階段使用的內核(過濾器)數量。

With PyTorch library, this AlexNet structure is pretty easy to implement. In the __init__() method, we can first ignore the head argument which controls the what kind of classification head we want to use.

使用PyTorch庫，此AlexNet結構非常易于實現。在__init__()方法中，我們可以首先忽略head參數，該參數控制我們要使用哪種分類head。

AlexNet __init__() ConvNet Feature ExtractorAlexNet __init __()ConvNet功能提取器

As we can see in the code and you may have already known that one beautiful property of convolutional layers is they do not care about your input size. Given any input size, it will generate the corresponding output size according to the formula above. However, if we append the convolutional layers with fully connected layers, then we need to consider the final feature map’s total elements. In AlexNet, the flattened feature map must be a 9216-dimension vector before going into the fully connected classifier.

正如我們在代碼中看到的那樣，您可能已經知道卷積層的一個美麗屬性是它們不關心您的輸入大小。給定任何輸入大小，它將根據上面的公式生成相應的輸出大小。但是，如果我們在卷積圖層上附加完全連接的圖層，則需要考慮最終特征圖的全部元素。在AlexNet中，展平的特征圖在進入完全連接的分類器之前必須為9216維向量。

There are two ways to satisfy this constraint. We can either have a resize operation to convert the input images to 224 x 224 dimension, or we can “resize” the final feature to make it 256 x 6 x 6 before put it into the fully connected layers. The second solution can be accomplished with nn.AdaptiveAvgPool2d() . It is kind of analogous to interpolation. A good explanation of it can be found here. With this bit of knowledge, we can build our fully connected classifier.

有兩種方法可以滿足此約束。我們可以執行調整大小操作以將輸入圖像轉換為224 x 224尺寸，也可以“調整”最終特征的大小以使其為256 x 6 x 6，然后再放入完全連接的圖層中。第二種解決方案可以使用nn.AdaptiveAvgPool2d()完成。它有點類似于插值。可以在這里找到很好的解釋。借助這些知識，我們可以構建完全連接的分類器。

AlexNet __init__() ConvNet Classification HeadAlexNet __init __()ConvNet分類負責人

ConvNet的滑動Windows實現 (ConvNet’s Implementation of Sliding Windows)

Fully connected layers are sort of annoying in neural networks for computer vision tasks. It has too much parameters and induces the strict input dimension requirement. So the question is can we do without it? Luckily, yes. In many later networks, fully connected layers are replaced by simple average pooling. In our case, if we want to stay true to the network structure, we can replace it with its equivalent convolutional implementation, ditching the need of ensuring the fixed input size.

在神經網絡中，對于計算機視覺任務而言，完全連接的層有點煩人。它具有太多參數，并引起嚴格的輸入尺寸要求。因此，問題是我們可以沒有它嗎？幸運的是在許多后來的網絡中，完全連接的層被簡單的平均池所代替。在我們的案例中，如果我們希望忠于網絡結構，則可以用等效的卷積實現代替它，從而無需確保輸入大小固定。

Replacing FC Layers with Convolution用卷積替換FC層

As in the above diagram, the 256 x 6 x 6 final feature map is on the left. It can be flattened to a 9216-dimensional feature vector and passed through a 4096-unit fully connected layer (top right). For every unit (green dot), there are 9216 connections, linking each element on the feature map. It will produce a 4096 x 1 feature vector. In this way, each FC unit is effectively a 256 x 6 x 6 kernel (bottom right). We can thus replace the 4096-unit FC layer with a convolutional layer of 4096 256 x 6 x 6 kernels. It will produce a 4096 x 1 x 1 feature map instead. It is obvious to see that aside from the extra dimension, the two outputs are equivalent.

如上圖所示，左側為256 x 6 x 6最終特征圖。可以將其展平為9216維特征向量，并通過4096個單位的完全連接層(右上)。對于每個單元(綠色點)，都有9216個連接，它們鏈接特征地圖上的每個元素。它將產生4096 x 1特征向量。這樣，每個FC單元實際上就是256 x 6 x 6內核(右下)。因此，我們可以用4096 256 x 6 x 6內核的卷積層替換4096個單位的FC層。它將生成4096 x 1 x 1的特征圖。很明顯，除了額外的維度，這兩個輸出是等效的。

With this derivation, we can implement the ConvNet version of classification head. Continuing from the previous code snippet, we have:

通過此推導，我們可以實現分類頭的ConvNet版本。繼續前面的代碼片段，我們有：

At the first look, this conversion seems redundant. However, remember the one beautiful property of ConvNet we mentioned minutes ago? Now we can allow the input to have any sizes as long as they are larger than 224 x 224.

乍一看，這種轉換似乎是多余的。但是，還記得我們幾分鐘前提到的ConvNet的一個美麗屬性嗎？現在我們可以允許輸入具有任何大小，只要它們大于224 x 224。

For example, if we have an input image with size that is between 287 x 287 and 318 x 318, we will get a final feature map that is 256 x 8 x 8. As our fully connected layers can only take in a flattened feature vector corresponding to a 256 x 6 x 6 feature map, we have to apply nn.AdaptiveAvgPool2d() . Alternatively, we can try “dense evaluation”, that is to put a sliding windows across the feature map and get 9 crops of 256 x 6 x 6 feature map and put them into the FC layers one-by-one to generate final outputs (Figure below, top). The outputs can then be averaged.

例如，如果輸入圖像的大小在287 x 287和318 x 318之間，則最終的特征圖將為256 x 8 x8。因為我們完全連接的圖層只能吸收平坦的特征向量對應于256 x 6 x 6特征圖，我們必須應用nn.AdaptiveAvgPool2d() 。另外，我們可以嘗試“ 密集評估 ”，即在特征圖中放置一個滑動窗口，并獲得9個256 x 6 x 6特征圖的作物，然后將它們一張一張地放入FC層，以生成最終輸出(下圖，頂部)。然后可以平均輸出。

However, if we are using the ConvNet’s implementation of the FC layers, we are naturally using sliding windows (Figure above, bottom). The output feature map can be averaged across the spatial dimension to get the final output.

但是，如果我們使用ConvNet的FC層實現，則自然會使用滑動窗口(上圖，底部)。可以在整個空間維度上對輸出要素圖進行平均以獲得最終輸出。

With ConvNet’s implementation of the classification head, we can now be lenient with our input image size.

使用ConvNet實現分類頭后，我們現在可以寬容輸入圖像的大小。

Let’s write the last bit of code to complete our AlexNet implementation. The forward() method of the class is:

讓我們寫最后的代碼來完成我們的AlexNet實現。該類的forward()方法為：

In the same script, we can define builder function that helps us generate the specified AlexNet. When we have pretrained weights ready, we can load the weights into the network here.

在同一腳本中，我們可以定義構建器函數來幫助我們生成指定的AlexNet。準備好預訓練的權重后，可以在此處將權重加載到網絡中。

With that, we have finished our AlexNet model. We can now move on to sanity check with weight porting.

至此，我們完成了AlexNet模型。現在，我們可以通過重量移植進行健康檢查。

重量移植的健全性檢查 (Sanity Check with Weight Porting)

We are going to initiate a pretrained AlexNet from torchvision and copy its weight to our model. In this section, we will go through how weights are indexed and how to reshape the weights for our ConvNet implementation of FC classifier.

我們將從火炬視覺中啟動經過預訓練的AlexNet，并將其權重復制到我們的模型中。在本節中，我們將介紹如何為FC分類器的ConvNet實現索引權重以及如何調整權重。

Firstly, we initiate all three networks:

首先，我們啟動所有三個網絡：

Let’s print out all the states and parameters of the three networks. If we run the script at this stage, we should get something like below.

讓我們打印出這三個網絡的所有狀態和參數。如果我們在此階段運行腳本，則應獲得如下內容。

Torch name: features.0.weight torch.Size([64, 3, 11, 11])
FC name : features.0.weight torch.Size([64, 3, 11, 11])
Conv name : features.0.weight torch.Size([64, 3, 11, 11])Torch name: features.0.bias torch.Size([64])
FC name : features.0.bias torch.Size([64])
Conv name : features.0.bias torch.Size([64])Torch name: features.3.weight torch.Size([192, 64, 5, 5])
FC name : features.3.weight torch.Size([192, 64, 5, 5])
Conv name : features.3.weight torch.Size([192, 64, 5, 5])...Torch name: classifier.0.weight torch.Size([4096, 25088])
FC name : classifier.2.weight torch.Size([4096, 25088])
Conv name : classifier.0.weight torch.Size([4096, 512, 7, 7])Torch name: classifier.0.bias torch.Size([4096])
FC name : classifier.2.bias torch.Size([4096])
Conv name : classifier.0.bias torch.Size([4096])Torch name: classifier.3.weight torch.Size([4096, 4096])
FC name : classifier.5.weight torch.Size([4096, 4096])
Conv name : classifier.3.weight torch.Size([4096, 4096, 1, 1])...

To transfer the weights from PyTorch’s pretrained AlexNet to our AlexNet with FC classification head, we can create an OrderedDict() and store the pretrained AlexNet’s weights with our model’s parameter name. We will then load this OrderedDict to our model.

要將權重從PyTorch的預訓練AlexNet轉移到帶有FC分類頭的AlexNet，我們可以創建OrderedDict()并使用模型的參數名稱存儲預訓練的AlexNet的權重。然后，我們將這個OrderedDict加載到我們的模型中。

This process can be repeated for our AlexNet with Convolutional head. However, as convolutional layers’ weights and FC layers’ weights have different shape, we need to reshape the weights for our classifier.

對于帶有卷積頭的AlexNet，可以重復此過程。但是，由于卷積層的權重和FC層的權重具有不同的形狀，因此我們需要對分類器的權重進行整形。

During the loading of the weights, there is no error, indicating that we have got the weights’ name and shapes right. However, to guarantee that we have got everything correct, we should test our model with the evaluation script we wrote in the last article.

在加載砝碼期間，沒有錯誤，這表明我們正確設置了砝碼的名稱和形狀。但是，為了確保我們一切都正確，我們應該使用上一篇文章中編寫的評估腳本來測試模型。

The torchvision’s pretrained AlexNet and our AlexNet have exactly the same accuracy, indicating we have done everything correctly.

火炬視覺的預訓練AlexNet和我們的AlexNet具有完全相同的精度，表明我們已正確完成了所有操作。

Next, let’s conduct the same center-crop evaluation with our convolutionally headed AlexNet. As this AlexNet outputs a feature map instead of a feature vector, we need to first write a wrapper that will average the model’s output.

接下來，讓我們與以卷積為首的AlexNet進行相同的中心裁剪評估。由于此AlexNet輸出特征圖而不是特征向量，因此我們需要首先編寫一個包裝程序，以對模型的輸出求平均值。

Then, we can pass the wrapped model to our evaluation function to get the outcome.

然后，我們可以將包裝的模型傳遞給評估函數以獲取結果。

Finally, we can test the “dense evaluation” by passing images of size larger than 224 x 224 and average the output for prediction.

最后，我們可以通過傳遞尺寸大于224 x 224的圖像并平均輸出以進行預測來測試“密集評估”。

As can be seen, the dense evaluation is about 1.3% more accurate than the center-crop evaluation. Yet, because most of the computations from the convolutional feature extractor are shared, the evaluation time isn’t too much longer.

可以看出，稠密評估的準確性比中茬評估高約1.3％。但是，由于共享了卷積特征提取器的大多數計算，因此評估時間不會太長。

Hooray! now we have our own enhanced AlexNet.

萬歲！現在我們有了自己的增強型AlexNet。

VGG (VGG)

When I first studied neural networks, I felt overwhelmed by the huge combinatorial possibilities of network structure. How deep do I need to go? What would the kernel sizes be? How many kernels should I use? What should the strides be?… AlexNet did not help with my queries. There isn’t too much pattern in the AlexNet that we can follow. Then VGG networks come to rescue!

當我第一次學習神經網絡時，我對網絡結構的巨大組合可能性感到不知所措。我需要走多深？內核大小是多少？我應該使用幾個內核？大步向前應該是什么？…AlexNet對我的查詢沒有幫助。 AlexNet中沒有太多可遵循的模式。然后，VGG網絡來救援！

VGG family is a landmark in deep learning, not only because it was the runner-up in 2014 ImageNet competition (with an impressive 7.3% top 5 error rate), but it also helped standardise the structure of networks.

VGG系列是深度學習的一個里程碑，不僅因為它是2014年ImageNet競賽的亞軍(前5位錯誤率高達7.3％)，而且還幫助標準化了網絡結構。

VGG中的模式 (Patterns in VGG)

Use 3 x 3 convolutional kernels across the network. Two 3 x 3 kernels stacked together have a 5 x 5 receptive field (i.e. one element on its output feature map is derived from a 5 x 5 region on the input image). Three of it stacked together will have a 7 x 7 receptive field. If we use a stack of three 3 x 3 kernels instead of a 7 x7 kernel, not only do we have the same receptive field, we can imbue it with three times more non-linearities (with ReLU), as well as, use a lot less parameters (3x(3x3xCxC) vs 7 x 7 x C x C, assuming the input map and output map both have C channels).

在網絡上使用3 x 3卷積內核。 堆疊在一起的兩個3 x 3內核具有5 x 5的接收場(即，其輸出特征圖上的一個元素來自輸入圖像上的5 x 5區域)。其中三個堆疊在一起將具有7 x 7的接收場。如果我們使用三個3 x 3內核而不是7 x7內核的堆棧，那么我們不僅具有相同的接收場，而且還可以使其非線性度提高三倍(使用ReLU)，并且使用少得多的參數(假設輸入映射和輸出映射都具有C通道，則3x(3x3xCxC)與7 x 7 x C x C相比)。

Only use max-pooling of size 2 and stride 2 to downsample feature map. Unlike AlexNet whose some convolutional layers also downsample the feature map, VGG only downsamples the feature map with max-pooling. This means all the convolutional layers have stride 1.

僅使用大小為2和跨度為2的最大合并來對特征圖進行下采樣。 與AlexNet的一些卷積層也對特征圖進行下采樣不同，VGG僅使用最大池化對特征圖進行下采樣。這意味著所有卷積層的步幅均為1。

Double the channels after every downsampling. As the feature map’s width and height are halved, its channels are doubled with twice as much kernels applied at each convolutional layer.

每次下采樣后，將通道加倍。 當特征圖的寬度和高度減半時，其通道將加倍，每個卷積層上應用的內核數將增加一倍。

Now we have the luxury of restricted choices while building our networks. In fact, many later networks also adopt these patterns in their structure. We seldom see the use of large kernels nowadays, and the heuristic of only doubling channels after downsampling gives rise to the concept of “module”, a stack of convolutional layers with the same number of kernels.

現在，在建立我們的網絡時，我們可以自由選擇。實際上，許多后來的網絡在其結構中也采用了這些模式。如今，我們很少看到使用大內核，并且在降采樣后僅將通道加倍的試探法引起了“模塊”的概念，即具有相同內核數量的卷積層的堆棧。

With the above heuristic, the structure of the whole VGG family can be summarized with a single table as in the paper.

通過上述啟發式方法，整個VGG系列的結構可以用本文中的單個表格來總結。

VGG Famlily StructuresVGG家庭結構

With this structured design, we can easily code all the networks in the VGG family.

通過這種結構化設計，我們可以輕松地對VGG系列中的所有網絡進行編碼。

The code here takes heavy reference of the official code for torchvision’s VGG models. I value-added (hopefully) with detailed remarks.

這里的代碼大量引用了Torchvision的VGG模型的官方代碼。我希望通過詳細的說明增值。

We first define the structure of the VGG networks. It is basically putting the table above into a dictionary.

我們首先定義VGG網絡的結構。基本上是將上面的表格放入字典中。

Next, we can define the class’s __init__()and forward()methods:

接下來，我們可以定義類的__init__()和forward()方法：

Same as our AlexNet, we will also implement convolutional classification head for VGG (“dense evaluation” originates from VGG paper after all). bn(batch normalization) argument determines if we will include batch normalization into our network. Batch normalization is a technique that came after VGG, we will retrofit it into VGG. It increases both the accuracy and training speed.

與AlexNet一樣，我們還將為VGG實現卷積分類頭(“密集評估”畢竟源于VGG論文)。 bn (批處理規范化)參數確定是否將批處理規范化包含到我們的網絡中。批量歸一化是VGG之后的一項技術，我們將其改裝為VGG。它提高了準確性和訓練速度。

We now move onto the _get_conv_layers()method.

現在，我們轉到_get_conv_layers()方法。

As mentioned in the comments in the code snippet above, if we choose to add batch normalization, it is usually added after convolution but before non-linearity.

如上面的代碼片段中的注釋所述，如果我們選擇添加批量歸一化，則通常在卷積之后但在非線性之前添加。

The classification head’s code is very straightforward too. One thing to note is that dropout, unlike batch normalization, is usually added after activation while before convolution.

分類頭的代碼也非常簡單。需要注意的一件事是，與批處理規范化不同，通常在激活之后卷積之前添加輟學。

重量初始化 (Weight Initialization)

We are almost done here. However, different from AlexNet whose weights can all be more casually initialized with zero-meaned normal distribution with a standard deviation of 0.01, caution has to be taken in initializing the weights of VGG networks as it is much deeper and does not converge easily. In fact, for ImageNet competition, the authors of VGG first trained shallower versions of the network and then slowly added more layers to make it deeper.

我們在這里差不多完成了。但是，與AlexNet可以通過零均值正態分布(標準偏差為0.01)更輕松地初始化權重的方法不同，在初始化VGG網絡的權重時要格外小心，因為它的深度更深且收斂不容易。實際上，對于ImageNet競賽，VGG的作者首先訓練了網絡的較淺版本，然后慢慢添加了更多層以使其更深。

We do not need to go through that tedious process ourselves, as with Kaiming or Xavier initialization, we can train the whole deep network from scratch. This pdf slides explains the two types of initialization quite well.

我們不需要自己進行繁瑣的過程，就像使用Kaiming或Xavier初始化一樣，我們可以從頭開始訓練整個深度網絡。該pdf幻燈片很好地解釋了兩種類型的初始化。

However, there are a few confusing choices to make. Firstly, for each type of initialization, should we use a Gaussian distribution or uniform distribution? This stackexchange discussion mentions that for Xavier initialization, uniform distribution seems be to slightly better, while for Kaiming initialization, Gaussian distribution is used for all layers in the original ResNet paper, so I guess we can go with Kaiming normal and Xavier uniform distribution.

但是，有一些令人困惑的選擇。首先，對于每種類型的初始化，我們應該使用高斯分布還是均勻分布？這個stackexchange討論提到對于Xavier初始化，均勻分布似乎要好一些，而對于Kaiming初始化，高斯分布用于原始ResNet論文中的所有層，因此我想我們可以使用Kaiming正態分布和Xavier均勻分布。

The second question is that for Kaiming initialization, there are “fan-in” and “fan-out” two modes, which one to use? This PyTorch forum discussion states that “fan-in” should be the default mode. It sounds good, except as mentioned in the forum discussion, torchvision’s ResNet as well as VGG used “fan-out”. I did quite a bit of search online but could not find explanation for this choice.

第二個問題是，對于Kaiming初始化，有“扇入”和“扇出”兩種模式，該使用哪種模式？在PyTorch論壇上的討論指出，“扇入”應為默認模式。聽起來不錯，除非在論壇討論中提到，Torchvision的ResNet以及VGG使用了“扇出”功能。我在網上做了很多搜索，但是找不到關于此選擇的解釋。

In this script, I decided to use “fan-in” mode, so the code looks like this:

在此腳本中，我決定使用“扇入”模式，因此代碼如下所示：

Just like in AlexNet, we can write some builder functions. Two examples are below:

就像在AlexNet中一樣，我們可以編寫一些構建器函數。以下是兩個示例：

With this, we completed our implementation of VGG family.

至此，我們完成了VGG系列的實施。

Kudos!

榮譽！

完整性檢查 (Sanity Check)

The sanity check for VGG is the same as the one we wrote for AlexNet above. As with AlexNet, “dense evaluation” achieves higher results.

VGG的健全性檢查與我們上面為AlexNet編寫的檢查相同。與AlexNet一樣，“密集評估”可取得更高的結果。

The completed codes for the models can be found in this repository.

可以在此存儲庫中找到模型的完整代碼。

結論 (Conclusion)

In this article, we implemented AlexNet and VGG family. The networks themselves are not difficult to implement, but the idea of using convolutional layers to implement sliding windows, as well as, weight initialization and porting may be tricky to understand.

在本文中，我們實現了AlexNet和VGG系列。網絡本身并不難實現，但是使用卷積層來實現滑動窗口以及權重初始化和端口化的想法可能很難理解。

In the next article, we will write a training script. We will discuss training data augmentation, PyTorch’s data parallelism and distributed data parallelism.

在下一篇文章中，我們將編寫一個培訓腳本。我們將討論訓練數據擴充，PyTorch的數據并行性和分布式數據并行性。

翻譯自: https://medium.com/swlh/scratch-to-sota-build-famous-classification-nets-2-alexnet-vgg-50a4f55f7f56

alexnet vgg

總結

以上是生活随笔為你收集整理的alexnet vgg_从零开始：建立著名的分类网2（AlexNet / VGG）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。