當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

网络模型计算量评估

發布時間：2024/4/18 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了网络模型计算量评估小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

計算量

訪存

計算量

計算性能指標：

● FlOPS: floating point operations per second

計算量指標：

● MACCs or MADDs: multiply accumulate operations

FLOPS和FLOPs的區別：

FLOPS：注意全大寫，是floating point operations per second的縮寫，意指每秒浮點運算次數，理解為計算速度。是一個衡量硬件性能的指標。

FLOPs：注意s小寫，是floating point operations的縮寫（s表復數），意指浮點運算數，理解為計算量。可以用來衡量算法/模型的復雜度。

注意點：MACCs就是乘加次數，FLOPs就是乘與加的次數之和

點乘求和舉例說明：

● y = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + ... + w[n-1]*x[n-1]

w[0]*x[0] + ... 認為是1個MACC，所以是n MACCs

上式乘加表達式中包含n個浮點乘法和n - 1浮點加法，所以是2n-1 FLOPS

一個 MACC差不多是兩個FLOPS

注意點: 嚴格的說，上述公式中只有n-1個加法，比乘法數少一個。這里MACC的數量是一個近似值，就像Big-O符號是一個算法復雜性的近似值一樣。

實際卷積計算量：

關于計算量相關的細節可以參考文章《PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE, ICLR2017》，

假設采用滑動窗實現卷積且忽略非線性計算開銷，則卷積核的FLOPs為:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

神經網絡各層FLOPs計算：

● Full Connected Layer：multiplying a vector of length I with an I × J matrix to get a vector of length J, takes I × J MACCs or (2I - 1) × J FLOPs.

● Activate Layer:? We do not measure these in MACCs but in FLOPs, because they’re not dot products.

● Convolution Layer: K × K × Cin × Hout × Wout × Cout MACCs

● Depthwise-Seperable Layer: (K × K × Cin × Hout × Wout) + (Cin × Hout × Wout × Cout) MACCs

->Cin × Hout × Wout × (K × K + Cout) MACCs

● Factor is K × K × Cout / (K × K + Cout).

訪存

● 計算量僅僅是運算速度的一個方面，另一個重要的方面是內存帶寬(memory bandwidth)，甚至比計算量還重要

● 對現代計算機來說，a single memory access from main memory is much slower than a single computation — by a factor of about 100 or more!

● 一個網絡來說，內存訪問需要訪問多少次呢？對每一層來說，包括了以下內存訪問：

??? 1. 讀取每層的輸入

??? 2. 計算結果：包括了載入權重

??? 3. 輸出每層的結果

Memory for weights

● Full Connected：輸入I/輸出J，總共是(I+1)*J

● Convolutional layers have less weights than fully-connected layers：K × K × Cin × Cout

● 因為內存訪問非常慢，所以大量的內存訪問給網絡運行帶來的很大的影響，甚至超過了計算量

Feature maps and intermediate results

● Convolution Layer:

?(the weights here are negligible)

input = Hin × Win × Cin × K × K × Cout

output = Hout × Wout × Cout

weights = K × K × Cin × Cout + Cout

Example: Cin = 256, Cout = 512, H = W = 28, K = 3,S = 1

1、Normal convolution layer

? ? ?input = 28 × 28 × 256 × 3 × 3 × 512 = 924,844,032

? ? ?output = 28 × 28 × 512 = 401,408

? ? ?weights = 3 × 3 × 256 × 512 + 512 = 1,180,160

? ? ?total = 926,425,600

2、depthwise layer+pointwise layer

? ? 1)depthwise layer

? ? ? ? input = 28 × 28 × 256 × 3 × 3 = 1,806,336

? ? ? ? output = 28 × 28 × 256 = 200,704

? ? ? ? weights = 3 × 3 × 256 + 256 = 2,560

? ? ? ? total = 2,009,600

? ? ?2)pointwise layer

? ? ? ? input = 28 × 28 × 256 × 1 × 1 × 512 = 102,760,448

? ? ? ? output = 28 × 28 × 512 = 401,408

? ? ? ? weights = 1 × 1 × 256 × 512 + 512 = 131,584

? ? ? ? ?total = 103,293,440

? ? ? ? ?total of both layers = 105,303,040

案例研究

● Input dimension: 126x224

MobileNet V1 parameters (multiplier = 1.0): 1.6M

MobileNet V2 parameters (multiplier = 1.0): 0.5M

MobileNet V2 parameters (multiplier = 1.4): 1.0M

MobileNet V1 MACCs (multiplier = 1.0): 255M

MobileNet V2 MACCs (multiplier = 1.0): 111M

MobileNet V2 MACCs (multiplier = 1.4): 214M

MobileNet V1 memory accesses (multiplier = 1.0): 283M

MobileNet V2 memory accesses (multiplier = 1.0): 159M

MobileNet V2 memory accesses (multiplier = 1.4): 286M

MobileNet V2 (multiplier = 1.4) is slightly slower than MobileNet V1 (multiplier = 1.0)

This provides some proof for my hypothesis that the amount of memory accesses is the primary factor for determining the speed of the neural net.

結論

“I hope this shows that all these things — number of computations, number of parameters, and number of memory accesses — are deeply related. A model that works well on mobile needs to carefully balance those factors.”

總結

以上是生活随笔為你收集整理的网络模型计算量评估的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。