deeplearning模型库
deeplearning模型庫
- 圖像分類
數據集:ImageNet1000類
1.1 量化
分類模型Lite時延(ms)
設備 模型類型 壓縮策略 armv7 Thread 1 armv7 Thread 2 armv7 Thread 4 armv8 Thread 1 armv8 Thread 2 armv8 Thread 4
高通835 MobileNetV1 FP32 baseline 96.1942 53.2058 32.4468 88.4955 47.95 27.5189
高通835 MobileNetV1 quant_aware 60.8186 32.1931 16.4275 56.4311 29.5446 15.1053
高通835 MobileNetV1 quant_post 60.5615 32.4016 16.6596 56.5266 29.7178 15.1459
高通835 MobileNetV2 FP32 baseline 65.715 38.1346 25.155 61.3593 36.2038 22.849
高通835 MobileNetV2 quant_aware 48.3655 30.2021 21.9303 46.1487 27.3146 18.3053
高通835 MobileNetV2 quant_post 48.3495 30.3069 22.1506 45.8715 27.4105 18.2223
高通835 ResNet50 FP32 baseline 526.811 319.6486 205.8345 506.1138 335.1584 214.8936
高通835 ResNet50 quant_aware 475.4538 256.8672 139.699 461.7344 247.9506 145.9847
高通835 ResNet50 quant_post 476.0507 256.5963 139.7266 461.9176 248.3795 149.353
高通855 MobileNetV1 FP32 baseline 33.5086 19.5773 11.7534 31.3474 18.5382 10.0811
高通855 MobileNetV1 quant_aware 36.7067 21.628 11.0372 14.0238 8.199 4.2588
高通855 MobileNetV1 quant_post 37.0498 21.7081 11.0779 14.0947 8.1926 4.2934
高通855 MobileNetV2 FP32 baseline 25.0396 15.2862 9.6609 22.909 14.1797 8.8325
高通855 MobileNetV2 quant_aware 28.1583 18.3317 11.8103 16.9158 11.1606 7.4148
高通855 MobileNetV2 quant_post 28.1631 18.3917 11.8333 16.9399 11.1772 7.4176
高通855 ResNet50 FP32 baseline 185.3705 113.0825 87.0741 177.7367 110.0433 74.4114
高通855 ResNet50 quant_aware 327.6883 202.4536 106.243 243.5621 150.0542 78.4205
高通855 ResNet50 quant_post 328.2683 201.9937 106.744 242.6397 150.0338 79.8659
麒麟970 MobileNetV1 FP32 baseline 101.2455 56.4053 35.6484 94.8985 51.7251 31.9511
麒麟970 MobileNetV1 quant_aware 62.5012 32.1863 16.6018 57.7477 29.2116 15.0703
麒麟970 MobileNetV1 quant_post 62.4412 32.2585 16.6215 57.825 29.2573 15.1206
麒麟970 MobileNetV2 FP32 baseline 70.4176 42.0795 25.1939 68.9597 39.2145 22.6617
麒麟970 MobileNetV2 quant_aware 52.9961 31.5323 22.1447 49.4858 28.0856 18.7287
麒麟970 MobileNetV2 quant_post 53.0961 31.7987 21.8334 49.383 28.2358 18.3642
麒麟970 ResNet50 FP32 baseline 586.8943 344.0858 228.2293 573.3344 351.4332 225.8006
麒麟970 ResNet50 quant_aware 488.361 260.1697 142.416 479.5668 249.8485 138.1742
麒麟970 ResNet50 quant_post 489.6188 258.3279 142.6063 480.0064 249.5339 138.5284
1.2 剪裁
PaddleLite推理耗時說明:
環境:Qualcomm SnapDragon 845 + armv8
速度指標:Thread1/Thread2/Thread4耗時
PaddleLite版本: v2.3
1.3 蒸餾
注意:帶”_vd”后綴代表該預訓練模型使用了Mixup,Mixup相關介紹參考mixup: Beyond Empirical Risk Minimization
1.4 搜索
數據集: ImageNet1000
Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, 2, 0, 3, 4, 5, 0, 4, 5, 5, 1, 4, 8, 0, 0]. Darts_SA的token是:[5, 5, 0, 5, 5, 10, 7, 7, 5, 7, 7, 11, 10, 12, 10, 0, 5, 3, 10, 8].
2. 目標檢測
2.1 量化
數據集: COCO 2017
模型 壓縮方法 數據集 Image/GPU 輸入608 Box AP 輸入416 Box AP 輸入320 Box AP 模型體積(MB) TensorRT時延(V100, ms)
MobileNet-V1-YOLOv3 - COCO 8 29.3 29.3 27.1 95 -
MobileNet-V1-YOLOv3 quant_post COCO 8 27.9 (-1.4) 28.0 (-1.3) 26.0 (-1.0) 25 -
MobileNet-V1-YOLOv3 quant_aware COCO 8 28.1 (-1.2) 28.2 (-1.1) 25.8 (-1.2) 26.3 -
R34-YOLOv3 - COCO 8 36.2 34.3 31.4 162 -
R34-YOLOv3 quant_post COCO 8 35.7 (-0.5) - - 42.7 -
R34-YOLOv3 quant_aware COCO 8 35.2 (-1.0) 33.3 (-1.0) 30.3 (-1.1) 44 -
R50-dcn-YOLOv3 obj365_pretrain - COCO 8 41.4 - - 177 18.56
R50-dcn-YOLOv3 obj365_pretrain quant_aware COCO 8 40.6 (-0.8) 37.5 34.1 66 14.64
數據集:WIDER-FACE
模型 壓縮方法 Image/GPU 輸入尺寸 Easy/Medium/Hard 模型體積(MB)
BlazeFace - 8 640 91.5/89.2/79.7 815
BlazeFace quant_post 8 640 87.8/85.1/74.9 (-3.7/-4.1/-4.8) 228
BlazeFace quant_aware 8 640 90.5/87.9/77.6 (-1.0/-1.3/-2.1) 228
BlazeFace-Lite - 8 640 90.9/88.5/78.1 711
BlazeFace-Lite quant_post 8 640 89.4/86.7/75.7 (-1.5/-1.8/-2.4) 211
BlazeFace-Lite quant_aware 8 640 89.7/87.3/77.0 (-1.2/-1.2/-1.1) 211
BlazeFace-NAS - 8 640 83.7/80.7/65.8 244
BlazeFace-NAS quant_post 8 640 81.6/78.3/63.6 (-2.1/-2.4/-2.2) 71
BlazeFace-NAS quant_aware 8 640 83.1/79.7/64.2 (-0.6/-1.0/-1.6) 71
2.2 剪裁
數據集:Pasacl VOC & COCO 2017
PaddleLite推理耗時說明:
環境:Qualcomm SnapDragon 845 + armv8
速度指標:Thread1/Thread2/Thread4耗時
PaddleLite版本: v2.3
模型 壓縮方法 數據集 Image/GPU 輸入608 Box AP 輸入416 Box AP 輸入320 Box AP 模型體積(MB) GFLOPs (608608) PaddleLite推理耗時(ms)(608608) TensorRT推理速度(FPS)(608*608)
MobileNet-V1-YOLOv3 Baseline Pascal VOC 8 76.2 76.7 75.3 94 40.49 1238\796.943\520.101 60.04
MobileNet-V1-YOLOv3 sensitive -52.88% Pascal VOC 8 77.6 (+1.4) 77.7 (1.0) 75.5 (+0.2) 31 19.08 602.497\353.759\222.427 99.36
MobileNet-V1-YOLOv3 - COCO 8 29.3 29.3 27.0 95 41.35 - -
MobileNet-V1-YOLOv3 sensitive -51.77% COCO 8 26.0 (-3.3) 25.1 (-4.2) 22.6 (-4.4) 32 19.94 - 73.93
R50-dcn-YOLOv3 - COCO 8 39.1 - - 177 89.60 - 27.68
R50-dcn-YOLOv3 sensitive -9.37% COCO 8 39.3 (+0.2) - - 150 81.20 - 30.08
R50-dcn-YOLOv3 sensitive -24.68% COCO 8 37.3 (-1.8) - - 113 67.48 - 34.32
R50-dcn-YOLOv3 obj365_pretrain - COCO 8 41.4 - - 177 89.60 - -
R50-dcn-YOLOv3 obj365_pretrain sensitive -9.37% COCO 8 40.5 (-0.9) - - 150 81.20 - -
R50-dcn-YOLOv3 obj365_pretrain sensitive -24.68% COCO 8 37.8 (-3.3) - - 113 67.48 - -
2.3 蒸餾
數據集:Pasacl VOC & COCO 2017
2.4 搜索
數據集:WIDER-FACE
Note: 硬件延時時間是利用提供的硬件延時表得到的,硬件延時表是在855芯片上基于PaddleLite測試的結果。BlazeFace-NASV2的詳細配置在這里.
3. 圖像分割
數據集:Cityscapes
3.1 量化
圖像分割模型Lite時延(ms), 輸入尺寸769x769
設備 模型類型 壓縮策略 armv7 Thread 1 armv7 Thread 2 armv7 Thread 4 armv8 Thread 1 armv8 Thread 2 armv8 Thread 4
高通835 Deeplabv3- MobileNetV1 FP32 baseline 1227.9894 734.1922 527.9592 1109.96 699.3818 479.0818
高通835 Deeplabv3- MobileNetV1 quant_aware 848.6544 512.785 382.9915 752.3573 455.0901 307.8808
高通835 Deeplabv3- MobileNetV1 quant_post 840.2323 510.103 371.9315 748.9401 452.1745 309.2084
高通835 Deeplabv3-MobileNetV2 FP32 baseline 1282.8126 793.2064 653.6538 1193.9908 737.1827 593.4522
高通835 Deeplabv3-MobileNetV2 quant_aware 976.0495 659.0541 513.4279 892.1468 582.9847 484.7512
高通835 Deeplabv3-MobileNetV2 quant_post 981.44 658.4969 538.6166 885.3273 586.1284 484.0018
高通855 Deeplabv3- MobileNetV1 FP32 baseline 568.8748 339.8578 278.6316 420.6031 281.3197 217.5222
高通855 Deeplabv3- MobileNetV1 quant_aware 608.7578 347.2087 260.653 241.2394 177.3456 143.9178
高通855 Deeplabv3- MobileNetV1 quant_post 609.0142 347.3784 259.9825 239.4103 180.1894 139.9178
高通855 Deeplabv3-MobileNetV2 FP32 baseline 639.4425 390.1851 322.7014 477.7667 339.7411 262.2847
高通855 Deeplabv3-MobileNetV2 quant_aware 703.7275 497.689 417.1296 394.3586 300.2503 239.9204
高通855 Deeplabv3-MobileNetV2 quant_post 705.7589 474.4076 427.2951 394.8352 297.4035 264.6724
麒麟970 Deeplabv3- MobileNetV1 FP32 baseline 1682.1792 1437.9774 1181.0246 1261.6739 1068.6537 690.8225
麒麟970 Deeplabv3- MobileNetV1 quant_aware 1062.3394 1248.1014 878.3157 774.6356 710.6277 528.5376
麒麟970 Deeplabv3- MobileNetV1 quant_post 1109.1917 1339.6218 866.3587 771.5164 716.5255 500.6497
麒麟970 Deeplabv3-MobileNetV2 FP32 baseline 1771.1301 1746.0569 1222.4805 1448.9739 1192.4491 760.606
麒麟970 Deeplabv3-MobileNetV2 quant_aware 1320.2905 921.4522 676.0732 1145.8801 821.5685 590.1713
麒麟970 Deeplabv3-MobileNetV2 quant_post 1320.386 918.5328 672.2481 1020.753 820.094 591.4114
3.2 剪裁
PaddleLite推理耗時說明:
環境:Qualcomm SnapDragon 845 + armv8
速度指標:Thread1/Thread2/Thread4耗時
PaddleLite版本: v2.3
總結
以上是生活随笔為你收集整理的deeplearning模型库的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 硬件delay评估表
- 下一篇: deeplearning算法优化原理