TVM性能评估分析(五)
TVM性能評估分析(五)
Figure 3. A futher speed up with operator fusion
Table 1. Performance issue of cuBLAS’ batch matmul
Table 2. Finding the best combination of number_thread. The results are obtained on a NVIDIA M40 GPU device with CUDA8.0.
Figure 4. DLPack provides an intermediate wrapper that is shared between frameworks and TVM
Figure 5. The OpenGL/WebGL Backend
Figure 6. TVM utilizes a unified AST to define kernels, and compiles it to code on different platforms.
Figure 7. The benchmark is run in 4 different settings
Figure 8. Inference Speed of Different Backends on ImageNet
Figure 9. Mali T860 and T880
Figure 10. Inference Speed of Different Backends on ImageNet
Table 3. Inference Speed of FP16 on ImageNet
總結
以上是生活随笔為你收集整理的TVM性能评估分析(五)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: TVM性能评估分析(四)
- 下一篇: TVM性能评估分析(六)