android sdk 固态硬盘,使用TVM在android中进行Mobilenet SSD部署
所謂TVM,按照正式說(shuō)法:就是一種將深度學(xué)習(xí)工作負(fù)載部署到硬件的端到端IR(中間表示)堆棧。換一種說(shuō)法,可以表述為一種把深度學(xué)習(xí)模型分發(fā)到各種硬件設(shè)備上的、端到端的解決方案,關(guān)于更多TVM的信息大家可以參考TVM主頁(yè)。
首發(fā):https://zhuanlan.zhihu.com/p/70982338
作者:張新棟
我們?cè)诙松线M(jìn)行CNN部署的時(shí)候,為了最大化的發(fā)揮硬件的性能,之前的框架多是用的手工調(diào)優(yōu)的算法Op,如NCNN、Tengine、Feather等等;TVM可謂是另辟蹊徑,讓框架去自適應(yīng)的尋找最優(yōu)或次優(yōu)的算子Op,這類的框架應(yīng)該是后續(xù)發(fā)展的主流。
如果大家看過(guò)我之前寫的博客,也許大概有個(gè)印象,一般在嵌入式設(shè)備中要進(jìn)行CNN的部署,一般我都會(huì)進(jìn)行如下幾個(gè)步驟:模型設(shè)計(jì)、模型訓(xùn)練、模型裁剪、在線部署。這里我們不再逐個(gè)展開(kāi)去討論, 我們主要討論如何進(jìn)行在線部署,有關(guān)Mobilenet-ssd的前三步的討論和步驟,大家可以參考本專欄之前的幾篇文章。
如何使用TVM進(jìn)行在線部署呢?抽象來(lái)說(shuō),我們需要進(jìn)行如下幾個(gè)步驟:1. 生成端到端的IR堆棧 2. 使用TVM-runtime進(jìn)行推斷。其中對(duì)性能影響最大的步驟為第一步,也就是我們前面所說(shuō)的,自適應(yīng)的尋找最優(yōu)或者次優(yōu)的算子Op。TVM官方給出的文檔有兩種策略,一種是使用tvm針對(duì)平臺(tái)已經(jīng)調(diào)教過(guò)的配置,另外一種進(jìn)行AutoTVM,用優(yōu)化和搜索的策略針對(duì)目標(biāo)平臺(tái)尋找最優(yōu)或者次優(yōu)的網(wǎng)絡(luò)前傳表示。不過(guò)第二種方法,在android目標(biāo)平臺(tái)中,需要先采用RPC讓目標(biāo)平臺(tái)跟上位機(jī)建立通訊聯(lián)系。下面我們開(kāi)始進(jìn)行Mobilenet-SSD的部署。
生成端到端的IR堆棧# tvm, relay
import tvm
from tvm import relay
# os and numpy
import numpy as np
import os.path
# Tensorflow imports
import tensorflow as tf
# Tensorflow utility functions
import tvm.relay.testing.tf as tf_testing
from PIL import Image
from tvm.contrib import graph_runtime
from tvm.contrib import util, ndk, graph_runtime as runtime
model_name = "./tflite_graph.pb"
arch = 'arm64'
# target_host = 'llvm -target=%s-linux-android' % arch
# target = 'opencl'
# target = tvm.target.mali(model='rk3399')
# target_host = tvm.target.arm_cpu(model='rk3399')
target_host = None
target = tvm.target.arm_cpu(model = 'rk3399')
# target_host = None
# target = 'llvm -target=%s-linux-android' % arch
layout = 'NCHW'
with tf.gfile.FastGFile(model_name, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
graph = tf.import_graph_def(graph_def, name='')
graph_def = tf_testing.ProcessGraphDefParam(graph_def)
shape_dict = {'normalized_input_image_tensor': (1, 300, 300, 3)}
dtype_dict = {'normalized_input_image_tensor': 'float32'}
mod, params = relay.frontend.from_tensorflow(
graph_def,
layout = layout,
shape = shape_dict,
outputs = ['raw_outputs/box_encodings', 'raw_outputs/class_predictions']
)
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(
mod[mod.entry_func],
target=target,
target_host=target_host,
params=params
)
fcompile = ndk.create_shared
lib.export_library("./tvm_android/tvm_lib/deploy_lib.so", fcompile)
with open("./tvm_android/tvm_lib/deploy_graph.json", "w") as fo:
fo.write(graph)
with open("./tvm_android/tvm_lib/deploy_param.params", "wb") as fo:
fo.write(relay.save_param_dict(params))
大家參考上面代碼可以發(fā)現(xiàn),由于我們采用的硬件平臺(tái)為RK3399,這里我們采用的配置文件,使用的是TVM提供的tvm.target.arm\_cpu(model ='rk3399')。輸出的是'raw\_outputs/box\_encodings','raw\_outputs/class\_predictions'兩個(gè)節(jié)點(diǎn)。由于我們是在android中進(jìn)行部署,所以要注意導(dǎo)出IR文件和模型的時(shí)候,加上ndk.create\_shared這個(gè)flag。最后我們完成編譯時(shí),會(huì)得到deploy\_lib.so、deploy\_graph.json、deploy\_param.param三個(gè)文件。
使用TVM-runtime進(jìn)行推斷
TVM的C++源碼提供了一個(gè)非常小型的runtime源碼,首先你要對(duì)該源碼進(jìn)行集成或適配,配置的mk文件大家可以參考如下代碼。其中需要注意把OpenCV\_BASE和TVM\_BASE替換成你自己庫(kù)的所在目錄。LOCAL_PATH := $(call my-dir)
OpenCV_BASE = /Users/xindongzhang/armnn-tflite/OpenCV-android-sdk/
TVM_BASE = /Users/xindongzhang/tvm/
include $(CLEAR_VARS)
LOCAL_MODULE := OpenCL
LOCAL_SRC_FILES := /Users/xindongzhang/Desktop/tvm-ssd/tvm_android/jni/libOpenCL.so
include $(PREBUILT_SHARED_LIBRARY)
include $(CLEAR_VARS)
OpenCV_INSTALL_MODULES := on
OPENCV_LIB_TYPE := STATIC
include $(OpenCV_BASE)/sdk/native/jni/OpenCV.mk
LOCAL_MODULE := tvm_mssd
LOCAL_C_INCLUDES += $(OPENCV_INCLUDE_DIR)
LOCAL_C_INCLUDES += $(TVM_BASE)/include
LOCAL_C_INCLUDES += $(TVM_BASE)/3rdparty/dlpack/include
LOCAL_C_INCLUDES += $(TVM_BASE)/3rdparty/dmlc-core/include
LOCAL_C_INCLUDES += $(TVM_BASE)/3rdparty/HalideIR/src
LOCAL_C_INCLUDES += $(TVM_BASE)/topi/include
LOCAL_SRC_FILES := \
main.cpp \
$(TVM_BASE)/src/runtime/c_runtime_api.cc \
$(TVM_BASE)/src/runtime/cpu_device_api.cc \
$(TVM_BASE)/src/runtime/workspace_pool.cc \
$(TVM_BASE)/src/runtime/module_util.cc \
$(TVM_BASE)/src/runtime/system_lib_module.cc \
$(TVM_BASE)/src/runtime/module.cc \
$(TVM_BASE)/src/runtime/registry.cc \
$(TVM_BASE)/src/runtime/file_util.cc \
$(TVM_BASE)/src/runtime/dso_module.cc \
$(TVM_BASE)/src/runtime/thread_pool.cc \
$(TVM_BASE)/src/runtime/threading_backend.cc \
$(TVM_BASE)/src/runtime/ndarray.cc \
$(TVM_BASE)/src/runtime/graph/graph_runtime.cc \
$(TVM_BASE)/src/runtime/opencl/opencl_device_api.cc \
$(TVM_BASE)/src/runtime/opencl/opencl_module.cc
LOCAL_LDLIBS := -landroid -llog -ldl -lz -fuse-ld=gold
LOCAL_CFLAGS := -O2 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing \
-ffunction-sections -fdata-sections -ffast-math -ftree-vectorize \
-fPIC -Ofast -ffast-math -w -std=c++14
LOCAL_CPPFLAGS := -O2 -fvisibility=hidden -fvisibility-inlines-hidden \
-fomit-frame-pointer -fstrict-aliasing -ffunction-sections \
-fdata-sections -ffast-math -fPIC -Ofast -ffast-math -std=c++14
LOCAL_LDFLAGS += -Wl,--gc-sections
LOCAL_CFLAGS += -fopenmp
LOCAL_CPPFLAGS += -fopenmp
LOCAL_LDFLAGS += -fopenmp
LOCAL_ARM_NEON := true
APP_ALLOW_MISSING_DEPS = true
LOCAL_SHARED_LIBRARIES := \
OpenCL
include $(BUILD_EXECUTABLE)
集成了運(yùn)行時(shí)的runtime環(huán)境后,我們就可以去寫C++的業(yè)務(wù)代碼了,我們這里僅考慮inference,不再去做postprocessing,關(guān)于postprocessing大家可以參考本專欄的另外幾篇文章。業(yè)務(wù)代碼比較簡(jiǎn)單,我就直接貼出來(lái)啦。#include "tvm/runtime/c_runtime_api.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
int main(void)
{
std::ifstream graph_file("./deploy_graph.json");
std::ifstream model_file("./deploy_param.params");
std::string graph_content(
(std::istreambuf_iterator(graph_file)),
std::istreambuf_iterator()
);
std::string model_params(
(std::istreambuf_iterator(model_file)),
std::istreambuf_iterator()
);
tvm::runtime::Module mod_dylib = tvm::runtime::Module::LoadFromFile("./deploy_lib.so");
tvm::runtime::Module mod_syslib = (*tvm::runtime::Registry::Get("module._GetSystemLib"))();
// int device_type = kDLCPU;
// int device_id = 0;
int device_type = kDLOpenCL;
int device_id = 0;
tvm::runtime::Module mod =
(*tvm::runtime::Registry::Get("tvm.graph_runtime.create"))
(graph_content.c_str(), mod_dylib, device_type, device_id);
int INPUT_SIZE = 300;
cv::Mat raw_image = cv::imread("./body.jpg");
int raw_image_height = raw_image.rows;
int raw_image_width = raw_image.cols;
cv::Mat image;
cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));
image.convertTo(image, CV_32FC3);
image = (image * 2.0f / 255.0f) - 1.0f;
TVMByteArray params;
params.data = reinterpret_cast(model_params.c_str());
params.size = model_params.size();
mod.GetFunction("load_params")(params);
std::vector input_shape = {1, 3, 300, 300};
DLTensor input;
input.ctx = DLContext{kDLOpenCL, 0};
// input.ctx = DLContext{kDLCPU, 0};
input.data = image.data;
input.ndim = 4;
input.dtype = DLDataType{kDLFloat, 32, 1};
input.shape = input_shape.data();
input.strides = nullptr;
input.byte_offset = 0;
//warm up
for (int i = 0; i < 3; ++i) {
mod.GetFunction("set_input")("normalized_input_image_tensor", &input);
mod.GetFunction("run")();
}
// cal time
int N = 10;
std::clock_t start = std::clock();
// mod.GetFunction("set_input")("normalized_input_image_tensor", &input);
for (int i = 0; i < N; ++i) {
mod.GetFunction("set_input")("normalized_input_image_tensor", &input);
mod.GetFunction("run")();
}
std::clock_t end = std::clock();
double duration = ( end - start ) * (1.0 / (double) N) / (double) CLOCKS_PER_SEC;
std::cout<< duration<< std::endl;
return 0;
}
至此,我們就使用TVM完成了Mobilenet-SSD在rk3399-android上的部署。
結(jié)尾
這篇文章主要分析了如何使用TVM在RK3399中部署CNN網(wǎng)絡(luò),我們這里選用了Mobilenet SSD檢測(cè)器。本專欄后續(xù)的文章會(huì)繼續(xù)跟進(jìn),跟大家一起探討如何使用AutoTVM的方式進(jìn)行部署。歡迎大家留言討論、關(guān)注本專欄,謝謝大家。
參考/推薦閱讀專注嵌入式端的AI算法實(shí)現(xiàn),歡迎關(guān)注作者微信公眾號(hào)和知乎嵌入式AI算法實(shí)現(xiàn)專欄。
更多嵌入式AI相關(guān)的技術(shù)文章請(qǐng)關(guān)注極術(shù)嵌入式AI專欄。
總結(jié)
以上是生活随笔為你收集整理的android sdk 固态硬盘,使用TVM在android中进行Mobilenet SSD部署的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Linux自定义日志文件设置回滚(避免信
- 下一篇: 用户分享率高达87% KilaKila恋