當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

MXNet中依赖库介绍及简单使用

發布時間：2023/11/27 生活经验 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 MXNet中依赖库介绍及简单使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

MXNet是一種開源的深度學習框架，核心代碼是由C++實現，在編譯源碼的過程中，它需要依賴其它幾種開源庫，這里對MXNet依賴的開源庫進行簡單的說明：

1. OpenBLAS：全稱為Open Basic Linear Algebra Subprograms，是開源的基本線性代數子程序庫，是一個優化的高性能多核BLAS庫，主要包括矩陣與矩陣、矩陣與向量、向量與向量等操作。它的License是BSD-3-Clause，可以商用，目前最新的發布版本是0.3.3。它的源碼放在GitHub上，由張先軼老師等持續維護。

OpenBLAS是由中科院軟件所并行軟件與計算科學實驗室發起的基于GotoBLAS2 1.13 BSD版的開源BLAS庫高性能實現。

BLAS是一個應用程序接口(API)標準，用以規范發布基礎線性代數操作的數值庫(如矢量或矩陣乘法)。該程序集最初發布于1979年，并用于建立更大的數值程序包(如LAPACK)。在高性能計算領域，BLAS被廣泛使用。

測試代碼如下(openblas_test.cpp)：

#include "openblas_test.hpp"
#include <iostream>
#include <cblas.h>int test_openblas_1()
{int th_model = openblas_get_parallel();switch (th_model) {case OPENBLAS_SEQUENTIAL:printf("OpenBLAS is compiled sequentially.\n");break;case OPENBLAS_THREAD:printf("OpenBLAS is compiled using the normal threading model\n");break;case OPENBLAS_OPENMP:printf("OpenBLAS is compiled using OpenMP\n");break;}int n = 2;double* x = (double*)malloc(n*sizeof(double));double* upperTriangleResult = (double*)malloc(n*(n + 1)*sizeof(double) / 2);for (int j = 0; j<n*(n + 1) / 2; j++)upperTriangleResult[j] = 0;x[0] = 1; x[1] = 3;cblas_dspr(CblasRowMajor, CblasUpper, n, 1, x, 1, upperTriangleResult);double*& A = upperTriangleResult;std::cout << A[0] << "\t" << A[1] << std::endl << "*\t" << A[2] << std::endl;free(upperTriangleResult);free(x);return 0;
}

執行結果如下：

2. DLPack：僅有一個頭文件dlpack.h。DLPack是一種開放的內存張量(tensor)結構，用于在不同框架之間共享張量，如Tensorflow, PyTorch, NXNet，不發生任何數據復制或拷貝。

dlpack.h文件中包括兩個枚舉類型，四個結構體：

枚舉類型DLDeviceType：支持的設備類型包括CPU、CUDA GPU、OpenCL、Apple GPU、AMD GPU等。

枚舉類型DLDataTypeCode：支持的數據類型包括有符號int、無符號int、float。

結構體DLContext：A Device context for Tensor and operator，數據成員包括設備類型和設備id。

結構體DLDataType：tensor支持的數據類型，數據成員包括code基本類型，值必須為DLDataTypeCode支持的；位數(bits)可以是8,16,32；類型的lanes數。

結構體DLTensor：tensor對象，不管理內存。數據成員包括數據指針(void*)、DLContext、維數、DLDataType、tensor的shape、tensor的stride、數據開始指針的字節偏移量。

結構體DLManagedTensor：管理DLTensor內存。

3. MShadow：全稱Matrix Shadow，用C++/CUDA實現的輕量級的CPU/GPU矩陣和tensor模板庫。它的文件全部為.h或.cuh，使用時直接include即可。注意：如果在工程屬性預處理器定義中沒有加入MSHADOW_STAND_ALONE，則需要包括額外的CBLAS或MKL或CUDA的支持。如果不依賴其它庫，定義MSHADOW_STAND_ALONE，則會導致有些函數沒有實現，如dot_engine-inl.h中，函數體中會包括語句：LOG(FATAL) << “Not implemented!”;

這里為了測試不開啟MSHADOW_STAND_ALONE宏，僅開啟MSHADOW_USE_CBLAS宏。

測試代碼如下(mshadow_test.cpp)：

#include "mshadow_test.hpp"
#include <iostream>
#include <cmath>
#include "mshadow/tensor.h"// reference: mshadow source code: mshadow/guideint test_mshadow_1()
{// intialize tensor engine before using tensor operationmshadow::InitTensorEngine<mshadow::cpu>();// assume we have a float spacefloat data[20];// create a 2 x 5 x 2 tensor, from existing spacemshadow::Tensor<mshadow::cpu, 3> ts(data, mshadow::Shape3(2, 5, 2));// take first subscript of the tensormshadow::Tensor<mshadow::cpu, 2> mat = ts[0];// Tensor object is only a handle, assignment means they have same data content// we can specify content type of a Tensor, if not specified, it is float bydefaultmshadow::Tensor<mshadow::cpu, 2, float> mat2 = mat;mat = mshadow::Tensor<mshadow::cpu, 1>(data, mshadow::Shape1(10)).FlatTo2D();// shaape of matrix, note size order is same as numpyfprintf(stdout, "%u X %u matrix\n", mat.size(0), mat.size(1));// initialize all element to zeromat = 0.0f;// assign some valuesmat[0][1] = 1.0f; mat[1][0] = 2.0f;// elementwise operationsmat += (mat + 10.0f) / 10.0f + 2.0f;// print out matrix, note: mat2 and mat1 are handles(pointers)for (mshadow::index_t i = 0; i < mat.size(0); ++i) {for (mshadow::index_t j = 0; j < mat.size(1); ++j) {fprintf(stdout, "%.2f ", mat2[i][j]);}fprintf(stdout, "\n");}mshadow::TensorContainer<mshadow::cpu, 2> lhs(mshadow::Shape2(2, 3)), rhs(mshadow::Shape2(2, 3)), ret(mshadow::Shape2(2, 2));lhs = 1.0;rhs = 1.0;ret = mshadow::expr::implicit_dot(lhs, rhs.T());mshadow::VectorDot(ret[0].Slice(0, 1), lhs[0], rhs[0]);fprintf(stdout, "vdot=%f\n", ret[0][0]);int cnt = 0;for (mshadow::index_t i = 0; i < ret.size(0); ++i) {for (mshadow::index_t j = 0; j < ret.size(1); ++j) {fprintf(stdout, "%.2f ", ret[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "\n");for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {lhs[i][j] = cnt++;fprintf(stdout, "%.2f ", lhs[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "\n");mshadow::TensorContainer<mshadow::cpu, 1> index(mshadow::Shape1(2)), choosed(mshadow::Shape1(2));index[0] = 1; index[1] = 2;choosed = mshadow::expr::mat_choose_row_element(lhs, index);for (mshadow::index_t i = 0; i < choosed.size(0); ++i) {fprintf(stdout, "%.2f ", choosed[i]);}fprintf(stdout, "\n");mshadow::TensorContainer<mshadow::cpu, 2> recover_lhs(mshadow::Shape2(2, 3)), small_mat(mshadow::Shape2(2, 3));small_mat = -100.0f;recover_lhs = mshadow::expr::mat_fill_row_element(small_mat, choosed, index);for (mshadow::index_t i = 0; i < recover_lhs.size(0); ++i) {for (mshadow::index_t j = 0; j < recover_lhs.size(1); ++j) {fprintf(stdout, "%.2f ", recover_lhs[i][j] - lhs[i][j]);}}fprintf(stdout, "\n");rhs = mshadow::expr::one_hot_encode(index, 3);for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {fprintf(stdout, "%.2f ", rhs[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "\n");mshadow::TensorContainer<mshadow::cpu, 1> idx(mshadow::Shape1(3));idx[0] = 8;idx[1] = 0;idx[2] = 1;mshadow::TensorContainer<mshadow::cpu, 2> weight(mshadow::Shape2(10, 5));mshadow::TensorContainer<mshadow::cpu, 2> embed(mshadow::Shape2(3, 5));for (mshadow::index_t i = 0; i < weight.size(0); ++i) {for (mshadow::index_t j = 0; j < weight.size(1); ++j) {weight[i][j] = i;}}embed = mshadow::expr::take(idx, weight);for (mshadow::index_t i = 0; i < embed.size(0); ++i) {for (mshadow::index_t j = 0; j < embed.size(1); ++j) {fprintf(stdout, "%.2f ", embed[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "\n\n");weight = mshadow::expr::take_grad(idx, embed, 10);for (mshadow::index_t i = 0; i < weight.size(0); ++i) {for (mshadow::index_t j = 0; j < weight.size(1); ++j) {fprintf(stdout, "%.2f ", weight[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "upsampling\n");#ifdef small
#undef small
#endifmshadow::TensorContainer<mshadow::cpu, 2> small(mshadow::Shape2(2, 2));small[0][0] = 1.0f;small[0][1] = 2.0f;small[1][0] = 3.0f;small[1][1] = 4.0f;mshadow::TensorContainer<mshadow::cpu, 2> large(mshadow::Shape2(6, 6));large = mshadow::expr::upsampling_nearest(small, 3);for (mshadow::index_t i = 0; i < large.size(0); ++i) {for (mshadow::index_t j = 0; j < large.size(1); ++j) {fprintf(stdout, "%.2f ", large[i][j]);}fprintf(stdout, "\n");}small = mshadow::expr::pool<mshadow::red::sum>(large, small.shape_, 3, 3, 3, 3);// shutdown tensor enigne after usagefor (mshadow::index_t i = 0; i < small.size(0); ++i) {for (mshadow::index_t j = 0; j < small.size(1); ++j) {fprintf(stdout, "%.2f ", small[i][j]);}fprintf(stdout, "\n");}fprintf(stdout, "mask\n");mshadow::TensorContainer<mshadow::cpu, 2> mask_data(mshadow::Shape2(6, 8));mshadow::TensorContainer<mshadow::cpu, 2> mask_out(mshadow::Shape2(6, 8));mshadow::TensorContainer<mshadow::cpu, 1> mask_src(mshadow::Shape1(6));mask_data = 1.0f;for (int i = 0; i < 6; ++i) {mask_src[i] = static_cast<float>(i);}mask_out = mshadow::expr::mask(mask_src, mask_data);for (mshadow::index_t i = 0; i < mask_out.size(0); ++i) {for (mshadow::index_t j = 0; j < mask_out.size(1); ++j) {fprintf(stdout, "%.2f ", mask_out[i][j]);}fprintf(stdout, "\n");}mshadow::ShutdownTensorEngine<mshadow::cpu>();return 0;
}// user defined unary operator addone
struct addone {// map can be template functiontemplate<typename DType>MSHADOW_XINLINE static DType Map(DType a) {return  a + static_cast<DType>(1);}
};
// user defined binary operator max of two
struct maxoftwo {// map can also be normal functions,// however, this can only be applied to float tensorMSHADOW_XINLINE static float Map(float a, float b) {if (a > b) return a;else return b;}
};int test_mshadow_2()
{// intialize tensor engine before using tensor operation, needed for CuBLASmshadow::InitTensorEngine<mshadow::cpu>();// take first subscript of the tensormshadow::Stream<mshadow::cpu> *stream_ = mshadow::NewStream<mshadow::cpu>(0);mshadow::Tensor<mshadow::cpu, 2, float> mat = mshadow::NewTensor<mshadow::cpu>(mshadow::Shape2(2, 3), 0.0f, stream_);mshadow::Tensor<mshadow::cpu, 2, float> mat2 = mshadow::NewTensor<mshadow::cpu>(mshadow::Shape2(2, 3), 0.0f, stream_);mat[0][0] = -2.0f;mat = mshadow::expr::F<maxoftwo>(mshadow::expr::F<addone>(mat) + 0.5f, mat2);for (mshadow::index_t i = 0; i < mat.size(0); ++i) {for (mshadow::index_t j = 0; j < mat.size(1); ++j) {fprintf(stdout, "%.2f ", mat[i][j]);}fprintf(stdout, "\n");}mshadow::FreeSpace(&mat); mshadow::FreeSpace(&mat2);mshadow::DeleteStream(stream_);// shutdown tensor enigne after usagemshadow::ShutdownTensorEngine<mshadow::cpu>();return 0;
}

其中test_mshadow_2的執行結果如下：

4. DMLC-Core：全稱Distributed Machine Learning Common Codebase，它是支持所有DMLC項目的基礎模塊，用于構建高效且可擴展的分布式機器學習通用庫。

測試代碼如下(dmlc_test.cpp)：

#include "dmlc_test.hpp"
#include <iostream>
#include <cstdio>
#include <functional>
#include <dmlc/parameter.h>
#include <dmlc/registry.h>// reference: dmlc-core/example and dmlc-core/teststruct MyParam : public dmlc::Parameter<MyParam> {float learning_rate;int num_hidden;int activation;std::string name;// declare parameters in header fileDMLC_DECLARE_PARAMETER(MyParam) {DMLC_DECLARE_FIELD(num_hidden).set_range(0, 1000).describe("Number of hidden unit in the fully connected layer.");DMLC_DECLARE_FIELD(learning_rate).set_default(0.01f).describe("Learning rate of SGD optimization.");DMLC_DECLARE_FIELD(activation).add_enum("relu", 1).add_enum("sigmoid", 2).describe("Activation function type.");DMLC_DECLARE_FIELD(name).set_default("mnet").describe("Name of the net.");// user can also set nhidden besides num_hiddenDMLC_DECLARE_ALIAS(num_hidden, nhidden);DMLC_DECLARE_ALIAS(activation, act);}
};// register it in cc file
DMLC_REGISTER_PARAMETER(MyParam);int test_dmlc_parameter()
{int argc = 4;char* argv[4] = {
#ifdef _DEBUG"E:/GitCode/MXNet_Test/lib/dbg/x64/ThirdPartyLibrary_Test.exe",
#else"E:/GitCode/MXNet_Test/lib/rel/x64/ThirdPartyLibrary_Test.exe",
#endif"num_hidden=100","name=aaa","activation=relu"};MyParam param;std::map<std::string, std::string> kwargs;for (int i = 0; i < argc; ++i) {char name[256], val[256];if (sscanf(argv[i], "%[^=]=%[^\n]", name, val) == 2) {kwargs[name] = val;}}fprintf(stdout, "Docstring\n---------\n%s", MyParam::__DOC__().c_str());fprintf(stdout, "start to set parameters ...\n");param.Init(kwargs);fprintf(stdout, "-----\n");fprintf(stdout, "param.num_hidden=%d\n", param.num_hidden);fprintf(stdout, "param.learning_rate=%f\n", param.learning_rate);fprintf(stdout, "param.name=%s\n", param.name.c_str());fprintf(stdout, "param.activation=%d\n", param.activation);return 0;
}namespace tree {struct Tree {virtual void Print() = 0;virtual ~Tree() {}};struct BinaryTree : public Tree {virtual void Print() {printf("I am binary tree\n");}};struct AVLTree : public Tree {virtual void Print() {printf("I am AVL tree\n");}};// registry to get the treesstruct TreeFactory: public dmlc::FunctionRegEntryBase<TreeFactory, std::function<Tree*()> > {};#define REGISTER_TREE(Name)                                             \DMLC_REGISTRY_REGISTER(::tree::TreeFactory, TreeFactory, Name)        \.set_body([]() { return new Name(); } )DMLC_REGISTRY_FILE_TAG(my_tree);}  // namespace tree// usually this sits on a seperate file
namespace dmlc {DMLC_REGISTRY_ENABLE(tree::TreeFactory);
}namespace tree {// Register the trees, can be in seperate filesREGISTER_TREE(BinaryTree).describe("This is a binary tree.");REGISTER_TREE(AVLTree);DMLC_REGISTRY_LINK_TAG(my_tree);
}int test_dmlc_registry()
{// construct a binary treetree::Tree *binary = dmlc::Registry<tree::TreeFactory>::Find("BinaryTree")->body();binary->Print();// construct a binary treetree::Tree *avl = dmlc::Registry<tree::TreeFactory>::Find("AVLTree")->body();avl->Print();delete binary;delete avl;return 0;
}

其中test_dmlc_parameter的執行結果如下：

5. TVM：深度學習系統的編譯器堆棧(compiler stack)。它旨在縮小深度學習框架與以性能、效率為重點的硬件后端之間的差距。它與深度學習框架協同工作，為不同的后端提供端到端的編譯。TVM除了依賴dlpack、dmlc-core外，還依賴HalideIR。而且編譯TVM時，一大堆C2440、C2664錯誤，即無法從一種類型轉換為另一種類型的錯誤。因為在編譯MXNet源碼時，目前MXNet僅需要tvm源碼nnvm/src下的c_api, core, pass三個目錄的文件參與編譯，因此后面再調試TVM庫。

6. OpenCV：可選的，編譯過程可參考：?https://blog.csdn.net/fengbingchun/article/details/84030309??

7. CUDA和cudnn：可選的，編譯過程可參考：https://blog.csdn.net/fengbingchun/article/details/53892997

GitHub：??https://github.com/fengbingchun/MXNet_Test

總結

以上是生活随笔為你收集整理的MXNet中依赖库介绍及简单使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

简单
MXNet

上一篇：非对称加密算法RSA公钥私钥的模数和指数
下一篇： Windows10上使用VS2017编译