當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【caffe解读】 caffe从数学公式到代码实现5-caffe中的卷积

發(fā)布時間：2025/3/20 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了【caffe解读】 caffe从数学公式到代码实现5-caffe中的卷积小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章首發(fā)于微信公眾號《與有三學(xué)AI》

[caffe解讀] caffe從數(shù)學(xué)公式到代碼實現(xiàn)5-caffe中的卷積

今天要講的就是跟卷積相關(guān)的一些layer了

im2col_layer.cpp

base_conv_layer.cpp

conv_layer.cpp

deconv_layer.cpp

inner_product_layer.cpp

01 im2col_layer.cpp

這是caffe里面的重要操作，caffe為什么這么耗顯存，跟這個有很大關(guān)系。im2col的目的，就是把要滑動卷積的圖像，先一次性存起來，然后再進(jìn)行矩陣乘操作。簡單來說，它的輸入是一個C*H*W的blob，經(jīng)過im2col操作會變成K'?x?(H?xW)?的矩陣，其中K'?=C*kernel_r*kernel_r，kernel_r就是卷積核的大小，這里只看正方形的卷積核。

如果不用這樣的操作，賈揚清有一個吐槽，對于輸入大小為W*H，維度為D的blob，卷積核為M*K*K，那么如果利用for循環(huán)，會是這樣的一個操作，6層for循環(huán)，計算效率是極其低下的。

for?w?in?1..W
for?h?in?1..H
??for?x?in?1..K
????for?y?in?1..K
??????for?m?in?1..M
????????for?d?in?1..D
??????????output(w,?h,?m)?+=?input(w+x,?h+y,?d)?*filter(m,?x,?y,?d)
????????end
??????end
????end
??end
end
end

具體im2col是什么原理呢？先貼出賈揚清的回答，https://www.zhihu.com/question/28385679

上面說了，要把C*H*W的blob，變成K'?x?(H?x?W)或者?(H?xW)?xK'?的矩陣,把filters也復(fù)制成一個大矩陣，這樣兩者直接相乘就得到結(jié)果，下面看一個簡單小例子。

借用網(wǎng)友一張圖，雖然和caffe細(xì)節(jié)上不同，但是還是有助于理解。http://blog.csdn.net/mrhiuser/article/details/52672824

4*4的原始數(shù)據(jù)，進(jìn)行stride=1的3*3操作，其中im2col的操作就是：

也就是說4*4的矩陣，經(jīng)過了im2col后，變成了9*4的矩陣，卷積核可以做同樣擴(kuò)展，卷積操作就變成了兩個矩陣相乘。

下面看im2col的代碼；

template?<typename?Dtype>
void?im2col_cpu(const?Dtype*?data_im,?const?intchannels,
??const?int?height,?const?int?width,?const?int?kernel_h,const?int?kernel_w,
??const?int?pad_h,?const?int?pad_w,
??const?int?stride_h,?const?int?stride_w,
??const?int?dilation_h,?const?int?dilation_w,
Dtype*?data_col)?{
?//輸入為data_im，kernel_h，kernel_w以及各類卷積參數(shù)，輸出就是data_col。
?//out_put_h，out_put_w，是輸出的圖像尺寸。
const?int?output_h?=?(height?+?2?*?pad_h?-
??(dilation_h?*?(kernel_h?-?1)?+?1))?/?stride_h?+?1;
const?int?output_w?=?(width?+?2?*?pad_w?-
??(dilation_w?*?(kernel_w?-?1)?+?1))?/?stride_w?+?1;
const?int?channel_size?=?height?*?width;
//外層channel循環(huán)不管
for?(int?channel?=?channels;?channel--;?data_im?+=channel_size)?{
???//這是一個關(guān)于kernel_row和kernel_col的2層循環(huán)?
??for?(int?kernel_row?=?0;?kernel_row?<?kernel_h;kernel_row++)?{
for?(int?kernel_col?=?0;?kernel_col?<?kernel_w;kernel_col++)?{
??????int?input_row?=?-pad_h?+?kernel_row?*?dilation_h;
???????//這是一個關(guān)于output_h和output_w的循環(huán)，這實際上就是上圖例子中每一行的數(shù)據(jù)?
?for?(int?output_rows?=?output_h;?output_rows;output_rows--)?{
?????????//邊界條件屬特殊情況，可以細(xì)下推敲
????????if?(!is_a_ge_zero_and_a_lt_b(input_row,?height))?{
for?(int?output_cols?=?output_w;?output_cols;output_cols--)?{
????????????*(data_col++)?=?0;
??????????}
????????}?else?{
??????????int?input_col?=?-pad_w?+?kernel_col?*dilation_w;
??????????for?(int?output_col?=?output_w;?output_col;output_col--)?{
if?(is_a_ge_zero_and_a_lt_b(input_col,?width))?{
//這就是核心的賦值語句，按照循環(huán)的順序，我們可以知道是按照輸出output_col*output_h的尺寸，一截一截地串接成了一個col。
??????????????*(data_col++)?=?data_im[input_row?*?width+?input_col];
????????????}?else?{
??????????????*(data_col++)?=?0;
????????????}
????????????input_col?+=?stride_w;
??????????}
????????}
????????input_row?+=?stride_h;
??????}
????}
??}
}
}

相關(guān)注釋已經(jīng)放在了上面，col2im的操作非常類似，可以自行看源碼，這一段要自己寫出來怕是需要調(diào)試一些時間。

有了上面的核心代碼后，Forward只需要調(diào)用im2col，輸入為bottom_data，輸出為top_data，Backward只需要調(diào)用col2im，輸入為top_diff，輸出為bottom_diff即可，代碼就不貼出了。

02 conv_layer.cpp，base_conv_layer.cpp

數(shù)學(xué)定義不用說，我們直接看代碼，這次要兩個一起看。由于conv_layer.cpp依賴于base_conv_layer.cpp，我們先來看看base_conv_layer.hpp中包含了什么東西，非常多。

base_conv_layer.hpp變量：

///?@brief?The?spatial?dimensions?of?a?filter?kernel.
Blob<int>?kernel_shape_;
///?@brief?The?spatial?dimensions?of?the?stride.
Blob<int>?stride_;
///?@brief?The?spatial?dimensions?of?the?padding.
Blob<int>?pad_;
///?@brief?The?spatial?dimensions?of?the?dilation.
Blob<int>?dilation_;
///?@brief?The?spatial?dimensions?of?the?convolutioninput.
Blob<int>?conv_input_shape_;
///?@brief?The?spatial?dimensions?of?the?col_buffer.
vector<int>?col_buffer_shape_;
///?@brief?The?spatial?dimensions?of?the?output.
vector<int>?output_shape_;
const?vector<int>*?bottom_shape_;

int?num_spatial_axes_;
int?bottom_dim_;
int?top_dim_;

int?channel_axis_;
int?num_;
int?channels_;
int?group_;
int?out_spatial_dim_;
int?weight_offset_;
int?num_output_;
bool?bias_term_;
bool?is_1x1_;
bool?force_nd_im2col_;

int?num_kernels_im2col_;
int?num_kernels_col2im_;
int?conv_out_channels_;
int?conv_in_channels_;
int?conv_out_spatial_dim_;
int?kernel_dim_;
int?col_offset_;
int?output_offset_;

Blob<Dtype>?col_buffer_;
Blob<Dtype>?bias_multiplier_;
?

非常之多，因為卷積發(fā)展到現(xiàn)在，已經(jīng)有很多的參數(shù)需要控制。無法一一解釋了，stride_，pad_，dilation是和卷積步長有關(guān)參數(shù)，kernel_shape_是卷積核大小，conv_input_shape_是輸入大小，output_shape是輸出大小，其他都是以后遇到了再說，現(xiàn)在我們先繞過。更具體的解答，有一篇博客可以參考

http://blog.csdn.net/lanxuecc/article/details/53188738

下面直接看conv_layer.cpp。既然是卷積，輸出的大小就取決于很多參數(shù)，所以先要計算輸出的大小。

voidConvolutionLayer<Dtype>::compute_output_shape(){
const?int*?kernel_shape_data?=?this->kernel_shape_.cpu_data();
const?int*?stride_data?=?this->stride_.cpu_data();
const?int*?pad_data?=?this->pad_.cpu_data();
const?int*?dilation_data?=?this->dilation_.cpu_data();
this->output_shape_.clear();
for?(int?i?=?0;?i?<?this->num_spatial_axes_;?++i)?{
??//?i?+?1?to?skip?channel?axis
??const?int?input_dim?=?this->input_shape(i?+?1);
??const?int?kernel_extent?=?dilation_data[i]?*(kernel_shape_data[i]?-?1)?+?1;
??const?int?output_dim?=?(input_dim?+?2?*?pad_data[i]-?kernel_extent)
??????/?stride_data[i]?+?1;
??this->output_shape_.push_back(output_dim);
}
}

然后，在forward函數(shù)中，

template?<typename?Dtype>
void?ConvolutionLayer<Dtype>::Forward_cpu(constvector<Blob<Dtype>*>&?bottom,?constvector<Blob<Dtype>*>&?top)?{
const?Dtype*?weight?=?this->blobs_[0]->cpu_data();
for?(int?i?=?0;?i?<?bottom.size();?++i)?{
??const?Dtype*?bottom_data?=?bottom[i]->cpu_data();
??Dtype*?top_data?=?top[i]->mutable_cpu_data();
??for?(int?n?=?0;?n?<?this->num_;?++n)?{
????this->forward_cpu_gemm(bottom_data?+?n?*?this->bottom_dim_,?weight,
????????top_data?+?n?*?this->top_dim_);
????if?(this->bias_term_)?{
??????const?Dtype*?bias?=?this->blobs_[1]->cpu_data();
??????this->forward_cpu_bias(top_data?+?n?*?this->top_dim_,?bias);
????}
??}
}
}

我們知道卷積層的輸入，是一個blob，輸出是一個blob，從上面代碼知道卷積核的權(quán)重存在了this->blobs_[0]->cpu_data()中,?this->blobs_[1]->cpu_data()則是bias，當(dāng)然不一定有值。外層循環(huán)大小為bottom.size()，可見其實可以有多個輸入。

看看里面最核心的函數(shù)，this>forward_cpu_gemm。

輸入input，輸出col_buff，關(guān)于這個函數(shù)的解析，https://tangxman.github.io/2015/12/07/caffe-conv/解釋地挺詳細(xì)，我大概總結(jié)一下。

首先，按照調(diào)用順序，對于3*3等正常的卷積，forward_cpu_gemm會調(diào)用conv_im2col_cpu函數(shù)（在base_conv_layer.hpp中），它的作用看名字就知道，將圖像先轉(zhuǎn)換為一個大矩陣，將卷積核也按列復(fù)制成大矩陣；

然后利用caffe_cpu_gemm計算矩陣相乘得到卷積后的結(jié)果。

template?<typename?Dtype>
voidBaseConvolutionLayer<Dtype>::forward_cpu_gemm(const?Dtype*?input,
??const?Dtype*?weights,?Dtype*?output,?boolskip_im2col)?{
const?Dtype*?col_buff?=?input;
if?(!is_1x1_)?{
??if?(!skip_im2col)?{
?????//?如果沒有1x1卷積，也沒有skip_im2col????
?????//?則使用conv_im2col_cpu對使用卷積核滑動過程中的每一個kernel大小的圖像塊????
?????//?變成一個列向量，形成一個height=kernel_dim_的???
?????//?width?=?卷積后圖像heght*卷積后圖像width???
????conv_im2col_cpu(input,col_buffer_.mutable_cpu_data());
??}
??col_buff?=?col_buffer_.cpu_data();
}
//?使用caffe的cpu_gemm來進(jìn)行計算??
//?假設(shè)輸入是20個feature?map，輸出是10個featuremap，group_=2
//?那么他就會把這個訓(xùn)練網(wǎng)絡(luò)分解成兩個10->5的網(wǎng)絡(luò)，由于兩個網(wǎng)絡(luò)結(jié)構(gòu)是
//?一模一樣的，那么就可以利用多個GPU完成訓(xùn)練加快訓(xùn)練速度
for?(int?g?=?0;?g?<?group_;?++g)?{
??caffe_cpu_gemm<Dtype>(CblasNoTrans,CblasNoTrans,?conv_out_channels_?/
??????group_,?conv_out_spatial_dim_,?kernel_dim_,
??????(Dtype)1.,?weights?+?weight_offset_?*?g,?col_buff+?col_offset_?*?g,
??????(Dtype)0.,?output?+?output_offset_?*?g);
??//weights?<---?blobs_[0]->cpu_data()。類比全連接層，
??//weights為權(quán)重，col_buff相當(dāng)與數(shù)據(jù)，矩陣相乘weights×col_buff.?
??//其中，weights的維度為(conv_out_channels_/group_)?x?kernel_dim_，
??//col_buff的維度為kernel_dim_?xconv_out_spatial_dim_，?
??//output的維度為(conv_out_channels_?/group_)?xconv_out_spatial_dim_.
}
}

反向傳播：

template?<typename?Dtype>
void?ConvolutionLayer<Dtype>::Backward_cpu(constvector<Blob<Dtype>*>&?top,
????const?vector<bool>&?propagate_down,?constvector<Blob<Dtype>*>&?bottom)?{
const?Dtype*?weight?=?this->blobs_[0]->cpu_data();
Dtype*?weight_diff?=?this->blobs_[0]->mutable_cpu_diff();
for?(int?i?=?0;?i?<?top.size();?++i)?{
??const?Dtype*?top_diff?=?top[i]->cpu_diff();
??const?Dtype*?bottom_data?=?bottom[i]->cpu_data();
??Dtype*?bottom_diff?=?bottom[i]->mutable_cpu_diff();
??//?Bias?gradient,?if?necessary.
??if?(this->bias_term_?&&?this->param_propagate_down_[1])?{
????Dtype*?bias_diff?=?this->blobs_[1]->mutable_cpu_diff();
????for?(int?n?=?0;?n?<?this->num_;?++n)?{
??????this->backward_cpu_bias(bias_diff,?top_diff?+?n?*this->top_dim_);
????}
??}
??if?(this->param_propagate_down_[0]?||propagate_down[i])?{
????for?(int?n?=?0;?n?<?this->num_;?++n)?{
??????//?gradient?w.r.t.?weight.?Note?that?we?willaccumulate?diffs.
??????if?(this->param_propagate_down_[0])?{
????????this->weight_cpu_gemm(bottom_data?+?n?*this->bottom_dim_,
????????????top_diff?+?n?*?this->top_dim_,?weight_diff);
??????}
??????//?gradient?w.r.t.?bottom?data,?if?necessary.
??????if?(propagate_down[i])?{
????????this->backward_cpu_gemm(top_diff?+?n?*?this->top_dim_,?weight,
????????????bottom_diff?+?n?*?this->bottom_dim_);
??????}
????}
??}
}

略去bias，從上面源碼可以看出，有this->weight_cpu_gemm和this->backward_cpu_gemm兩項。

this->backward_cpu_gemm是計算bottom_data的反向傳播的，也就是feature?map的反向傳播。

template?<typename?Dtype>
voidBaseConvolutionLayer<Dtype>::backward_cpu_gemm(const?Dtype*?output,
??const?Dtype*?weights,?Dtype*?input)?{
Dtype*?col_buff?=?col_buffer_.mutable_cpu_data();
if?(is_1x1_)?{
??col_buff?=?input;
}
for?(int?g?=?0;?g?<?group_;?++g)?{
??caffe_cpu_gemm<Dtype>(CblasTrans,CblasNoTrans,?kernel_dim_?/?group_,
??????conv_out_spatial_dim_,?conv_out_channels_?/group_,
??????(Dtype)1.,?weights?+?weight_offset_?*?g,?output?+output_offset_?*?g,
??????(Dtype)0.,?col_buff?+?col_offset_?*?g);
}
if?(!is_1x1_)?{
??conv_col2im_cpu(col_buff,?input);

}

weight_cpu_gemm是計算權(quán)重的反向傳播的；

template?<typename?Dtype>
voidBaseConvolutionLayer<Dtype>::weight_cpu_gemm(const?Dtype*?input,
??const?Dtype*?output,?Dtype*?weights)?{
const?Dtype*?col_buff?=?input;
if?(!is_1x1_)?{
??conv_im2col_cpu(input,col_buffer_.mutable_cpu_data());
??col_buff?=?col_buffer_.cpu_data();
}
for?(int?g?=?0;?g?<?group_;?++g)?{
??caffe_cpu_gemm<Dtype>(CblasNoTrans,CblasTrans,?conv_out_channels_?/?group_,
??????kernel_dim_,?conv_out_spatial_dim_,
??????(Dtype)1.,?output?+?output_offset_?*?g,?col_buff?+col_offset_?*?g,
??????(Dtype)1.,?weights?+?weight_offset_?*?g);
}
}

其中諸多細(xì)節(jié)，看不懂就再去看源碼，一次看不懂就看多次。

03 deconv_layer.cpp

卷積，就是將下圖轉(zhuǎn)換為上圖，一個輸出像素，和9個輸入像素有關(guān)。反卷積則反之，計算反卷積的時候，就是把上圖輸入的像素乘以卷積核，然后放在下圖對應(yīng)的輸出各個位置，移動輸入像素，最后把所有相同位置的輸出相加。

template?<typename?Dtype>
voidDeconvolutionLayer<Dtype>::Forward_cpu(constvector<Blob<Dtype>*>&?bottom,
????const?vector<Blob<Dtype>*>&?top)?{
const?Dtype*?weight?=?this->blobs_[0]->cpu_data();
for?(int?i?=?0;?i?<?bottom.size();?++i)?{
??const?Dtype*?bottom_data?=?bottom[i]->cpu_data();
??Dtype*?top_data?=?top[i]->mutable_cpu_data();
??for?(int?n?=?0;?n?<?this->num_;?++n)?{
????this->backward_cpu_gemm(bottom_data?+?n?*this->bottom_dim_,?weight,
????????top_data?+?n?*?this->top_dim_);
????if?(this->bias_term_)?{
??????const?Dtype*?bias?=?this->blobs_[1]->cpu_data();
??????this->forward_cpu_bias(top_data?+?n?*?this->top_dim_,?bias);
????}
??}
}
}

forward直接調(diào)用了backward_cpu_gemm函數(shù)，反向的時候就直接調(diào)用forward函數(shù)，這里肯定是需要反復(fù)去理解的，一次不懂就多次。

template?<typename?Dtype>
voidDeconvolutionLayer<Dtype>::Backward_cpu(constvector<Blob<Dtype>*>&?top,const?vector<bool>&propagate_down,?const?vector<Blob<Dtype>*>&bottom)?{
const?Dtype*?weight?=?this->blobs_[0]->cpu_data();
Dtype*?weight_diff?=?this->blobs_[0]->mutable_cpu_diff();
for?(int?i?=?0;?i?<?top.size();?++i)?{
??const?Dtype*?top_diff?=?top[i]->cpu_diff();
??const?Dtype*?bottom_data?=?bottom[i]->cpu_data();
??Dtype*?bottom_diff?=?bottom[i]->mutable_cpu_diff();
??//?Bias?gradient,?if?necessary.
??if?(this->bias_term_?&&?this->param_propagate_down_[1])?{
????Dtype*?bias_diff?=?this->blobs_[1]->mutable_cpu_diff();
????for?(int?n?=?0;?n?<?this->num_;?++n)?{
??????this->backward_cpu_bias(bias_diff,?top_diff?+?n?*this->top_dim_);
????}
??}
??if?(this->param_propagate_down_[0]?||propagate_down[i])?{
????for?(int?n?=?0;?n?<?this->num_;?++n)?{
??????//?Gradient?w.r.t.?weight.?Note?that?we?willaccumulate?diffs.
??????if?(this->param_propagate_down_[0])?{
????????this->weight_cpu_gemm(top_diff?+?n?*?this->top_dim_,
????????????bottom_data?+?n?*?this->bottom_dim_,weight_diff);
??????}
??????//?Gradient?w.r.t.?bottom?data,?if?necessary,reusing?the?column?buffer
??????//?we?might?have?just?computed?above.
??????if?(propagate_down[i])?{
????????this->forward_cpu_gemm(top_diff?+?n?*?this->top_dim_,?weight,
????????????bottom_diff?+?n?*?this->bottom_dim_,
????????????this->param_propagate_down_[0]);
??????}
????}
??}

04 inner_product_layerfilter.hpp

既然卷積層已經(jīng)讀過了，現(xiàn)在該讀一讀全連接層了。

全連接層和卷積層的區(qū)別是什么？就是沒有局部連接，每一個輸出都跟所有輸入有關(guān)，如果輸入feature?map是H*W，那么去卷積它的核也是這么大，得到的輸出是一個1*1的值。

它在setup函數(shù)里面要做一些事情，其中最重要的就是設(shè)定weights的尺寸，下面就是關(guān)鍵代碼。num_output是一個輸出標(biāo)量數(shù)，比如imagenet1000類，最終輸出一個1000維的向量。

K是一個樣本的大小，當(dāng)axis=1，實際上就是把每一個輸入樣本壓縮成一個數(shù)，C*H*W經(jīng)過全連接變成1個數(shù)。

const?int?num_output?=?this->layer_param_.inner_product_param().num_output();
K_?=?bottom[0]->count(axis);
??//?Initialize?the?weights
??vector<int>?weight_shape(2);
??if?(transpose_)?{
????weight_shape[0]?=?K_;
????weight_shape[1]?=?N_;
??}?else?{
????weight_shape[0]?=?N_;
????weight_shape[1]?=?K_;
??}

所以，weight的大小就是N*K_。

有了這個之后，forward就跟conv_layer是一樣的了。

好了，這一節(jié)雖然沒有復(fù)雜的公式，但是很多東西夠大家喝一壺了，得仔細(xì)推敲才能好好理解的。caffe_cpu_gemm是整節(jié)計算的核心，感興趣的去看吧！

同時，在我的知乎專欄也會開始同步更新這個模塊，歡迎來交流

https://zhuanlan.zhihu.com/c_151876233

注：部分圖片來自網(wǎng)絡(luò)

—END—

打一個小廣告，我在gitchat開設(shè)了一些課程和chat，歡迎交流。

感謝各位看官的耐心閱讀，不足之處希望多多指教。后續(xù)內(nèi)容將會不定期奉上，歡迎大家關(guān)注有三公眾號 有三AI！

總結(jié)

以上是生活随笔為你收集整理的【caffe解读】 caffe从数学公式到代码实现5-caffe中的卷积的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【分享预告】细数GAN和图像分类的前世今
下一篇：【caffe解读】 caffe从数学公式