當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

CV算法复现（分类算法6/6）：MobileNet（2017年V1，2018年V2，2019年V3，谷歌）

發布時間：2023/11/27 生活经验 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 CV算法复现（分类算法6/6）：MobileNet（2017年V1，2018年V2，2019年V3，谷歌）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

致謝：霹靂吧啦Wz：霹靂吧啦Wz的個人空間_嗶哩嗶哩_Bilibili

1 本次要點

1.1 pytorch框架語法

2 網絡簡介

2.1 歷史意義

2.2 網絡亮點

V1版亮點

V2版亮點

V3版亮點

2.3 V1網絡

DW卷積和PW卷積介紹

計算量

網絡結構（和VGG差不多，就是卷積層的串聯）

效果

2.4 V2網絡

倒殘差結構

ReLU6激勵函數

V2網絡結構

效果

2.5 V3網絡

V3算法的 block結構

SE模塊（注意力機制）

重新設計耗時層結構

重新設計激活函數

?V3網絡結構

3 代碼結構（V2版）

3.1 model.py

3.2 train.py

3.3 predict.py

1 本次要點

1.1 pytorch框架語法

要使用DW卷積功能，只要設定nn.conv2d中groups參數即可。
pytorch自帶relu6激勵函數。nn.ReLU6()。

2 網絡簡介

2.1 歷史意義

2017年，為了滿足移動和嵌入式視覺任務的需要，MobileNet V1構造了一種體量小（參數量是VGG16的1/32）、運算少（計算量是GoogeNet的1/3）網絡架構，精度相比VGG僅低了0.9%。

2018年，提出了MobileNet V2。

2019年，提出了MobileNet V3。

2.2 網絡亮點

V1版亮點

Depthwise Convolution( DW卷積，大大減少運算量和參數數量)
增加超參數α（卷積核個數）、β（輸入圖像的分辨率）

V2版亮點

Inverted Residuals （倒殘差結構）
Linear Bottlenecks

V3版亮點

更新block（bneck）：在V2倒殘差結構基礎上進行了簡單改動。
1. SE模塊：注意力機制。
2. 更新了激活函數。
使用NAS（neural architecture search 神經結構搜索）搜索參數
重新設計耗時層結構

效果：更準確、更高效

2.3 V1網絡

DW卷積和PW卷積介紹

傳統卷積：

卷積核channel=輸入特征channel：這是卷積計算的定義，以前長時間誤解卷積核channel就是1，比如輸入特征是1024通道，那么單個卷積核也要是1024個通道，單個通道兩兩匹配做卷積計算，然后1024個通道矩陣的相應位置元素值累加，1024個元素（不是特征圖，是特征圖中單個元素）累加后，再加一次bias，此時的元素值，就是特征圖中的元素值。
卷積核個數=輸出的通道數：所以，剛開始1*1卷積的發明，不是做特征提取，而是用來做通道數擴增或降維。

DW卷積：

卷積核通道就1層，而且只和輸入的單通道進行卷積。具體DW原理見上圖。
這種運算對輸入層的每個通道獨立進行卷積運算，沒有利用不同通道在相同空間位置上的feature信息。因此需要Pointwise Convolution來將這些Feature map進行組合生成新的Feature map。

PW卷積：

作用跟1*1卷積一模一樣，在這里換了個名字，作用是控制輸出的通道數。具體如上圖。

計算量

M是輸入特征的通道數，N是輸出特征的通道數，通常卷積核是3*3的，使用DW+PW替換常規卷積，分子除以分母約為1/9。

當然，實際上，由于計算機底層計算機制等原因，可能DW卷積計算方式比常規卷積計算更費時間。

網絡結構（和VGG差不多，就是卷積層的串聯）

注意與解釋：

conv / s2?：普通卷積；步長為2。
3 x 3 x 3 x 32：卷積核尺寸為 3 x 3；卷積核深度為3；卷積核組數為32（即輸出32個特征圖）
conv dw / s1：DW卷積；步長為1。
3 x 3 x 32 dw：卷積核尺寸為 3 x 3；DW卷積核深度為1，所以這里默認不寫；卷積核組數為32。
PW卷積就是常規的1x1卷積，所以網絡結構中，每個dw卷積后面接一個1x1常規卷積。

在V1版本中，DW部分的卷積核容易廢掉，即卷積核參數大部分為0，基本沒起作用。V2中有了改善。

效果

2.4 V2網絡

倒殘差結構

左圖是殘差結構，右圖是倒殘差結構，升維降維順序反過來。

V2論文中有兩種連接方式，如下：

論文中，只有stride=1且輸入特征矩陣與輸出特征矩陣shape相同時才有shortcut連接。

ReLU6激勵函數

替換常規ReLU原因是：常規ReLU對低維的特征造成大量損失，而對高維影響才小。

V2網絡結構

t 是擴展因子：就是DW卷積通道拓展倍數。
bottleneck只有s=1時（即bottleneck中第一層DW卷積步長為1），才使用有shortcut分支版本block。(原因：DW卷積步長為2時，特征圖尺度會發生變化，沒法前后add了)
最后一層 1 x 1 卷積層，因為輸入向量也是1x1x1280，所以相當于一個全連接層。

效果

2.5 V3網絡

V3算法的 block結構

NL：非線性激活函數的意思。（V2算法中沒有用）

后面的1x1卷積層：用于降維，也沒有用激活函數。

SE模塊（注意力機制）

SE是Squeeze and Excitation的縮寫（‘緊縮和激勵’）,該模塊的提出主要是考慮到模型通道之間的相互依賴性。

細節解釋：

pool：每個通道特征圖，使用global pooling，比如全局平均池化，求得特征圖所有元素的平均值。

一個例子：

SE-ResNet結合示意圖：

重新設計耗時層結構

減少第一個卷積層的卷積核個數（V2版是32，V2版是16）（我：這也算創新嗎，這感覺完全是實驗測試發現精度不降，速度能提升。。。）
1. 效果：精度一樣，速度提升2毫秒（約3%提速）
精簡Last State 結構
1. 效果：速度提升7毫秒（11%提速）

重新設計激活函數

問題：swish激活函數能提升網絡精度，但是求導復雜、量化過程也不友好。

解決：發明?h-swish?激活函數，形態和swish相近，但計算和量化好的多。

?V3網絡結構

3 代碼結構（V2版）

model.py
train.py
predict.py

3.1 model.py

from torch import nn
import torch"""
目的：將輸入通道數調整為輸出通道數的整數倍
ch：卷積核個數（即輸出特征圖channel）
divisor：一個基數。
_make_divisible就是要把ch調整成divisor的整數倍。
可能原因：有利于并行運算或多機器分布式運算。
"""
def _make_divisible(ch, divisor=8, min_ch=None):"""This function is taken from the original tf repo.It ensures that all layers have a channel number that is divisible by 8It can be seen here:https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py"""if min_ch is None:min_ch = divisor# 以下一句，相當于四舍五入    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)# Make sure that round down does not go down by more than 10%.# 確保向下取整時不會超過10%if new_ch < 0.9 * ch:new_ch += divisorreturn new_ch"""
Conv + BN + ReLU6 模塊
繼承nn.Sequential類，因為后續要使用pytorch官方預訓練權重。
groups=1表示普通卷積，其他值將為DW卷積。
"""
class ConvBNReLU(nn.Sequential):def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):padding = (kernel_size - 1) // 2#傳入3個參數super(ConvBNReLU, self).__init__(nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),nn.BatchNorm2d(out_channel),nn.ReLU6(inplace=True))# 倒殘差結構
class InvertedResidual(nn.Module):def __init__(self, in_channel, out_channel, stride, expand_ratio):super(InvertedResidual, self).__init__()hidden_channel = in_channel * expand_ratio #擴展因子t # use_shortcut：是否使用shortcut結構self.use_shortcut = stride == 1 and in_channel == out_channellayers = []if expand_ratio != 1: #如果為1，則不要1*1卷積# 1x1 pointwise convlayers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))# layers.append()是一個個插入。而layers.extend()能批量插入。layers.extend([# 3x3 depthwise convConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),# 1x1 pointwise conv(linear 線性激活函數)nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),nn.BatchNorm2d(out_channel),# 注意：線性激活函數，就相當于y=x，所以也就是BN層后不需要加激活函數了。])self.conv = nn.Sequential(*layers)def forward(self, x):if self.use_shortcut:return x + self.conv(x)else:return self.conv(x)"""
alpha：卷積核個數的倍率
round_nearest："""
class MobileNetV2(nn.Module):def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):super(MobileNetV2, self).__init__()block = InvertedResidual# _make_divisible：將輸入通道數調整為輸出通道數的整數倍# 可能原因：有利于并行運算或多機器分布式運算。input_channel = _make_divisible(32 * alpha, round_nearest)last_channel = _make_divisible(1280 * alpha, round_nearest)inverted_residual_setting = [# t, c, n, s（具體含義，見網絡詳解圖）[1, 16, 1, 1],[6, 24, 2, 2],[6, 32, 3, 2],[6, 64, 4, 2],[6, 96, 3, 1],[6, 160, 3, 2],[6, 320, 1, 1],]features = []# conv1 layerfeatures.append(ConvBNReLU(3, input_channel, stride=2))# building inverted residual residual blockesfor t, c, n, s in inverted_residual_setting:output_channel = _make_divisible(c * alpha, round_nearest)for i in range(n):stride = s if i == 0 else 1features.append(block(input_channel, output_channel, stride, expand_ratio=t))input_channel = output_channel# building last several layersfeatures.append(ConvBNReLU(input_channel, last_channel, 1))# combine feature layersself.features = nn.Sequential(*features)# building classifierself.avgpool = nn.AdaptiveAvgPool2d((1, 1))self.classifier = nn.Sequential(nn.Dropout(0.2),nn.Linear(last_channel, num_classes))# weight initialization 權重初始化for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out')if m.bias is not None:nn.init.zeros_(m.bias) #bias設為0elif isinstance(m, nn.BatchNorm2d):nn.init.ones_(m.weight) #方差設為1nn.init.zeros_(m.bias) #bias設為0elif isinstance(m, nn.Linear):nn.init.normal_(m.weight, 0, 0.01) #將權重調整為均值為0，方差為0.01的正態分布。nn.init.zeros_(m.bias) #bias設為0def forward(self, x):x = self.features(x)x = self.avgpool(x)x = torch.flatten(x, 1)x = self.classifier(x)return x

3.2 train.py

import torch
import torch.nn as nn
from torchvision import transforms, datasets
import json
import os
import torch.optim as optim
from model import MobileNetV2
# import torchvision.models.mobilenet 點進去里面有mobilenet_v2預訓練模型的下載路徑。def main():device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print("using {} device.".format(device))data_transform = {"train": transforms.Compose([transforms.RandomResizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),"val": transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root pathimage_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set pathassert os.path.exists(image_path), "{} path does not exist.".format(image_path)train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),transform=data_transform["train"])train_num = len(train_dataset)# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}flower_list = train_dataset.class_to_idxcla_dict = dict((val, key) for key, val in flower_list.items())# write dict into json filejson_str = json.dumps(cla_dict, indent=4)with open('class_indices.json', 'w') as json_file:json_file.write(json_str)batch_size = 16nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workersprint('Using {} dataloader workers every process'.format(nw))train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=batch_size, shuffle=True,num_workers=nw)validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),transform=data_transform["val"])val_num = len(validate_dataset)validate_loader = torch.utils.data.DataLoader(validate_dataset,batch_size=batch_size, shuffle=False,num_workers=nw)print("using {} images for training, {} images fot validation.".format(train_num,val_num))net = MobileNetV2(num_classes=5)# load pretrain weights# download url: https://download.pytorch.org/models/mobilenet_v2-b0353104.pthmodel_weight_path = "./mobilenet_v2.pth" #imageNet預訓練模型,輸出節點是1000，所以最后一層在此不能用。assert os.path.exists(model_weight_path), "file {} dose not exist.".format(model_weight_path)pre_weights = torch.load(model_weight_path)# delete classifier weights# 遍歷權重字典，看權重名稱中是否有“classifier”參數，有表示是最后一層全連接層參數，pre_dict = {k: v for k, v in pre_weights.items() if "classifier" not in k} #排除最后一層名叫"classifier"的全連接層。missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False) # 通過字典載入權重# freeze features weights# 凍結 features結構部分（特征提取部分）權重。for param in net.features.parameters(): # 如果是net.parameters()，則所有網絡結構都凍結。param.requires_grad = False # 不求導，也不參數更新。net.to(device)loss_function = nn.CrossEntropyLoss()optimizer = optim.Adam(net.parameters(), lr=0.0001)best_acc = 0.0save_path = './MobileNetV2.pth'for epoch in range(5):# trainnet.train()running_loss = 0.0for step, data in enumerate(train_loader, start=0):images, labels = dataoptimizer.zero_grad()logits = net(images.to(device))loss = loss_function(logits, labels.to(device))loss.backward()optimizer.step()# print statisticsrunning_loss += loss.item()# print train processrate = (step+1)/len(train_loader)a = "*" * int(rate * 50)b = "." * int((1 - rate) * 50)print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate*100), a, b, loss), end="")print()# validatenet.eval()acc = 0.0  # accumulate accurate number / epochwith torch.no_grad():for val_data in validate_loader:val_images, val_labels = val_dataoutputs = net(val_images.to(device))  # eval model only have last output layer# loss = loss_function(outputs, test_labels)predict_y = torch.max(outputs, dim=1)[1]acc += (predict_y == val_labels.to(device)).sum().item()val_accurate = acc / val_numif val_accurate > best_acc:best_acc = val_accuratetorch.save(net.state_dict(), save_path)print('[epoch %d] train_loss: %.3f  test_accuracy: %.3f' %(epoch + 1, running_loss / step, val_accurate))print('Finished Training')if __name__ == '__main__':main()

凍結輸出層之外的層參數，訓練結果：

作者最高訓練到94%。

如果基于預訓練模型，所有層都進行更新訓練，能達到98.1%。

3.3 predict.py

import torch
from model import MobileNetV2
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
import jsondata_transform = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])# load image
img = Image.open("../tulip.jpg")
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0)# read class_indict
try:json_file = open('./class_indices.json', 'r')class_indict = json.load(json_file)
except Exception as e:print(e)exit(-1)# create model
model = MobileNetV2(num_classes=5)
# load model weights
model_weight_path = "./MobileNetV2.pth"
model.load_state_dict(torch.load(model_weight_path))
model.eval()
with torch.no_grad():# predict classoutput = torch.squeeze(model(img)) #壓縮batch維度predict = torch.softmax(output, dim=0) #將輸出值轉為概率分布。predict_cla = torch.argmax(predict).numpy()
print(class_indict[str(predict_cla)], predict[predict_cla].numpy())
plt.show()

預測輸出：

總結

以上是生活随笔為你收集整理的CV算法复现（分类算法6/6）：MobileNet（2017年V1，2018年V2，2019年V3，谷歌）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： CV算法复现（分类算法5/6）：ResN
下一篇：少样本学习原理快速入门，并翻译《Free