新年美食鉴赏——基于注意力机制CBAM的美食101分类
新年美食鑒賞——基于注意力機制CBAM的美食101分類
- 一、數據預處理
- 1.數據集介紹
- 2.讀取標簽
- 3.統一命名
- 4.整理圖片路徑
- 5.劃分訓練集與驗證集
- 6.定義美食數據集
- 二、注意力機制
- 1.簡介
- 2.卷積塊注意力網絡——CBAM
- 通道注意力模塊CAM
- 空間注意力模塊SAM
- 3.將通道、空間注意力模塊合并
- 三、基于卷積塊注意力模塊的ResNet-CBAM
- 1.代碼實現
- 2.模型測試
- 3.查看網絡結構
- 4.模型訓練
- 5.模型預測
- 四、總結與升華
- 個人簡介
春節是一年當中最隆重、最盛大的節日,吃得最好的一餐可能就是年夜飯了,想想就很激動呢!隨著新節鐘聲的敲響,一年一度的美食斗圖大賽即將上演,快來看看都有啥好吃的!
一、數據預處理
本項目使用的數據集地址:101類美食數據集
# 解壓數據集 !unzip -oq /home/aistudio/data/data70204/images.zip -d food1.數據集介紹
該數據集包含完整的101種食物。為圖像分析提供比CIFAR10或MNIST更令人興奮的簡單訓練集,因此,數據中包含大量縮小的圖像版本,以便進行快速測試。
101種類別如下所示(索引從0開始):
‘apple_pie’: 0, ‘baby_back_ribs’: 1, ‘baklava’: 2, ‘beef_carpaccio’: 3, ‘beef_tartare’: 4, ‘beet_salad’: 5, ‘beignets’: 6, ‘bibimbap’: 7, ‘bread_pudding’: 8, ‘breakfast_burrito’: 9, ‘bruschetta’: 10,
‘caesar_salad’: 11, ‘cannoli’: 12, ‘caprese_salad’: 13, ‘carrot_cake’: 14, ‘ceviche’: 15, ‘cheesecake’: 16, ‘cheese_plate’: 17, ‘chicken_curry’: 18, ‘chicken_quesadilla’: 19, ‘chicken_wings’: 20,
‘chocolate_cake’: 21, ‘chocolate_mousse’: 22, ‘churros’: 23, ‘clam_chowder’: 24, ‘club_sandwich’: 25, ‘crab_cakes’: 26, ‘creme_brulee’: 27, ‘croque_madame’: 28, ‘cup_cakes’: 29, ‘deviled_eggs’: 30,
‘donuts’: 31, ‘dumplings’: 32, ‘edamame’: 33, ‘eggs_benedict’: 34, ‘escargots’: 35, ‘falafel’: 36, ‘filet_mignon’: 37, ‘fish_and_chips’: 38, ‘foie_gras’: 39, ‘french_fries’: 40,
‘french_onion_soup’: 41, ‘french_toast’: 42, ‘fried_calamari’: 43, ‘fried_rice’: 44, ‘frozen_yogurt’: 45, ‘garlic_bread’: 46, ‘gnocchi’: 47, ‘greek_salad’: 48, ‘grilled_cheese_sandwich’: 49, ‘grilled_salmon’: 50,
‘guacamole’: 51, ‘gyoza’: 52, ‘hamburger’: 53, ‘hot_and_sour_soup’: 54, ‘hot_dog’: 55, ‘huevos_rancheros’: 56, ‘hummus’: 57, ‘ice_cream’: 58, ‘lasagna’: 59, ‘lobster_bisque’: 60,
‘lobster_roll_sandwich’: 61, ‘macaroni_and_cheese’: 62, ‘macarons’: 63, ‘miso_soup’: 64, ‘mussels’: 65, ‘nachos’: 66, ‘omelette’: 67, ‘onion_rings’: 68, ‘oysters’: 69, ‘pad_thai’: 70,
‘paella’: 71, ‘pancakes’: 72, ‘panna_cotta’: 73, ‘peking_duck’: 74, ‘pho’: 75, ‘pizza’: 76, ‘pork_chop’: 77, ‘poutine’: 78, ‘prime_rib’: 79, ‘pulled_pork_sandwich’: 80,
‘ramen’: 81, ‘ravioli’: 82, ‘red_velvet_cake’: 83, ‘risotto’: 84, ‘samosa’: 85, ‘sashimi’: 86, ‘scallops’: 87, ‘seaweed_salad’: 88, ‘shrimp_and_grits’: 89, ‘spaghetti_bolognese’: 90,
‘spaghetti_carbonara’: 91, ‘spring_rolls’: 92, ‘steak’: 93, ‘strawberry_shortcake’: 94, ‘sushi’: 95, ‘tacos’: 96, ‘takoyaki’: 97, ‘tiramisu’: 98, ‘tuna_tartare’: 99, ‘waffles’: 100}
2.讀取標簽
在做分類任務之前,要明確有幾類,因為機器只認識二進制,因此要把每一類(字符串)映射到唯一的一個數字上
txtpath = r"classes.txt" fp = open(txtpath) arr = [] for lines in fp.readlines():# print(lines)lines = lines.replace("\n","")arr.append(lines) # print(arr) fp.close()number = [] for item in range(len(arr)):number.append(item)categorys = dict(zip(arr, number)) print(categorys) {'apple_pie': 0, 'baby_back_ribs': 1, 'baklava': 2, 'beef_carpaccio': 3, 'beef_tartare': 4, 'beet_salad': 5, 'beignets': 6, 'bibimbap': 7, 'bread_pudding': 8, 'breakfast_burrito': 9, 'bruschetta': 10, 'caesar_salad': 11, 'cannoli': 12, 'caprese_salad': 13, 'carrot_cake': 14, 'ceviche': 15, 'cheesecake': 16, 'cheese_plate': 17, 'chicken_curry': 18, 'chicken_quesadilla': 19, 'chicken_wings': 20, 'chocolate_cake': 21, 'chocolate_mousse': 22, 'churros': 23, 'clam_chowder': 24, 'club_sandwich': 25, 'crab_cakes': 26, 'creme_brulee': 27, 'croque_madame': 28, 'cup_cakes': 29, 'deviled_eggs': 30, 'donuts': 31, 'dumplings': 32, 'edamame': 33, 'eggs_benedict': 34, 'escargots': 35, 'falafel': 36, 'filet_mignon': 37, 'fish_and_chips': 38, 'foie_gras': 39, 'french_fries': 40, 'french_onion_soup': 41, 'french_toast': 42, 'fried_calamari': 43, 'fried_rice': 44, 'frozen_yogurt': 45, 'garlic_bread': 46, 'gnocchi': 47, 'greek_salad': 48, 'grilled_cheese_sandwich': 49, 'grilled_salmon': 50, 'guacamole': 51, 'gyoza': 52, 'hamburger': 53, 'hot_and_sour_soup': 54, 'hot_dog': 55, 'huevos_rancheros': 56, 'hummus': 57, 'ice_cream': 58, 'lasagna': 59, 'lobster_bisque': 60, 'lobster_roll_sandwich': 61, 'macaroni_and_cheese': 62, 'macarons': 63, 'miso_soup': 64, 'mussels': 65, 'nachos': 66, 'omelette': 67, 'onion_rings': 68, 'oysters': 69, 'pad_thai': 70, 'paella': 71, 'pancakes': 72, 'panna_cotta': 73, 'peking_duck': 74, 'pho': 75, 'pizza': 76, 'pork_chop': 77, 'poutine': 78, 'prime_rib': 79, 'pulled_pork_sandwich': 80, 'ramen': 81, 'ravioli': 82, 'red_velvet_cake': 83, 'risotto': 84, 'samosa': 85, 'sashimi': 86, 'scallops': 87, 'seaweed_salad': 88, 'shrimp_and_grits': 89, 'spaghetti_bolognese': 90, 'spaghetti_carbonara': 91, 'spring_rolls': 92, 'steak': 93, 'strawberry_shortcake': 94, 'sushi': 95, 'tacos': 96, 'takoyaki': 97, 'tiramisu': 98, 'tuna_tartare': 99, 'waffles': 100}3.統一命名
統一命名,方便檢查數據集
# 將圖片整理到一個文件夾,并統一命名 import os from PIL import Imagecategorys = arr if not os.path.exists("temporary"):os.mkdir("temporary")for category in categorys:# 圖片文件夾路徑path = r"food/{}/".format(category)count = 0for filename in os.listdir(path):img = Image.open(path + filename)img = img.resize((512, 512),Image.ANTIALIAS) # 轉換圖片,圖像尺寸變為1280*720img = img.convert('RGB') # 保存為.jpg格式才需要img.save(r"temporary/{}{}.jpg".format(category, str(count)))count += 14.整理圖片路徑
整理圖片路徑,便于將圖片送入神經網絡
# 獲取圖片路徑與圖片標簽 import os import stringtrain_list = open('train_list.txt',mode='w') paths = r'temporary/' # 返回指定路徑的文件夾名稱 dirs = os.listdir(paths) # 循環遍歷該目錄下的照片 for path in dirs:# 拼接字符串imgPath = paths + pathtrain_list.write(imgPath + '\t')for category in categorys:if category == path.replace(".jpg","").rstrip(string.digits):train_list.write(str(categorys[category]) + '\n') train_list.close()5.劃分訓練集與驗證集
驗證集用于檢驗模型是否過擬合,這里的劃分標準是5:1,即每5張圖片取1張做驗證數據
# 劃分訓練集和驗證集 import shutiltrain_dir = '/home/aistudio/work/trainImages' eval_dir = '/home/aistudio/work/evalImages' train_list_path = '/home/aistudio/train_list.txt' target_path = "/home/aistudio/"if not os.path.exists(train_dir):os.mkdir(train_dir) if not os.path.exists(eval_dir):os.mkdir(eval_dir) with open(train_list_path, 'r') as f:data = f.readlines()for i in range(len(data)):img_path = data[i].split('\t')[0]class_label = data[i].split('\t')[1][:-1]if i % 5 == 0: # 每5張圖片取一個做驗證數據eval_target_dir = os.path.join(eval_dir, str(class_label)) eval_img_path = os.path.join(target_path, img_path)if not os.path.exists(eval_target_dir):os.mkdir(eval_target_dir) shutil.copy(eval_img_path, eval_target_dir) else:train_target_dir = os.path.join(train_dir, str(class_label)) train_img_path = os.path.join(target_path, img_path) if not os.path.exists(train_target_dir):os.mkdir(train_target_dir)shutil.copy(train_img_path, train_target_dir) print ('劃分訓練集和驗證集完成!') 劃分訓練集和驗證集完成!6.定義美食數據集
分類任務中有一個非常重要的點就是歸一化處理,通過歸一化處理,將圖片的取值范圍從0~255轉化為0~1之間,這個對于后續的神經網絡有很大的好處,如果不做歸一化,那么神經網絡有可能學不到任何東西,輸出結果全部一樣。
import os import numpy as np import paddle from paddle.io import Dataset from paddle.vision.datasets import DatasetFolder, ImageFolder from paddle.vision.transforms import Compose, Resize, BrightnessTransform, ColorJitter, Normalize, Transposeclass FoodsDataset(Dataset):"""步驟一:繼承paddle.io.Dataset類"""def __init__(self, mode='train'):"""步驟二:實現構造函數,定義數據讀取方式,劃分訓練和測試數據集"""super(FoodsDataset, self).__init__()train_image_dir = '/home/aistudio/work/trainImages'eval_image_dir = '/home/aistudio/work/evalImages'test_image_dir = '/home/aistudio/work/evalImages'transform_train = Compose([Normalize(mean=[127.5, 127.5, 127.5],std=[127.5, 127.5, 127.5],data_format='HWC'), Transpose()])transform_eval = Compose([Normalize(mean=[127.5, 127.5, 127.5],std=[127.5, 127.5, 127.5],data_format='HWC'), Transpose()])train_data_folder = DatasetFolder(train_image_dir, transform=transform_train)eval_data_folder = DatasetFolder(eval_image_dir, transform=transform_eval)test_data_folder = DatasetFolder(test_image_dir)self.mode = modeif self.mode == 'train':self.data = train_data_folderelif self.mode == 'eval':self.data = eval_data_folderelif self.mode == 'test':self.data = test_data_folderdef __getitem__(self, index):"""步驟三:實現__getitem__方法,定義指定index時如何獲取數據,并返回單條數據(訓練數據,對應的標簽)"""data = np.array(self.data[index][0]).astype('float32')if self.mode == 'test':return dataelse:label = np.array([self.data[index][1]]).astype('int64')return data, labeldef __len__(self):"""步驟四:實現__len__方法,返回數據集總數目"""return len(self.data)train_dataset = FoodsDataset(mode='train') val_dataset = FoodsDataset(mode='eval') test_dataset = FoodsDataset(mode='test') # 查看訓練數據,共80800條訓練數據 print(len(train_dataset)) 80800二、注意力機制
注意力機制(Attention Mechanism)源于對人類視覺的研究。在認知科學中,由于信息處理的瓶頸,人類會選擇性地關注所有信息的一部分,同時忽略其他可見的信息。上述機制通常被稱為注意力機制。人類視網膜不同的部位具有不同程度的信息處理能力,即敏銳度(Acuity),只有視網膜中央凹部位具有最強的敏銳度。為了合理利用有限的視覺信息處理資源,人類需要選擇視覺區域中的特定部分,然后集中關注它。例如,人們在閱讀時,通常只有少量要被讀取的詞會被關注和處理。綜上,注意力機制主要有兩個方面:決定需要關注輸入的哪部分;分配有限的信息處理資源給重要的部分。
1.簡介
人類視覺通過快速掃描全局圖像,獲得需要重點關注的目標區域,也就是一般所說的注意力焦點,而后對這一區域投入更多注意力資源,以獲取更多所需要關注目標的細節信息,而抑制其他無用信息。
這是人類利用有限的注意力資源從大量信息中快速篩選出高價值信息的手段,是人類在長期進化中形成的一種生存機制,人類視覺注意力機制極大地提高了視覺信息處理的效率與準確性。
上面這張圖形象化地展示了人類在看到一副圖像時是如何高效分配有限的注意力資源的,其中紅色區域表明視覺系統更關注的目標,很明顯對于上圖所示的場景,人們會把注意力更多投入到人的臉部,文本的標題以及文章首句等位置。
深度學習中的注意力機制從本質上講和人類的選擇性視覺注意力機制類似,核心目標也是從眾多信息中選擇出對當前任務目標更關鍵的信息。
2.卷積塊注意力網絡——CBAM
CBAM是Convolutional Block Attention Module的縮寫,即卷積塊注意力網絡。論文地址:https://arxiv.org/abs/1807.06521
卷積塊注意力網絡CBAM 包含2個子模塊:
- Channel Attention Module(CAM) ——通道注意力模塊
- Spartial Attention Module(SAM) ——空間注意力模塊
卷積塊注意力模塊CBAM是一種用于前饋卷積神經網絡的簡單而有效的注意力模塊。給定一個中間特征圖,CBAM會沿著兩個獨立的維度(通道和空間)依次推斷注意力圖,然后將注意力圖與輸入特征圖相乘以進行自適應特征細化。由于CBAM是輕量級的通用模塊,因此可以將其無縫集成到任何CNN架構中,而開銷卻可以忽略不計,并且可以與基礎CNN一起進行端到端訓練。
作者通過對ImageNet-1K,MS?COCO檢測和VOC?2007檢測數據集進行了廣泛的實驗來驗證CBAM。實驗表明,使用各種模型在分類和檢測性能方面都有提升,證明了CBAM的廣泛適用性。
通道注意力模塊CAM
通道注意力機制(Channel Attention Module)是將特征圖在空間維度上進行壓縮,得到一個一維矢量后再進行操作。在空間維度上進行壓縮時,不僅考慮到了平均值池化(Average Pooling)還考慮了最大值池化(Max Pooling)。
平均池化和最大池化可用來聚合特征映射的空間信息,送到一個共享網絡,壓縮輸入特征圖的空間維數,逐元素求和合并,以產生通道注意力圖。
單就一張圖來說,通道注意力,關注的是這張圖上哪些內容是有重要作用的。平均值池化對特征圖上的每一個像素點都有反饋,而最大值池化在進行梯度反向傳播計算時,只有特征圖中響應最大的地方有梯度的反饋。
代碼實現CAM:
import paddle import math import paddle.fluid as fluid from paddle import nnclass CAM_Module(nn.Layer): def __init__(self, channels, reduction=16): super(CBAM_Module, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2D(output_size=1) self.max_pool = nn.AdaptiveMaxPool2D(output_size=1) self.fc1 = nn.Conv2D(in_channels=channels, out_channels=channels // reduction, kernel_size=1, padding=0) self.relu = nn.ReLU() self.fc2 = nn.Conv2D(in_channels=channels // reduction, out_channels=channels, kernel_size=1, padding=0) self.sigmoid_channel = nn.Sigmoid() def forward(self, x): # Channel Attention Module module_input = x avg = self.relu(self.fc1(self.avg_pool(x))) avg = self.fc2(avg) mx = self.relu(self.fc1(self.max_pool(x))) mx = self.fc2(mx) x = avg + mx x = self.sigmoid_channel(x) return x空間注意力模塊SAM
空間注意力機制(Spatial Attention Module)是對通道進行壓縮,在通道維度分別進行了平均值池化和最大值池化。
- MaxPool的操作就是在通道上提取最大值,提取的次數是高乘以寬;
- AvgPool的操作就是在通道上提取平均值,提取的次數也是是高乘以寬。
接著將前面所提取到的特征圖(通道數都為1)合并得到一個2通道的特征圖。
代碼實現:
import paddle import math import paddle.fluid as fluid from paddle import nnclass SAM_Module(nn.Layer): def __init__(self, channels, reduction=16): super(CBAM_Module, self).__init__() self.conv_after_concat = nn.Conv2D(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3) self.sigmoid_spatial = nn.Sigmoid() def forward(self, x): # Spatial Attention Module x = module_input * x module_input = x avg = paddle.mean(x, axis=1, keepdim=True) mx = paddle.argmax(x, axis=1, keepdim=True)mx = paddle.cast(mx, 'float32')x = paddle.concat([avg, mx], axis=1)x = self.conv_after_concat(x) x = self.sigmoid_spatial(x) x = module_input * x return x3.將通道、空間注意力模塊合并
將通道、空間注意力模塊合并就可以得到卷積塊注意力模塊CBAM,下面是基于PaddlePaddle的CBAM代碼實現:
import paddle import math import paddle.fluid as fluid from paddle import nnclass CBAM_Module(nn.Layer): def __init__(self, channels, reduction=16): super(CBAM_Module, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2D(output_size=1) self.max_pool = nn.AdaptiveMaxPool2D(output_size=1) self.fc1 = nn.Conv2D(in_channels=channels, out_channels=channels // reduction, kernel_size=1, padding=0) self.relu = nn.ReLU() self.fc2 = nn.Conv2D(in_channels=channels // reduction, out_channels=channels, kernel_size=1, padding=0) self.sigmoid_channel = nn.Sigmoid() self.conv_after_concat = nn.Conv2D(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3) self.sigmoid_spatial = nn.Sigmoid() def forward(self, x): # Channel Attention Module module_input = x avg = self.relu(self.fc1(self.avg_pool(x))) avg = self.fc2(avg) mx = self.relu(self.fc1(self.max_pool(x))) mx = self.fc2(mx) x = avg + mx x = self.sigmoid_channel(x) # Spatial Attention Module x = module_input * x module_input = x avg = paddle.mean(x, axis=1, keepdim=True) mx = paddle.argmax(x, axis=1, keepdim=True)mx = paddle.cast(mx, 'float32')x = paddle.concat([avg, mx], axis=1)x = self.conv_after_concat(x) x = self.sigmoid_spatial(x) x = module_input * x return x三、基于卷積塊注意力模塊的ResNet-CBAM
ResNet論文地址:https://arxiv.org/pdf/1512.03385.pdf
在ResNet中添加通道注意力機制和空間注意力機制,構成ResNet-CBAM。因為不能改變ResNet的網絡結構,所以CBAM不能加在block里面(加進去網絡結構發生了變化,導致不能用預訓練參數)。故將CBAM加在第一層卷積和最后一層卷積不會改變網絡,且可以用預訓練參數。
-
Channel Attention:
-
Spatial Attention
1.代碼實現
import paddle import paddle.nn as nn from paddle.utils.download import get_weights_path_from_url__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152' ]model_urls = {'resnet18': ('https://paddle-hapi.bj.bcebos.com/models/resnet18.pdparams','cf548f46534aa3560945be4b95cd11c4'),'resnet34': ('https://paddle-hapi.bj.bcebos.com/models/resnet34.pdparams','8d2275cf8706028345f78ac0e1d31969'),'resnet50': ('https://paddle-hapi.bj.bcebos.com/models/resnet50.pdparams','ca6f485ee1ab0492d38f323885b0ad80'),'resnet101': ('https://paddle-hapi.bj.bcebos.com/models/resnet101.pdparams','02f35f034ca3858e1e54d4036443c92d'),'resnet152': ('https://paddle-hapi.bj.bcebos.com/models/resnet152.pdparams','7ad16a2f1e7333859ff986138630fd7a'), }class BasicBlock(nn.Layer):expansion = 1def __init__(self,inplanes,planes,stride=1,downsample=None,groups=1,base_width=64,dilation=1,norm_layer=None):super(BasicBlock, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2Dif dilation > 1:raise NotImplementedError("Dilation > 1 not supported in BasicBlock")self.conv1 = nn.Conv2D(inplanes, planes, 3, padding=1, stride=stride, bias_attr=False)self.bn1 = norm_layer(planes)self.relu = nn.ReLU()self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)self.bn2 = norm_layer(planes)self.downsample = downsampleself.stride = stridedef forward(self, x):identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return outclass BottleneckBlock(nn.Layer):expansion = 4def __init__(self,inplanes,planes,stride=1,downsample=None,groups=1,base_width=64,dilation=1,norm_layer=None):super(BottleneckBlock, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2Dwidth = int(planes * (base_width / 64.)) * groupsself.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)self.bn1 = norm_layer(width)self.conv2 = nn.Conv2D(width,width,3,padding=dilation,stride=stride,groups=groups,dilation=dilation,bias_attr=False)self.bn2 = norm_layer(width)self.conv3 = nn.Conv2D(width, planes * self.expansion, 1, bias_attr=False)self.bn3 = norm_layer(planes * self.expansion)self.relu = nn.ReLU()self.downsample = downsampleself.stride = stridedef forward(self, x):identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return outclass ResNet(nn.Layer):def __init__(self, block, depth, num_classes=1000, with_pool=True):super(ResNet, self).__init__()layer_cfg = {18: [2, 2, 2, 2],34: [3, 4, 6, 3],50: [3, 4, 6, 3],101: [3, 4, 23, 3],152: [3, 8, 36, 3]}layers = layer_cfg[depth]self.num_classes = num_classesself.with_pool = with_poolself._norm_layer = nn.BatchNorm2Dself.inplanes = 64self.dilation = 1self.conv1 = nn.Conv2D(3,self.inplanes,kernel_size=7,stride=2,padding=3,bias_attr=False)self.bn1 = self._norm_layer(self.inplanes)self.relu = nn.ReLU()self.CBAM_Module1 = CBAM_Module(channels=self.inplanes)self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)self.layer1 = self._make_layer(block, 64, layers[0])self.layer2 = self._make_layer(block, 128, layers[1], stride=2)self.layer3 = self._make_layer(block, 256, layers[2], stride=2)self.layer4 = self._make_layer(block, 512, layers[3], stride=2)self.CBAM_Module2 = CBAM_Module(channels=2048)if with_pool:self.avgpool = nn.AdaptiveAvgPool2D((1, 1))if num_classes > 0:self.fc = nn.Linear(512 * block.expansion, num_classes)def _make_layer(self, block, planes, blocks, stride=1, dilate=False):norm_layer = self._norm_layerdownsample = Noneprevious_dilation = self.dilationif dilate:self.dilation *= stridestride = 1if stride != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(nn.Conv2D(self.inplanes,planes * block.expansion,1,stride=stride,bias_attr=False),norm_layer(planes * block.expansion), )layers = []layers.append(block(self.inplanes, planes, stride, downsample, 1, 64,previous_dilation, norm_layer))self.inplanes = planes * block.expansionfor _ in range(1, blocks):layers.append(block(self.inplanes, planes, norm_layer=norm_layer))return nn.Sequential(*layers)def forward(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.CBAM_Module1(x)x = self.maxpool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.CBAM_Module2(x)if self.with_pool:x = self.avgpool(x)if self.num_classes > 0:x = paddle.flatten(x, 1)x = self.fc(x)return xdef _resnet(arch, Block, depth, pretrained, **kwargs):model = ResNet(Block, depth, **kwargs)if pretrained:assert arch in model_urls, "{} model do not have a pretrained model now, you should set pretrained=False".format(arch)weight_path = get_weights_path_from_url(model_urls[arch][0],model_urls[arch][1])param = paddle.load(weight_path)model.set_dict(param)return modeldef resnet18(pretrained=False, **kwargs):return _resnet('resnet18', BasicBlock, 18, pretrained, **kwargs)def resnet34(pretrained=False, **kwargs):return _resnet('resnet34', BasicBlock, 34, pretrained, **kwargs)def resnet50(pretrained=False, **kwargs):return _resnet('resnet50', BottleneckBlock, 50, pretrained, **kwargs)def resnet101(pretrained=False, **kwargs):return _resnet('resnet101', BottleneckBlock, 101, pretrained, **kwargs)def resnet152(pretrained=False, **kwargs):return _resnet('resnet152', BottleneckBlock, 152, pretrained, **kwargs) model = resnet50(pretrained=True, num_classes=101)2.模型測試
測試模型是否可以跑通,給定輸入,查看輸出(本項目是101種美食分類,因此輸出維度為 [1, 101] )
x = paddle.rand([1, 3, 512, 512]) out = model(x)print(out) Tensor(shape=[1, 101], dtype=float32, place=CUDAPlace(0), stop_gradient=False,[[ 0.23755740, 0.15992080, -0.16790576, -0.02421574, 0.23948371, 0.01540074, -0.05914545, -0.06471819, -0.15462203, 0.07059185, -0.09104817, -0.08824400, 0.16376866, 0.01088102, -0.01639843, 0.07510512, -0.25128710, 0.05310057, -0.05061390, -0.24302137, 0.24108808, 0.26871991, 0.11471137, -0.10154713, 0.16017962, 0.38808146, 0.39115551, -0.06520218, -0.06546519, -0.04215863, -0.39803913, -0.02926474, -0.21277788, -0.05047140, 0.20483626, -0.00560332, 0.00816562, 0.11082268, -0.02240067, -0.31493288, -0.34661019, -0.15874574, -0.04415106, 0.08496793, -0.14479199, 0.07015306, 0.03542121, -0.06248808, -0.36255446, 0.23171450, 0.01219252, 0.06549657, 0.05162504, 0.02651403, 0.28627244, -0.02422512, 0.09902165, 0.01188086, -0.05695777, -0.01429159, 0.10739808, 0.15823485, 0.08081408, 0.16685896, -0.03923680, -0.25720799, 0.18960142, -0.37058586, -0.15431085, 0.16415425, -0.13622791, -0.04410422, 0.08821643, 0.32092187, -0.00823142, -0.14378656, 0.17974210, 0.18032075, 0.16180043, -0.03393000, 0.01341872, 0.34255776, 0.29252559, -0.11773793, -0.12506239, 0.13361360, -0.41730911, -0.03966195, 0.03181494, 0.16027087, 0.11529364, -0.24660280, -0.11513865, -0.09760797, -0.00116460, 0.17974031, -0.00829839, 0.24515726, -0.09149191, -0.35889381, 0.19253115]])/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:636: UserWarning: When training, we now always track global mean and variance."When training, we now always track global mean and variance.")3.查看網絡結構
import paddlemodel = paddle.Model(model) model.summary((16, 3, 512, 512)) --------------------------------------------------------------------------------Layer (type) Input Shape Output Shape Param # ================================================================================Conv2D-1 [[16, 3, 512, 512]] [16, 64, 256, 256] 9,408 BatchNorm2D-1 [[16, 64, 256, 256]] [16, 64, 256, 256] 256 ReLU-1 [[16, 64, 256, 256]] [16, 64, 256, 256] 0 AdaptiveAvgPool2D-1 [[16, 64, 256, 256]] [16, 64, 1, 1] 0 Conv2D-2 [[16, 64, 1, 1]] [16, 4, 1, 1] 260 ReLU-2 [[16, 4, 1, 1]] [16, 4, 1, 1] 0 Conv2D-3 [[16, 4, 1, 1]] [16, 64, 1, 1] 320 AdaptiveMaxPool2D-1 [[16, 64, 256, 256]] [16, 64, 1, 1] 0 Sigmoid-1 [[16, 64, 1, 1]] [16, 64, 1, 1] 0 Conv2D-4 [[16, 2, 256, 256]] [16, 1, 256, 256] 99 Sigmoid-2 [[16, 1, 256, 256]] [16, 1, 256, 256] 0 CBAM_Module-1 [[16, 64, 256, 256]] [16, 64, 256, 256] 0 MaxPool2D-1 [[16, 64, 256, 256]] [16, 64, 128, 128] 0 Conv2D-6 [[16, 64, 128, 128]] [16, 64, 128, 128] 4,096 BatchNorm2D-3 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 ReLU-3 [[16, 256, 128, 128]] [16, 256, 128, 128] 0 Conv2D-7 [[16, 64, 128, 128]] [16, 64, 128, 128] 36,864 BatchNorm2D-4 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 Conv2D-8 [[16, 64, 128, 128]] [16, 256, 128, 128] 16,384 BatchNorm2D-5 [[16, 256, 128, 128]] [16, 256, 128, 128] 1,024 Conv2D-5 [[16, 64, 128, 128]] [16, 256, 128, 128] 16,384 BatchNorm2D-2 [[16, 256, 128, 128]] [16, 256, 128, 128] 1,024 BottleneckBlock-1 [[16, 64, 128, 128]] [16, 256, 128, 128] 0 Conv2D-9 [[16, 256, 128, 128]] [16, 64, 128, 128] 16,384 BatchNorm2D-6 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 ReLU-4 [[16, 256, 128, 128]] [16, 256, 128, 128] 0 Conv2D-10 [[16, 64, 128, 128]] [16, 64, 128, 128] 36,864 BatchNorm2D-7 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 Conv2D-11 [[16, 64, 128, 128]] [16, 256, 128, 128] 16,384 BatchNorm2D-8 [[16, 256, 128, 128]] [16, 256, 128, 128] 1,024 BottleneckBlock-2 [[16, 256, 128, 128]] [16, 256, 128, 128] 0 Conv2D-12 [[16, 256, 128, 128]] [16, 64, 128, 128] 16,384 BatchNorm2D-9 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 ReLU-5 [[16, 256, 128, 128]] [16, 256, 128, 128] 0 Conv2D-13 [[16, 64, 128, 128]] [16, 64, 128, 128] 36,864 BatchNorm2D-10 [[16, 64, 128, 128]] [16, 64, 128, 128] 256 Conv2D-14 [[16, 64, 128, 128]] [16, 256, 128, 128] 16,384 BatchNorm2D-11 [[16, 256, 128, 128]] [16, 256, 128, 128] 1,024 BottleneckBlock-3 [[16, 256, 128, 128]] [16, 256, 128, 128] 0 Conv2D-16 [[16, 256, 128, 128]] [16, 128, 128, 128] 32,768 BatchNorm2D-13 [[16, 128, 128, 128]] [16, 128, 128, 128] 512 ReLU-6 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-17 [[16, 128, 128, 128]] [16, 128, 64, 64] 147,456 BatchNorm2D-14 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 Conv2D-18 [[16, 128, 64, 64]] [16, 512, 64, 64] 65,536 BatchNorm2D-15 [[16, 512, 64, 64]] [16, 512, 64, 64] 2,048 Conv2D-15 [[16, 256, 128, 128]] [16, 512, 64, 64] 131,072 BatchNorm2D-12 [[16, 512, 64, 64]] [16, 512, 64, 64] 2,048 BottleneckBlock-4 [[16, 256, 128, 128]] [16, 512, 64, 64] 0 Conv2D-19 [[16, 512, 64, 64]] [16, 128, 64, 64] 65,536 BatchNorm2D-16 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 ReLU-7 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-20 [[16, 128, 64, 64]] [16, 128, 64, 64] 147,456 BatchNorm2D-17 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 Conv2D-21 [[16, 128, 64, 64]] [16, 512, 64, 64] 65,536 BatchNorm2D-18 [[16, 512, 64, 64]] [16, 512, 64, 64] 2,048 BottleneckBlock-5 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-22 [[16, 512, 64, 64]] [16, 128, 64, 64] 65,536 BatchNorm2D-19 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 ReLU-8 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-23 [[16, 128, 64, 64]] [16, 128, 64, 64] 147,456 BatchNorm2D-20 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 Conv2D-24 [[16, 128, 64, 64]] [16, 512, 64, 64] 65,536 BatchNorm2D-21 [[16, 512, 64, 64]] [16, 512, 64, 64] 2,048 BottleneckBlock-6 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-25 [[16, 512, 64, 64]] [16, 128, 64, 64] 65,536 BatchNorm2D-22 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 ReLU-9 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-26 [[16, 128, 64, 64]] [16, 128, 64, 64] 147,456 BatchNorm2D-23 [[16, 128, 64, 64]] [16, 128, 64, 64] 512 Conv2D-27 [[16, 128, 64, 64]] [16, 512, 64, 64] 65,536 BatchNorm2D-24 [[16, 512, 64, 64]] [16, 512, 64, 64] 2,048 BottleneckBlock-7 [[16, 512, 64, 64]] [16, 512, 64, 64] 0 Conv2D-29 [[16, 512, 64, 64]] [16, 256, 64, 64] 131,072 BatchNorm2D-26 [[16, 256, 64, 64]] [16, 256, 64, 64] 1,024 ReLU-10 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-30 [[16, 256, 64, 64]] [16, 256, 32, 32] 589,824 BatchNorm2D-27 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-31 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-28 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 Conv2D-28 [[16, 512, 64, 64]] [16, 1024, 32, 32] 524,288 BatchNorm2D-25 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-8 [[16, 512, 64, 64]] [16, 1024, 32, 32] 0 Conv2D-32 [[16, 1024, 32, 32]] [16, 256, 32, 32] 262,144 BatchNorm2D-29 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 ReLU-11 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-33 [[16, 256, 32, 32]] [16, 256, 32, 32] 589,824 BatchNorm2D-30 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-34 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-31 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-9 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-35 [[16, 1024, 32, 32]] [16, 256, 32, 32] 262,144 BatchNorm2D-32 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 ReLU-12 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-36 [[16, 256, 32, 32]] [16, 256, 32, 32] 589,824 BatchNorm2D-33 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-37 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-34 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-10 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-38 [[16, 1024, 32, 32]] [16, 256, 32, 32] 262,144 BatchNorm2D-35 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 ReLU-13 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-39 [[16, 256, 32, 32]] [16, 256, 32, 32] 589,824 BatchNorm2D-36 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-40 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-37 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-11 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-41 [[16, 1024, 32, 32]] [16, 256, 32, 32] 262,144 BatchNorm2D-38 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 ReLU-14 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-42 [[16, 256, 32, 32]] [16, 256, 32, 32] 589,824 BatchNorm2D-39 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-43 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-40 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-12 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-44 [[16, 1024, 32, 32]] [16, 256, 32, 32] 262,144 BatchNorm2D-41 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 ReLU-15 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-45 [[16, 256, 32, 32]] [16, 256, 32, 32] 589,824 BatchNorm2D-42 [[16, 256, 32, 32]] [16, 256, 32, 32] 1,024 Conv2D-46 [[16, 256, 32, 32]] [16, 1024, 32, 32] 262,144 BatchNorm2D-43 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 4,096 BottleneckBlock-13 [[16, 1024, 32, 32]] [16, 1024, 32, 32] 0 Conv2D-48 [[16, 1024, 32, 32]] [16, 512, 32, 32] 524,288 BatchNorm2D-45 [[16, 512, 32, 32]] [16, 512, 32, 32] 2,048 ReLU-16 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 Conv2D-49 [[16, 512, 32, 32]] [16, 512, 16, 16] 2,359,296 BatchNorm2D-46 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,048 Conv2D-50 [[16, 512, 16, 16]] [16, 2048, 16, 16] 1,048,576 BatchNorm2D-47 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 8,192 Conv2D-47 [[16, 1024, 32, 32]] [16, 2048, 16, 16] 2,097,152 BatchNorm2D-44 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 8,192 BottleneckBlock-14 [[16, 1024, 32, 32]] [16, 2048, 16, 16] 0 Conv2D-51 [[16, 2048, 16, 16]] [16, 512, 16, 16] 1,048,576 BatchNorm2D-48 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,048 ReLU-17 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 Conv2D-52 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,359,296 BatchNorm2D-49 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,048 Conv2D-53 [[16, 512, 16, 16]] [16, 2048, 16, 16] 1,048,576 BatchNorm2D-50 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 8,192 BottleneckBlock-15 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 Conv2D-54 [[16, 2048, 16, 16]] [16, 512, 16, 16] 1,048,576 BatchNorm2D-51 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,048 ReLU-18 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 Conv2D-55 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,359,296 BatchNorm2D-52 [[16, 512, 16, 16]] [16, 512, 16, 16] 2,048 Conv2D-56 [[16, 512, 16, 16]] [16, 2048, 16, 16] 1,048,576 BatchNorm2D-53 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 8,192 BottleneckBlock-16 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 AdaptiveAvgPool2D-2 [[16, 2048, 16, 16]] [16, 2048, 1, 1] 0 Conv2D-57 [[16, 2048, 1, 1]] [16, 128, 1, 1] 262,272 ReLU-19 [[16, 128, 1, 1]] [16, 128, 1, 1] 0 Conv2D-58 [[16, 128, 1, 1]] [16, 2048, 1, 1] 264,192 AdaptiveMaxPool2D-2 [[16, 2048, 16, 16]] [16, 2048, 1, 1] 0 Sigmoid-3 [[16, 2048, 1, 1]] [16, 2048, 1, 1] 0 Conv2D-59 [[16, 2, 16, 16]] [16, 1, 16, 16] 99 Sigmoid-4 [[16, 1, 16, 16]] [16, 1, 16, 16] 0 CBAM_Module-2 [[16, 2048, 16, 16]] [16, 2048, 16, 16] 0 AdaptiveAvgPool2D-3 [[16, 2048, 16, 16]] [16, 2048, 1, 1] 0 Linear-1 [[16, 2048]] [16, 101] 206,949 ================================================================================ Total params: 24,295,343 Trainable params: 24,189,103 Non-trainable params: 106,240 -------------------------------------------------------------------------------- Input size (MB): 48.00 Forward/backward pass size (MB): 22449.39 Params size (MB): 92.68 Estimated Total Size (MB): 22590.07 --------------------------------------------------------------------------------{'total_params': 24295343, 'trainable_params': 24189103}4.模型訓練
# 調用飛槳框架的VisualDL模塊,保存信息到目錄中。 callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir')def create_optim(parameters):step_each_epoch = len(train_dataset) // 32lr = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=0.01,T_max=step_each_epoch * 10)return paddle.optimizer.Momentum(learning_rate=lr,parameters=parameters,weight_decay=paddle.regularizer.L2Decay(0.002))# 模型訓練配置 model.prepare(create_optim(model.parameters()), # 優化器paddle.nn.CrossEntropyLoss(), # 損失函數paddle.metric.Accuracy(topk=(1, 5))) # 評估指標model.fit(train_dataset,val_dataset,epochs=10,shuffle=True, save_dir='./chk_points/',batch_size=32,callbacks=callback,verbose=1) The loss value printed in the log is the current step, and the metric is the average value of previous step. Epoch 1/10/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop workingreturn (isinstance(seq, collections.Sequence) andstep 2525/2525 [==============================] - loss: 1.6131 - acc_top1: 0.4604 - acc_top5: 0.7048 - 1s/step save checkpoint at /home/aistudio/chk_points/0 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 2.3558 - acc_top1: 0.4844 - acc_top5: 0.7572 - 983ms/step Eval samples: 20200 Epoch 2/10 step 2525/2525 [==============================] - loss: 1.4524 - acc_top1: 0.6161 - acc_top5: 0.8601 - 1s/step save checkpoint at /home/aistudio/chk_points/1 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 2.2584 - acc_top1: 0.5791 - acc_top5: 0.8365 - 1s/step Eval samples: 20200 Epoch 3/10 step 2525/2525 [==============================] - loss: 1.3926 - acc_top1: 0.6323 - acc_top5: 0.8714 - 1s/step save checkpoint at /home/aistudio/chk_points/2 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 2.0664 - acc_top1: 0.4394 - acc_top5: 0.7272 - 966ms/step Eval samples: 20200 Epoch 4/10 step 2525/2525 [==============================] - loss: 0.9868 - acc_top1: 0.6557 - acc_top5: 0.8828 - 1s/step save checkpoint at /home/aistudio/chk_points/3 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 3.4569 - acc_top1: 0.5897 - acc_top5: 0.8445 - 955ms/step Eval samples: 20200 Epoch 5/10 step 2525/2525 [==============================] - loss: 1.4642 - acc_top1: 0.6878 - acc_top5: 0.9018 - 1s/step save checkpoint at /home/aistudio/chk_points/4 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 1.6742 - acc_top1: 0.6254 - acc_top5: 0.8488 - 969ms/step Eval samples: 20200 Epoch 6/10 step 2525/2525 [==============================] - loss: 0.8651 - acc_top1: 0.7369 - acc_top5: 0.9241 - 1s/step save checkpoint at /home/aistudio/chk_points/5 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 1.4554 - acc_top1: 0.6617 - acc_top5: 0.8814 - 965ms/step Eval samples: 20200 Epoch 7/10 step 2525/2525 [==============================] - loss: 0.8263 - acc_top1: 0.7954 - acc_top5: 0.9487 - 1s/step save checkpoint at /home/aistudio/chk_points/6 Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 632/632 [==============================] - loss: 1.9954 - acc_top1: 0.7577 - acc_top5: 0.9257 - 951ms/step Eval samples: 20200 Epoch 8/10 step 1730/2525 [===================>..........] - loss: 0.5766 - acc_top1: 0.8649 - acc_top5: 0.9735 - ETA: 17:04 - 1s/ste model.save('infer/foods', training=False) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/hapi/model.py:1738: UserWarning: 'inputs' was not specified when Model initialization, so the input shape to be saved will be the shape derived from the user's actual inputs. The input shape to be saved is [[32, 3, 512, 512]]. For saving correct input shapes, please provide 'inputs' for Model initialization.% self._input_info[0])5.模型預測
# 進行預測操作 result = model.predict(val_dataset) Predict begin... step 20200/20200 [==============================] - 42ms/step Predict samples: 20200 from PIL import Image import numpy as np import matplotlib.pyplot as plt# 隨機抽樣展示 indexs = [11, 15, 201, 999, 1000, 5778, 6656, 9199, 10384, 20030]def get_label(predict):label = list(categorys.keys())[list(categorys.values()).index(predict)]return label# 定義畫圖方法 def show_img(img, predict):plt.figure()plt.title('predict: {}'.format(get_label(predict)))img = np.transpose(img, (2,0,1))img = np.transpose(img, (2,0,1))plt.imshow(img, cmap='gray')plt.show()for idx in indexs: t Image import numpy as np import matplotlib.pyplot as plt# 隨機抽樣展示 indexs = [11, 15, 201, 999, 1000, 5778, 6656, 9199, 10384, 20030]def get_label(predict):label = list(categorys.keys())[list(categorys.values()).index(predict)]return label# 定義畫圖方法 def show_img(img, predict):plt.figure()plt.title('predict: {}'.format(get_label(predict)))img = np.transpose(img, (2,0,1))img = np.transpose(img, (2,0,1))plt.imshow(img, cmap='gray')plt.show()for idx in indexs:show_img(val_dataset[idx][0], np.argmax(result[0][idx]))
四、總結與升華
一開始做這個項目的時候,我使用的模型是VGG,我嘗試了VGG-11和VGG-16,但VGG的模型參數太多,訓練較慢且效果不佳,于是我又嘗試了其他網絡,如MobileNetV1、MobileNetV2、ResNet等,但是在訓練過程中,Loss一直停留在4.5左右降不下去,因此我改變思路,決定原有網絡的基礎上進行改進,因此,基于卷積塊注意力模塊的殘差網絡(ResNet-CBAM)由此誕生。
經過多輪調試,我最終選擇了ResNet50-CBAM.在實踐中,我發現網絡并不是越深越好,網絡加深的同時會帶來計算量的增大,無論是訓練還是預測,都會帶來一些不好的影響,所以綜合考慮之后,我選擇在ResNet50的基礎上增加注意力機制,并取得了一定的效果。
個人簡介
北京聯合大學 機器人學院 自動化專業 2018級 本科生 鄭博培
百度飛槳開發者技術專家 PPDE
百度飛槳官方幫幫團、答疑團成員
深圳柴火創客空間 認證會員
百度大腦 智能對話訓練師
我在AI Studio上獲得至尊等級,點亮9個徽章,來互關呀!!!
https://aistudio.baidu.com/aistudio/personalcenter/thirdview/147378
總結
以上是生活随笔為你收集整理的新年美食鉴赏——基于注意力机制CBAM的美食101分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 『项目管理』用ALPEN法则来安排每日工
- 下一篇: 去掉RecyclerView的默认ite