【知识蒸馏】使用CoatNet蒸馏ResNet图像分类模型
本文轉載自:https://blog.csdn.net/hhhhhhhhhhwwwwwwwwww/article/details/127787791 ,僅作留用和學習,如有侵權,立刻刪除!
文章目錄
- 摘要
- 蒸餾的過程
- 最終結論
- 數據準備
- 教師網絡
-
- 步驟
-
- 導入需要的庫
- 定義訓練和驗證函數
- 定義全局參數
- 圖像預處理與增強
- 讀取數據
- 設置模型和Loss
- 學生網絡
-
- 步驟
-
- 導入需要的庫
- 定義訓練和驗證函數
- 定義全局參數
- 圖像預處理與增強
- 讀取數據
- 設置模型和Loss
- 蒸餾學生網絡
-
- 步驟
-
- 導入需要的庫
- 定義蒸餾函數
- 定義訓練和驗證函數
- 定義全局參數
- 圖像預處理與增強
- 讀取數據
- 設置模型和Loss
- 結果比對
- 總結
摘要
知識蒸餾(Knowledge Distillation),簡稱KD,將已經訓練好的模型包含的知識(”Knowledge”),蒸餾(“Distill”)提取到另一個模型里面去。Hinton在"Distilling the Knowledge in a Neural Network"首次提出了知識蒸餾(暗知識提取)的概念,通過引入與教師網絡(Teacher network:復雜、但預測精度優越)相關的軟目標(Soft-target)作為Total loss的一部分,以誘導學生網絡(Student network:精簡、低復雜度,更適合推理部署)的訓練,實現知識遷移(Knowledge transfer)。論文鏈接:https://arxiv.org/pdf/1503.02531.pdf
蒸餾的過程
知識蒸餾使用的是Teacher—Student模型,其中teacher是“知識”的輸出者,student是“知識”的接受者。知識蒸餾的過程分為2個階段:
- 原始模型訓練: 訓練"Teacher模型", 簡稱為Net-T,它的特點是模型相對復雜,也可以由多個分別訓練的模型集成而成。我們對"Teacher模型"不作任何關于模型架構、參數量、是否集成方面的限制,唯一的要求就是,對于輸入X, 其都能輸出Y,其中Y經過softmax的映射,輸出值對應相應類別的概率值。
- 精簡模型訓練: 訓練"Student模型", 簡稱為Net-S,它是參數量較小、模型結構相對簡單的單模型。同樣的,對于輸入X,其都能輸出Y,Y經過softmax映射后同樣能輸出對應相應類別的概率值。
- Teacher學習能力強,可以將它學到的知識遷移給學習能力相對弱的Student模型,以此來增強Student模型的泛化能力。復雜笨重但是效果好的Teacher模型不上線,就單純是個導師角色,真正部署上線進行預測任務的是靈活輕巧的Student小模型。
最終結論
先把結論說了吧! Teacher網絡使用coatnet_2,Student網絡使用ResNet18。如下表
| coatnet_2 | 50 | 92% |
| ResNet18 | 50 | 86% |
| ResNet18 +KD | 50 | 89% |
在相同的條件下,加入知識蒸餾后,ResNet18的ACC上升了3個點,提升的還是很高的。如下圖:
數據準備
數據使用我以前在圖像分類任務中的數據集——植物幼苗數據集,先將數據集轉為訓練集和驗證集。執行代碼:
import glob import os import shutilimage_list=glob.glob('data1/*/*.png') print(image_list) file_dir='data' if os.path.exists(file_dir):print('true')#os.rmdir(file_dir)shutil.rmtree(file_dir)#刪除再建立os.makedirs(file_dir) else:os.makedirs(file_dir)from sklearn.model_selection import train_test_split trainval_files, val_files = train_test_split(image_list, test_size=0.3, random_state=42) train_dir='train' val_dir='val' train_root=os.path.join(file_dir,train_dir) val_root=os.path.join(file_dir,val_dir) for file in trainval_files:file_class=file.replace("\\","/").split('/')[-2]file_name=file.replace("\\","/").split('/')[-1]file_class=os.path.join(train_root,file_class)if not os.path.isdir(file_class):os.makedirs(file_class)shutil.copy(file, file_class + '/' + file_name)for file in val_files:file_class=file.replace("\\","/").split('/')[-2]file_name=file.replace("\\","/").split('/')[-1]file_class=os.path.join(val_root,file_class)if not os.path.isdir(file_class):os.makedirs(file_class)shutil.copy(file, file_class + '/' + file_name)教師網絡
教師網絡選用coatnet_2,是一個比較大一點的網絡了,模型的大小有200M。訓練50個epoch,最好的模型在92%左右。
步驟
新建teacher_train.py,插入代碼:
導入需要的庫
import torch.optim as optim import torch import torch.nn as nn import torch.nn.parallel import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from torchvision import datasets from torch.autograd import Variable from model.coatnet import coatnet_2import json import os定義訓練和驗證函數
def train(model, device, train_loader, optimizer, epoch):model.train()sum_loss = 0total_num = len(train_loader.dataset)print(total_num, len(train_loader))for batch_idx, (data, target) in enumerate(train_loader):data, target = Variable(data).to(device), Variable(target).to(device)output = model(data)loss = criterion(output, target)optimizer.zero_grad()loss.backward()optimizer.step()print_loss = loss.data.item()sum_loss += print_lossif (batch_idx + 1) % 10 == 0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),100. * (batch_idx + 1) / len(train_loader), loss.item()))ave_loss = sum_loss / len(train_loader)print('epoch:{},loss:{}'.format(epoch, ave_loss))Best_ACC=0 # 驗證過程 @torch.no_grad() def val(model, device, test_loader):global Best_ACCmodel.eval()test_loss = 0correct = 0total_num = len(test_loader.dataset)print(total_num, len(test_loader))with torch.no_grad():for data, target in test_loader:data, target = Variable(data).to(device), Variable(target).to(device)output = model(data)loss = criterion(output, target)_, pred = torch.max(output.data, 1)correct += torch.sum(pred == target)print_loss = loss.data.item()test_loss += print_losscorrect = correct.data.item()acc = correct / total_numavgloss = test_loss / len(test_loader)if acc > Best_ACC:torch.save(model, file_dir + '/' + 'best.pth')Best_ACC = accprint('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(avgloss, correct, len(test_loader.dataset), 100 * acc))return acc定義全局參數
if __name__ == '__main__':# 創建保存模型的文件夾file_dir = 'CoatNet'if os.path.exists(file_dir):print('true')os.makedirs(file_dir, exist_ok=True)else:os.makedirs(file_dir)# 設置全局參數modellr = 1e-4BATCH_SIZE = 16EPOCHS = 50DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')圖像預處理與增強
# 數據預處理7transform = transforms.Compose([transforms.RandomRotation(10),transforms.GaussianBlur(kernel_size=(5, 5), sigma=(0.1, 3.0)),transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5),transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])transform_test = transforms.Compose([transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])讀取數據
使用pytorch默認讀取數據的方式。
# 讀取數據dataset_train = datasets.ImageFolder('data/train', transform=transform)dataset_test = datasets.ImageFolder("data/val", transform=transform_test)with open('class.txt', 'w') as file:file.write(str(dataset_train.class_to_idx))with open('class.json', 'w', encoding='utf-8') as file:file.write(json.dumps(dataset_train.class_to_idx))# 導入數據train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)設置模型和Loss
# 實例化模型并且移動到GPUcriterion = nn.CrossEntropyLoss()model_ft = coatnet_2()num_ftrs = model_ft.fc.in_featuresmodel_ft.fc = nn.Linear(num_ftrs, 12)model_ft.to(DEVICE)# 選擇簡單暴力的Adam優化器,學習率調低optimizer = optim.Adam(model_ft.parameters(), lr=modellr)cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=20, eta_min=1e-9)# 訓練val_acc_list= {}for epoch in range(1, EPOCHS + 1):train(model_ft, DEVICE, train_loader, optimizer, epoch)cosine_schedule.step()acc=val(model_ft, DEVICE, test_loader)val_acc_list[epoch]=accwith open('result.json', 'w', encoding='utf-8') as file:file.write(json.dumps(val_acc_list))torch.save(model_ft, 'CoatNet/model_final.pth')完成上面的代碼就可以開始訓練Teacher網絡了。
學生網絡
學生網絡選用ResNet18,是一個比較小一點的網絡了,模型的大小有40M。訓練50個epoch,最好的模型在86%左右。
步驟
新建student_train.py,插入代碼:
導入需要的庫
import torch.optim as optim import torch import torch.nn as nn import torch.nn.parallel import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from torchvision import datasets from torch.autograd import Variable from torchvision.models.resnet import resnet18import json import os定義訓練和驗證函數
def train(model, device, train_loader, optimizer, epoch):model.train()sum_loss = 0total_num = len(train_loader.dataset)print(total_num, len(train_loader))for batch_idx, (data, target) in enumerate(train_loader):data, target = Variable(data).to(device), Variable(target).to(device)output = model(data)loss = criterion(output, target)optimizer.zero_grad()loss.backward()optimizer.step()print_loss = loss.data.item()sum_loss += print_lossif (batch_idx + 1) % 10 == 0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),100. * (batch_idx + 1) / len(train_loader), loss.item()))ave_loss = sum_loss / len(train_loader)print('epoch:{},loss:{}'.format(epoch, ave_loss))Best_ACC=0 # 驗證過程 @torch.no_grad() def val(model, device, test_loader):global Best_ACCmodel.eval()test_loss = 0correct = 0total_num = len(test_loader.dataset)print(total_num, len(test_loader))with torch.no_grad():for data, target in test_loader:data, target = Variable(data).to(device), Variable(target).to(device)output = model(data)loss = criterion(output, target)_, pred = torch.max(output.data, 1)correct += torch.sum(pred == target)print_loss = loss.data.item()test_loss += print_losscorrect = correct.data.item()acc = correct / total_numavgloss = test_loss / len(test_loader)if acc > Best_ACC:torch.save(model, file_dir + '/' + 'best.pth')Best_ACC = accprint('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(avgloss, correct, len(test_loader.dataset), 100 * acc))return acc定義全局參數
if __name__ == '__main__':# 創建保存模型的文件夾file_dir = 'resnet'if os.path.exists(file_dir):print('true')os.makedirs(file_dir, exist_ok=True)else:os.makedirs(file_dir)# 設置全局參數modellr = 1e-4BATCH_SIZE = 16EPOCHS = 50DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')圖像預處理與增強
# 數據預處理7transform = transforms.Compose([transforms.RandomRotation(10),transforms.GaussianBlur(kernel_size=(5, 5), sigma=(0.1, 3.0)),transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5),transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])transform_test = transforms.Compose([transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])讀取數據
使用pytorch默認讀取數據的方式。
# 讀取數據dataset_train = datasets.ImageFolder('data/train', transform=transform)dataset_test = datasets.ImageFolder("data/val", transform=transform_test)with open('class.txt', 'w') as file:file.write(str(dataset_train.class_to_idx))with open('class.json', 'w', encoding='utf-8') as file:file.write(json.dumps(dataset_train.class_to_idx))# 導入數據train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)設置模型和Loss
# 實例化模型并且移動到GPUcriterion = nn.CrossEntropyLoss()model_ft = resnet18()print(model_ft)num_ftrs = model_ft.fc.in_featuresmodel_ft.fc = nn.Linear(num_ftrs, 12)model_ft.to(DEVICE)# 選擇簡單暴力的Adam優化器,學習率調低optimizer = optim.Adam(model_ft.parameters(), lr=modellr)cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=20, eta_min=1e-9)# 訓練val_acc_list= {}for epoch in range(1, EPOCHS + 1):train(model_ft, DEVICE, train_loader, optimizer, epoch)cosine_schedule.step()acc=val(model_ft, DEVICE, test_loader)val_acc_list[epoch]=accwith open('result_student.json', 'w', encoding='utf-8') as file:file.write(json.dumps(val_acc_list))torch.save(model_ft, 'resnet/model_final.pth')完成上面的代碼就可以開始訓練Student網絡了。
蒸餾學生網絡
學生網絡繼續選用ResNet18,使用Teacher網絡蒸餾學生網絡,訓練50個epoch,最終ACC是89%。
步驟
新建student_kd_train.py,插入代碼:
導入需要的庫
import torch.optim as optim import torch import torch.nn as nn import torch.nn.parallel import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from torchvision import datasets from torch.autograd import Variable from torchvision.models.resnet import resnet18import json import os定義蒸餾函數
def distillation(y, labels, teacher_scores, temp, alpha):return nn.KLDivLoss()(F.log_softmax(y / temp, dim=1), F.softmax(teacher_scores / temp, dim=1)) * (temp * temp * 2.0 * alpha) + F.cross_entropy(y, labels) * (1. - alpha)定義訓練和驗證函數
# 定義訓練過程 def train(model, device, train_loader, optimizer, epoch):model.train()sum_loss = 0total_num = len(train_loader.dataset)print(total_num, len(train_loader))for batch_idx, (data, target) in enumerate(train_loader):data, target = data.to(device), target.to(device)optimizer.zero_grad()output = model(data)teacher_output = teacher_model(data) # 訓練出教師的 teacher_outputteacher_output = teacher_output.detach() # 切斷老師網絡的反向傳播loss = distillation(output, target, teacher_output, temp=7.0, alpha=0.7) # 通過老師的 teacher_output訓練學生的outputloss.backward()optimizer.step()print_loss = loss.data.item()sum_loss += print_lossif (batch_idx + 1) % 10 == 0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),100. * (batch_idx + 1) / len(train_loader), loss.item()))ave_loss = sum_loss / len(train_loader)print('epoch:{},loss:{}'.format(epoch, ave_loss))Best_ACC=0 # 驗證過程 @torch.no_grad() def val(model, device, test_loader):global Best_ACCmodel.eval()test_loss = 0correct = 0total_num = len(test_loader.dataset)print(total_num, len(test_loader))with torch.no_grad():for data, target in test_loader:data, target = Variable(data).to(device), Variable(target).to(device)output = model(data)loss = criterion(output, target)_, pred = torch.max(output.data, 1)correct += torch.sum(pred == target)print_loss = loss.data.item()test_loss += print_losscorrect = correct.data.item()acc = correct / total_numavgloss = test_loss / len(test_loader)if acc > Best_ACC:torch.save(model, file_dir + '/' + 'best.pth')Best_ACC = accprint('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(avgloss, correct, len(test_loader.dataset), 100 * acc))return acc定義全局參數
if __name__ == '__main__':# 創建保存模型的文件夾file_dir = 'resnet_kd'if os.path.exists(file_dir):print('true')os.makedirs(file_dir, exist_ok=True)else:os.makedirs(file_dir)# 設置全局參數modellr = 1e-4BATCH_SIZE = 16EPOCHS = 50DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')圖像預處理與增強
# 數據預處理7transform = transforms.Compose([transforms.RandomRotation(10),transforms.GaussianBlur(kernel_size=(5, 5), sigma=(0.1, 3.0)),transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5),transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])transform_test = transforms.Compose([transforms.Resize((224, 224)),transforms.ToTensor(),transforms.Normalize(mean=[0.44127703, 0.4712498, 0.43714803], std=[0.18507297, 0.18050247, 0.16784933])])讀取數據
使用pytorch默認讀取數據的方式。
# 讀取數據dataset_train = datasets.ImageFolder('data/train', transform=transform)dataset_test = datasets.ImageFolder("data/val", transform=transform_test)with open('class.txt', 'w') as file:file.write(str(dataset_train.class_to_idx))with open('class.json', 'w', encoding='utf-8') as file:file.write(json.dumps(dataset_train.class_to_idx))# 導入數據train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)設置模型和Loss
# 實例化模型并且移動到GPUcriterion = nn.CrossEntropyLoss()model_ft = resnet18()print(model_ft)num_ftrs = model_ft.fc.in_featuresmodel_ft.fc = nn.Linear(num_ftrs, 12)model_ft.to(DEVICE)# 選擇簡單暴力的Adam優化器,學習率調低optimizer = optim.Adam(model_ft.parameters(), lr=modellr)cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=20, eta_min=1e-9)# 訓練val_acc_list= {}for epoch in range(1, EPOCHS + 1):train(model_ft, DEVICE, train_loader, optimizer, epoch)cosine_schedule.step()acc=val(model_ft, DEVICE, test_loader)val_acc_list[epoch]=accwith open('result_student.json', 'w', encoding='utf-8') as file:file.write(json.dumps(val_acc_list))torch.save(model_ft, 'resnet_kd/model_final.pth')完成上面的代碼就可以開始蒸餾模式!!!
結果比對
加載保存的結果,然后繪制acc曲線。
import numpy as np from matplotlib import pyplot as plt import json teacher_file='result.json' student_file='result_student.json' student_kd_file='result_kd.json' def read_json(file):with open(file, 'r', encoding='utf8') as fp:json_data = json.load(fp)print(json_data)return json_datateacher_data=read_json(teacher_file) student_data=read_json(student_file) student_kd_data=read_json(student_kd_file)x =[int(x) for x in list(dict(teacher_data).keys())] print(x)plt.plot(x, list(teacher_data.values()), label='teacher') plt.plot(x,list(student_data.values()), label='student without KD') plt.plot(x, list(student_kd_data.values()), label='student with KD')plt.title('Test accuracy') plt.legend()plt.show()總結
知識蒸餾是常用的一種對輕量化模型壓縮和提升的方法。今天通過一個簡單的例子講解了如何使用Teacher網絡對Student網絡進行蒸餾。
本次實戰用到的代碼和數據集詳見:
https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwww/86947893
總結
以上是生活随笔為你收集整理的【知识蒸馏】使用CoatNet蒸馏ResNet图像分类模型的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 计算机本科生论文都抽查什么,本科生毕业论
- 下一篇: 简易计算器(非常简单)本文为转载