AnnaAraslanova/FBNet 程序分析
AnnaAraslanova/FBNet 是 FBNet 相對來說比較好的一個第三方實現。延遲測量采用 x86 處理器的結果近似。需要注意的是:
- PyTorch GPU 并行對輸入數據有要求;
- 隨機超網絡直接使用 BN 層似乎不妥。
supernet_main_file.py
train_supernet 訓練隨機超網絡。
sample_architecture_from_the_supernet 從中選出最優結構。
if __name__ == "__main__":assert args.train_or_sample in ['train', 'sample']if args.train_or_sample == 'train':train_supernet()elif args.train_or_sample == 'sample':assert args.architecture_name != '' and args.architecture_name not in MODEL_ARCHhardsampling = False if args.hardsampling_bool_value in ['False', '0'] else Truesample_architecture_from_the_supernet(unique_name_of_arch=args.architecture_name, hardsampling=hardsampling)train_supernet
Created with Rapha?l 2.2.0train_supernetconfig_for_supernetcreate_directories_from_listget_loggerSummaryWriterLookUpTableget_loadersget_test_loaderFBNet_Stochastic_SuperNetweights_initSupernetLosscheck_tensor_in_listCosineAnnealingLRTrainerSupernetTrainerSupernet.train_loop設置隨機種子,確保可復現。
manual_seed = 1np.random.seed(manual_seed)torch.manual_seed(manual_seed)torch.cuda.manual_seed_all(manual_seed)torch.backends.cudnn.benchmark = TrueCONFIG_SUPERNET 存儲了超網絡的配置參數。 create_directories_from_list 創建 tensorboard 日志文件路徑。
get_logger 根據文件路徑創建一個日志記錄器并設置格式。
SummaryWriter 創建一個 TensorBoard 事件異步寫入器。
1.7版本之后參數變成了logdir。
create_directories_from_list([CONFIG_SUPERNET['logging']['path_to_tensorboard_logs']])logger = get_logger(CONFIG_SUPERNET['logging']['path_to_log_file'])writer = SummaryWriter(log_dir=CONFIG_SUPERNET['logging']['path_to_tensorboard_logs'])LookUpTable 會將結果寫入文件。
#### LookUp table consists all information about layerslookup_table = LookUpTable(calulate_latency=CONFIG_SUPERNET['lookup_table']['create_from_scratch'])get_loaders 劃分訓練和驗證數據集。
#### DataLoadingtrain_w_loader, train_thetas_loader = get_loaders(CONFIG_SUPERNET['dataloading']['w_share_in_train'],CONFIG_SUPERNET['dataloading']['batch_size'],CONFIG_SUPERNET['dataloading']['path_to_save_data'],logger)test_loader = get_test_loader(CONFIG_SUPERNET['dataloading']['batch_size'],CONFIG_SUPERNET['dataloading']['path_to_save_data'])實例化 FBNet_Stochastic_SuperNet 。
nn.Module.apply 將fn遞歸地應用于每個子模塊(由.children()返回)以及self。典型用途包括初始化模型的參數(另請參見 torch.nn.init)。
這里為什么調用 weights_init 而不是在內部初始化?
沒有加載快照繼續訓練的功能。
torch.nn.DataParallel 在模塊級實現數據并行性。此容器通過在批處理維度中進行分塊,將輸入拆分到指定設備上,從而使給定module的應用程序并行化(其他對象將在每個設備上復制一次)。在前向過程中,模塊在每個設備上復制,每個副本處理輸入的一部分。在向后傳遞期間,匯總每個副本的梯度到原始模塊中。批量大小應大于使用的 GPU 數量。
另請參閱:Use nn.DataParallel instead of multiprocessing
允許將任意位置和關鍵字輸入傳遞到 DataParallel,但某些類型是特殊處理的。在指定的dim上(默認為0)分散張量。淺復制元組、列表和字典類型。其他類型將在不同的線程之間共享,如果在模型的正向傳遞中寫入,則可能會損壞。
在運行此 DataParallel 模塊之前,并行化module必須在device_ids[0]上具有其參數和緩沖區。
每次前向時,模塊都會復制到每個設備上,因此forward運行模塊的任何更新都將丟失。例如,如果module具有在每個forward中遞增的計數器屬性,則它將始終保持在初始值,因為更新是對forward之后銷毀的副本進行的。但是,DataParallel 保證device[0]上副本的參數和緩沖區與基本并行化module共享存儲。 因此將記錄device[0]上的參數和緩沖區的原地更新。例如,BBatchNorm2d 和 spectral_norm() 依賴于此行為來更新緩沖區。
將調用module及其子模塊上定義的前向和后向鉤子len(device_ids)次,每個鉤子的輸入都位于特定的設備上。特別地,僅保證鉤子在相應設備上的操作順序正確。例如,不能保證在所有len(device_ids)個 forward() 調用之前執行通過 register_forward_pre_hook() 設置的鉤子,但是每個鉤子都會在該設備的相應 forward() 調用之前執行。
網絡權重和結構參數關聯到不同的優化器。
SupernetLoss 計算帶有延遲的損失。
torch.optim.lr_scheduler.CosineAnnealingLR 使用余弦退火計劃設置每個參數組的學習率,其中 ηmax\eta_{max}ηmax? 設置為初始 lr,TcurT_{cur}Tcur? 是自 SGDR 上次重啟以來的紀元數:
ηt+1=ηmin+(ηt?ηmin)1+cos?(Tcur+1Tmaxπ)1+cos?(TcurTmaxπ),Tcur≠(2k+1)Tmax;ηt+1=ηt+(ηmax?ηmin)1?cos?(1Tmaxπ)2,Tcur=(2k+1)Tmax.\begin{aligned} \eta_{t+1} = \eta_{min} + (\eta_t - \eta_{min})\frac{1 + \cos(\frac{T_{cur+1}}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)}, T_{cur} \neq (2k+1)T_{max};\\ \eta_{t+1} = \eta_{t} + (\eta_{max} - \eta_{min})\frac{1 - \cos(\frac{1}{T_{max}}\pi)}{2}, T_{cur} = (2k+1)T_{max}.\\ \end{aligned} ηt+1?=ηmin?+(ηt??ηmin?)1+cos(Tmax?Tcur??π)1+cos(Tmax?Tcur+1??π)?,Tcur???=(2k+1)Tmax?;ηt+1?=ηt?+(ηmax??ηmin?)21?cos(Tmax?1?π)?,Tcur?=(2k+1)Tmax?.?
TrainerSupernet 封裝了訓練過程。
#### Training Looptrainer = TrainerSupernet(criterion, w_optimizer, theta_optimizer, w_scheduler, logger, writer)trainer.train_loop(train_w_loader, train_thetas_loader, test_loader, model)get_logger
""" Make python logger """# [!] Since tensorboardX use default logger (e.g. logging.info()), we should use custom loggerlogger = logging.getLogger('fbnet')log_format = '%(asctime)s | %(message)s'formatter = logging.Formatter(log_format, datefmt='%m/%d %I:%M:%S %p')file_handler = logging.FileHandler(file_path)file_handler.setFormatter(formatter)stream_handler = logging.StreamHandler()stream_handler.setFormatter(formatter)logger.addHandler(file_handler)logger.addHandler(stream_handler)logger.setLevel(logging.INFO)return loggerLookUpTable
Created with Rapha?l 2.2.0LookUpTablecandidate_blocks, search_space_generate_layers_parameterscalulate_latency?_create_from_operationsEnd_create_from_fileyesnoCANDIDATE_BLOCKS 列舉了論文表2中的9種結構,詳細參數在 PRIMITIVES 中。
| k3_e1 | 1 | 3 | 1 |
| k3_e1_g2 | 1 | 3 | 2 |
| k3_e3 | 3 | 3 | 1 |
| k3_e6 | 6 | 3 | 1 |
| k5_e1 | 1 | 5 | 1 |
| k5_e1_g2 | 1 | 5 | 2 |
| k5_e3 | 3 | 5 | 1 |
| k5_e6 | 6 | 5 | 1 |
| skip | - | - | - |
SEARCH_SPACE 對應論文表1網絡結構(僅 TBS)。
| 2242×3224^2 \times 32242×3 | 3x3 conv | 16 | 1 | 2 |
| 1122×16112^2 \times 161122×16 | TBS | 16 | 1 | 1 |
| 1122×16112^2 \times 161122×16 | TBS | 24 | 4 | 2 |
| 562×2456^2 \times 24562×24 | TBS | 32 | 4 | 2 |
| 282×3228^2 \times 32282×32 | TBS | 64 | 4 | 2 |
| 142×6414^2 \times 64142×64 | TBS | 112 | 4 | 1 |
| 142×11214^2 \times 112142×112 | TBS | 184 | 4 | 2 |
| 72×1847^2 \times 18472×184 | TBS | 352 | 1 | 1 |
| 72×3527^2 \times 35272×352 | 1x1 conv | 1984 | 1 | 1 |
| 72×1504(1984)7^2 \times 1504~(1984)72×1504?(1984) | x7 avgpool | - | 1 | 1 |
| 150415041504 | fc | 1000 | 1 | - |
由search_space的輸入形狀數量推斷層數。
創建操作符字典self.lookup_table_operations。
_generate_layers_parameters 從 SEARCH_SPACE 中解析出層參數和輸入參數。
_create_from_operations 計算操作符的耗時并寫入文件。
_read_lookup_table_from_file 從文件讀取結果。
# lookup_tableself.lookup_table_latency = Noneif calulate_latency:self._create_from_operations(cnt_of_runs=CONFIG_SUPERNET['lookup_table']['number_of_runs'],write_to_file=CONFIG_SUPERNET['lookup_table']['path_to_lookup_table'])else:self._create_from_file(path_to_file=CONFIG_SUPERNET['lookup_table']['path_to_lookup_table'])_generate_layers_parameters
_generate_layers_parameters 從search_space字典中讀取參數,構造各層參數列表layers_parameters。這里的參數順序需要與 PRIMITIVES 中一致。
# layers_parameters are : C_in, C_out, expansion, stridelayers_parameters = [(search_space["input_shape"][layer_id][0],search_space["channel_size"][layer_id],# expansion (set to -999) embedded into operation and will not be considered# (look fbnet_building_blocks/fbnet_builder.py - this is facebookresearch code# and I don't want to modify it)-999,search_space["strides"][layer_id]) for layer_id in range(self.cnt_layers)]# layers_input_shapes are (C_in, input_w, input_h)layers_input_shapes = search_space["input_shape"]return layers_parameters, layers_input_shapes_create_from_operations
_create_from_operations_calculate_latency_write_lookup_table_to_file self.lookup_table_latency = self._calculate_latency(self.lookup_table_operations,self.layers_parameters,self.layers_input_shapes,cnt_of_runs)if write_to_file is not None:self._write_lookup_table_to_file(write_to_file)_calculate_latency
latency_table_layer_by_ops為每 TBS 創建一個字典,用于記錄每個操作的耗時。
隨機生成數據,globals() 返回表示當前全局符號表的字典。這始終是當前模塊的字典(在函數或方法內部,這是定義它的模塊,而不是調用它的模塊)。
timeit.timeit 使用給定的語句、設置代碼和計時器函數創建一個Timer 實例,并使用數字執行運行其 timeit() 方法。 可選的globals參數指定用于執行代碼的命名空間。
_write_lookup_table_to_file
_write_lookup_table_to_fileclear_files_in_the_listadd_text_to_fileclear_files_in_the_list 清空已有文件。
ops為操作符名稱列表。第1行打印名稱。
打印操作符的耗時,每行為一個 TBS。
add_text_to_file 以文件形式保存結果。
_create_from_file
_create_from_file_read_lookup_table_from_file self.lookup_table_latency = self._read_lookup_table_from_file(path_to_file)_read_lookup_table_from_file
從文件讀取結果,第一行為名稱。
latences = [line.strip('\n') for line in open(path_to_file)]ops_names = latences[0].split(" ")latences = [list(map(float, layer.split(" "))) for layer in latences[1:]]lookup_table_latency = [{op_name : latences[i][op_id] for op_id, op_name in enumerate(ops_names)} for i in range(self.cnt_layers)]return lookup_table_latencyget_loaders
隨機裁減、翻轉并標準化。
train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize(CIFAR_MEAN, CIFAR_STD),])train_data = datasets.CIFAR10(root=path_to_save_data, train=True, download=True, transform=train_transform)創建索引,劃分數據集。
torch.utils.data.SubsetRandomSampler 從給定的索引列表中隨機抽樣元素,無需替換。
get_test_loader
測試僅作歸一化。
test_transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize(CIFAR_MEAN, CIFAR_STD),])test_data = datasets.CIFAR10(root=path_to_save_data, train=False,download=True, transform=test_transform)test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,shuffle=False, num_workers=16)return test_loaderFBNet_Stochastic_SuperNet
FBNet_Stochastic_SuperNetConvBNReluMixedOperationConvBNRelu 構建基本模塊,僅初始化了卷積參數。
torch.nn.ModuleList 將子模塊保存在列表中。ModuleList 可以像常規 Python 列表一樣編制索引,但它包含的模塊已正確注冊,并且所有 Module 方法都可以看到它們。
MixedOperation 運行操作符列表并求延遲加權和。
forward
網絡抽象為:
firststages_to_searchlast_stages y = self.first(x)for mixed_op in self.stages_to_search:y, latency_to_accumulate = mixed_op(y, temperature, latency_to_accumulate)y = self.last_stages(y)return y, latency_to_accumulateMixedOperation
MixedOperation 根據proposed_operations字典構建操作列表、延遲列表及相應參數。
提取出proposed_operations的鍵得到列表ops_names。latency為字典。
forward
ml,i=GumbelSoftmax(θl,i∣θl)=exp?[(θl,i+gl,i)/τ]∑iexp?[(θl,i+gl,i)/τ],\begin{aligned} m_{l, i} & = \text{GumbelSoftmax}(\theta_{l, i}|\mathrm{\theta_{l}}) \\ & = \frac{\exp[(\theta_{l,i} + g_{l,i})/\tau]}{\sum_i \exp[(\theta_{l,i} + g_{l,i})/\tau]}, \end{aligned} ml,i??=GumbelSoftmax(θl,i?∣θl?)=∑i?exp[(θl,i?+gl,i?)/τ]exp[(θl,i?+gl,i?)/τ]?,?
xl+1=∑iml,i?bl,i(xl),\begin{aligned} x_{l+1} = \sum_i m_{l, i} \cdot b_{l, i}(x_{l}), \end{aligned} xl+1?=i∑?ml,i??bl,i?(xl?),?
LAT(a)=∑l∑iml,i?LAT(bl,i).\begin{aligned} \text{LAT}(a) = \sum_l \sum_i m_{l,i} \cdot \text{LAT} (b_{l,i}). \end{aligned} LAT(a)=l∑?i∑?ml,i??LAT(bl,i?).?
torch.nn.functional.gumbel_softmax 從 Gumbel-Softmax 分布([Concrete Distribution] [Gumbel-Softmax])采樣,并可選擇離散化。
參數:
- logits:[…, num_features]未標準化的概率對數
- tau:非負標量溫度
- hard:如果為True,則返回的樣本將被離散化為 one-hot 矢量,但可微,就好像它是 autograd 中的軟樣本一樣
- dim(int):計算 softmax 的維數。默認值:-1。
返回:
采樣與logits形狀相同的張量,服從 Gumbel-Softmax 分布。如果hard=True,則返回的樣本將是獨熱的,否則它們將是各dim概率和為1的概率分布。
此函數出于遺留原因,可能會在將來從 nn.Functional 中刪除。
hard的主要技巧是做 y_hard - y_soft.detach() + y_soft
它實現了兩件事:
- 使輸出值完全獨熱(因為我們加然后減去 y_soft 值)
- 使梯度等于 y_soft 梯度(因為我們剝離所有其他梯度)
這里self.thetas需要加 torch.Tensor.unsqueeze 操作變成2維。
soft_mask_variables = nn.functional.gumbel_softmax(self.thetas, temperature)output = sum(m * op(x) for m, op in zip(soft_mask_variables, self.ops))latency = sum(m * lat for m, lat in zip(soft_mask_variables, self.latency))latency_to_accumulate = latency_to_accumulate + latencyreturn output, latency_to_accumulateweights_init
weights_init 僅初始化卷積和全連接。
if deepth > max_depth:returnif isinstance(m, torch.nn.Conv2d):torch.nn.init.kaiming_uniform_(m.weight.data)if m.bias is not None:torch.nn.init.constant_(m.bias.data, 0)elif isinstance(m, torch.nn.Linear):m.weight.data.normal_(0, 0.01)if m.bias is not None:m.bias.data.zero_()elif isinstance(m, torch.nn.BatchNorm2d):returnelif isinstance(m, torch.nn.ReLU):returnelif isinstance(m, torch.nn.Module):deepth += 1for m_ in m.modules():weights_init(m_, deepth)else:raise ValueError("%s is unk" % m.__class__.__name__)SupernetLoss
def __init__(self):super(SupernetLoss, self).__init__()self.alpha = CONFIG_SUPERNET['loss']['alpha']self.beta = CONFIG_SUPERNET['loss']['beta']self.weight_criterion = nn.CrossEntropyLoss()forward
L(a,wa)=CE(a,wa)?αlog?(LAT(a))β.\begin{aligned} \mathcal{L}(a, w_a) = \text{ CE}(a, w_a) \cdot \alpha \log(\text{LAT}(a))^\beta. \end{aligned} L(a,wa?)=?CE(a,wa?)?αlog(LAT(a))β.?
需要對torch.log(latency ** self.beta)求均值。
self.beta應放在外面,否則會失去作用。
ce = self.weight_criterion(outs, targets)lat = torch.log(latency ** self.beta)losses_ce.update(ce.item(), N)losses_lat.update(lat.item(), N)loss = self.alpha * ce * latreturn loss #.unsqueeze(0)TrainerSupernet
AverageMeter 能夠累積數據求均值。
def __init__(self, criterion, w_optimizer, theta_optimizer, w_scheduler, logger, writer):self.top1 = AverageMeter()self.top3 = AverageMeter()self.losses = AverageMeter()self.losses_lat = AverageMeter()self.losses_ce = AverageMeter()self.logger = loggerself.writer = writerself.criterion = criterionself.w_optimizer = w_optimizerself.theta_optimizer = theta_optimizerself.w_scheduler = w_schedulerself.temperature = CONFIG_SUPERNET['train_settings']['init_temperature']self.exp_anneal_rate = CONFIG_SUPERNET['train_settings']['exp_anneal_rate'] # apply it every epochself.cnt_epochs = CONFIG_SUPERNET['train_settings']['cnt_epochs']self.train_thetas_from_the_epoch = CONFIG_SUPERNET['train_settings']['train_thetas_from_the_epoch']self.print_freq = CONFIG_SUPERNET['train_settings']['print_freq']self.path_to_save_model = CONFIG_SUPERNET['train_settings']['path_to_save_model']train_loop
train_loop_training_step_validate首先訓練網絡權重self.train_thetas_from_the_epoch個 epoch。
調用 _training_step 一次訓練一個 epoch,名字不具有表現力。
然后交替訓練權重和結構。交替更新一定程度上降低了效率。
for epoch in range(self.train_thetas_from_the_epoch, self.cnt_epochs):self.writer.add_scalar('learning_rate/weights', self.w_optimizer.param_groups[0]['lr'], epoch)self.writer.add_scalar('learning_rate/theta', self.theta_optimizer.param_groups[0]['lr'], epoch)self.logger.info("Start to train weights for epoch %d" % (epoch))self._training_step(model, train_w_loader, self.w_optimizer, epoch, info_for_logger="_w_step_")self.w_scheduler.step()self.logger.info("Start to train theta for epoch %d" % (epoch))self._training_step(model, train_thetas_loader, self.theta_optimizer, epoch, info_for_logger="_theta_step_")top1_avg = self._validate(model, test_loader, epoch)if best_top1 < top1_avg:best_top1 = top1_avgself.logger.info("Best top1 acc by now. Save model")save(model, self.path_to_save_model)self.temperature = self.temperature * self.exp_anneal_rate_training_step
需要顯式構造latency_to_accumulate變量,且元素與設備數量相同。
_intermediate_stats_logging 記錄損失、top1、top3、交叉熵以及延遲。
_epoch_stats_logging 記錄 epoch 狀態信息到 tensorboard。
_validate
驗證準確率。
model.eval()start_time = time.time()with torch.no_grad():for step, (X, y) in enumerate(loader):X, y = X.cuda(), y.cuda()N = X.shape[0]latency_to_accumulate = torch.Tensor([[0.0]]).cuda()outs, latency_to_accumulate = model(X, self.temperature, latency_to_accumulate)loss = self.criterion(outs, y, latency_to_accumulate, self.losses_ce, self.losses_lat, N)self._intermediate_stats_logging(outs, y, loss, step, epoch, N, len_loader=len(loader), val_or_train="Valid")top1_avg = self.top1.get_avg()self._epoch_stats_logging(start_time=start_time, epoch=epoch, val_or_train='val')for avg in [self.top1, self.top3, self.losses]:avg.reset()return top1_avg_intermediate_stats_logging
accuracy 計算準確率。
prec1, prec3 = accuracy(outs, y, topk=(1, 5))self.losses.update(loss.item(), N)self.top1.update(prec1.item(), N)self.top3.update(prec3.item(), N)如果迭代數滿足打印間隔或者是最后一次則記錄信息。
if (step > 1 and step % self.print_freq == 0) or step == len_loader - 1:self.logger.info(val_or_train+": [{:3d}/{}] Step {:03d}/{:03d} Loss {:.3f} ""Prec@(1,3) ({:.1%}, {:.1%}), ce_loss {:.3f}, lat_loss {:.3f}".format(epoch + 1, self.cnt_epochs, step, len_loader - 1, self.losses.get_avg(),self.top1.get_avg(), self.top3.get_avg(), self.losses_ce.get_avg(), self.losses_lat.get_avg()))_epoch_stats_logging
記錄 epoch 狀態信息到 tensorboard。
self.writer.add_scalar('train_vs_val/'+val_or_train+'_loss'+info_for_logger, self.losses.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_top1'+info_for_logger, self.top1.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_top3'+info_for_logger, self.top3.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_losses_lat'+info_for_logger, self.losses_lat.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_losses_ce'+info_for_logger, self.losses_ce.get_avg(), epoch)top1_avg = self.top1.get_avg()self.logger.info(info_for_logger+val_or_train + ": [{:3d}/{}] Final Prec@1 {:.4%} Time {:.2f}".format(epoch+1, self.cnt_epochs, top1_avg, time.time() - start_time))accuracy
torch.topk 返回給定輸入張量沿給定維度的k個最大元素。如果未給定dim,則選擇輸入的最后一個維度。如果largest為False,則返回k個最小元素。返回(values, indices)的命名元組,其中索引是原始輸入張量中元素的索引。如果布爾值選項sorted為True,則將確保返回的k個元素本身已排序。
""" Computes the precision@k for the specified values of k """maxk = max(topk)batch_size = target.size(0)_, pred = output.topk(maxk, 1, True, True)pred = pred.t()# one-hot caseif target.ndimension() > 1:target = target.max(1)[1]correct = pred.eq(target.view(1, -1).expand_as(pred))res = []for k in topk:correct_k = correct[:k].view(-1).float().sum(0)res.append(correct_k.mul_(1.0 / batch_size))return resPRIMITIVES
"skip": lambda C_in, C_out, expansion, stride, **kwargs: Identity(C_in, C_out, stride),"ir_k3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, **kwargs),"ir_k5": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=5, **kwargs),"ir_k7": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=7, **kwargs),"ir_k1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=1, **kwargs),"shuffle": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, shuffle_type="mid", pw_group=4, **kwargs),"basic_block": lambda C_in, C_out, expansion, stride, **kwargs: CascadeConv3x3(C_in, C_out, stride),"shift_5x5": lambda C_in, C_out, expansion, stride, **kwargs: ShiftBlock5x5(C_in, C_out, expansion, stride),# layer search 2"ir_k3_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, **kwargs),"ir_k3_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, **kwargs),"ir_k3_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, **kwargs),"ir_k3_s4": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 4, stride, kernel=3, shuffle_type="mid", pw_group=4, **kwargs),"ir_k5_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, **kwargs),"ir_k5_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=5, **kwargs),"ir_k5_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=5, **kwargs),"ir_k5_s4": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 4, stride, kernel=5, shuffle_type="mid", pw_group=4, **kwargs),# layer search se"ir_k3_e1_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, se=True, **kwargs),"ir_k3_e3_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, se=True, **kwargs),"ir_k3_e6_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, se=True, **kwargs),"ir_k3_s4_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,4,stride,kernel=3,shuffle_type="mid",pw_group=4,se=True,**kwargs),"ir_k5_e1_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, se=True, **kwargs),"ir_k5_e3_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=5, se=True, **kwargs),"ir_k5_e6_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=5, se=True, **kwargs),"ir_k5_s4_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,4,stride,kernel=5,shuffle_type="mid",pw_group=4,se=True,**kwargs),# layer search 3 (in addition to layer search 2)"ir_k3_s2": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, shuffle_type="mid", pw_group=2, **kwargs),"ir_k5_s2": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, shuffle_type="mid", pw_group=2, **kwargs),"ir_k3_s2_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,1,stride,kernel=3,shuffle_type="mid",pw_group=2,se=True,**kwargs),"ir_k5_s2_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,1,stride,kernel=5,shuffle_type="mid",pw_group=2,se=True,**kwargs),# layer search 4 (in addition to layer search 3)"ir_k3_sep": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, cdw=True, **kwargs),# layer search 5 (in addition to layer search 4)"ir_k7_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=7, **kwargs),"ir_k7_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=7, **kwargs),"ir_k7_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=7, **kwargs),"ir_k7_sep": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=7, cdw=True, **kwargs), }ConvBNRelu
ConvBNRelu 模塊選項很多,可設置不使用 BN,或者使用 FrozenBatchNorm2d。相對來說,nn.BatchNorm2d(C_out, affine=affine)更常見一些。
def __init__(self,input_depth,output_depth,kernel,stride,pad,no_bias,use_relu,bn_type,group=1,*args,**kwargs):super(ConvBNRelu, self).__init__()assert use_relu in ["relu", None]if isinstance(bn_type, (list, tuple)):assert len(bn_type) == 2assert bn_type[0] == "gn"gn_group = bn_type[1]bn_type = bn_type[0]assert bn_type in ["bn", "af", "gn", None]assert stride in [1, 2, 4]op = Conv2d(input_depth,output_depth,kernel_size=kernel,stride=stride,padding=pad,bias=not no_bias,groups=group,*args,**kwargs)nn.init.kaiming_normal_(op.weight, mode="fan_out", nonlinearity="relu")if op.bias is not None:nn.init.constant_(op.bias, 0.0)self.add_module("conv", op)if bn_type == "bn":bn_op = BatchNorm2d(output_depth)elif bn_type == "gn":bn_op = nn.GroupNorm(num_groups=gn_group, num_channels=output_depth)elif bn_type == "af":bn_op = FrozenBatchNorm2d(output_depth)if bn_type is not None:self.add_module("bn", bn_op)if use_relu == "relu":self.add_module("relu", nn.ReLU(inplace=True))sample_architecture_from_the_supernet
Created with Rapha?l 2.2.0sample_architecture_from_the_supernetverificationget_loggerLookUpTableFBNet_Stochastic_SuperNetloadhardsampling?writh_new_ARCH_to_fbnet_modeldefEndsoftmaxyesno加載模型。由于 save 保存的是 torch.nn.DataParallel 類型的模型,所以
load 的輸入也需保持一致。其屬性module為原模型。
numpy.linspace 在指定的間隔內返回均勻間隔的數字。
scipy.special.softmax
如果是hardsampling,每個 TBS 直接取 θ\thetaθ 最大的操作符;否則 計算概率:
Pθl(bl=bl,i)=softmax(θl,i;θl)=exp?(θl,i)∑iexp?(θl,i).Pθ(a)=∏lPθl(bl=bl,i(a)),\begin{aligned} P_{\mathrm{\theta}_{l}}(b_l = b_{l,i}) = \text{softmax}(\theta_{l,i}; \mathrm{\theta}_l) = \frac{\exp(\theta_{l,i})}{\sum_i \exp(\theta_{l,i})}. \end{aligned} \begin{aligned} P_{\mathrm{\theta}}(a) = \prod_l P_{\mathrm{\theta}_l} (b_l = b_{l,i}^{(a)}), \end{aligned} Pθl??(bl?=bl,i?)=softmax(θl,i?;θl?)=∑i?exp(θl,i?)exp(θl,i?)?.?Pθ?(a)=l∏?Pθl??(bl?=bl,i(a)?),?
writh_new_ARCH_to_fbnet_modeldef
load
model.load_state_dict(torch.load(model_path))writh_new_ARCH_to_fbnet_modeldef
MODEL_ARCH 用于保存模型結構。
檢查名字是否已存在。
將ops_names轉為字符串列表ops,進一步按 stage 分組拼接為ops_lines
### create text to inserttext_to_write = " \"" + my_unique_name_for_ARCH + "\": {\n\\"block_op_type\": [\n"ops = ["[\"" + str(op) + "\"], " for op in ops_names]ops_lines = [ops[0], ops[1:5], ops[5:9], ops[9:13], ops[13:17], ops[17:21], ops[21]]ops_lines = [''.join(line) for line in ops_lines]text_to_write += ' ' + '\n '.join(ops_lines)記錄每次的維度信息。e即 expantion_ratio。
e = [(op_name[-1] if op_name[-2] == 'e' else '1') for op_name in ops_names]text_to_write += "\n\],\n\\"block_cfg\": {\n\\"first\": [16, 2],\n\\"stages\": [\n\[["+e[0]+", 16, 1, 1]], # stage 1\n\[["+e[1]+", 24, 1, 2]], [["+e[2]+", 24, 1, 1]], \[["+e[3]+", 24, 1, 1]], [["+e[4]+", 24, 1, 1]], # stage 2\n\[["+e[5]+", 32, 1, 2]], [["+e[6]+", 32, 1, 1]], \[["+e[7]+", 32, 1, 1]], [["+e[8]+", 32, 1, 1]], # stage 3\n\[["+e[9]+", 64, 1, 2]], [["+e[10]+", 64, 1, 1]], \[["+e[11]+", 64, 1, 1]], [["+e[12]+", 64, 1, 1]], # stage 4\n\[["+e[13]+", 112, 1, 1]], [["+e[14]+", 112, 1, 1]], \[["+e[15]+", 112, 1, 1]], [["+e[16]+", 112, 1, 1]], # stage 5\n\[["+e[17]+", 184, 1, 2]], [["+e[18]+", 184, 1, 1]], \[["+e[19]+", 184, 1, 1]], [["+e[20]+", 184, 1, 1]], # stage 6\n\[["+e[21]+", 352, 1, 1]], # stage 7\n\],\n\\"backbone\": [num for num in range(23)],\n\},\n\},\n\ }\ "讀取./fbnet_building_blocks/fbnet_modeldef.py,追加后寫入。
需要跳過末尾的右括號。
next 通過調用 _next_() 方法從迭代器中檢索下一個項。如果給定default,則在迭代器耗盡時返回,否則引發 StopIteration。
參考資料:
- Lambda Lambda Lambda
- Print lists in Python (4 Different Ways)
- Optional: Data Parallelism
總結
以上是生活随笔為你收集整理的AnnaAraslanova/FBNet 程序分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: osg qt 三维模型加载
- 下一篇: Python编写超级玛丽竟然如此简单?不