pytorch中的参数初始化方法
參數(shù)初始化(Weight Initialization)
PyTorch 中參數(shù)的默認(rèn)初始化在各個層的 reset_parameters() 方法中。例如:nn.Linear 和 nn.Conv2D,都是在 [-limit, limit] 之間的均勻分布(Uniform distribution),其中 limit 是 1. / sqrt(fan_in) ,fan_in 是指參數(shù)張量(tensor)的輸入單元的數(shù)量
下面是幾種常見的初始化方式。
Xavier Initialization
Xavier初始化的基本思想是保持輸入和輸出的方差一致,這樣就避免了所有輸出值都趨向于0。這是通用的方法,適用于任何激活函數(shù)。
# 默認(rèn)方法 for m in model.modules():if isinstance(m, (nn.Conv2d, nn.Linear)):nn.init.xavier_uniform_(m.weight)也可以使用?gain?參數(shù)來自定義初始化的標(biāo)準(zhǔn)差來匹配特定的激活函數(shù):
for m in model.modules():if isinstance(m, (nn.Conv2d, nn.Linear)):nn.init.xavier_uniform_(m.weight(), gain=nn.init.calculate_gain('relu'))參考資料:
- Understanding the difficulty of training deep feedforward neural networks
He et. al Initialization
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')He initialization的思想是:在ReLU網(wǎng)絡(luò)中,假定每一層有一半的神經(jīng)元被激活,另一半為0。推薦在ReLU網(wǎng)絡(luò)中使用。
# he initialization for m in model.modules():if isinstance(m, (nn.Conv2d, nn.Linear)):nn.init.kaiming_normal_(m.weight, mode='fan_in')正交初始化(Orthogonal Initialization)
主要用以解決深度網(wǎng)絡(luò)下的梯度消失、梯度爆炸問題,在RNN中經(jīng)常使用的參數(shù)初始化方法。
for m in model.modules():if isinstance(m, (nn.Conv2d, nn.Linear)):nn.init.orthogonal(m.weight)Batchnorm Initialization
在非線性激活函數(shù)之前,我們想讓輸出值有比較好的分布(例如高斯分布),以便于計算梯度和更新參數(shù)。Batch Normalization 將輸出值強(qiáng)行做一次 Gaussian Normalization 和線性變換:
實(shí)現(xiàn)方法:
for m in model:if isinstance(m, nn.BatchNorm2d):nn.init.constant(m.weight, 1)nn.init.constant(m.bias, 0)單層初始化
conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3) nn.init.xavier_uniform(conv1.weight) nn.init.constant(conv1.bias, 0.1)模型初始化
def weights_init(m):classname = m.__class__.__name__if classname.find('Conv2d') != -1:nn.init.xavier_normal_(m.weight.data)nn.init.constant_(m.bias.data, 0.0)elif classname.find('Linear') != -1:nn.init.xavier_normal_(m.weight)nn.init.constant_(m.bias, 0.0) net = Net() net.apply(weights_init) #apply函數(shù)會遞歸地搜索網(wǎng)絡(luò)內(nèi)的所有module并把參數(shù)表示的函數(shù)應(yīng)用到所有的module上。不建議訪問以下劃線為前綴的成員,他們是內(nèi)部的,如果有改變不會通知用戶。更推薦的一種方法是檢查某個module是否是某種類型:
def weights_init(m):if isinstance(m, (nn.Conv2d, nn.Linear)):nn.init.xavier_normal_(m.weight)nn.init.constant_(m.bias, 0.0)各種初始化方法:
import torch import torch.nn as nnw = torch.empty(2, 3)# 1. 均勻分布 - u(a,b) # torch.nn.init.uniform_(tensor, a=0, b=1) nn.init.uniform_(w) # tensor([[ 0.0578, 0.3402, 0.5034], # [ 0.7865, 0.7280, 0.6269]])# 2. 正態(tài)分布 - N(mean, std) # torch.nn.init.normal_(tensor, mean=0, std=1) nn.init.normal_(w) # tensor([[ 0.3326, 0.0171, -0.6745], # [ 0.1669, 0.1747, 0.0472]])# 3. 常數(shù) - 固定值 val # torch.nn.init.constant_(tensor, val) nn.init.constant_(w, 0.3) # tensor([[ 0.3000, 0.3000, 0.3000], # [ 0.3000, 0.3000, 0.3000]])# 4. 對角線為 1,其它為 0 # torch.nn.init.eye_(tensor) nn.init.eye_(w) # tensor([[ 1., 0., 0.], # [ 0., 1., 0.]])# 5. Dirac delta 函數(shù)初始化,僅適用于 {3, 4, 5}-維的 torch.Tensor # torch.nn.init.dirac_(tensor) w1 = torch.empty(3, 16, 5, 5) nn.init.dirac_(w1)# 6. xavier_uniform 初始化 # torch.nn.init.xavier_uniform_(tensor, gain=1) # From - Understanding the difficulty of training deep feedforward neural networks - Bengio 2010 nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu')) # tensor([[ 1.3374, 0.7932, -0.0891], # [-1.3363, -0.0206, -0.9346]])# 7. xavier_normal 初始化 # torch.nn.init.xavier_normal_(tensor, gain=1) nn.init.xavier_normal_(w) # tensor([[-0.1777, 0.6740, 0.1139], # [ 0.3018, -0.2443, 0.6824]])# 8. kaiming_uniform 初始化 # From - Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - HeKaiming 2015 # torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu') # tensor([[ 0.6426, -0.9582, -1.1783], # [-0.0515, -0.4975, 1.3237]])# 9. kaiming_normal 初始化 # torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu') nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu') # tensor([[ 0.2530, -0.4382, 1.5995], # [ 0.0544, 1.6392, -2.0752]])# 10. 正交矩陣 - (semi)orthogonal matrix # From - Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe 2013 # torch.nn.init.orthogonal_(tensor, gain=1) nn.init.orthogonal_(w) # tensor([[ 0.5786, -0.5642, -0.5890], # [-0.7517, -0.0886, -0.6536]])# 11. 稀疏矩陣 - sparse matrix # 非零元素采用正態(tài)分布 N(0, 0.01) 初始化. # From - Deep learning via Hessian-free optimization - Martens 2010 # torch.nn.init.sparse_(tensor, sparsity, std=0.01) nn.init.sparse_(w, sparsity=0.1) # tensor(1.00000e-03 * # [[-0.3382, 1.9501, -1.7761], # [ 0.0000, 0.0000, 0.0000]])Xavier均勻分布
torch.nn.init.xavier_uniform_(tensor, gain=1) xavier初始化方法中服從均勻分布U(?a,a) ,分布的參數(shù)a = gain * sqrt(6/fan_in+fan_out), 這里有一個gain,增益的大小是依據(jù)激活函數(shù)類型來設(shè)定 eg:nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain(‘relu’)) PS:上述初始化方法,也稱為Glorot initialization""" torch.nn.init.xavier_uniform_(tensor, gain=1) 根據(jù)Glorot, X.和Bengio, Y.在“Understanding the dif×culty of training deep feedforward neural networks”中描述的方法,用一個均勻分布生成值,填充輸入的張量或變量。結(jié)果張量中的值 采樣自U(-a, a),其中a= gain * sqrt( 2/(fan_in + fan_out))* sqrt(3). 該方法也被稱為Glorot initialisat參數(shù): tensor – n維的torch.Tensor gain - 可選的縮放因子 """ import torch from torch import nn w=torch.Tensor(3,5) nn.init.xavier_uniform_(w,gain=1) print(w)Xavier正態(tài)分布
torch.nn.init.xavier_normal_(tensor, gain=1) xavier初始化方法中服從正態(tài)分布, mean=0,std = gain * sqrt(2/fan_in + fan_out)kaiming初始化方法,論文在《 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification》,公式推導(dǎo)同樣從“方差一致性”出法,kaiming是針對xavier初始化方法在relu這一類激活函數(shù)表現(xiàn)不佳而提出的改進(jìn),詳細(xì)可以參看論文。""" 根據(jù)Glorot, X.和Bengio, Y. 于2010年在“Understanding the dif×culty of training deep feedforward neural networks”中描述的方法,用一個正態(tài)分布生成值,填充輸入的張量或變 量。結(jié)果張量中的值采樣自均值為0,標(biāo)準(zhǔn)差為gain * sqrt(2/(fan_in + fan_out))的正態(tài)分布。 也被稱為Glorot initialisation. 參數(shù): tensor – n維的torch.Tensor gain - 可選的縮放因子 """b=torch.Tensor(3,4) nn.init.xavier_normal_(b, gain=1) print(b)kaiming均勻分布
torch.nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’) 此為均勻分布,U~(-bound, bound), bound = sqrt(6/(1+a^2)*fan_in) 其中,a為激活函數(shù)的負(fù)半軸的斜率,relu是0 mode- 可選為fan_in 或 fan_out, fan_in使正向傳播時,方差一致; fan_out使反向傳播時,方差一致 nonlinearity- 可選 relu 和 leaky_relu ,默認(rèn)值為 。 leaky_relu nn.init.kaiming_uniform_(w, mode=‘fan_in’, nonlinearity=‘relu’)w=torch.Tensor(3,5) nn.init.kaiming_uniform_(w,a=0,mode='fan_in') print(w)kaiming正態(tài)分布
torch.nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’) 此為0均值的正態(tài)分布,N~ (0,std),其中std = sqrt(2/(1+a^2)*fan_in) 其中,a為激活函數(shù)的負(fù)半軸的斜率,relu是0 mode- 可選為fan_in 或 fan_out, fan_in使正向傳播時,方差一致;fan_out使反向傳播時,方差一致 nonlinearity- 可選 relu 和 leaky_relu ,默認(rèn)值為 。 leaky_relu nn.init.kaiming_normal_(w, mode=‘fan_out’, nonlinearity=‘relu’)2.其他
均勻分布初始化
?torch.nn.init.uniform_(tensor, a=0, b=1)
使值服從均勻分布U(a,b)
tensor - n維的torch.Tensor
a - 均勻分布的下界
b - 均勻分布的上界
?
正態(tài)分布初始化
torch.nn.init.normal_(tensor, mean=0, std=1)
使值服從正態(tài)分布N(mean, std),默認(rèn)值為0,1
tensor – n維的torch.Tensor
mean – 正態(tài)分布的均值
std – 正態(tài)分布的標(biāo)準(zhǔn)差
?
常數(shù)初始化
torch.nn.init.constant_(tensor, val)
使值為常數(shù)val nn.init.constant_(w, 0.3)
單位矩陣初始化
torch.nn.init.eye_(tensor)
將二維tensor初始化為單位矩陣(the identity matrix)
正交初始化
torch.nn.init.orthogonal_(tensor, gain=1)
使得tensor是正交的,論文:Exact solutions to the nonlinear dynamics of learning in deep linear neural networks” - Saxe, A. et al. (2013)
?
稀疏初始化
torch.nn.init.sparse_(tensor, sparsity, std=0.01)
從正態(tài)分布N~(0. std)中進(jìn)行稀疏化,使每一個column有一部分為0
sparsity- 每一個column稀疏的比例,即為0的比例_
sparsity - 每列中需要被設(shè)置成零的元素比例
std - 用于生成非零值的正態(tài)分布的標(biāo)準(zhǔn)差
nn.init.sparse_(w, sparsity=0.1)
dirac
""" torch.nn.init.dirac(tensor) 用Dirac 函數(shù)來填充{3, 4, 5}維輸入張量或變量。在卷積層盡可能多的保存輸入通道特性 參數(shù): tensor – {3, 4, 5}維的torch.Tensor或autograd.Variable """ w=torch.Tensor(3,16,5,5) nn.init.dirac_(w) print(w)w.sum() tensor(3.)計算增益calculate_gain
torch.nn.init.calculate_gain(nonlinearity, param=None)
torch.nn.init.calculate_gain(nonlinearity,param=None) 對于給定的非線性函數(shù),返回推薦的增益值. 參數(shù): nonlinearity - 非線性函數(shù)( nn.functional 名稱) param - 非線性函數(shù)的可選參數(shù)from torch import nn import torch gain = nn.init.calculate_gain('leaky_relu') print(gain)1.4141428569978354| Linear / Identity | 1 |
| Conv{1,2,3}D | 1 |
| Sigmoid | 1 |
| Tanh | 5/3 |
| ReLU | sqrt(2) |
總結(jié)
以上是生活随笔為你收集整理的pytorch中的参数初始化方法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: latex填充段落之间的留白
- 下一篇: 苹果开发者_苹果优秀开发者实锤 微软VS