當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习优化算法实现(Momentum, Adam)

發(fā)布時(shí)間：2025/3/15 pytorch 33 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习优化算法实现(Momentum, Adam) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

除了常見的梯度下降法外，還有幾種比較通用的優(yōu)化算法；表現(xiàn)都優(yōu)于梯度下降法。本文只記錄完成吳恩達(dá)深度學(xué)習(xí)作業(yè)時(shí)遇到的Momentum和Adam算法，而且只有簡要的代碼。具體原理請看深度學(xué)習(xí)優(yōu)化算法解析(Momentum, RMSProp, Adam),比較具體的說明了吳恩達(dá)版本的三種優(yōu)化算法的講述！

Momentum

初始化

def initialize_velocity(parameters):"""Initializes the velocity as a python dictionary with:- keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the correspondinggradients/parameters.Arguments:parameters -- python dictionary containing your parameters.parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blReturns:v -- python dictionary containing the current velocity.v['dW' + str(l)] = velocity of dWlv['db' + str(l)] = velocity of dbl"""L = len(parameters) // 2 # number of layers in the neural networksv = {}# Initialize velocityfor l in range(L):### START CODE HERE ### (approx. 2 lines)v['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))v['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))### END CODE HERE ###return v

更新參數(shù)

def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):"""Update parameters using MomentumArguments:parameters -- python dictionary containing your parameters:parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blgrads -- python dictionary containing your gradients foreach parameters:grads['dW' + str(l)] = dWlgrads['db' + str(l)] = dblv -- python dictionary containing the current velocity:v['dW' + str(l)] = ...v['db' + str(l)] = ...beta -- the momentum hyperparameter, scalarlearning_rate -- the learning rate, scalarReturns:parameters -- python dictionary containing your updated parameters v -- python dictionary containing your updated velocities"""L = len(parameters) // 2 # number of layers in the neural networks# Momentum update for each parameterfor l in range(L):### START CODE HERE ### (approx. 4 lines)# compute velocitiesv['dW' + str(l + 1)] = beta * v['dW' + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]v['db' + str(l + 1)] = beta * v['db' + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]# update parametersparameters['W' + str(l + 1)] += -learning_rate * v['dW' + str(l + 1)]parameters['b' + str(l + 1)] += -learning_rate * v['db' + str(l + 1)]### END CODE HERE ###return parameters, v

Adam

初始化

def initialize_adam(parameters) :"""Initializes v and s as two python dictionaries with:- keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.Arguments:parameters -- python dictionary containing your parameters.parameters["W" + str(l)] = Wlparameters["b" + str(l)] = blReturns: v -- python dictionary that will contain the exponentially weighted average of the gradient.v["dW" + str(l)] = ...v["db" + str(l)] = ...s -- python dictionary that will contain the exponentially weighted average of the squared gradient.s["dW" + str(l)] = ...s["db" + str(l)] = ..."""L = len(parameters) // 2 # number of layers in the neural networksv = {}s = {}# Initialize v, s. Input: "parameters". Outputs: "v, s".for l in range(L):### START CODE HERE ### (approx. 4 lines)v['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))v['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))s['dW' + str(l + 1)] = np.zeros(np.shape(parameters['W' + str(l + 1)]))s['db' + str(l + 1)] = np.zeros(np.shape(parameters['b' + str(l + 1)]))### END CODE HERE ###return v, s

更新參數(shù)

def update_parameters_with_adam(parameters, grads, v, s, t,learning_rate = 0.01,beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8):"""Update parameters using AdamArguments:parameters -- python dictionary containing your parameters:parameters['W' + str(l)] = Wlparameters['b' + str(l)] = blgrads -- python dictionary containing your gradientsfor each parameters:grads['dW' + str(l)] = dWlgrads['db' + str(l)] = dblv -- Adam variable, moving average of the first gradient,python dictionarys -- Adam variable, moving average of the squared gradient,python dictionarylearning_rate -- the learning rate, scalar.beta1 -- Exponential decay hyperparameter forthe first moment estimates beta2 -- Exponential decay hyperparameter forthe second moment estimates epsilon -- hyperparameter preventing divisionby zero in Adam updatesReturns:parameters -- python dictionary containing your updated parameters v -- Adam variable, moving average of the first gradient, python dictionarys -- Adam variable, moving average of the squared gradient, python dictionary"""L = len(parameters) // 2 # number of layers in the neural networks# Initializing first moment estimate,python dictionaryv_corrected = {} # Initializing second moment estimate,python dictionarys_corrected = {} # Perform Adam update on all parametersfor l in range(L):### START CODE HERE ### (approx. 2 lines)v['dW' + str(l + 1)] = beta1 * v['dW' + str(l + 1)] + (1 - beta1) * grads['dW' + str(l + 1)]v['db' + str(l + 1)] = beta1 * v['db' + str(l + 1)] + (1 - beta1) * grads['db' + str(l + 1)]### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)v_corrected['dW' + str(l + 1)] = v['dW' + str(l + 1)] / (1 - beta1 ** t)v_corrected['db' + str(l + 1)] = v['db' + str(l + 1)] / (1 - beta1 ** t)### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)s['dW' + str(l + 1)] = beta2 * s['dW' + str(l + 1)] + (1 - beta2) * np.square(grads['dW' + str(l + 1)])s['db' + str(l + 1)] = beta2 * s['db' + str(l + 1)] + (1 - beta2) * np.square(grads['db' + str(l + 1)]) ### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)s_corrected['dW' + str(l + 1)] = s['dW' + str(l + 1)] / (1 - beta2 ** t)s_corrected['db' + str(l + 1)] = s['db' + str(l + 1)] / (1 - beta2 ** t)### END CODE HERE ###### START CODE HERE ### (approx. 2 lines)parameters['W' + str(l + 1)] += -learning_rate * v_corrected['dW' + str(l + 1)] / (np.sqrt(s['dW' + str(l + 1)]) + epsilon)parameters['b' + str(l + 1)] += -learning_rate * v_corrected['db' + str(l + 1)] / (np.sqrt(s['db' + str(l + 1)]) + epsilon)### END CODE HERE ###return parameters, v, s

總結(jié)

以上是生活随笔為你收集整理的深度学习优化算法实现(Momentum, Adam)的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： activiti5第六弹手动任务、接
下一篇：关于浏览器前进键和后退键样样式表冲突的问

pytorch

深度学习优化算法实现(Momentum, Adam)

目錄

Momentum

初始化

更新參數(shù)

Adam

初始化

更新參數(shù)

總結(jié)