當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

神经网络optimizer的发展历史整理

發布時間：2023/12/20 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了神经网络optimizer的发展历史整理小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

optimizer名字論文出處作者發表時間

Gradient Descent	Méthode générale pour la résolution des systèmes d’équations simultanées(法文)	Cauchy, Augustin	1847
SGD早期形式	《A Stochastic Approximation Method》 The Annals of Mathematical Statistics, Vol. 22, No. 3. (Sep., 1951), pp. 400-407.	Herbert Robbins and Sutton Monro	1951
SGD早期形式	《Stochastic Estimation of the Maximum of a Regression Function》 Ann. Math. Statist. Volume 23, Number 3 (1952), 462-466	J. Kiefer and J. Wolfowitz	1952
Momentum	Some Methods of Speeding up the Convergence of Iteration Methods.	Polyak, B.T.	1964
Nesterov’s Accelerated Gradient	A method of solving a convex programming problem with convergence rate O( $1k2\frac{1}{k^2}$ )	YU.E.NESTEROV	1983
RmsProp	作者上課時提出	Geoffrey Hinton	－
AdaGrad	Adaptive Subgradient Methods for Online Learning and Stochastic Optimization	John Duchi	2011
AdaDelta	AdaDelta:An adaptive learning rate method	Matthew D. Zeiler	2012
Adam	＜Adam: A Method for Stochastic Optimization＞的section1	Diederik P. Kingma、Jimmy Lei Ba	2015
AdaMax	＜Adam: A Method for Stochastic Optimization＞的section7	Diederik P. Kingma、Jimmy Lei B	2015
Nadam	Incorporating Nesterov Momentum into Adam	Timothy Dozat	2015
SGDW	Decoupled Weight Decay Regularization	Ilya Loshchilov, Frank Hutter	2017
Adabound	ADAPTIVE GRADIENT METHODS WITH DYNAMIC BOUND OF LEARNING RATE	Liangchen Luo	2019
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	Liyuan Liu	2019

ADAPTIVE GRADIENT METHODS WITH DYNAMIC BOUND OF LEARNING RATE

Mini-BGD和BGD是GD的弱化版本，意思是反向傳播的時候隨機選擇一個batch或者mini-batch來計算梯度

注意，參考文獻中有些提到的是對于腦科學相關的一些文章，雖然和SGD相關，但是我沒有列入．

參考文獻:
[1]https://stats.stackexchange.com/questions/313681/who-invented-stochastic-gradient-descent

以上是生活随笔為你收集整理的神经网络optimizer的发展历史整理的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。