神经网络optimizer的发展历史整理
生活随笔
收集整理的這篇文章主要介紹了
神经网络optimizer的发展历史整理
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
| Gradient Descent | Méthode générale pour la résolution des systèmes d’équations simultanées(法文) | Cauchy, Augustin | 1847 |
| SGD早期形式 | 《A Stochastic Approximation Method》 The Annals of Mathematical Statistics, Vol. 22, No. 3. (Sep., 1951), pp. 400-407. | Herbert Robbins and Sutton Monro | 1951 |
| SGD早期形式 | 《Stochastic Estimation of the Maximum of a Regression Function》 Ann. Math. Statist. Volume 23, Number 3 (1952), 462-466 | J. Kiefer and J. Wolfowitz | 1952 |
| Momentum | Some Methods of Speeding up the Convergence of Iteration Methods. | Polyak, B.T. | 1964 |
| Nesterov’s Accelerated Gradient | A method of solving a convex programming problem with convergence rate O(1k2\frac{1}{k^2}k21?) | YU.E.NESTEROV | 1983 |
| RmsProp | 作者上課時提出 | Geoffrey Hinton | - |
| AdaGrad | Adaptive Subgradient Methods for Online Learning and Stochastic Optimization | John Duchi | 2011 |
| AdaDelta | AdaDelta:An adaptive learning rate method | Matthew D. Zeiler | 2012 |
| Adam | <Adam: A Method for Stochastic Optimization>的section1 | Diederik P. Kingma、Jimmy Lei Ba | 2015 |
| AdaMax | <Adam: A Method for Stochastic Optimization>的section7 | Diederik P. Kingma、Jimmy Lei B | 2015 |
| Nadam | Incorporating Nesterov Momentum into Adam | Timothy Dozat | 2015 |
| SGDW | Decoupled Weight Decay Regularization | Ilya Loshchilov, Frank Hutter | 2017 |
| Adabound | ADAPTIVE GRADIENT METHODS WITH DYNAMIC BOUND OF LEARNING RATE | Liangchen Luo | 2019 |
| RAdam | On the Variance of the Adaptive Learning Rate and Beyond | Liyuan Liu | 2019 |
ADAPTIVE GRADIENT METHODS WITH DYNAMIC BOUND OF LEARNING RATE
Mini-BGD和BGD是GD的弱化版本,意思是反向傳播的時候隨機選擇一個batch或者mini-batch來計算梯度
注意,參考文獻中有些提到的是對于腦科學相關的一些文章,雖然和SGD相關,但是我沒有列入.
參考文獻:
[1]https://stats.stackexchange.com/questions/313681/who-invented-stochastic-gradient-descent
總結
以上是生活随笔為你收集整理的神经网络optimizer的发展历史整理的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 国内人工玻璃体和日本人工玻璃体的区别
- 下一篇: ubuntu18.10安装octave