TensorFlow学习笔记(十三)TensorFLow 常用Optimizer 总结
因為深度學習常見的是對于梯度的優化,也就是說,優化器最后其實就是各種對于梯度下降算法的優化。
常用的optimizer類
Ⅰ.class tf.train.Optimizer
優化器(optimizers)類的基類。這個類定義了在訓練模型的時候添加一個操作的API。你基本上不會直接使用這個類,但是你會用到他的子類比如GradientDescentOptimizer, AdagradOptimizer, MomentumOptimizer.等等這些。
后面講的時候會詳細講一下GradientDescentOptimizer 這個類的一些函數,然后其他的類只會講構造函數,因為類中剩下的函數都是大同小異的
Ⅱ.class tf.train.GradientDescentOptimizer
這個類是實現梯度下降算法的優化器。(結合理論可以看到,這個構造函數需要的一個學習率就行了)
__init__(learning_rate, use_locking=False,name=’GradientDescent’)
作用:創建一個梯度下降優化器對象
參數:
learning_rate: A Tensor or a floating point value. 要使用的學習率
use_locking: 要是True的話,就對于更新操作(update operations.)使用鎖
name: 名字,可選,默認是”GradientDescent”.
compute_gradients(loss,var_list=None,gate_gradients=GATE_OP,aggregation_method=None,colocate_gradients_with_ops=False,grad_loss=None)
作用:對于在變量列表(var_list)中的變量計算對于損失函數的梯度,這個函數返回一個(梯度,變量)對的列表,其中梯度就是相對應變量的梯度了。這是minimize()函數的第一個部分,
參數:
loss: 待減小的值
var_list: 默認是在GraphKey.TRAINABLE_VARIABLES.
gate_gradients: How to gate the computation of gradients. Can be GATE_NONE, GATE_OP, or GATE_GRAPH.
aggregation_method: Specifies the method used to combine gradient terms. Valid values are defined in the class AggregationMethod.
colocate_gradients_with_ops: If True, try colocating gradients with the corresponding op.
grad_loss: Optional. A Tensor holding the gradient computed for loss.
apply_gradients(grads_and_vars,global_step=None,name=None)
作用:把梯度“應用”(Apply)到變量上面去。其實就是按照梯度下降的方式加到上面去。這是minimize()函數的第二個步驟。 返回一個應用的操作。
參數:
grads_and_vars: compute_gradients()函數返回的(gradient, variable)對的列表
global_step: Optional Variable to increment by one after the variables have been updated.
name: 可選,名字
get_name()
minimize(loss,global_step=None,var_list=None,gate_gradients=GATE_OP,aggregation_method=None,colocate_gradients_with_ops=False,name=None,grad_loss=None)
作用:非常常用的一個函數
通過更新var_list來減小loss,這個函數就是前面compute_gradients() 和apply_gradients().的結合
Ⅲ.class tf.train.AdadeltaOptimizer
實現了 Adadelta算法的優化器,可以算是下面的Adagrad算法改進版本
構造函數:
tf.train.AdadeltaOptimizer.init(learning_rate=0.001, rho=0.95, epsilon=1e-08, use_locking=False, name=’Adadelta’)
作用:構造一個使用Adadelta算法的優化器
參數:
learning_rate: tensor或者浮點數,學習率
rho: tensor或者浮點數. The decay rate.
epsilon: A Tensor or a floating point value. A constant epsilon used to better conditioning the grad update.
use_locking: If True use locks for update operations.
name: 【可選】這個操作的名字,默認是”Adadelta”
IV.class tf.train.AdagradOptimizer
Optimizer that implements the Adagrad algorithm.
See this paper.
tf.train.AdagradOptimizer.__init__(learning_rate, initial_accumulator_value=0.1, use_locking=False, name=’Adagrad’)
Construct a new Adagrad optimizer.
Args:
Raises:
ValueError: If the initial_accumulator_value is invalid.The Optimizer base class provides methods to compute gradients for a loss and apply gradients to variables. A collection of subclasses implement classic optimization algorithms such as GradientDescent and Adagrad.
You never instantiate the Optimizer class itself, but instead instantiate one of the subclasses.
Ⅴ.class tf.train.MomentumOptimizer
Optimizer that implements the Momentum algorithm.
tf.train.MomentumOptimizer.__init__(learning_rate, momentum, use_locking=False, name=’Momentum’, use_nesterov=False)
Construct a new Momentum optimizer.
Args:
learning_rate: A Tensor or a floating point value. The learning rate.
momentum: A Tensor or a floating point value. The momentum.
use_locking: If True use locks for update operations.
name: Optional name prefix for the operations created when applying gradients. Defaults to “Momentum”.
Ⅵ.class tf.train.AdamOptimizer
實現了Adam算法的優化器
構造函數:
tf.train.AdamOptimizer.__init__(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name=’Adam’)
Construct a new Adam optimizer.
Initialization:
m_0 <- 0 (Initialize initial 1st moment vector)
v_0 <- 0 (Initialize initial 2nd moment vector)
t <- 0 (Initialize timestep)
The update rule for variable with gradient g uses an optimization described at the end of section2 of the paper:
t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1.
Note that in dense implement of this algorithm, m_t, v_t and variable will update even if g is zero, but in sparse implement, m_t, v_t and variable will not update in iterations g is zero.
Args:
learning_rate: A Tensor or a floating point value. The learning rate.
beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability.
use_locking: If True use locks for update operations.
name: Optional name for the operations created when applying gradients. Defaults to “Adam”.
三.例子
I.線性回歸
要是有不知道線性回歸的理論知識的,請到
http://blog.csdn.net/xierhacker/article/details/53257748
http://blog.csdn.net/xierhacker/article/details/53261008
熟悉的直接跳過。
直接上代碼:
結果:
這里你可以使用更多的優化器來試一下各個優化器的性能和調參情況
總結
以上是生活随笔為你收集整理的TensorFlow学习笔记(十三)TensorFLow 常用Optimizer 总结的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: TensorFlow学习笔记(十二)Te
- 下一篇: TensorFlow学习笔记(十四)Te