TensorFlow-CIFAR10 CNN代码分析
- CIFAR
- 代碼組織
- 代碼分析
- cifar10_trainpy
- cifar10py
- cifar10_evalpy
- Reference
根據TensorFlow 1.2.1,改了官方版本的報錯。
CIFAR
想了解更多信息請參考CIFAR-10 page,以及Alex Krizhevsky寫的技術報告
- 相關核心數學對象,如卷積、修正線性激活、最大池化以及局部響應歸一化;
- 訓練過程中一些網絡行為的可視化,這些行為包括輸入圖像、損失情況、網絡行為的分布情況以及梯度;
- 算法學習參數的移動平均值的計算函數,以及在評估階段使用這些平均值提高預測性能;
- 實現了一種機制,使得學習率隨著時間的推移而遞減;
- 為輸入數據設計預存取隊列,將磁盤延遲和高開銷的圖像預處理操作與模型分離開來處理;
代碼組織
| cifar10_input.py | 讀取本地CIFAR-10的二進制文件格式的內容。 |
| cifar10.py | 建立CIFAR-10的模型。 |
| cifar10_train.py | 在CPU或GPU上訓練CIFAR-10的模型。 |
| cifar10_multi_gpu_train.py | 在多GPU上訓練CIFAR-10的模型。 |
| cifar10_eval.py | 評估CIFAR-10模型的預測性能。 |
代碼分析
由于TensorFlow 1.0有些版本改動,導致新版本和以前代碼不兼容,具體bug解決方法見:TensorFlow CIFAR-10訓練例子報錯解決
下面是其官方的訓練效果。10萬步的準確率為86%左右,所以其實并不用訓練到100k。
System | Step Time (sec/batch) | Accuracy ------------------------------------------------------------------ 1 Tesla K20m | 0.35-0.60 | ~86% at 60K steps (5 hours) 1 Tesla K40m | 0.25-0.35 | ~86% at 100K steps (4 hours)- 1
- 2
- 3
- 4
cifar10_train.py
代碼核心的train()函數,如下:
def train():"""Train CIFAR-10 for a number of steps."""with tf.Graph().as_default():global_step = tf.Variable(0, trainable=False)# Get images and labels for CIFAR-10.# 輸入圖像的預處理,包括亮度、對比度、圖像翻轉等操作images, labels = cifar10.distorted_inputs()# Build a Graph that computes the logits predictions from the# inference model.logits = cifar10.inference(images)# Calculate loss.loss = cifar10.loss(logits, labels)# Build a Graph that trains the model with one batch of examples and# updates the model parameters.train_op = cifar10.train(loss, global_step)# Create a saver.saver = tf.train.Saver(tf.all_variables())# Build the summary operation based on the TF collection of Summaries.summary_op = tf.summary.merge_all()# Build an initialization operation to run below.init = tf.initialize_all_variables()# Start running operations on the Graph.sess = tf.Session(config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement))sess.run(init)# Start the queue runners.tf.train.start_queue_runners(sess=sess)summary_writer = tf.summary.FileWriter(FLAGS.train_dir,graph_def=sess.graph_def)# 按照設置的迭代次數迭代for step in xrange(FLAGS.max_steps):start_time = time.time()_, loss_value = sess.run([train_op, loss])duration = time.time() - start_timeassert not np.isnan(loss_value), 'Model diverged with loss = NaN'# 每10個輸入數據顯示次step,loss,時間等運行數據if step % 10 == 0:num_examples_per_step = FLAGS.batch_sizeexamples_per_sec = num_examples_per_step / durationsec_per_batch = float(duration)format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f ''sec/batch)')print(format_str % (datetime.now(), step, loss_value,examples_per_sec, sec_per_batch))# 每100個輸入數據將網絡的狀況體現在summary里if step % 100 == 0:summary_str = sess.run(summary_op)summary_writer.add_summary(summary_str, step)# Save the model checkpoint periodically.# 每1000個輸入數據保存次模型if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')saver.save(sess, checkpoint_path, global_step=step)def main(argv=None): # pylint: disable=unused-argument# 檢查目錄下是否有數據,沒有則下載。cifar10.maybe_download_and_extract()# 刪除訓練日志。if gfile.Exists(FLAGS.train_dir):gfile.DeleteRecursively(FLAGS.train_dir)gfile.MakeDirs(FLAGS.train_dir)# 訓練train()if __name__ == '__main__':# 處理flag解析,并執(zhí)行main函數。tf.app.run()- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
其中的distorded_inputs函數,對訓練集合進行隨機的一些操作,包括顛倒,隨機翻轉等,以保證包含驗證集中情況。在cifar10_input.py中,具體代碼如下:
def distorted_inputs(data_dir, batch_size):"""Construct distorted input for CIFAR training using the Reader ops.Args:data_dir: Path to the CIFAR-10 data directory.batch_size: Number of images per batch.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size."""# 訓練集合具有更多隨機的操作,包括顛倒,隨機翻轉,以保證包含驗證集中情況。filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)for i in xrange(1, 6)]for f in filenames:if not gfile.Exists(f):raise ValueError('Failed to find file: ' + f)# Create a queue that produces the filenames to read.filename_queue = tf.train.string_input_producer(filenames)# Read examples from files in the filename queue.read_input = read_cifar10(filename_queue)reshaped_image = tf.cast(read_input.uint8image, tf.float32)height = IMAGE_SIZEwidth = IMAGE_SIZE# Image processing for training the network. Note the many random# distortions applied to the image.# 步驟1:隨機截取一個以[高,寬]為大小的圖矩陣。distorted_image = tf.random_crop(reshaped_image, [height, width, 3])# 步驟2:隨機顛倒圖片的左右。概率為50%distorted_image = tf.image.random_flip_left_right(distorted_image)# Because these operations are not commutative, consider randomizing# randomize the order their operation.# 步驟3:隨機改變圖片的亮度以及色彩對比。distorted_image = tf.image.random_brightness(distorted_image,max_delta=63)distorted_image = tf.image.random_contrast(distorted_image,lower=0.2, upper=1.8)# Subtract off the mean and divide by the variance of the pixels.float_image = tf.image.per_image_standardization(distorted_image)# Ensure that the random shuffling has good mixing properties.# queue里有了不少于40%的數據的時候訓練才能開始min_fraction_of_examples_in_queue = 0.4min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *min_fraction_of_examples_in_queue)print('Filling queue with %d CIFAR images before starting to train. ''This will take a few minutes.' % min_queue_examples)# Generate a batch of images and labels by building up a queue of examples.return _generate_image_and_label_batch(float_image, read_input.label,min_queue_examples, batch_size)- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
而對于驗證集,不需要上述操作,所以使用inputs()函數,基本和上面相似,不贅述了,在cifar10_input.py中,具體代碼如下:
def inputs(eval_data, data_dir, batch_size):"""Construct input for CIFAR evaluation using the Reader ops.Args:eval_data: bool, indicating if one should use the train or eval data set.data_dir: Path to the CIFAR-10 data directory.batch_size: Number of images per batch.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size."""if not eval_data:filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)for i in xrange(1, 6)]num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAINelse:filenames = [os.path.join(data_dir, 'test_batch.bin')]num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVALfor f in filenames:if not gfile.Exists(f):raise ValueError('Failed to find file: ' + f)# Create a queue that produces the filenames to read.filename_queue = tf.train.string_input_producer(filenames)# Read examples from files in the filename queue.read_input = read_cifar10(filename_queue)reshaped_image = tf.cast(read_input.uint8image, tf.float32)height = IMAGE_SIZEwidth = IMAGE_SIZE# Image processing for evaluation.# Crop the central [height, width] of the image.resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,width, height)# Subtract off the mean and divide by the variance of the pixels.float_image = tf.image.per_image_standardization(resized_image)# Ensure that the random shuffling has good mixing properties.min_fraction_of_examples_in_queue = 0.4min_queue_examples = int(num_examples_per_epoch *min_fraction_of_examples_in_queue)# Generate a batch of images and labels by building up a queue of examples.return _generate_image_and_label_batch(float_image, read_input.label, min_queue_examples, batch_size)- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
cifar10.py
這一部分是訓練CNN模型的代碼,inference()函數主要是定義CNN結構,在MINIST中已經詳細解釋過,參考tensorFlow搭建CNN-mnist上手:
def inference(images):"""Build the CIFAR-10 model.Args:images: Images returned from distorted_inputs() or inputs().Returns:Logits."""# We instantiate all variables using tf.get_variable() instead of# tf.Variable() in order to share variables across multiple GPU training runs.# If we only ran this model on a single GPU, we could simplify this function# by replacing all instances of tf.get_variable() with tf.Variable().## conv1with tf.variable_scope('conv1') as scope:kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],stddev=1e-4, wd=0.0)conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))bias = tf.nn.bias_add(conv, biases)conv1 = tf.nn.relu(bias, name=scope.name)_activation_summary(conv1)# pool1pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],padding='SAME', name='pool1')# norm1norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm1')# conv2with tf.variable_scope('conv2') as scope:kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],stddev=1e-4, wd=0.0)conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))bias = tf.nn.bias_add(conv, biases)conv2 = tf.nn.relu(bias, name=scope.name)_activation_summary(conv2)# norm2norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm2')# pool2pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],strides=[1, 2, 2, 1], padding='SAME', name='pool2')# local3with tf.variable_scope('local3') as scope:# Move everything into depth so we can perform a single matrix multiply.dim = 1for d in pool2.get_shape()[1:].as_list():dim *= dreshape = tf.reshape(pool2, [FLAGS.batch_size, dim])weights = _variable_with_weight_decay('weights', shape=[dim, 384],stddev=0.04, wd=0.004)biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)_activation_summary(local3)# local4with tf.variable_scope('local4') as scope:weights = _variable_with_weight_decay('weights', shape=[384, 192],stddev=0.04, wd=0.004)biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)_activation_summary(local4)# softmax, i.e. softmax(WX + b)with tf.variable_scope('softmax_linear') as scope:weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],stddev=1 / 192.0, wd=0.0)biases = _variable_on_cpu('biases', [NUM_CLASSES],tf.constant_initializer(0.0))softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)_activation_summary(softmax_linear)return softmax_linear- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
訓練CNN,train()函數:
def train(total_loss, global_step):"""Train CIFAR-10 model.Create an optimizer and apply to all trainable variables. Add movingaverage for all trainable variables.Args:total_loss: Total loss from loss().global_step: Integer Variable counting the number of training stepsprocessed.Returns:train_op: op for training."""# Variables that affect learning rate.num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_sizedecay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)# Decay the learning rate exponentially based on the number of steps.lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True)tf.summary.scalar('learning_rate', lr)# Generate moving averages of all losses and associated summaries.loss_averages_op = _add_loss_summaries(total_loss)# Compute gradients.# control dependencies的運用。這里只有l(wèi)oss_averages_op完成了# 我們才會進行gradient descent的優(yōu)化。with tf.control_dependencies([loss_averages_op]):opt = tf.train.GradientDescentOptimizer(lr)grads = opt.compute_gradients(total_loss)# Apply gradients.apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)# Add histograms for trainable variables.for var in tf.trainable_variables():tf.summary.histogram(var.op.name, var)# Add histograms for gradients.for grad, var in grads:if grad is not None:tf.summary.histogram(var.op.name + '/gradients', grad)# Track the moving averages of all trainable variables.variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)variables_averages_op = variable_averages.apply(tf.trainable_variables())with tf.control_dependencies([apply_gradient_op, variables_averages_op]):train_op = tf.no_op(name='train')return train_op- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
cifar10_eval.py
然后是評估函數evaluate():
def evaluate():"""Eval CIFAR-10 for a number of steps."""with tf.Graph().as_default():# Get images and labels for CIFAR-10.eval_data = FLAGS.eval_data == 'test'images, labels = cifar10.inputs(eval_data=eval_data)# Build a Graph that computes the logits predictions from the# inference model.logits = cifar10.inference(images)# Calculate predictions.top_k_op = tf.nn.in_top_k(logits, labels, 1)# Restore the moving average version of the learned variables for eval.variable_averages = tf.train.ExponentialMovingAverage(cifar10.MOVING_AVERAGE_DECAY)variables_to_restore = variable_averages.variables_to_restore()saver = tf.train.Saver(variables_to_restore)# Build the summary operation based on the TF collection of Summaries.summary_op = tf.summary.merge_all()graph_def = tf.get_default_graph().as_graph_def()summary_writer = tf.summary.FileWriter(FLAGS.eval_dir,graph_def=graph_def)while True:eval_once(saver, summary_writer, top_k_op, summary_op)if FLAGS.run_once:breaktime.sleep(FLAGS.eval_interval_secs)- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
主線就是這樣,具體代碼見:CIFAR_TensorFlow
Reference
卷積神經網絡
原文地址: http://blog.csdn.net/shine19930820/article/details/76608648
總結
以上是生活随笔為你收集整理的TensorFlow-CIFAR10 CNN代码分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 理解ResNet结构与TensorFlo
- 下一篇: 深度学习笔记:优化方法总结(BGD,SG