當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DL之CNN：基于CNN-RNN(GRU,2)算法(keras+tensorflow)实现不定长文本识别

發布時間：2025/3/21 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 DL之CNN：基于CNN-RNN(GRU,2)算法(keras+tensorflow)实现不定长文本识别小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

DL之CNN：基于CNN-RNN(GRU,2)算法(keras+tensorflow)實現不定長文本識別

輸出結果

實現代碼

輸出結果

后期更新……

實現代碼

后期更新……

image_ocr代碼：DL之CNN：利用CNN(keras, CTC loss, {image_ocr})算法實現OCR光學字符識別

#DL之CNN：基于CNN-RNN(GRU,2)算法(keras+tensorflow)實現不定長文本識別#Keras 的 CTC loss函數：位于 https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py文件中，內容如下：import tensorflow as tf from tensorflow.python.ops import ctc_ops as ctcdef ctc_batch_cost(y_true, y_pred, input_length, label_length):"""Runs CTC loss algorithm on each batch element.# Argumentsy_true: tensor `(samples, max_string_length)`containing the truth labels.y_pred: tensor `(samples, time_steps, num_categories)`containing the prediction, or output of the softmax.input_length: tensor `(samples, 1)` containing the sequence length foreach batch item in `y_pred`.label_length: tensor `(samples, 1)` containing the sequence length foreach batch item in `y_true`.# ReturnsTensor with shape (samples,1) containing theCTC loss of each element."""label_length = tf.to_int32(tf.squeeze(label_length))input_length = tf.to_int32(tf.squeeze(input_length))sparse_labels = tf.to_int32(ctc_label_dense_to_sparse(y_true, label_length))y_pred = tf.log(tf.transpose(y_pred, perm=[1, 0, 2]) + 1e-8)return tf.expand_dims(ctc.ctc_loss(inputs=y_pred, labels=sparse_labels, sequence_length=input_length), 1)# 不定長文本識別 import os import itertools import re import datetime import cairocffi as cairo import editdistance import numpy as np from scipy import ndimage import pylabfrom keras import backend as K from keras.layers.convolutional import Conv2D, MaxPooling2D from keras.layers import Input, Dense, Activation, Reshape, Lambda from keras.layers.merge import add, concatenate from keras.layers.recurrent import GRU from keras.models import Model from keras.optimizers import SGD from keras.utils.data_utils import get_file from keras.preprocessing import image from keras.callbacks import EarlyStopping,Callbackfrom keras.backend.tensorflow_backend import set_session import tensorflow as tf import matplotlib.pyplot as pltconfig = tf.ConfigProto() config.gpu_options.allow_growth=True set_session(tf.Session(config=config))OUTPUT_DIR = 'image_ocr' np.random.seed(55)# # 從 Keras 官方文件中 import 相關的函數 # !wget https://raw.githubusercontent.com/fchollet/keras/master/examples/image_ocr.py from image_ocr import *#定義必要的參數： run_name = datetime.datetime.now().strftime('%Y:%m:%d:%H:%M:%S') start_epoch = 0 stop_epoch = 200 img_w = 128 img_h = 64 words_per_epoch = 16000 val_split = 0.2 val_words = int(words_per_epoch * (val_split))# Network parameters conv_filters = 16 kernel_size = (3, 3) pool_size = 2 time_dense_size = 32 rnn_size = 512 input_shape = (img_w, img_h, 1)# 使用這些函數以及對應參數構建生成器，生成不固定長度的驗證碼 fdir = os.path.dirname(get_file('wordlists.tgz', origin='http://www.mythic-ai.com/datasets/wordlists.tgz', untar=True)) img_gen = TextImageGenerator(monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),minibatch_size=32, img_w=img_w, img_h=img_h,downsample_factor=(pool_size ** 2), val_split=words_per_epoch - val_words )#構建CNN網絡 act = 'relu'input_data = Input(name='the_input', shape=input_shape, dtype='float32') inner = Conv2D(conv_filters, kernel_size, padding='same', activation=act, kernel_initializer='he_normal',name='conv1')(input_data) inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner) inner = Conv2D(conv_filters, kernel_size, padding='same', activation=act, kernel_initializer='he_normal',name='conv2')(inner) inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters) inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)#減少輸入尺寸到RNN：cuts down input size going into RNN: inner = Dense(time_dense_size, activation=act, name='dense1')(inner)#GRU模型：兩層雙向的算法 # Two layers of bidirecitonal GRUs # GRU seems to work as well, if not better than LSTM: gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner) gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner) gru1_merged = add([gru_1, gru_1b]) gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged) gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)#將RNN輸出轉換為字符激活：transforms RNN output to character activations inner = Dense(img_gen.get_output_size(), kernel_initializer='he_normal',name='dense2')(concatenate([gru_2, gru_2b])) y_pred = Activation('softmax', name='softmax')(inner)Model(inputs=input_data, outputs=y_pred).summary() labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len], dtype='float32') input_length = Input(name='input_length', shape=[1], dtype='int64') label_length = Input(name='label_length', shape=[1], dtype='int64')#Keras目前不支持帶有額外參數的loss funcs，所以CTC loss是在lambda層中實現的 # Keras doesn't currently support loss funcs with extra parameters, so CTC loss is implemented in a lambda layer loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])#clipnorm似乎加快了收斂速度：clipnorm seems to speeds up convergence sgd = SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5) model = Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)#計算損失發生在其他地方，所以使用一個啞函數來表示損失 # the loss calc occurs elsewhere, so use a dummy lambda func for the loss model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd) if start_epoch > 0:weight_file = os.path.join(OUTPUT_DIR, os.path.join(run_name, 'weights%02d.h5' % (start_epoch - 1)))model.load_weights(weight_file)#捕獲softmax的輸出，以便在可視化過程中解碼輸出 # captures output of softmax so we can decode the output during visualization test_func = K.function([input_data], [y_pred])# 反饋函數，即運行固定次數后，執行反饋函數可保存模型，并且可視化當前訓練的效果 viz_cb = VizCallback(run_name, test_func, img_gen.next_val())# 執行訓練： model.fit_generator(generator=img_gen.next_train(), steps_per_epoch=(words_per_epoch - val_words),epochs=stop_epoch, validation_data=img_gen.next_val(), validation_steps=val_words,callbacks=[EarlyStopping(patience=10), viz_cb, img_gen], initial_epoch=start_epoch)

總結

以上是生活随笔為你收集整理的DL之CNN：基于CNN-RNN(GRU,2)算法(keras+tensorflow)实现不定长文本识别的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：成功解决修改pip的默认安装packa
下一篇：成功解决OSError: dlopen(