【ACM 2020 - Text Recognition in the Wild:A Survey】OCR识别综述
Introduction
1. 推動基于深度學習的STR發(fā)展三要素:
(1)先進的硬件系統(tǒng):高性能計算支持訓練大規(guī)模識別網(wǎng)絡(luò)
(2)基于深度學習的STR算法能自動進行特征學習
(3)STR應用需求旺盛
BACKGROUND
STR基本問題:
(1)Text localization(文本定位)
- Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li.Single shot text detector with regional attention. In Proceedings of ICCV. 3047–3055.
- Fangneng Zhan and Shijian Lu. 2019. ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of CVPR. 2059–2068.
- Fang Yin, Rui Wu, Xiaoyang Yu, and Guanglu Sun. 2019. Video text localization based on Adaboost. Multimedia Tools and Applications 78, 5 (2019), 5345–5354.
(2)Text verification(文本驗證)
- Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of ICPR. 3304–3308.
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep features for text spotting. In Proceedings of ECCV. 512–528.
(3)Text detection(文本檢測)
基于回歸:
- Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of CVPR. 1962–1969.
- Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Sheng Zhang. 2019. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90 (2019), 337–345.
基于分割:
- Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, and Xiang Bai. 2019. TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing 28, 11 (2019), 5566–5579.
- Yuliang Liu, Lianwen Jin, and Chuanming Fang. 2020. Arbitrarily Shaped Scene Text Detection with a Mask Tightness Text Detector. IEEE Transactions on Image Processing 29 (2020), 2918–2930.
(4)Text segmentation(文本分割)
單行分割:
- Fangneng Zhan and Shijian Lu. 2019. ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of CVPR. 2059–2068.
單字符分割(早期文字識別方法):
- Palaiahnakote Shivakumara, Souvik Bhowmick, Bolan Su, Chew Lim Tan, and Umapada Pal. 2011. A new gradient based character segmentation method for video text recognition. In Proceedings of ICDAR. 126–130
- Anand Mishra, Karteek Alahari, and CV Jawahar. 2012. Scene text recognition using higher order language priors. In Proceedings of BMVC. 1–11.
(5)Text recognition(文本識別)
- Zhanzhan Cheng, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. 2017. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of ICCV. 5086–5094.
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.
(6)End-to-end system(端到端)
- Hui Li, Peng Wang, and Chunhua Shen. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of ICCV. 5238–5246.
- Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, and Changming Sun. 2018. An end-to-end textspotter with explicit alignment and attention. In Proceedings of CVPR. 5020–5029.
- Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network. In Proceedings of CVPR.
其它:Script identification、Text enhancement、Text tracking、NLP
METHODOLOGIES
STR常見方法有基于單字符分割的方法和文本行識別的方法
1. 基于單字符分割
三個步驟:圖像處理,字符分割,單字符識別
- Zhaoyi Wan, Mingling He, Haoran Chen, Xiang Bai, and Cong Yao. 2020.TextScanner: Reading Characters in Order for Robust Scene Text Recognition. In Proceedings of AAAI.
通過與語義分割實現(xiàn)字符級識別,通過構(gòu)建兩個分支分別進行字符的分類和定位
存在的問題:
(1)字符定位被認為是STR中最具挑戰(zhàn)性的任務之一,識別效果受字符定位效果影響
(2)單字符識別未考慮到上下文語義信息,最終單詞級別效果可能較差
2. 文本行識別
四個步驟:圖像處理,特征提取,序列模型,文本行預測,其中第一步和第三部非必需
(1)圖像處理
背景移除
傳統(tǒng)的二值化方法可以應用到文檔圖像中,對于自然場景中的復雜圖像,可以借鑒GAN的方法移除背景
- Canjie Luo, Qingxiang Lin, Yuliang Liu, Jin Lianwen, and Shen Chunhua. 2020. Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild. CoRR abs/2001.04189 (2020).(借助CANs的方法移除背景)
圖像超分
對于模糊且分辨率低的圖像,通過采用圖像超分的方法解決
- Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, and Ping Luo. 2019. TextSR: Content-Aware Text Super-Resolution Guided by Recognition. CoRR abs/1909.07113 (2019).(首次將圖像超分與識別任務相結(jié)合)
- https://github.com/JasonBoy1/TextZoom(超分+識別)
圖像整流
通過人為設(shè)計整流網(wǎng)絡(luò)應對不規(guī)則文本圖像,規(guī)范化圖像輸入
- Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Proceedings of NIPS. 2017–2025.(STN)
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.(TPS)
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.(提出多目標整流網(wǎng)絡(luò),預測圖像每個部分的偏移量來糾正不規(guī)則文本)
(2)特征提取
圖像特征提取的效果直接影響到最終的識別性能,更深更先進的特征提取網(wǎng)絡(luò)能取得更好的效果,但是需要更高的內(nèi)存開銷以及需要更大的算力支持,背景消除+簡單的特征提取網(wǎng)絡(luò)可能是未來發(fā)展的一個方向
基于CNN:
- Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2017. Learning to read irregular text with attention mechanisms. In Proceedings of IJCAI. 3280–3286.(VGG)
- Qingqing Wang, Wenjing Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang, and Shujing Lyu. 2019. ReELFA: A Scene Text Recognizer with Encoded Location and Focused Attention. In Proceedings of ICDAR: Workshops. 71–76.(VGG)
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.(ResNet)
- Xiaoxue Chen, Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, and Canjie Luo. 2020. Adaptive Embedding Gate for Attention-Based Scene Text Recognition. Neurocomputing 381 (2020), 261–271.(ResNet)
- Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2018. Dense Chained Attention Network for Scene Text Recognition. In Proceedings of ICIP. 679–683.(DenseNet)
- Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2019. Reading scene text with fully convolu-tional sequence modeling. Neurocomputing 339 (2019), 161–170.(DenseNet)
基于RCNN:
- Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of CVPR. 2231–2239.
- Jianfeng Wang and Xiaolin Hu. 2017. Gated recurrent convolution neural network for OCR. In Proceedings of NIPS. 335–344.
基于CNN+Attention:
考慮到直接用CNN提取特征可能會引入額外噪聲,因此結(jié)合Attention機制強化文本內(nèi)容抑制背景
- Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. 2019. Sequence-To Sequence Domain Adaptation Network for Robust Text Image Recognition. In Proceedings of CVPR. 2740–2749.
- Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, and Xiang Bai. 2019. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell (2019).
- Yunlong Huang, Zenghui Sun, Lianwen Jin, and Canjie Luo. 2020. EPAN: Effective parts attention network for scene text recognition. Neurocomputing 376 (2020), 202–213.
(3)序列模型
序列模型被當作銜接圖像的視覺特征以及識別預測之間的橋梁,能夠捕獲字符的上下文信息用于下一時間階段的字符預測,因此比獨立的字符預測效果要好
雙向LSTM:能捕獲長序列依賴,但是由于RNN結(jié)構(gòu)特性,無法并行化
- Siwei Wang, Yongtao Wang, Xiaoran Qin, Qijie Zhao, and Zhi Tang. 2019. Scene Text Recognition via Gated Cascade Attention. In Proceedings of ICME. 1018–1023.
- Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, and Xiang Bai. 2019. Symmetry-constrained rectification network for scene text recognition. In Proceedings of ICCV. 9147–9156.
- Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, and Mingxiang Cai. 2020. Decoupled Attention Network for Text Recognition. In Proceedings of AAAI.
- Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of ICCV. 4714–4722.
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.
CNN:CNN可以通過感受野控制進行長文本上下文信息建模,能并行化處理
- Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, and Errui Ding. 2020. Towards Accurate Scene Text Recognition with Semantic Reasoning Networks. In Proceedings of CVPR.
- Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, and Weiping Wang. 2020. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. In Proceedings of CVPR.
transformer:采用Attention機制進行圖像的序列化編碼,能并行化處理
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. 5998–6008.
(4)文本識別
對文本圖像的編碼特征進行解碼成對應的字符序列,主流的兩種方法為基于CTC和基于Attention機制
基于CTC方法:
CTC在語音識別和在線手寫識別上均有廣泛的應用,在STR中,CTC通過計算條件概率來實現(xiàn)識別這一任務,基于約定的映射關(guān)系,最大化輸入到輸出的所有可能路徑的條件概率和。此外,無需數(shù)據(jù)對齊標注就可以完成訓練過程。
- Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2019. Reading scene text with fully convolutional sequence modeling. Neurocomputing 339 (2019), 161–170
- Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell 39, 11 (2017), 2298–2304.
CTC存在的問題:
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of ICML. 369–376.
-Hu Liu, Sheng Jin, and Changshui Zhang. 2018. Connectionist temporal classification with maximum entropy regularization. In Proceedings of NIPS. 831–841.
-code: https://github.com/liuhu-bigeye/enctc.crnn
提出一種基于最大熵的正則化方法對CTC算法進行改進
-Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, and Cong Yao. 2019. 2D-CTC for Scene Text Recognition. CoRR abs/1907.09705 (2019).
嘗試在高度方向增加一個維度計算CTC,但是最終改善效果有限
對CTC方法的其它改進:
- Xinjie Feng, Hongxun Yao, and Shengping Zhang. 2019. Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets. Complexity 2019 (2019), 9345861:1–9345861:11.
提出一種融合焦點損失的方法解決識別樣本不均衡的問題 - Wenyang Hu, Xiaocong Cai, Jun Hou, Shuai Yi, and Zhiping Lin. 2020. GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition. In Proceedings of AAAI.
采用圖卷積網(wǎng)絡(luò)提升CTC的準確率和魯棒性
基于Attention的方法:
Attention最初被[1]提出來用來做機器翻譯,后來也被用在圖像標題[2]、文本識別[3]、遙感影像分類[4]等場景,在STR場景中,通常與RNN結(jié)構(gòu)相結(jié)合作為識別模塊,Attention機制通過對目標字符之前的輸出信息結(jié)合編碼過程中輸出的特征向量學習對齊特征
- [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
- [2] Xinwei He, Yang Yang, Baoguang Shi, and Xiang Bai. 2019. VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation. Neurocomputing 328 (2019), 48–55.
- [3] Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.
- [4] Qi Wang, Shaoteng Liu, Jocelyn Chanussot, and Xuelong Li. 2018. Scene classification with recurrent attention of VHR remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 57, 2 (2018), 1155–1167.
Attention應用于STR場景:
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035 2048.
- Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2017. Learning to read irregular text with attention mechanisms. In Proceedings of IJCAI. 3280–3286.
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.
- Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, and Xiang Bai. 2019. Symmetry constrained rectification network for scene text recognition. In Proceedings of ICCV. 9147–9156.
基于Attention模型所進行的改進:
-Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2017. Learning to read irregular text with attention mechanisms. In Proceedings of IJCAI. 3280–3286.
-Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. 2019. Show, attend and read: A simple and strong baseline for irregular text recognition. In Proceedings of AAAI. 8610–8617.
-Yunlong Huang, Zenghui Sun, Lianwen Jin, and Canjie Luo. 2020. EPAN: Effective parts attention network for scene text recognition. Neurocomputing 376 (2020), 202–213.
提出一種高階字符語言模型
-Xiaoxue Chen, Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, and Canjie Luo. 2020. Adaptive Embedding Gate for
Attention-Based Scene Text Recognition. Neurocomputing 381 (2020), 261–271.
ASTER方法構(gòu)建雙向注意力解碼器
-Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.
transformer方法拋棄掉傳統(tǒng)的RNN結(jié)構(gòu),實現(xiàn)并行化處理
-Yiwei Zhu, Shilin Wang, Zheng Huang, and Kai Chen. 2019. Text Recognition in Images Based on Transformer with Hierarchical Attention. In Proceedings of ICIP. 1945–1949.
-Peng Wang, Lu Yang, Hui Li, Yuyan Deng, Chunhua Shen, and Yanning Zhang. 2019. A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition. CoRR abs/1904.01375 (2019).
-Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, and Errui Ding. 2020. Towards Accurate Scene Text Recognition with Semantic Reasoning Networks. In Proceedings of CVPR.
-Zhanzhan Cheng, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. 2017. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of ICCV. 5086–5094
-Yunlong Huang, Zenghui Sun, Lianwen Jin, and Canjie Luo. 2020. EPAN: Effective parts attention network for scene text recognition. Neurocomputing 376 (2020), 202–213.
Attention存在的問題:
CTC和Attention結(jié)合:
- Wenyang Hu, Xiaocong Cai, Jun Hou, Shuai Yi, and Zhiping Lin. 2020. GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition. In Proceedings of AAAI.
- Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, and R. Manmatha. 2020. SCATTER: Selective Context Attentional Scene Text Recognizer. In Proceedings of CVPR.
CTC和Attention對比:
單詞級別:Attention效果更好
句子級別:CTC效果更好
- Fuze Cong, Wenping Hu, Huo Qiang, and Li Guo. 2019. A Comparative Study of Attention-based Encoder Decoder Approaches to Natural Scene Text Recognition. In Proceedings of ICDAR. 916–921.
端到端系統(tǒng):
非端到端系統(tǒng)通常是將檢測和識別當作兩個獨立的子任務進行串聯(lián)起來,實現(xiàn)圖像中文本內(nèi)容的定位與識別,而端到端的系統(tǒng)檢測和識別則是共享同一個網(wǎng)絡(luò),共同參與網(wǎng)絡(luò)的訓練過程,通常也包括文本框的定位,文本內(nèi)容的識別以及后處理三個部分
端到端系統(tǒng)的優(yōu)點:
(1)將檢測和識別拆分為兩個獨立的任務,會產(chǎn)生累計誤差,端到端由于是同時訓練檢測和識別部分,因此不會產(chǎn)生累計誤差
(2)檢測和識別部分共享網(wǎng)絡(luò)參數(shù),共享訓練數(shù)據(jù),因此可以整體進行優(yōu)化
(3)在不同的場景中能快速地遷移應用,數(shù)據(jù)依賴程度較低
(4)執(zhí)行速度快,內(nèi)存開銷小
缺陷:
(1)網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計較難,檢測和識別之間的銜接及信息共享
(2)檢測和識別任務在網(wǎng)絡(luò)學習及收斂過程中存在明顯的差異性,較難權(quán)衡二者之間的關(guān)系
(3)聯(lián)合訓練的存在較大的優(yōu)化空間
端到端的方法比較
EVALUATIONS AND PROTOCOLS
數(shù)據(jù)集:包含合成數(shù)據(jù)集和真實數(shù)據(jù)集
合成數(shù)據(jù)集:Synth90k, SynthText, Verisimilar Synthesis, UnrealText
真實數(shù)據(jù)集:常規(guī)拉丁文(IIIT5K-Words (IIIT5K), Street View Text (SVT), ICDAR 2003 (IC03), ICDAR 2011 (IC11), ICDAR 2013 (IC13), Street View House Number (SVHN)); 非常規(guī)拉丁文(SVT-P, CUTE, IC15, COCO-Text, Total-Text);中文自然場景數(shù)據(jù)集(RCTW-17), MTWI, CTW, LSVT, ArT等)
識別評價指標:
端到端的系統(tǒng)評價指標:
DISCUSSION AND FUTURE DIRECTIONS
STR盡管已經(jīng)取得非常明顯的突破,但是還有許多需要進步的地方,STR在未來的發(fā)展方向可以有以下幾個方面:
資料
論文地址:https://arxiv.org/abs/2005.03492
https://github.com/HCIILAB/Scene-Text-Recognition
總結(jié)
以上是生活随笔為你收集整理的【ACM 2020 - Text Recognition in the Wild:A Survey】OCR识别综述的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: return 函数
- 下一篇: django 返回文件字节流