Qi Song

Papers from this author

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Qi Song, Qianyi Jiang, Xiaolin Wei, Nan Li, Rui Zhang

Responsive image

Auto-TLDR; ReADS: Rectified Attentional Double Supervised Network for General Scene Text Recognition

Slides Poster Similar

In recent years, scene text recognition is always regarded as a sequence-to-sequence problem. Connectionist Temporal Classification (CTC) and Attentional sequence recognition (Attn) are two very prevailing approaches to tackle this problem while they may fail in some scenarios respectively. CTC concentrates more on every individual character but is weak in text semantic dependency modeling. Attn based methods have better context semantic modeling ability while tends to overfit on limited training data. In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition. To overcome the weakness of CTC and Attn, both of them are applied in our method but with different modules in two supervised branches which can make a complementary to each other. Moreover, effective spatial and channel attention mechanisms are introduced to eliminate background noise and extract valid foreground information. Finally, a simple rectified network is implemented to rectify irregular text. The ReADS can be trained end-to-end and only word-level annotations are required. Extensive experiments on various benchmarks verify the effectiveness of ReADS which achieves state-of-the-art performance.

Robust Lexicon-Free Confidence Prediction for Text Recognition

Qi Song, Qianyi Jiang, Rui Zhang, Xiaolin Wei

Responsive image

Auto-TLDR; Confidence Measurement for Optical Character Recognition using Single-Input Multi-Output Network

Slides Poster Similar

Benefiting from the success of deep learning, Optical Character Recognition (OCR) is booming in recent years. As we all know, the text recognition results are vulnerable to slight perturbation in input images, thus a method for measuring how reliable the results are is crucial. In this paper, we present a novel method for confidence measurement given a text recognition result, which can be embedded in any text recognizer with little overheads. Our method consists of two stages with a coarse-to-fine style. The first stage generates multiple candidates for voting coarse scores by a Single-Input Multi-Output network (SIMO). The second stage calculates a refined confidence score referred by the voting result and the conditional probabilities of the Top-1 probable recognition sequence. Highly competitive performance is achieved on several standard benchmarks validates the efficiency and effectiveness of the proposed method. Moreover, it can be adopted in both Latin and non-Latin languages.