Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Gaussian Constrained Attention Network for Scene Text Recognition
Auto-TLDR; Gaussian Constrained Attention Network for Scene Text Recognition
Scene text recognition has been a hot topic in computer vision. Recent methods adopt the attention mechanism for sequence prediction which achieve convincing results. However, we argue that the existing attention mechanism faces the problem of attention diffusion, in which the model may not focus on a certain character area. In this paper, we propose Gaussian Constrained Attention Network to deal with this problem. It is a 2D attention-based method integrated with a novel Gaussian Constrained Refinement Module, which predicts an additional Gaussian mask to refine the attention weights. Different from adopting an additional supervision on the attention weights simply, our proposed method introduce an explicit refinement. In this way, the attention weights will be more concentrated and the attention-based recognition network achieves better performance. The proposed Gaussian Constrained Refinement Module is flexible and can be applied to existing attention-based methods directly. The experiments on several benchmark datasets demonstrate the effectiveness of our proposed method. Our code has been available at https://github.com/Pay20Y/GCAN.
Progressive Cluster Purification for Unsupervised Feature Learning
Auto-TLDR; Progressive Cluster Purification for Unsupervised Feature Learning
In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training, while the sizes of clusters continuously expand consistently with the growth of model representation capability. With a well-designed cluster purification mechanism, it further purifies clusters by filtering noise samples which facilitate the subsequent feature learning by utilizing the refined clusters as pseudo-labels. Experiments on commonly used benchmarks demonstrate that the proposed PCP improves baseline method with significant margins. Our code will be available at https://github.com/zhangyifei0115/PCP.
Self-Training for Domain Adaptive Scene Text Detection
Auto-TLDR; A self-training framework for image-based scene text detection
Though deep learning based scene text detection has achieved great progress, well-trained detectors suffer from severe performance degradation for different domains. In general, a tremendous amount of data is indispensable to train the detector in the target domain. However, data collection and annotation are expensive and time-consuming. To address this problem, we propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images. To reduce the noise of hard examples, a novel text mining module is implemented based on the fusion of detection and tracking results. Then, an image-to-video generation method is designed for the tasks that videos are unavailable and only images can be used. Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT, demonstrate the effectiveness of our self-training method. The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.