ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Sample-Aware Data Augmentor for Scene Text Recognition

Guanghao Meng, Tao Dai, Shudeng Wu, Bin Chen, Jian Lu, Yong Jiang, Shutao Xia

Auto-TLDR; Sample-Aware Data Augmentation for Scene Text Recognition

Abstract Slides Poster

Deep neural networks (DNNs) have been widely used in scene text recognition, and achieved remarkable performance. Such DNN-based scene text recognizers usually require plenty of training data for training, but data collection and annotation is usually cost-expensive in practice. To alleviate this issue, data augmentation is often applied to train the scene text recognizers. However, existing data augmentation methods including affine transformation and elastic transformation methods suffer from the problems of under- and over-diversity, due to the complexity of text contents and shapes. In this paper, we propose a sample-aware data augmentor to transform samples adaptively based on the contents of samples. Specifically, our data augmentor consists of three parts: gated module, affine transformation module, and elastic transformation module. In our data augmentor, affine transformation module focuses on keeping the affinity of samples, while elastic transformation module aims to improve the diversity of samples. With the gated module, our data augmentor determines transformation type adaptively based on the properties of training samples and the recognizer capability during the training process. Besides, our framework introduces an adversarial learning strategy to optimize the augmentor and the recognizer jointly. Extensive experiments on scene text recognition benchmarks show that our sample-aware data augmentor significantly improves the performance of state-of-the-art scene text recognizer.

Similar papers

IBN-STR: A Robust Text Recognizer for Irregular Text in Natural Scenes

Xiaoqian Li, Jie Liu, Shuwu Zhang

Auto-TLDR; IBN-STR: A Robust Text Recognition System Based on Data and Feature Representation

Abstract Poster Similar

Although text recognition methods based on deep neural networks have promising performance, there are still challenges due to the variety of text styles, perspective distortion, text with large curvature, and so on. To obtain a robust text recognizer, we have improved the performance from two aspects: data aspect and feature representation aspect. In terms of data, we transform the input images into S-shape distorted images in order to increase the diversity of training data. Besides, we explore the effects of different training data. In terms of feature representation, the combination of instance normalization and batch normalization improves the model's capacity and generalization ability. This paper proposes a robust text recognizer IBN-STR, which is an attention-based model. Through extensive experiments, the model analysis and comparison have been carried out from the aspects of data and feature representation, and the effectiveness of IBN-STR on both regular and irregular text instances has been verified. Furthermore, IBN-STR is an end-to-end recognition system that can achieve state-of-the-art performance.

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Qi Song, Qianyi Jiang, Xiaolin Wei, Nan Li, Rui Zhang

Auto-TLDR; ReADS: Rectified Attentional Double Supervised Network for General Scene Text Recognition

Sample-Aware Data Augmentor for Scene Text Recognition

Similar papers

IBN-STR: A Robust Text Recognizer for Irregular Text in Natural Scenes

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

A Multi-Head Self-Relation Network for Scene Text Recognition

Weakly Supervised Attention Rectification for Scene Text Recognition

2D License Plate Recognition based on Automatic Perspective Rectification

Gaussian Constrained Attention Network for Scene Text Recognition

Cost-Effective Adversarial Attacks against Scene Text Recognition

Text Recognition in Real Scenarios with a Few Labeled Samples

MEAN: A Multi-Element Attention Based Network for Scene Text Recognition

Recognizing Multiple Text Sequences from an Image by Pure End-To-End Learning

Robust Lexicon-Free Confidence Prediction for Text Recognition

Text Recognition - Real World Data and Where to Find Them

Transferable Adversarial Attacks for Deep Scene Text Detection

Feature Embedding Based Text Instance Grouping for Largely Spaced and Occluded Text Detection

An Accurate Threshold Insensitive Kernel Detector for Arbitrary Shaped Text

Stratified Multi-Task Learning for Robust Spotting of Scene Texts

Scene Text Detection with Selected Anchors

A Transformer-Based Radical Analysis Network for Chinese Character Recognition

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

Self-Training for Domain Adaptive Scene Text Detection

Local Gradient Difference Based Mass Features for Classification of 2D-3D Natural Scene Text Images

Global Context-Based Network with Transformer for Image2latex

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Mutually Guided Dual-Task Network for Scene Text Detection

Attentive Part-Aware Networks for Partial Person Re-Identification

Improving Word Recognition Using Multiple Hypotheses and Deep Embeddings

DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese Text Line Recognition

Radical Counter Network for Robust Chinese Character Recognition

ID Documents Matching and Localization with Multi-Hypothesis Constraints

ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition

RLST: A Reinforcement Learning Approach to Scene Text Detection Refinement

Multi-Task Learning Based Traditional Mongolian Words Recognition

TCATD: Text Contour Attention for Scene Text Detection

Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

PointDrop: Improving Object Detection from Sparse Point Clouds Via Adversarial Data Augmentation

On-Device Text Image Super Resolution

A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping

Pose Variation Adaptation for Person Re-Identification

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video

Deep Space Probing for Point Cloud Analysis

DmifNet:3D Shape Reconstruction Based on Dynamic Multi-Branch Information Fusion

Boundary-Aware Graph Convolution for Semantic Segmentation