ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Collaborative Human Machine Attention Module for Character Recognition

Chetan Ralekar, Tapan Gandhi, Santanu Chaudhury

Auto-TLDR; A Collaborative Human-Machine Attention Module for Deep Neural Networks

Abstract Slides Poster

The deep learning models which include attention mechanisms are shown to enhance the performance and efficiency of the various computer vision tasks such as pattern recognition, object detection, face recognition, etc. Although the visual attention mechanism is the source of inspiration for these models, recent attention models consider `attention' as a pure machine vision optimization problem and visual attention remains the most neglected aspect. Therefore, this paper presents a collaborative human and machine attention module which considers both visual and network's attention. The proposed module is inspired by the dorsal (`where') pathways of visual processing and it can be integrated with any convolutional neural network (CNN) model. First, the module computes the spatial attention map from the input feature maps which is then combined with the visual attention maps. The visual attention maps are created using eye-fixations obtained by performing an eye-tracking experiment with human participants. The visual attention map covers the highly salient and discriminative image regions as humans tend to focus on such regions, whereas the other relevant image regions are processed by spatial attention map. The combination of these two maps results in the finer refinement in feature maps which results in improved performance. The comparative analysis reveals that our model not only shows significant improvement over the baseline model but also outperforms the other models. We hope that our findings using a collaborative human-machine attention module will be helpful in other vision tasks as well.

Similar papers

Classifying Eye-Tracking Data Using Saliency Maps

Shafin Rahman, Sejuti Rahman, Omar Shahid, Md. Tahmeed Abdullah, Jubair Ahmed Sourov

Auto-TLDR; Saliency-based Feature Extraction for Automatic Classification of Eye-tracking Data

Collaborative Human Machine Attention Module for Character Recognition

Similar papers

Classifying Eye-Tracking Data Using Saliency Maps

From Early Biological Models to CNNs: Do They Look Where Humans Look?

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

GazeMAE: General Representations of Eye Movements Using a Micro-Macro Autoencoder

Context-Aware Residual Module for Image Classification

A General End-To-End Method for Characterizing Neuropsychiatric Disorders Using Free-Viewing Visual Scanning Tasks

Attention As Activation

Arbitrary Style Transfer with Parallel Self-Attention

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Attention Pyramid Module for Scene Recognition

Question-Agnostic Attention for Visual Question Answering

Utilising Visual Attention Cues for Vehicle Detection and Tracking

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

FatNet: A Feature-Attentive Network for 3D Point Cloud Processing

Saliency Prediction on Omnidirectional Images with Brain-Like Shallow Neural Network

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Adaptive Feature Fusion Network for Gaze Tracking in Mobile Tablets

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Attention-Based Selection Strategy for Weakly Supervised Object Localization

RLST: A Reinforcement Learning Approach to Scene Text Detection Refinement

Detection and Correspondence Matching of Corneal Reflections for Eye Tracking Using Deep Learning

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

User-Independent Gaze Estimation by Extracting Pupil Parameter and Its Mapping to the Gaze Angle

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Second-Order Attention Guided Convolutional Activations for Visual Recognition

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Directed Variational Cross-encoder Network for Few-Shot Multi-image Co-segmentation

A Transformer-Based Radical Analysis Network for Chinese Character Recognition

Do Not Treat Boundaries and Regions Differently: An Example on Heart Left Atrial Segmentation

Self and Channel Attention Network for Person Re-Identification

ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Enhancing Semantic Segmentation of Aerial Images with Inhibitory Neurons

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Rotation Invariant Aerial Image Retrieval with Group Convolutional Metric Learning

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

BCAU-Net: A Novel Architecture with Binary Channel Attention Module for MRI Brain Segmentation

DARN: Deep Attentive Refinement Network for Liver Tumor Segmentation from 3D CT Volume

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection