ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Wei Wu, Jiale Yu

Auto-TLDR; An improved bilinear pooling method for image-based action recognition

Abstract Slides Poster

Action recognition in still images is a challenging task because of the complexity of human motions and the variance of background in the same action category. And some actions typically occur in fine-grained categories, with little visual differences between these categories. So extracting discriminative features or modeling various semantic parts is essential for image-based action recognition. Many methods apply expensive manual annotations to learn discriminative parts information for action recognition, which may severely discourage potential applications in real life. In recent years, bilinear pooling method has shown its effectiveness for image classification due to its learning distinctive features automatically. Inspired by this model, in this paper, an improved bilinear pooling method is proposed for avoiding the shortcomings of traditional bilinear pooling methods. The previous bilinear pooling approaches contain lots of noisy background or harmful feature information, which limit their application for action recognition. In our method, the attention mechanism is introduced into hierarchical bilinear pooling framework with mask aggregation for action recognition. The proposed model can generate the distinctive and ROI-aware feature information by combining multiple attention mask maps from the channel and spatial-wise attention features. To be more specific, our method makes the network to better pay attention to discriminative region of the vital objects in an image. We verify our model on the two challenging datasets: 1) Stanford 40 action dataset and 2) our action dataset that includes 60 categories. Experimental results demonstrate the effectiveness of our approach, which is superior to the traditional and state-of-the-art methods.

Similar papers

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Junhui Yin, Siqing Zhang, Dongliang Chang, Zhanyu Ma, Jun Guo

Auto-TLDR; Dual-Attention Guided Dropblock for Weakly Supervised Object Localization

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Similar papers

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Attention Pyramid Module for Scene Recognition

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

A Grid-Based Representation for Human Action Recognition

MFI: Multi-Range Feature Interchange for Video Action Recognition

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Global-Local Attention Network for Semantic Segmentation in Aerial Images

DeepPear: Deep Pose Estimation and Action Recognition

Semantic Bilinear Pooling for Fine-Grained Recognition

A Novel Region of Interest Extraction Layer for Instance Segmentation

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

SCA Net: Sparse Channel Attention Module for Action Recognition

Arbitrary Style Transfer with Parallel Self-Attention

Attention As Activation

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

Second-Order Attention Guided Convolutional Activations for Visual Recognition

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Progressive Scene Segmentation Based on Self-Attention Mechanism

Context-Aware Residual Module for Image Classification

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Human-Centric Parsing Network for Human-Object Interaction Detection

Saliency Prediction on Omnidirectional Images with Brain-Like Shallow Neural Network

Learnable Higher-Order Representation for Action Recognition

RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

Self and Channel Attention Network for Person Re-Identification

Accurate Cell Segmentation in Digital Pathology Images Via Attention Enforced Networks

Free-Form Image Inpainting Via Contrastive Attention Network

Spatial-Related and Scale-Aware Network for Crowd Counting

Self-Selective Context for Interaction Recognition

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Skin Lesion Classification Using Weakly-Supervised Fine-Grained Method

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Attention-Based Selection Strategy for Weakly Supervised Object Localization

Real-Time Semantic Segmentation Via Region and Pixel Context Network

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

EDD-Net: An Efficient Defect Detection Network

Pose-Aware Multi-Feature Fusion Network for Driver Distraction Recognition

Local Attention and Global Representation Collaborating for Fine-Grained Classification

Face Anti-Spoofing Using Spatial Pyramid Pooling

DARN: Deep Attentive Refinement Network for Liver Tumor Segmentation from 3D CT Volume

Adaptive Feature Fusion Network for Gaze Tracking in Mobile Tablets

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction