ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

SCA Net: Sparse Channel Attention Module for Action Recognition

Hang Song, Yonghong Song, Yuanlin Zhang

Auto-TLDR; SCA Net: Efficient Group Convolution for Sparse Channel Attention

Abstract Slides Poster

Channel attention has shown its great performance recently when it was incorporated into deep convolutional neural networks. However, existing methods usually require extensive computing resources due to their involuted structure, which is hard for 3D CNNs to take full advantage of. In this paper, a lightweight sparse channel attention (SCA) module implemented by efficient group convolution is proposed, which adopts the idea of sparse channel connection and involves much less parameters but brings clear performance gain. Meanwhile, to solve the lack of local channel interaction brought by group convolution, a dominant function called Aggregate-Shuffle-Diverge (ASD) is leveraged to enhance information flow over each group with no additional parameters. We also adjust the existing mainstream 3D CNNs by employing 3D convolution factorization, so as to further reduce the parameters. Our SCA module can be flexibly incorporated into most existing 3D CNNs, all of which can achieve a perfect trade-off between performance and complexity on action recognition task with factorized I3D or 3D ResNet backbone networks. The experimental results also indicate that the resulting network, namely, SCA Net can achieve an outstanding performance on UCF-101 and HMDB-51 datasets.

Similar papers

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Kaiyu Shan, Yongtao Wang, Zhi Tang, Ying Chen, Yangyan Li

Auto-TLDR; Mixed Temporal Convolution for Action Recognition

SCA Net: Sparse Channel Attention Module for Action Recognition

Similar papers

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Region-Based Non-Local Operation for Video Classification

Learnable Higher-Order Representation for Action Recognition

MFI: Multi-Range Feature Interchange for Video Action Recognition

Improved Residual Networks for Image and Video Recognition

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Attention Pyramid Module for Scene Recognition

Attention As Activation

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Second-Order Attention Guided Convolutional Activations for Visual Recognition

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Dynamic Multi-Path Neural Network

You Ought to Look Around: Precise, Large Span Action Detection

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Self and Channel Attention Network for Person Re-Identification

Transitional Asymmetric Non-Local Neural Networks for Real-World Dirt Road Segmentation

Context-Aware Residual Module for Image Classification

Motion Complementary Network for Efficient Action Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

Arbitrary Style Transfer with Parallel Self-Attention

Single View Learning in Action Recognition

Attention Stereo Matching Network

Progressive Scene Segmentation Based on Self-Attention Mechanism

RWF-2000: An Open Large Scale Video Database for Violence Detection

CQNN: Convolutional Quadratic Neural Networks

Feature-Dependent Cross-Connections in Multi-Path Neural Networks

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

FatNet: A Feature-Attentive Network for 3D Point Cloud Processing

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Efficient Super Resolution by Recursive Aggregation

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

HANet: Hybrid Attention-Aware Network for Crowd Counting

DeepPear: Deep Pose Estimation and Action Recognition

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

A Duplex Spatiotemporal Filtering Network for Video-Based Person Re-Identification

RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

BCAU-Net: A Novel Architecture with Binary Channel Attention Module for MRI Brain Segmentation

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

MFST: Multi-Features Siamese Tracker

Rethinking of Deep Models Parameters with Respect to Data Distribution

ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Residual Fractal Network for Single Image Super Resolution by Widening and Deepening

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Single Image Super-Resolution with Dynamic Residual Connection

Attention-Driven Body Pose Encoding for Human Activity Recognition

Not All Domains Are Equally Complex: Adaptive Multi-Domain Learning