ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier

Auto-TLDR; Attentional Blocks for Action Recognition in Table Tennis Strokes

Abstract Slides Poster

The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes. Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow. Actions are recognized by classification of temporal windows. We introduce 3D attention modules and examine their impact on classification efficiency. In the context of the study of sportsmen performances, a corpus of the particular actions of table tennis strokes is considered. The use of attention blocks in the network speeds up the training step and improves the classification scores up to 5% with our twin model. We visualize the impact on the obtained features and notice correlation between attention and player movements and position. Score comparison of state-of-the-art action classification method and proposed approach with attentional blocks is performed on the corpus. Proposed model with attention blocks outperforms previous model without them and our baseline.

Similar papers

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Mirco Planamente, Andrea Bottino, Barbara Caputo

Auto-TLDR; A Single Stream Architecture for Egocentric Action Recognition from the First-Person Point of View

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Similar papers

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Region-Based Non-Local Operation for Video Classification

Single View Learning in Action Recognition

Learnable Higher-Order Representation for Action Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

RWF-2000: An Open Large Scale Video Database for Violence Detection

MFI: Multi-Range Feature Interchange for Video Action Recognition

Motion Complementary Network for Efficient Action Recognition

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

RMS-Net: Regression and Masking for Soccer Event Spotting

SCA Net: Sparse Channel Attention Module for Action Recognition

Attention-Driven Body Pose Encoding for Human Activity Recognition

Modeling Long-Term Interactions to Enhance Action Recognition

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Developing Motion Code Embedding for Action Recognition in Videos

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

Improved Residual Networks for Image and Video Recognition

What and How? Jointly Forecasting Human Action and Pose

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

A Grid-Based Representation for Human Action Recognition

Extracting Action Hierarchies from Action Labels and their Use in Deep Action Recognition

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Attention As Activation

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Hierarchical Multimodal Attention for Deep Video Summarization

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Late Fusion of Bayesian and Convolutional Models for Action Recognition

You Ought to Look Around: Precise, Large Span Action Detection

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Attention Pyramid Module for Scene Recognition

Extraction and Analysis of 3D Kinematic Parameters of Table Tennis Ball from a Single Camera

Vision-Based Multi-Modal Framework for Action Recognition

Wavelet Attention Embedding Networks for Video Super-Resolution

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Video Face Manipulation Detection through Ensemble of CNNs

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Feature-Supervised Action Modality Transfer

Knowledge Distillation for Action Anticipation Via Label Smoothing

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning