ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Feature-Supervised Action Modality Transfer

Fida Mohammad Thoker, Cees Snoek

Auto-TLDR; Cross-Modal Action Recognition and Detection in Non-RGB Video Modalities by Learning from Large-Scale Labeled RGB Data

Abstract Slides Poster

This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB datasets that have limited amounts of labeled examples available. Unfortunately, large-scale labeled action datasets for other modalities are unavailable for pre-training. In this paper, our goal is to recognize actions from limited examples in non-RGB video modalities, by learning from large-scale labeled RGB data. To this end, we propose a two-step training process: (i) we extract action representation knowledge from an RGB-trained teacher network and adapt it to a non-RGB student network. (ii) we then fine-tune the transfer model with available labeled examples of the target modality. For the knowledge transfer we introduce feature-supervision strategies, which rely on unlabeled pairs of two modalities (the RGB and the target modality) to transfer feature level representations from the teacher to the the student network. Ablations and generalizations with two RGB source datasets and two non-RGB target datasets demonstrate that an optical-flow teacher provides better action transfer features than RGB for both depth maps and 3D-skeletons, even when evaluated on a different target domain, or for a different task. Compared to alternative cross-modal action transfer methods we show a good improvement in performance especially when labeled non-RGB examples to learn from are scarce.

Similar papers

Single View Learning in Action Recognition

Gaurvi Goyal, Nicoletta Noceti, Francesca Odone

Auto-TLDR; Cross-View Action Recognition Using Domain Adaptation for Knowledge Transfer

Feature-Supervised Action Modality Transfer

Similar papers

Single View Learning in Action Recognition

DeepPear: Deep Pose Estimation and Action Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Vision-Based Multi-Modal Framework for Action Recognition

Temporal Extension Module for Skeleton-Based Action Recognition

What and How? Jointly Forecasting Human Action and Pose

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Attention-Driven Body Pose Encoding for Human Activity Recognition

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

A Grid-Based Representation for Human Action Recognition

RMS-Net: Regression and Masking for Soccer Event Spotting

JT-MGCN: Joint-Temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition

Learning Group Activities from Skeletons without Individual Action Labels

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Knowledge Distillation for Action Anticipation Via Label Smoothing

Feature Pyramid Hierarchies for Multi-Scale Temporal Action Detection

A Boundary-Aware Distillation Network for Compressed Video Semantic Segmentation

Learnable Higher-Order Representation for Action Recognition

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

TinyVIRAT: Low-Resolution Video Action Recognition

You Ought to Look Around: Precise, Large Span Action Detection

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

RWF-2000: An Open Large Scale Video Database for Violence Detection

MFI: Multi-Range Feature Interchange for Video Action Recognition

Motion Complementary Network for Efficient Action Recognition

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

A Detection-Based Approach to Multiview Action Classification in Infants

From Human Pose to On-Body Devices for Human-Activity Recognition

Developing Motion Code Embedding for Action Recognition in Videos

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

Bridging the Gap between Natural and Medical Images through Deep Colorization

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

Modeling Long-Term Interactions to Enhance Action Recognition

Improving Visual Relation Detection Using Depth Maps

Precise Temporal Action Localization with Quantified Temporal Structure of Actions

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows