ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Vision-Based Multi-Modal Framework for Action Recognition

Djamila Romaissa Beddiar, Mourad Oussalah, Brahim Nini

Auto-TLDR; Multi-modal Framework for Human Activity Recognition Using RGB, Depth and Skeleton Data

Abstract Slides Poster

Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.

Similar papers

Attention-Driven Body Pose Encoding for Human Activity Recognition

Bappaditya Debnath, Swagat Kumar, Marry O'Brien, Ardhendu Behera

Auto-TLDR; Attention-based Body Pose Encoding for Human Activity Recognition

Vision-Based Multi-Modal Framework for Action Recognition

Similar papers

Attention-Driven Body Pose Encoding for Human Activity Recognition

A Grid-Based Representation for Human Action Recognition

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

What and How? Jointly Forecasting Human Action and Pose

DeepPear: Deep Pose Estimation and Action Recognition

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

JT-MGCN: Joint-Temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition

Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

Learning Group Activities from Skeletons without Individual Action Labels

Single View Learning in Action Recognition

Temporal Extension Module for Skeleton-Based Action Recognition

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Feature-Supervised Action Modality Transfer

RWF-2000: An Open Large Scale Video Database for Violence Detection

Learning Recurrent High-Order Statistics for Skeleton-Based Hand Gesture Recognition

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Anticipating Activity from Multimodal Signals

Modeling Long-Term Interactions to Enhance Action Recognition

Extracting Action Hierarchies from Action Labels and their Use in Deep Action Recognition

A Detection-Based Approach to Multiview Action Classification in Infants

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

From Human Pose to On-Body Devices for Human-Activity Recognition

Depth Videos for the Classification of Micro-Expressions

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Weight Estimation from an RGB-D Camera in Top-View Configuration

Pose-Aware Multi-Feature Fusion Network for Driver Distraction Recognition

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Temporal Binary Representation for Event-Based Action Recognition

Developing Motion Code Embedding for Action Recognition in Videos

Learning Dictionaries of Kinematic Primitives for Action Classification

Inferring Tasks and Fluents in Videos by Learning Causal Relations

RefiNet: 3D Human Pose Refinement with Depth Maps

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

MFI: Multi-Range Feature Interchange for Video Action Recognition

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

Personalized Models in Human Activity Recognition Using Deep Learning

Video Representation Fusion Network For Multi-Label Movie Genre Classification