ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

DeepPear: Deep Pose Estimation and Action Recognition

Wen-Jiin Tsai, You-Ying Jhuang

Auto-TLDR; Human Action Recognition Using RGB Video Using 3D Human Pose and Appearance Features

Abstract Slides Poster

Human action recognition has been a popular issue recently because it can be applied in many applications such as intelligent surveillance systems, human-robot interaction, and autonomous vehicle control. Human action recognition using RGB video is a challenging task because the learning of actions is easily affected by the cluttered background. To cope with this problem, the proposed method estimates 3D human poses first which can help remove the cluttered background and focus on the human body. In addition to the human poses, the proposed method also utilizes appearance features nearby the predicted joints to make our action prediction context-aware. Instead of using 3D convolutional neural networks as many action recognition approaches did, the proposed method uses a two-stream architecture that aggregates the results from skeleton-based and appearance-based approaches to do action recognition. Experimental results show that the proposed method achieved state-of-the-art performance on NTU RGB+D which is a largescale dataset for human action recognition.

Similar papers

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Negar Heidari, Alexandros Iosifidis

Auto-TLDR; Temporal Attention Module for Efficient Graph Convolutional Network-based Action Recognition

DeepPear: Deep Pose Estimation and Action Recognition

Similar papers

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Temporal Extension Module for Skeleton-Based Action Recognition

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

JT-MGCN: Joint-Temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Attention-Driven Body Pose Encoding for Human Activity Recognition

What and How? Jointly Forecasting Human Action and Pose

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Vision-Based Multi-Modal Framework for Action Recognition

A Grid-Based Representation for Human Action Recognition

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Learning Group Activities from Skeletons without Individual Action Labels

Feature-Supervised Action Modality Transfer

Learning Recurrent High-Order Statistics for Skeleton-Based Hand Gesture Recognition

RefiNet: 3D Human Pose Refinement with Depth Maps

Orthographic Projection Linear Regression for Single Image 3D Human Pose Estimation

Single View Learning in Action Recognition

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

MFI: Multi-Range Feature Interchange for Video Action Recognition

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Kernel-based Graph Convolutional Networks

PEAN: 3D Hand Pose Estimation Adversarial Network

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

RWF-2000: An Open Large Scale Video Database for Violence Detection

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

You Ought to Look Around: Precise, Large Span Action Detection

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Pose-Aware Multi-Feature Fusion Network for Driver Distraction Recognition

Inferring Tasks and Fluents in Videos by Learning Causal Relations

Learnable Higher-Order Representation for Action Recognition

Temporal Binary Representation for Event-Based Action Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

Motion Complementary Network for Efficient Action Recognition

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video

A Detection-Based Approach to Multiview Action Classification in Infants

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

HPERL: 3D Human Pose Estimastion from RGB and LiDAR