ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

Gibran Benitez-Garcia, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, Keiji Yanai

Auto-TLDR; IPN Hand: A Benchmark Dataset for Continuous Hand Gesture Recognition

Abstract Slides Poster

Continuous hand gesture recognition (HGR) is an essential part of human-computer interaction with a wide range of applications in the automotive sector, consumer electronics, home automation, and others. In recent years, accurate and efficient deep learning models have been proposed for HGR. However, in the research community, the current publicly available datasets lack real-world elements needed to build responsive and efficient HGR systems. In this paper, we introduce a new benchmark dataset named IPN Hand with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks. This dataset contains more than 4 000 gesture samples and 800 000 RGB frames from 50 distinct subjects. We design 13 different static and dynamic gestures focused on interaction with touchless screens. We especially consider the scenario when continuous gestures are performed without transition states, and when subjects perform natural movements with their hands as non-gesture actions. Gestures were collected from about 30 diverse scenes, with real-world variation in background and illumination. With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR. Furthermore, we analyze the possibility of increasing the recognition accuracy by adding multiple modalities derived from RGB frames, i.e., optical flow and semantic segmentation, while keeping the real-time performance of the 3D-CNN model. Our empirical study also provides a comparison with the publicly available nvGesture (NVIDIA) dataset. The experimental results show that the state-of-the-art ResNext-101 model decreases about 30% accuracy when using our real-world dataset, demonstrating that the IPN Hand dataset can be used as a benchmark, and may help the community to step forward in the continuous HGR.

Similar papers

Recognizing American Sign Language Nonmanual Signal Grammar Errors in Continuous Videos

Elahe Vahdani, Longlong Jing, Ying-Li Tian, Matt Huenerfauth

Auto-TLDR; ASL-HW-RGBD: Recognizing Grammatical Errors in Continuous Sign Language

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

Similar papers

Recognizing American Sign Language Nonmanual Signal Grammar Errors in Continuous Videos

RWF-2000: An Open Large Scale Video Database for Violence Detection

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning

What and How? Jointly Forecasting Human Action and Pose

Temporal Binary Representation for Event-Based Action Recognition

Depth Videos for the Classification of Micro-Expressions

A Grid-Based Representation for Human Action Recognition

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Audio-Video Detection of the Active Speaker in Meetings

Motion Complementary Network for Efficient Action Recognition

Vision-Based Multi-Modal Framework for Action Recognition

Modeling Long-Term Interactions to Enhance Action Recognition

Single View Learning in Action Recognition

Feature-Supervised Action Modality Transfer

Estimation of Clinical Tremor Using Spatio-Temporal Adversarial AutoEncoder

Deep Real-Time Hand Detection Using CFPN on Embedded Systems

TinyVIRAT: Low-Resolution Video Action Recognition

Feasibility Study of Using MyoBand for Learning Electronic Keyboard

Exploiting the Logits: Joint Sign Language Recognition and Spell-Correction

Learning Dictionaries of Kinematic Primitives for Action Classification

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Weight Estimation from an RGB-D Camera in Top-View Configuration

Image Sequence Based Cyclist Action Recognition Using Multi-Stream 3D Convolution

Identity-Aware Facial Expression Recognition in Compressed Video

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Anomaly Detection, Localization and Classification for Railway Inspection

Human or Machine? It Is Not What You Write, but How You Write It

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Developing Motion Code Embedding for Action Recognition in Videos

Spatial Bias in Vision-Based Voice Activity Detection

A Detection-Based Approach to Multiview Action Classification in Infants

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

Concept Embedding through Canonical Forms: A Case Study on Zero-Shot ASL Recognition

Attribute-Based Quality Assessment for Demographic Estimation in Face Videos

Conditional-UNet: A Condition-Aware Deep Model for Coherent Human Activity Recognition from Wearables

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Attention-Driven Body Pose Encoding for Human Activity Recognition

A Prototype-Based Generalized Zero-Shot Learning Framework for Hand Gesture Recognition

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Learnable Higher-Order Representation for Action Recognition

Real Time Fencing Move Classification and Detection at Touch Time During a Fencing Match

A Systematic Investigation on End-To-End Deep Recognition of Grocery Products in the Wild