ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Raphael Memmesheimer, Nick Theisen, Dietrich Paulus

Auto-TLDR; One-Shot Action Recognition using Metric Learning

Abstract Slides

Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by \ntuoneshotimpro%. With just 60% of the training data, our approach still outperforms the baseline approach by \ntuoneshotimproreduced%. With 40% of the training data, our approach performs comparably well as the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups.

Similar papers

Vision-Based Multi-Modal Framework for Action Recognition

Djamila Romaissa Beddiar, Mourad Oussalah, Brahim Nini

Auto-TLDR; Multi-modal Framework for Human Activity Recognition Using RGB, Depth and Skeleton Data

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Similar papers

Vision-Based Multi-Modal Framework for Action Recognition

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

A Grid-Based Representation for Human Action Recognition

JT-MGCN: Joint-Temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Attention-Driven Body Pose Encoding for Human Activity Recognition

What and How? Jointly Forecasting Human Action and Pose

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Feature-Supervised Action Modality Transfer

DeepPear: Deep Pose Estimation and Action Recognition

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Temporal Extension Module for Skeleton-Based Action Recognition

Nonlinear Ranking Loss on Riemannian Potato Embedding

A Prototype-Based Generalized Zero-Shot Learning Framework for Hand Gesture Recognition

Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Developing Motion Code Embedding for Action Recognition in Videos

Single View Learning in Action Recognition

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Anticipating Activity from Multimodal Signals

Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

Generalized Local Attention Pooling for Deep Metric Learning

Learning Group Activities from Skeletons without Individual Action Labels

Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning

RGB-Infrared Person Re-Identification Via Image Modality Conversion

Learning Recurrent High-Order Statistics for Skeleton-Based Hand Gesture Recognition

Multi-Level Deep Learning Vehicle Re-Identification Using Ranked-Based Loss Functions

Semantics to Space(S2S): Embedding Semantics into Spatial Space for Zero-Shot Verb-Object Query Inferencing

Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-Identification

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Loop-closure detection by LiDAR scan re-identification

Progressive Learning Algorithm for Efficient Person Re-Identification

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Deep Gait Relative Attribute Using a Signed Quadratic Contrastive Loss

One-Shot Representational Learning for Joint Biometric and Device Authentication

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

Top-DB-Net: Top DropBlock for Activation Enhancement in Person Re-Identification

Deep Top-Rank Counter Metric for Person Re-Identification

Kernel-based Graph Convolutional Networks

3D Facial Matching by Spiral Convolutional Metric Learning and a Biometric Fusion-Net of Demographic Properties

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Modeling Long-Term Interactions to Enhance Action Recognition