ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Self-Supervised Learning of Dynamic Representations for Static Images

Siyang Song, Enrique Sanchez, Linlin Shen, Michel Valstar

Auto-TLDR; Facial Action Unit Intensity Estimation and Affect Estimation from Still Images with Multiple Temporal Scale

Abstract Slides Poster

Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel approach to capture multiple scales of such temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular, 1) we propose a framework that infers a dynamic representation (DR) from a still image, which captures the bi-directional flow of time within a short time-window centered at the input image; 2) we show that we can train our method without the need of explicitly generating target representations, allowing the network to represent dynamics more broadly; and 3) we propose to apply a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.

Similar papers

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Fu-En Yang, Jing-Cheng Chang, Yuan-Hao Lee, Yu-Chiang Frank Wang

Auto-TLDR; Dual Motion Transfer GAN for Convolutional Neural Networks

Self-Supervised Learning of Dynamic Representations for Static Images

Similar papers

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Identity-Aware Facial Expression Recognition in Compressed Video

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction

Video-Based Facial Expression Recognition Using Graph Convolutional Networks

Deep Multi-Task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

A Quantitative Evaluation Framework of Video De-Identification Methods

What and How? Jointly Forecasting Human Action and Pose

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Quantified Facial Temporal-Expressiveness Dynamics for Affect Analysis

Learning Emotional Blinded Face Representations

A Grid-Based Representation for Human Action Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Facial Expression Recognition by Using a Disentangled Identity-Invariant Expression Representation

Future Urban Scenes Generation through Vehicles Synthesis

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

Interpretable Emotion Classification Using Temporal Convolutional Models

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

MRP-Net: A Light Multiple Region Perception Neural Network for Multi-Label AU Detection

Motion-Supervised Co-Part Segmentation

Pixel-based Facial Expression Synthesis

Learning to Take Directions One Step at a Time

Let's Play Music: Audio-Driven Performance Video Generation

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Video Semantic Segmentation Using Deep Multi-View Representation Learning

High Resolution Face Age Editing

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

Temporal Binary Representation for Event-Based Action Recognition

Facial Expression Recognition Using Residual Masking Network

Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics

Shape Consistent 2D Keypoint Estimation under Domain Shift

Depth Videos for the Classification of Micro-Expressions

Vision-Based Multi-Modal Framework for Action Recognition

Siamese-Structure Deep Neural Network Recognizing Changes in Facial Expression According to the Degree of Smiling

Context Matters: Self-Attention for Sign Language Recognition

Talking Face Generation Via Learning Semantic and Temporal Synchronous Landmarks

Unsupervised Face Manipulation Via Hallucination

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Audio-Video Detection of the Active Speaker in Meetings

Coherence and Identity Learning for Arbitrary-Length Face Video Generation

Siamese Fully Convolutional Tracker with Motion Correction

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

A Prototype-Based Generalized Zero-Shot Learning Framework for Hand Gesture Recognition

Magnifying Spontaneous Facial Micro Expressions for Improved Recognition

STaRFlow: A SpatioTemporal Recurrent Cell for Lightweight Multi-Frame Optical Flow Estimation

Residual Learning of Video Frame Interpolation Using Convolutional LSTM