ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Video-Based Facial Expression Recognition Using Graph Convolutional Networks

Daizong Liu, Hongting Zhang, Pan Zhou

Auto-TLDR; Graph Convolutional Network for Video-based Facial Expression Recognition

Abstract Slides Poster

Facial expression recognition (FER), aiming to classify the expression present in the facial image or video, has attracted a lot of research interests in the field of artificial intelligence and multimedia. In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression. However, existing methods directly utilize CNN-RNN or 3D CNN to extract the spatial-temporal features from different facial units, instead of concentrating on a certain region during expression variation capturing, which leads to limited performance in FER. In our paper, we introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based FER. First, the GCN layer is utilized to learn more contributing facial expression features which concentrate on certain regions after sharing information between nodes those represent CNN extracted features. Then, a LSTM layer is applied to learn long-term dependencies among the GCN learned features to model the variation. In addition, a weight assignment mechanism is also designed to weight the output of different nodes for final classification by characterizing the expression intensities in each frame. To the best of our knowledge, it is the first time to use GCN in FER task. We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0, and the experimental results demonstrate that our method has superior performance to existing methods.

Similar papers

Identity-Aware Facial Expression Recognition in Compressed Video

Xiaofeng Liu, Linghao Jin, Xu Han, Jun Lu, Jonghye Woo, Jane You

Auto-TLDR; Exploring Facial Expression Representation in Compressed Video with Mutual Information Minimization

Abstract Slides Similar

This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. Specifically, we propose a novel collaborative min-min game for mutual information (MI) minimization in latent space. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3$\times$ faster inference with compressed data.

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Jun Weng, Yang Yang, Zichang Tan, Zhen Lei

Auto-TLDR; Attentive Hybrid Architecture for Facial Expression Recognition

Video-Based Facial Expression Recognition Using Graph Convolutional Networks

Similar papers

Identity-Aware Facial Expression Recognition in Compressed Video

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Deep Multi-Task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Facial Expression Recognition by Using a Disentangled Identity-Invariant Expression Representation

Unconstrained Facial Expression Recogniton Based on Cascade Decision and Gabor Filters

Facial Expression Recognition Using Residual Masking Network

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Interpretable Emotion Classification Using Temporal Convolutional Models

Depth Videos for the Classification of Micro-Expressions

Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

Self-Supervised Learning of Dynamic Representations for Static Images

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

G-FAN: Graph-Based Feature Aggregation Network for Video Face Recognition

Responsive Social Smile: A Machine-Learning Based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

A Grid-Based Representation for Human Action Recognition

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

MFI: Multi-Range Feature Interchange for Video Action Recognition

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

MRP-Net: A Light Multiple Region Perception Neural Network for Multi-Label AU Detection

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

JT-MGCN: Joint-Temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition

Constructing Geographic and Long-term Temporal Graph for Traffic Forecasting

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Attention-Driven Body Pose Encoding for Human Activity Recognition

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Boundary-Aware Graph Convolution for Semantic Segmentation

Siamese-Structure Deep Neural Network Recognizing Changes in Facial Expression According to the Degree of Smiling

Dual-Mode Iterative Denoiser: Tackling the Weak Label for Anomaly Detection

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning

Audio-Visual Speech Recognition Using a Two-Step Feature Fusion Strategy

Let's Play Music: Audio-Driven Performance Video Generation

Context Matters: Self-Attention for Sign Language Recognition

Geographic-Semantic-Temporal Hypergraph Convolutional Network for Traffic Flow Prediction

Wavelet Attention Embedding Networks for Video Super-Resolution

GCNs-Based Context-Aware Short Text Similarity Model

Quality-Based Representation for Unconstrained Face Recognition

Vision-Based Multi-Modal Framework for Action Recognition

AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network

A Duplex Spatiotemporal Filtering Network for Video-Based Person Re-Identification

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network