ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Zhihua Li, Zheng Zhang, Lijun Yin

Auto-TLDR; Temporal Fusion and Self-Attention Network for Facial Action Unit Detection

Abstract Slides Poster

Research on facial action unit detection has shown remarkable performances by using deep spatial learning models in recent years, however, it is far from reaching its full capacity in learning due to the lack of use of temporal information of AUs across time. Since the AU occurrence in one frame is highly likely related to previous frames in a temporal sequence, exploring temporal correlation of AUs across frames becomes a key motivation of this work. In this paper, we propose a novel temporal fusion and AU-supervised self-attention network (a so-called SAT-Net) to address the AU detection problem. First of all, we input the deep features of a sequence into a convolutional LSTM network and fuse the previous temporal information into the feature map of the last frame, and continue to learn the AU occurrence. Second, considering the AU detection problem is a multi-label classification problem that individual label depends only on certain facial areas, we propose a new self-learned attention mask by focusing the detection of each AU on parts of facial areas through the learning of individual attention mask for each AU, thus increasing the AU independence without the loss of any spatial relations. Our extensive experiments show that the proposed framework achieves better results of AU detection over the state-of-the-arts on two benchmark databases (BP4D and DISFA).

Similar papers

MRP-Net: A Light Multiple Region Perception Neural Network for Multi-Label AU Detection

Yang Tang, Shuang Chen, Honggang Zhang, Gang Wang, Rui Yang

Auto-TLDR; MRP-Net: A Fast and Light Neural Network for Facial Action Unit Detection

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Similar papers

MRP-Net: A Light Multiple Region Perception Neural Network for Multi-Label AU Detection

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Video-Based Facial Expression Recognition Using Graph Convolutional Networks

Self-Supervised Learning of Dynamic Representations for Static Images

Interpretable Emotion Classification Using Temporal Convolutional Models

Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction

MFI: Multi-Range Feature Interchange for Video Action Recognition

Global Feature Aggregation for Accident Anticipation

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection

Facial Expression Recognition Using Residual Masking Network

Attention-Driven Body Pose Encoding for Human Activity Recognition

Identity-Aware Facial Expression Recognition in Compressed Video

Deep Multi-Task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Depth Videos for the Classification of Micro-Expressions

A Grid-Based Representation for Human Action Recognition

What and How? Jointly Forecasting Human Action and Pose

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Context Matters: Self-Attention for Sign Language Recognition

Real-Time Driver Drowsiness Detection Using Facial Action Units

Attention Pyramid Module for Scene Recognition

RWF-2000: An Open Large Scale Video Database for Violence Detection

Exploring Spatial-Temporal Representations for fNIRS-based Intimacy Detection via an Attention-enhanced Cascade Convolutional Recurrent Neural Network

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Facial Expression Recognition by Using a Disentangled Identity-Invariant Expression Representation

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Global-Local Attention Network for Semantic Segmentation in Aerial Images

TinyVIRAT: Low-Resolution Video Action Recognition

Attentive Visual Semantic Specialized Network for Video Captioning

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Context Visual Information-Based Deliberation Network for Video Captioning

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Audio-Visual Speech Recognition Using a Two-Step Feature Fusion Strategy

Learning Semantic Representations Via Joint 3D Face Reconstruction and Facial Attribute Estimation

Wavelet Attention Embedding Networks for Video Super-Resolution

Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-Identification

MA-LSTM: A Multi-Attention Based LSTM for Complex Pattern Extraction

Early Wildfire Smoke Detection in Videos

RMS-Net: Regression and Masking for Soccer Event Spotting

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis