ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

Mariana-Iuliana Georgescu, Radu Ionescu

Auto-TLDR; Knowledge Distillation for Facial Expression Recognition under Occlusion

Abstract Slides

In this paper, we study the task of facial expression recognition under strong occlusion. We are particularly interested in cases where 50% of the face is occluded, e.g. when the subject wears a Virtual Reality (VR) headset. While previous studies show that pre-training convolutional neural networks (CNNs) on fully-visible (non-occluded) faces improves the accuracy, we propose to employ knowledge distillation to achieve further improvements. First of all, we employ the classic teacher-student training strategy, in which the teacher is a CNN trained on fully-visible faces and the student is a CNN trained on occluded faces. Second of all, we propose a new approach for knowledge distillation based on triplet loss. During training, the goal is to reduce the distance between an anchor embedding, produced by a student CNN that takes occluded faces as input, and a positive embedding (from the same class as the anchor), produced by a teacher CNN trained on fully-visible faces, so that it becomes smaller than the distance between the anchor and a negative embedding (from a different class than the anchor), produced by the student CNN. Third of all, we propose to combine the distilled embeddings obtained through the classic teacher-student strategy and our novel teacher-student strategy based on triplet loss into a single embedding vector. We conduct experiments on two benchmarks, FER+ and AffectNet, with two CNN architectures, VGG-f and VGG-face, showing that knowledge distillation can bring significant improvements over the state-of-the-art methods designed for occluded faces in the VR setting. Furthermore, we obtain accuracy rates that are quite close to the state-of-the-art models that take as input fully-visible faces. For example, on the FER+ data set, our VGG-face based on concatenated distilled embeddings attains an accuracy rate of 82.75% on lower-half-visible faces, which is only 2.24% below the accuracy rate of a state-of-the-art VGG-13 that is evaluated on fully-visible faces. Given that our model sees only the lower-half of the face, we consider this to be a remarkable achievement. In conclusion, we consider that our distilled CNN models can provide useful feedback for the task of recognizing the facial expressions of a person wearing a VR headset.

Similar papers

Unconstrained Facial Expression Recogniton Based on Cascade Decision and Gabor Filters

Yanhong Wu, Lijie Zhang, Guannan Chen, Pablo Navarrete Michelini

Auto-TLDR; Convolutional Neural Network for Facial Expression Recognition under unconstrained natural conditions

Abstract Slides Similar

Facial Expression Recognition (FER) research with Convolutional Neural Networks (CNN) has been active, especially under unconstrained natural conditions. From our observation, prior arts treat expressions equally in classification and the reconition accuracy of some expression are always higher than others. In this paper, we make the assumption that an expression with a higher accuracy is easier to be recognized, and those expressions easier to recognize will hinder the recognition of uneasy expressions. Then, we propose a novel algorithm for unconstrained FER based on cascade decision and Gabor filters. Easier expressions are recognized before the difficult expressions. This simple method trains up to five models to cascadedly recognize a given facial image expression. The first binary classifier model is for the classification of Happy with the highest accuracy. The second binary classifier model is for the classification of Surprise with the second high accuracy. The third binary classifier model is for the classification of Neutral with the third high accuracy. The forth model is for the classification of Sad with the forth high accuracy. And the final model is 3-class classifier for Angry, Disgust and Fear. Gabor filters are included in every model to enhance robustness on illumination variations and face poses. Extensive experiment results on several datasets validate the effectiveness of the proposed method. We obtain accuracy of 77.6% on FER2013 with the final models, outperforming the latest state-of-the-arts.

Facial Expression Recognition Using Residual Masking Network

Luan Pham, Vu Huynh, Tuan Anh Tran

Auto-TLDR; Deep Residual Masking for Automatic Facial Expression Recognition

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

Similar papers

Unconstrained Facial Expression Recogniton Based on Cascade Decision and Gabor Filters

Facial Expression Recognition Using Residual Masking Network

Deep Multi-Task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Identity-Aware Facial Expression Recognition in Compressed Video

Video-Based Facial Expression Recognition Using Graph Convolutional Networks

Efficient Online Subclass Knowledge Distillation for Image Classification

Quality-Based Representation for Unconstrained Face Recognition

Learning Emotional Blinded Face Representations

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Feature-Supervised Action Modality Transfer

Facial Expression Recognition by Using a Disentangled Identity-Invariant Expression Representation

Siamese-Structure Deep Neural Network Recognizing Changes in Facial Expression According to the Degree of Smiling

Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition

Channel Planting for Deep Neural Networks Using Knowledge Distillation

Knowledge Distillation Beyond Model Compression

Compact CNN Structure Learning by Knowledge Distillation

Responsive Social Smile: A Machine-Learning Based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening

3D Facial Matching by Spiral Convolutional Metric Learning and a Biometric Fusion-Net of Demographic Properties

Interpretable Emotion Classification Using Temporal Convolutional Models

SSDL: Self-Supervised Domain Learning for Improved Face Recognition

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Automatic Student Network Search for Knowledge Distillation

Feature Fusion for Online Mutual Knowledge Distillation

Self-Supervised Learning of Dynamic Representations for Static Images

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

A Boundary-Aware Distillation Network for Compressed Video Semantic Segmentation

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

Depth Videos for the Classification of Micro-Expressions

G-FAN: Graph-Based Feature Aggregation Network for Video Face Recognition

Lightweight Low-Resolution Face Recognition for Surveillance Applications

Exploiting Distilled Learning for Deep Siamese Tracking

Progressive Learning Algorithm for Efficient Person Re-Identification

Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-Identification

Rotation Invariant Aerial Image Retrieval with Group Convolutional Metric Learning

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Real-Time Driver Drowsiness Detection Using Facial Action Units

Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning

MRP-Net: A Light Multiple Region Perception Neural Network for Multi-Label AU Detection

Inner Eye Canthus Localization for Human Body Temperature Screening

Video Face Manipulation Detection through Ensemble of CNNs

Person Recognition with HGR Maximal Correlation on Multimodal Data

A Flatter Loss for Bias Mitigation in Cross-Dataset Facial Age Estimation

An Experimental Evaluation of Recent Face Recognition Losses for Deepfake Detection

Teacher-Student Competition for Unsupervised Domain Adaptation