ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Person Recognition with HGR Maximal Correlation on Multimodal Data

Yihua Liang, Fei Ma, Yang Li, Shao-Lun Huang

Auto-TLDR; A correlation-based multimodal person recognition framework that learns discriminative embeddings of persons by joint learning visual features and audio features

Abstract Slides Poster

Multimodal person recognition is a common task in video analysis and public surveillance, where information from multiple modalities, such as images and audio extracted from videos, are used to jointly determine the identity of a person. Previous person recognition techniques either use only uni-modal data or only consider shared representations between different input modalities, while leaving the extraction of their relationship with identity information to downstream tasks. Furthermore, real-world data often contain noise, which makes recognition more challenging practical situations. In our work, we propose a novel correlation-based multimodal person recognition framework that is relatively simple but can efficaciously learn supervised information in multimodal data fusion and resist noise. Specifically, our framework learns a discriminative embeddings of persons by joint learning visual features and audio features while maximizing HGR maximal correlation among multimodal input and persons' identities. Experiments are done on a subset of Voxceleb2. Compared with state-of-the-art methods, the proposed method demonstrates an improvement of accuracy and robustness to noise.

Similar papers

Robust Audio-Visual Speech Recognition Based on Hybrid Fusion

Hong Liu, Wenhao Li, Bing Yang

Auto-TLDR; Hybrid Fusion Based AVSR with Residual Networks and Bidirectional Gated Recurrent Unit for Robust Speech Recognition in Noise Conditions

Person Recognition with HGR Maximal Correlation on Multimodal Data

Similar papers

Robust Audio-Visual Speech Recognition Based on Hybrid Fusion

End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

Audio-Visual Speech Recognition Using a Two-Step Feature Fusion Strategy

G-FAN: Graph-Based Feature Aggregation Network for Video Face Recognition

Mutual Alignment between Audiovisual Features for End-To-End Audiovisual Speech Recognition

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

RGB-Infrared Person Re-Identification Via Image Modality Conversion

Angular Sparsemax for Face Recognition

Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning

Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition

Attentive Part-Aware Networks for Partial Person Re-Identification

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Cc-Loss: Channel Correlation Loss for Image Classification

Identity-Aware Facial Expression Recognition in Compressed Video

SATGAN: Augmenting Age Biased Dataset for Cross-Age Face Recognition

Single-Modal Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning

Spatial Bias in Vision-Based Voice Activity Detection

Face Image Quality Assessment for Model and Human Perception

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Learning Emotional Blinded Face Representations

Audio-Video Detection of the Active Speaker in Meetings

3D Facial Matching by Spiral Convolutional Metric Learning and a Biometric Fusion-Net of Demographic Properties

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning

A Base-Derivative Framework for Cross-Modality RGB-Infrared Person Re-Identification

Progressive Learning Algorithm for Efficient Person Re-Identification

Multi-Label Contrastive Focal Loss for Pedestrian Attribute Recognition

Multi-Level Deep Learning Vehicle Re-Identification Using Ranked-Based Loss Functions

Rethinking ReID：Multi-Feature Fusion Person Re-Identification Based on Orientation Constraints

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Hybrid Network for End-To-End Text-Independent Speaker Identification

Deep Gait Relative Attribute Using a Signed Quadratic Contrastive Loss

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Dual Loss for Manga Character Recognition with Imbalanced Training Data

Self and Channel Attention Network for Person Re-Identification

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity

Progressive Unsupervised Domain Adaptation for Image-Based Person Re-Identification

Ballroom Dance Recognition from Audio Recordings

SoftmaxOut Transformation-Permutation Network for Facial Template Protection

Talking Face Generation Via Learning Semantic and Temporal Synchronous Landmarks

Lightweight Low-Resolution Face Recognition for Surveillance Applications

Deep Top-Rank Counter Metric for Person Re-Identification

Toward Text-Independent Cross-Lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification