ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Toward Text-Independent Cross-Lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

Yi-Chieh Wu, Wen-Hung Liao

Auto-TLDR; Cross-lingual Speech for Biometric Recognition

Abstract Poster

Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric application. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, model trained with hybrid data demonstrates highest accuracy associated with the cost of data collection efforts. In SI tasks, we obtained over 91\% cross-lingual accuracy all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52\%, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.

Similar papers

Detection of Calls from Smart Speaker Devices

Vinay Maddali, David Looney, Kailash Patil

Auto-TLDR; Distinguishing Between Smart Speaker and Cell Devices Using Only the Audio Using a Feature Set

Toward Text-Independent Cross-Lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

Similar papers

Detection of Calls from Smart Speaker Devices

Hybrid Network for End-To-End Text-Independent Speaker Identification

DenseRecognition of Spoken Languages

Spatial Bias in Vision-Based Voice Activity Detection

End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition

ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map

Audio-Video Detection of the Active Speaker in Meetings

Influence of Event Duration on Automatic Wheeze Classification

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Person Recognition with HGR Maximal Correlation on Multimodal Data

Ballroom Dance Recognition from Audio Recordings

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition

Which are the factors affecting the performance of audio surveillance systems?

Mood Detection Analyzing Lyrics and Audio Signal Based on Deep Learning Architectures

Optimal Strategies for Comparing Covariates to Solve Matching Problems

Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning

Handwritten Signature and Text Based User Verification Using Smartwatch

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Improving Mix-And-Separate Training in Audio-Visual Sound Source Separation with an Object Prior

Audio-Visual Speech Recognition Using a Two-Step Feature Fusion Strategy

Video Episode Boundary Detection with Joint Episode-Topic Model

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Trajectory-User Link with Attention Recurrent Networks

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning

Enhanced User Interest and Expertise Modeling for Expert Recommendation

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

Cross-People Mobile-Phone Based Airwriting Character Recognition

Responsive Social Smile: A Machine-Learning Based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening

Tackling Contradiction Detection in German Using Machine Translation and End-To-End Recurrent Neural Networks

Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images

One-Shot Learning for Acoustic Identification of Bird Species in Non-Stationary Environments

Memetic Evolution of Training Sets with Adaptive Radial Basis Kernels for Support Vector Machines

Attentive Visual Semantic Specialized Network for Video Captioning

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Learning Neural Textual Representations for Citation Recommendation

Robust Audio-Visual Speech Recognition Based on Hybrid Fusion

Learning Metric Features for Writer-Independent Signature Verification Using Dual Triplet Loss

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes

Assessing the Severity of Health States Based on Social Media Posts

Multi-Attribute Learning with Highly Imbalanced Data

Exploring Seismocardiogram Biometrics with Wavelet Transform

Are Multiple Cross-Correlation Identities Better Than Just Two? Improving the Estimate of Time Differences-Of-Arrivals from Blind Audio Signals

Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity