ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning

Anastasia-Sotiria Toufa, Constantine Kotropoulos

Auto-TLDR; Compressed Sensing for Digit Recognition in Audio Reconstruction

Abstract Poster

Compressed sensing allows signal reconstruction from a few measurements. This work proposes a complete pipeline for digit recognition applied to audio reconstructed signals. The reconstruction procedure exploits the assumption that the original signal lies in the range of a generator. A pretrained generator of a Generative Adversarial Network generates audio digits. A new method for reconstruction is proposed, using only the most active segment of the signal, i.e., the segment with the highest energy. The underlying assumption is that such segment offers a more compact representation, preserving the meaningful content of signal. Cases when the reconstruction produces noise, instead of digit, are treated as outliers. In order to detect and reject them, three unsupervised indicators are used, namely, the total energy of reconstructed signal, the predictions of an one-class Support Vector Machine, and the confidence of a pretrained classifier used for recognition. This classifier is based on neural networks architectures and is pretrained on original audio recordings, employing three input representations, i.e., raw audio, spectrogram, and gammatonegram. Experiments are conducted, analyzing both the quality of reconstruction and the performance of classifiers in digit recognition, demonstrating that the proposed method yields higher performance in both the quality of reconstruction and digit recognition accuracy.

Similar papers

Hybrid Network for End-To-End Text-Independent Speaker Identification

Wajdi Ghezaiel, Luc Brun, Olivier Lezoray

Auto-TLDR; Text-Independent Speaker Identification with Scattering Wavelet Network and Convolutional Neural Networks

Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning

Similar papers

Hybrid Network for End-To-End Text-Independent Speaker Identification

Ballroom Dance Recognition from Audio Recordings

DenseRecognition of Spoken Languages

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition

Which are the factors affecting the performance of audio surveillance systems?

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

Audio-Video Detection of the Active Speaker in Meetings

S2I-Bird: Sound-To-Image Generation of Bird Species Using Generative Adversarial Networks

Phase Retrieval Using Conditional Generative Adversarial Networks

Adversarially Training for Audio Classifiers

ESResNet: Environmental Sound Classification Based on Visual Domain Models

End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition

Leveraging Synthetic Subject Invariant EEG Signals for Zero Calibration BCI

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification

Influence of Event Duration on Automatic Wheeze Classification

Detection of Calls from Smart Speaker Devices

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

Uncertainty-Aware Data Augmentation for Food Recognition

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes

Signal Generation Using 1d Deep Convolutional Generative Adversarial Networks for Fault Diagnosis of Electrical Machines

Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks

Radar Image Reconstruction from Raw ADC Data Using Parametric Variational Autoencoder with Domain Adaptation

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning

One-Shot Learning for Acoustic Identification of Bird Species in Non-Stationary Environments

Are Multiple Cross-Correlation Identities Better Than Just Two? Improving the Estimate of Time Differences-Of-Arrivals from Blind Audio Signals

Toward Text-Independent Cross-Lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

Combining GANs and AutoEncoders for Efficient Anomaly Detection

Spatial Bias in Vision-Based Voice Activity Detection

Anticipating Activity from Multimodal Signals

Data Augmentation Via Mixed Class Interpolation Using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Improving Gravitational Wave Detection with 2D Convolutional Neural Networks

Improving Mix-And-Separate Training in Audio-Visual Sound Source Separation with an Object Prior

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

Generative Deep-Neural-Network Mixture Modeling with Semi-Supervised MinMax+EM Learning

Single-Modal Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning

Deep Learning on Active Sonar Data Using Bayesian Optimization for Hyperparameter Tuning

Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Computational Data Analysis for First Quantization Estimation on JPEG Double Compressed Images

On the Use of Benford's Law to Detect GAN-Generated Images

On the Evaluation of Generative Adversarial Networks by Discriminative Models

CardioGAN: An Attention-Based Generative Adversarial Network for Generation of Electrocardiograms

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Estimation of Clinical Tremor Using Spatio-Temporal Adversarial AutoEncoder

How to Define a Rejection Class Based on Model Learning?

Video Analytics Gait Trend Measurement for Fall Prevention and Health Monitoring