ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Peter Steiner, Azarakhsh Jalalvand, Simon Stone, Peter Birkholz

Auto-TLDR; Echo State Networks for Onset Detection in Music Analysis

Abstract Slides Poster

In music analysis, one of the most fundamental tasks is note onset detection - detecting the beginning of new note events. As the target function of onset detection is related to other tasks, such as beat tracking or tempo estimation, onset detection is the basis for such related tasks. Furthermore, it can help to improve Automatic Music Transcription (AMT). Typically, different approaches for onset detection follow a similar outline: An audio signal is transformed into an Onset Detection Function (ODF), which should have rather low values (i.e. close to zero) for most of the time but with pronounced peaks at onset times, which can then be extracted by applying peak picking algorithms on the ODF. In the recent years, several kinds of neural networks were used successfully to compute the ODF from feature vectors. Currently, Convolutional Neural Networks (CNNs) define the state of the art. In this paper, we build up on an alternative approach to obtain a ODF by Echo State Networks (ESNs), which have achieved comparable results to CNNs in several tasks, such as speech and image recognition. In contrast to the typical iterative training procedures of deep learning architectures, such as CNNs or networks consisting of Long-Short-Term Memory Cells (LSTMs), in ESNs only a very small part of the weights is easily trained in one shot using linear regression. By comparing the performance of several feature extraction methods, pre-processing steps and introducing a new way to stack ESNs, we expand our previous approach to achieve results that fall between a bidirectional LSTM network and a CNN with relative improvements of 1.8% and -1.4%, respectively. For the evaluation, we used exactly the same 8-fold cross validation setup as for the reference results.

Similar papers

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

Auto-TLDR; Environmental Sound Classification with Short-Time Fourier Transform Spectrograms

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Similar papers

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Which are the factors affecting the performance of audio surveillance systems?

Mood Detection Analyzing Lyrics and Audio Signal Based on Deep Learning Architectures

DenseRecognition of Spoken Languages

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

Detection of Calls from Smart Speaker Devices

Hybrid Network for End-To-End Text-Independent Speaker Identification

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Ballroom Dance Recognition from Audio Recordings

The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition

Radar Image Reconstruction from Raw ADC Data Using Parametric Variational Autoencoder with Domain Adaptation

Adversarially Training for Audio Classifiers

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning

Location Prediction in Real Homes of Older Adults based on K-Means in Low-Resolution Depth Videos

Exploring Spatial-Temporal Representations for fNIRS-based Intimacy Detection via an Attention-enhanced Cascade Convolutional Recurrent Neural Network

Classification of Intestinal Gland Cell-Graphs Using Graph Neural Networks

Wireless Localisation in WiFi Using Novel Deep Architectures

Regularized Flexible Activation Function Combinations for Deep Neural Networks

Emerging Relation Network and Task Embedding for Multi-Task Regression Problems

Improving Gravitational Wave Detection with 2D Convolutional Neural Networks

End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition

Neuron-Based Network Pruning Based on Majority Voting

Influence of Event Duration on Automatic Wheeze Classification

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Spatial Bias in Vision-Based Voice Activity Detection

Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification

Deep Learning on Active Sonar Data Using Bayesian Optimization for Hyperparameter Tuning

A Low-Complexity R-Peak Detection Algorithm with Adaptive Thresholding for Wearable Devices

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

Improving Mix-And-Separate Training in Audio-Visual Sound Source Separation with an Object Prior

Recursive Convolutional Neural Networks for Epigenomics

Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-Off between Robustness and Classification Accuracy

Trajectory-User Link with Attention Recurrent Networks

GazeMAE: General Representations of Eye Movements Using a Micro-Macro Autoencoder

Deep Composer: A Hash-Based Duplicative Neural Network for Generating Multi-Instrument Songs

One-Shot Learning for Acoustic Identification of Bird Species in Non-Stationary Environments

Signal Generation Using 1d Deep Convolutional Generative Adversarial Networks for Fault Diagnosis of Electrical Machines

Hierarchical Multimodal Attention for Deep Video Summarization

Verifying the Causes of Adversarial Examples

Deep Transfer Learning for Alzheimer’s Disease Detection

Transfer Learning with Graph Neural Networks for Short-Term Highway Traffic Forecasting

Recognizing Bengali Word Images - A Zero-Shot Learning Perspective