ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Heuristics for Evaluation of AI Generated Music

Edmund Dervakos, Giorgos Filandrianos, Giorgos Stamou

Auto-TLDR; Evaluation of generative models in the symbolic music domain using the circle of fifths

Abstract Slides Poster

Evaluation of generative AI is a difficult problem, especially in artistic domains in which aesthetic qualities of generated samples are to an extent subjective, such as in music. The most widely accepted method for evaluating such models is to conduct a survey of users, which is a resource intensive process. In this work we propose a framework for cheaply evaluating generative models in the symbolic music domain by utilizing tools from music theory, such as the circle of fifths, with the goal of producing quantifiable metrics which reflect the "musicality" of a written score or MIDI file.

Similar papers

Deep Composer: A Hash-Based Duplicative Neural Network for Generating Multi-Instrument Songs

Jacob Galajda, Brandon Royal, Kien Hua

Auto-TLDR; Deep Composer for Intelligence Duplication

Abstract Poster Similar

Music is one of the most appreciated forms of art, and generating songs has become a popular subject in the artificial intelligence community. There are various networks that can produce pleasant sounding music, but no model has been able to produce music that duplicates the style of a specific artist or artists. In this paper, we extend a previous single-instrument model: the Deep Composer -a model we believe to be capable of achieving this. Deep Composer originates from the Deep Segment Hash Learning (DSHL) single instrument model and is designed to learn how a specific artist would place individual segments of music together rather than create music similar to a specific genre. To the best of our knowledge, no other network has been designed to achieve this. For these reasons, we introduce a new field of study, Intelligence Duplication (ID). AI research generally focuses on developing techniques to mimic universal intelligence. Intelligence Duplication (ID) research focuses on techniques to artificially duplicate or clone a specific mind such as Mozart. Additionally, we present a new retrieval algorithm, Segment Barrier Retrieval (SBR), to improve retrieval accuracy within the hash-space as opposed to a more traditionally used feature-space. SBR prevents retrieval branches from entering areas of low-density within the hash-space, a phenomena we identify and label as segment sparsity. To test our Deep Composer and the effectiveness of SBR, we evaluate various models with different SBR threshold values and conduct qualitative surveys for each model. The survey results indicate that our Deep Composer model is capable of learning music generation from multiple composers. Our extended Deep Composer model provides a more suitable platform for Intelligence Duplication. Future work can apply this platform to duplicate great composers such as Mozart or allow them to collaborate in the virtual space.

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Herremans Dorien

Auto-TLDR; Exploring the effect of spectrogram reconstruction loss on automatic music transcription

Abstract Slides Similar

Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks. In this paper, we do not aim at achieving state-of-the-art transcription accuracy, instead, we explore the effect that spectrogram reconstruction has on our AMT model. Our proposed model consists of two U-nets: the first U-net transcribes the spectrogram into a posteriorgram, and a second U-net transforms the posteriorgram back into a spectrogram. A reconstruction loss is applied between the original spectrogram and the reconstructed spectrogram to constrain the second U-net to focus only on reconstruction. We train our model on different datasets including MAPS, MAESTRO, and MusicNet. Our experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part. Moreover, it can also boost the frame-level precision to be higher than the state-of-the-art models. The feature maps learned by our u-net contain gridlike structures (not present in the baseline model) which implies that with the present of reconstruction loss, the model is probably trying to count along both the time and frequency axis, resulting in a higher note-level transcription accuracy.

Adversarial Training for Aspect-Based Sentiment Analysis with BERT

Akbar Karimi, Andrea Prati, Leonardo Rossi

Auto-TLDR; Adversarial Training of BERT for Aspect-Based Sentiment Analysis

Heuristics for Evaluation of AI Generated Music

Similar papers

Deep Composer: A Hash-Based Duplicative Neural Network for Generating Multi-Instrument Songs

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

Adversarial Training for Aspect-Based Sentiment Analysis with BERT

Mood Detection Analyzing Lyrics and Audio Signal Based on Deep Learning Architectures

On the Evaluation of Generative Adversarial Networks by Discriminative Models

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

Trajectory-User Link with Attention Recurrent Networks

CardioGAN: An Attention-Based Generative Adversarial Network for Generation of Electrocardiograms

An Unsupervised Approach towards Varying Human Skin Tone Using Generative Adversarial Networks

Text Synopsis Generation for Egocentric Videos

S2I-Bird: Sound-To-Image Generation of Bird Species Using Generative Adversarial Networks

GCNs-Based Context-Aware Short Text Similarity Model

Tackling Contradiction Detection in German Using Machine Translation and End-To-End Recurrent Neural Networks

Generating Private Data Surrogates for Vision Related Tasks

Learning Neural Textual Representations for Citation Recommendation

Ballroom Dance Recognition from Audio Recordings

GAP: Quantifying the Generative Adversarial Set and Class Feature Applicability of Deep Neural Networks

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability

Graph Discovery for Visual Test Generation

Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis

Signal Generation Using 1d Deep Convolutional Generative Adversarial Networks for Fault Diagnosis of Electrical Machines

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning

Transformer Networks for Trajectory Forecasting

Transformer Reasoning Network for Image-Text Matching and Retrieval

Assessing the Severity of Health States Based on Social Media Posts

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

Emerging Relation Network and Task Embedding for Multi-Task Regression Problems

Scientific Document Summarization using Citation Context and Multi-objective Optimization

Local Facial Attribute Transfer through Inpainting

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Information Graphic Summarization Using a Collection of Multimodal Deep Neural Networks

Equation Attention Relationship Network (EARN) : A Geometric Deep Metric Framework for Learning Similar Math Expression Embedding

SAGE: Sequential Attribute Generator for Analyzing Glioblastomas Using Limited Dataset

Automatic Student Network Search for Knowledge Distillation

Future Urban Scenes Generation through Vehicles Synthesis

Context Visual Information-Based Deliberation Network for Video Captioning

Few-Shot Font Generation with Deep Metric Learning

Let's Play Music: Audio-Driven Performance Video Generation

Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Mutual Information Based Method for Unsupervised Disentanglement of Video Representation

A Quantitative Evaluation Framework of Video De-Identification Methods

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Hierarchical Multimodal Attention for Deep Video Summarization

KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding

Adversarial Knowledge Distillation for a Compact Generator

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes