Tomohiro Tanaka

Papers from this author

Unsupervised Sound Source Localization From Audio-Image Pairs Using Input Gradient Map

Tomohiro Tanaka, Takahiro Shinozaki

Responsive image

Auto-TLDR; Unsupervised Sound Localization Using Gradient Method

Slides Poster Similar

Humans easily and routinely identify an image region that corresponds to an observed sound in their daily lives. The task is formulated as an unsupervised sound source localization without using tagged data. Recently, several methods have been proposed that utilize the activation of hidden or output layers of neural networks, such as an attention layer or feature maps in a convolutional neural network (CNN). We propose another strategy that obtains a localization map at the input side, applying the widely used input gradient method. It is computationally efficient and can be easily applied to any existing techniques because it is free from the network structure. Taking advantage of it, we propose a combination method with existing methods for higher sound localization performance. Experiments are performed using the Flickr-SoundNet data set. When a pre-trained image front-end was used, the proposed method gives better results than the attention-based method. For a completely unsupervised condition, the gradient method provides comparable performance as the conventional methods; the best results are obtained by this combination method.